API Reference#
The page holds MRdataset's API documentation, which might be helpful for users or developers to create interface with their own neuroimaging datasets. Among the different sub-packages and modules, there are two categories: core API and high-level API. The core api contains modules for important elements (e.g. BaseDataset).
High level API#
The high-level API contains functions that are useful for importing datasets from disk. After importing these objects can be saved/reloaded as pickle files.
- import_dataset(
- data_source: str | Path | List,
- ds_format: str = 'dicom',
- name: str | None = None,
- verbose: bool = False,
- is_complete: bool = True,
- config_path: str | Path | None = None,
- output_dir: str | Path | None = None,
- **_kwargs,
Create MRdataset from data source as per arguments. This function acts as a Wrapper class for BaseDataset. This is the main interface between this package and your dataset. This function is used by the CLI and the python scripts.
- Parameters:
- data_sourceUnion[str, Path, List]
path/to/my/dataset containing files e.g. .dcm
- ds_formatstr
Specify dataset type. Imports the module "{ds_format}.py", which will instantiate {ds_format}Dataset().
- namestr
Name/Identifier for your dataset, like ADNI. The name used to save files and reports. If not provided, a random name is generated e.g. 54231
- verbose: bool
The flag allows you to change the verbosity of execution
- is_complete: bool
- whether the dataset is subset of a larger dataset. It is useful for
parallel processing of large datasets.
- config_path: Union[str, Path]
path to config file which contains the rules for reading the dataset e.g. sequences to read, subjects to ignore, etc.
- output_dir: Union[str, Path]
path to the directory where the output files will be saved.
- Returns:
- datasetBaseDataset
dataset object containing the dataset
Examples
from MRdataset import import_dataset data = import_dataset(data_source='/path/to/my/data/', ds_format='dicom', name='abcd_baseline', config_path='mri-config.json', output_dir='/path/to/my/output/dir/')
- load_mr_dataset(filepath: str | Path) BaseDataset #
Load a dataset from a file. The file must be a pickle file with extension .mrds.pkl
- Parameters:
- filepath: Union[str, Path]
path to the dataset file
- Returns:
- datasetBaseDataset
dataset loaded from the file
Examples
from MRdataset import load_mr_dataset dataset = load_mr_dataset('/path/to/my/dataset.mrds.pkl')
- save_mr_dataset(
- filepath: str | Path,
- mrds_obj: BaseDataset,
Save a dataset to a file with extension .mrds.pkl
- Parameters:
- filepath: Union[str, Path]
path to the dataset file
- mrds_obj: BaseDataset
dataset to be saved
- Returns:
- None
Examples
from MRdataset import save_mr_dataset my_dataset = import_dataset(data_source='/path/to/my/data/', ds_format='dicom', name='abcd_baseline', config_path='mri-config.json') dataset = save_mr_dataset(filepath='/path/to/my/dataset.mrds.pkl', mrds_obj=my_dataset)
Core API#
The Core API contains modules for important elements (e.g. Modality, Subject, Run etc.in a neuroimaging experiment.
- class DicomDataset#
Bases:
BaseDataset
,ABC
This class represents a dataset of dicom files. It is a subclass of BaseDataset.
- Parameters:
- data_sourcestr or List[str]
The path or list of folders that contain the dicom files
- patternstr
The pattern to match the file extension. Default is '*'.
- namestr
The name of the dataset. Default is 'DicomDataset'.
- config_pathstr
The path to the config file.
- verbosebool
Whether to print verbose output on console. Default is False.
- ds_formatstr
The format of the dataset. Default is 'dicom'. Choose one of ['dicom']
Methods
add
(subject_id, session_id, seq_id, run_id, seq)Adds a given sequence to provided subject_id, session_id and run_id for the dataset
get
(subject_id, session_id, seq_id, run_id)Returns a Sequence given subject/session/seq/run from the dataset
get_sequence_ids
()Returns a list of all sequence IDs in the dataset
get_subject_ids
(seq_id)Returns a list of all subject IDs in the dataset for a given sequence ID
load
()Default method to load the dataset.
merge
(other)Merges two dicom datasets
save_process_log
([output_dir])Saves the log file to the output directory.
subjects
()Returns a list of all subject IDs in the dataset
traverse_horizontal
(seq_id)Generator to traverse the dataset horizontally.
traverse_vertical2
(seq_id1, seq_id2)Generator to traverse the dataset vertically.
traverse_vertical_multi
(*seq_ids)Generator to traverse the dataset vertically.
- __init__(
- data_source,
- pattern='*',
- name='DicomDataset',
- config_path=None,
- verbose=False,
- output_dir=None,
- min_count=1,
- **kwargs,
constructor
- load()#
Default method to load the dataset. It iterates over all the folders in the data_source and finds the sub-folders with at least min_count files. Then it iterates over all the sub-folders and processes them to find the dicom slices. It then runs some basic validation on them and adds them to the dataset.
- merge(other)#
Merges two dicom datasets
- save_process_log(output_dir=None)#
Saves the log file to the output directory. This log file contains the information about how many dicom files were processed from each folder. This is used to speed up the loading process.
- Parameters:
- output_dirstr | Path
The path to the output directory. Default is None.
- class BidsDataset#
Bases:
BaseDataset
,ABC
Class to represent a BIDS dataset. It is a subclass of BaseDataset. It gathers data from JSON files.
- Parameters:
- data_sourcestr or List[str]
The path to the dataset.
- patternstr
The pattern to match for JSON files.
- namestr
The name of the dataset.
- config_pathstr
The path to the config file.
- verbosebool
Whether to print verbose output on console.
- ds_formatstr
The format of the dataset. One of ['dicom', 'bids'].
Methods
add
(subject_id, session_id, seq_id, run_id, seq)Adds a given sequence to provided subject_id, session_id and run_id for the dataset
get
(subject_id, session_id, seq_id, run_id)Returns a Sequence given subject/session/seq/run from the dataset
get_sequence_ids
()Returns a list of all sequence IDs in the dataset
get_subject_ids
(seq_id)Returns a list of all subject IDs in the dataset for a given sequence ID
load
()Default method to load the dataset.
merge
(other)Merges two datasets.
subjects
()Returns a list of all subject IDs in the dataset
traverse_horizontal
(seq_id)Generator to traverse the dataset horizontally.
traverse_vertical2
(seq_id1, seq_id2)Generator to traverse the dataset vertically.
traverse_vertical_multi
(*seq_ids)Generator to traverse the dataset vertically.
- __init__(
- data_source,
- pattern='*.json',
- name='BidsDataset',
- config_path=None,
- verbose=False,
- output_dir=None,
- min_count=1,
- **kwargs,
constructor
- load()#
Default method to load the dataset. It iterates over all the folders in the data_source and finds subfolders with at least min_count files matching the pattern. It then processes each subfolder and adds the sequence to the dataset.
- class BaseDataset#
Bases:
ABC
Base class for all datasets. The class provides a common interface to access the dataset in a hierarchical fashion. The hierarchy is as follows: Subject > Session > Sequence > Run
- Parameters:
- data_sourceList | Path | str
valid path to the dataset on disk
- is_completebool
flag to indicate if the dataset is complete or not
- namestr
name of the dataset
- ds_formatstr
format of the dataset. One of ['dicom', 'bids']
Methods
add
(subject_id, session_id, seq_id, run_id, seq)Adds a given sequence to provided subject_id, session_id and run_id for the dataset
get
(subject_id, session_id, seq_id, run_id)Returns a Sequence given subject/session/seq/run from the dataset
Returns a list of all sequence IDs in the dataset
get_subject_ids
(seq_id)Returns a list of all subject IDs in the dataset for a given sequence ID
load
()default method to load the dataset
merge
(other)Merges two datasets.
subjects
()Returns a list of all subject IDs in the dataset
traverse_horizontal
(seq_id)Generator to traverse the dataset horizontally.
traverse_vertical2
(seq_id1, seq_id2)Generator to traverse the dataset vertically.
traverse_vertical_multi
(*seq_ids)Generator to traverse the dataset vertically.
- __init__(
- data_source: List | Path | str | None = None,
- is_complete: bool = True,
- name: str = 'Dataset',
- ds_format: str = 'dicom',
constructor
- add(subject_id, session_id, seq_id, run_id, seq)#
Adds a given sequence to provided subject_id, session_id and run_id for the dataset
- Parameters:
- subject_idstr
Unique identifier for the Subject. For example, a subject ID can be a string like 'sub-01' or '001'.
- session_idstr
Unique identifier the Session. For example, a session ID can be a string like 'ses-01' or '001'. For DICOM datasets, this can be StudyInstanceUID.
- seq_idstr
Unique identifier the Sequence. For example, a sequence ID can be a string like 'fMRI' or 't1w'.
- run_idstr
Unique identifier the Run. For example, a run ID can be a string like 'run-01' or '001'. For DICOM datasets, this can be SeriesInstanceUID.
- seqprotocol.BaseSequence
Instance of the sequence
- get(subject_id, session_id, seq_id, run_id, default=None)#
Returns a Sequence given subject/session/seq/run from the dataset
- Parameters:
- subject_idstr
Unique identifier for the Subject. For example, a subject ID can be a string like 'sub-01' or '001'.
- session_idstr
Unique identifier the Session. For example, a session ID can be a string like 'ses-01' or '001'. For DICOM datasets, this can be StudyInstanceUID.
- seq_idstr
Unique identifier the Sequence. For example, a sequence ID can be a string like 'fMRI' or 't1w'.
- run_idstr
Unique identifier the Run. For example, a run ID can be a string like 'run-01' or '001'. For DICOM datasets, this can be SeriesInstanceUID.
- defaultAny
Default value to return if the sequence is not found
- get_sequence_ids()#
Returns a list of all sequence IDs in the dataset
- get_subject_ids(seq_id)#
Returns a list of all subject IDs in the dataset for a given sequence ID
- Parameters:
- seq_idstr
Name of the Sequence ID
- abstract load()#
default method to load the dataset
- merge(other)#
Merges two datasets. This function is an alias for _merge(). It is provided for intuitive use. See _merge() for more details. It can be overloaded by the child classes to provide additional functionality.
- Parameters:
- otherBaseDataset
Another instance of BaseDataset to merge with the current dataset
- subjects()#
Returns a list of all subject IDs in the dataset
- traverse_horizontal(seq_id)#
Generator to traverse the dataset horizontally. i.e., all subjects, across sessions and runs for a given sequence. The method will yield a tuple of (subject_id, session_id, run_id, sequence) for each sequence in the dataset.
- Parameters:
- seq_idstr
Name of the Sequence ID
- Yields:
- tuple_idstuple
A tuple of subject_id, session_id, run_id, and protocol.Sequence instance
- traverse_vertical2(seq_id1, seq_id2)#
Generator to traverse the dataset vertically. i.e., sequences for a particular subject. The method will yield sequences from the same session for a given subject. For example, fMRI and associated field maps from the same session.
- Parameters:
- seq_id1str
Name of the Sequence ID
- seq_id2str
Name of the Sequence ID
- Yields:
- tuple_idstuple
A tuple of subj, sess, run, seq_one, seq_two
- traverse_vertical_multi(*seq_ids)#
Generator to traverse the dataset vertically. i.e., sequences for a particular subject. The method will yield multiple sequences from the same session for a given subject. For example, fMRI, t1w and associated field maps from the same session.
- Parameters:
- seq_idslist
Sequence IDs to retrieve from the dataset
- Returns:
- tuple_ids_datatuple
A tuple of subj, sess, tuple_runs, tuple_seqs