API Reference#

The page holds MRdataset's API documentation, which might be helpful for users or developers to create interface with their own neuroimaging datasets. Among the different sub-packages and modules, there are two categories: core API and high-level API. The core api contains modules for important elements (e.g. BaseDataset).

High level API#

The high-level API contains functions that are useful for importing datasets from disk. After importing these objects can be saved/reloaded as pickle files.

Create MRdataset from data source as per arguments. This function acts as a Wrapper class for BaseDataset. This is the main interface between this package and your dataset. This function is used by the CLI and the python scripts.

Parameters:

data_sourceUnion[str, Path, List]

path/to/my/dataset containing files e.g. .dcm

ds_formatstr

Specify dataset type. Imports the module "{ds_format}.py", which will instantiate {ds_format}Dataset().

namestr

Name/Identifier for your dataset, like ADNI. The name used to save files and reports. If not provided, a random name is generated e.g. 54231

verbose: bool

The flag allows you to change the verbosity of execution

is_complete: bool

whether the dataset is subset of a larger dataset. It is useful for: parallel processing of large datasets.

config_path: Union[str, Path]

path to config file which contains the rules for reading the dataset e.g. sequences to read, subjects to ignore, etc.

output_dir: Union[str, Path]

path to the directory where the output files will be saved.

Returns:

datasetBaseDataset: dataset object containing the dataset

Examples

from MRdataset import import_dataset
data = import_dataset(data_source='/path/to/my/data/',
                      ds_format='dicom', name='abcd_baseline',
                      config_path='mri-config.json',
                      output_dir='/path/to/my/output/dir/')

load_mr_dataset(filepath: str | Path) → BaseDataset#

Load a dataset from a file. The file must be a pickle file with extension .mrds.pkl

Parameters:

filepath: Union[str, Path]: path to the dataset file

Returns:

datasetBaseDataset: dataset loaded from the file

Examples

from MRdataset import load_mr_dataset
dataset = load_mr_dataset('/path/to/my/dataset.mrds.pkl')

save_mr_dataset( filepath: str | Path, mrds_obj: BaseDataset, ) → None#

Save a dataset to a file with extension .mrds.pkl

Parameters:

filepath: Union[str, Path]: path to the dataset file
mrds_obj: BaseDataset: dataset to be saved

Returns:

None

Examples

from MRdataset import save_mr_dataset
my_dataset = import_dataset(data_source='/path/to/my/data/',
              ds_format='dicom', name='abcd_baseline',
              config_path='mri-config.json')
dataset = save_mr_dataset(filepath='/path/to/my/dataset.mrds.pkl',
                          mrds_obj=my_dataset)

Core API#

The Core API contains modules for important elements (e.g. Modality, Subject, Run etc.in a neuroimaging experiment.

class DicomDataset#

Bases: BaseDataset, ABC

This class represents a dataset of dicom files. It is a subclass of BaseDataset.

Parameters:

data_sourcestr or List[str]: The path or list of folders that contain the dicom files
patternstr: The pattern to match the file extension. Default is '*'.
namestr: The name of the dataset. Default is 'DicomDataset'.
config_pathstr: The path to the config file.
verbosebool: Whether to print verbose output on console. Default is False.
ds_formatstr: The format of the dataset. Default is 'dicom'. Choose one of ['dicom']

Methods

`add`(subject_id, session_id, seq_id, run_id, seq)	Adds a given sequence to provided subject_id, session_id and run_id for the dataset
`get`(subject_id, session_id, seq_id, run_id)	Returns a Sequence given subject/session/seq/run from the dataset
`get_sequence_ids`()	Returns a list of all sequence IDs in the dataset
`get_subject_ids`(seq_id)	Returns a list of all subject IDs in the dataset for a given sequence ID
`load`()	Default method to load the dataset.
`merge`(other)	Merges two dicom datasets
`save_process_log`([output_dir])	Saves the log file to the output directory.
`subjects`()	Returns a list of all subject IDs in the dataset
`traverse_horizontal`(seq_id)	Generator to traverse the dataset horizontally.
`traverse_vertical2`(seq_id1, seq_id2)	Generator to traverse the dataset vertically.
`traverse_vertical_multi`(*seq_ids)	Generator to traverse the dataset vertically.

__init__(

data_source,

pattern='*',

name='DicomDataset',

config_path=None,

verbose=False,

output_dir=None,

min_count=1,

**kwargs,

)#: constructor

load()#: Default method to load the dataset. It iterates over all the folders in the data_source and finds the sub-folders with at least min_count files. Then it iterates over all the sub-folders and processes them to find the dicom slices. It then runs some basic validation on them and adds them to the dataset.

merge(other)#: Merges two dicom datasets

save_process_log(output_dir=None)#

Saves the log file to the output directory. This log file contains the information about how many dicom files were processed from each folder. This is used to speed up the loading process.

Parameters:

output_dirstr | Path: The path to the output directory. Default is None.

class BidsDataset#

Bases: BaseDataset, ABC

Class to represent a BIDS dataset. It is a subclass of BaseDataset. It gathers data from JSON files.

Parameters:

data_sourcestr or List[str]: The path to the dataset.
patternstr: The pattern to match for JSON files.
namestr: The name of the dataset.
config_pathstr: The path to the config file.
verbosebool: Whether to print verbose output on console.
ds_formatstr: The format of the dataset. One of ['dicom', 'bids'].

Methods

`add`(subject_id, session_id, seq_id, run_id, seq)	Adds a given sequence to provided subject_id, session_id and run_id for the dataset
`get`(subject_id, session_id, seq_id, run_id)	Returns a Sequence given subject/session/seq/run from the dataset
`get_sequence_ids`()	Returns a list of all sequence IDs in the dataset
`get_subject_ids`(seq_id)	Returns a list of all subject IDs in the dataset for a given sequence ID
`load`()	Default method to load the dataset.
`merge`(other)	Merges two datasets.
`subjects`()	Returns a list of all subject IDs in the dataset
`traverse_horizontal`(seq_id)	Generator to traverse the dataset horizontally.
`traverse_vertical2`(seq_id1, seq_id2)	Generator to traverse the dataset vertically.
`traverse_vertical_multi`(*seq_ids)	Generator to traverse the dataset vertically.

__init__(

data_source,

pattern='*.json',

name='BidsDataset',

config_path=None,

verbose=False,

output_dir=None,

min_count=1,

**kwargs,

)#: constructor

load()#: Default method to load the dataset. It iterates over all the folders in the data_source and finds subfolders with at least min_count files matching the pattern. It then processes each subfolder and adds the sequence to the dataset.

class BaseDataset#

Bases: ABC

Base class for all datasets. The class provides a common interface to access the dataset in a hierarchical fashion. The hierarchy is as follows: Subject > Session > Sequence > Run

Parameters:

data_sourceList | Path | str: valid path to the dataset on disk
is_completebool: flag to indicate if the dataset is complete or not
namestr: name of the dataset
ds_formatstr: format of the dataset. One of ['dicom', 'bids']

Methods

`add`(subject_id, session_id, seq_id, run_id, seq)	Adds a given sequence to provided subject_id, session_id and run_id for the dataset
`get`(subject_id, session_id, seq_id, run_id)	Returns a Sequence given subject/session/seq/run from the dataset
`get_sequence_ids`()	Returns a list of all sequence IDs in the dataset
`get_subject_ids`(seq_id)	Returns a list of all subject IDs in the dataset for a given sequence ID
`load`()	default method to load the dataset
`merge`(other)	Merges two datasets.
`subjects`()	Returns a list of all subject IDs in the dataset
`traverse_horizontal`(seq_id)	Generator to traverse the dataset horizontally.
`traverse_vertical2`(seq_id1, seq_id2)	Generator to traverse the dataset vertically.
`traverse_vertical_multi`(*seq_ids)	Generator to traverse the dataset vertically.

__init__( data_source: List | Path | str | None = None, is_complete: bool = True, name: str = 'Dataset', ds_format: str = 'dicom', )#: constructor

add(subject_id, session_id, seq_id, run_id, seq)#

Adds a given sequence to provided subject_id, session_id and run_id for the dataset

Parameters:

subject_idstr: Unique identifier for the Subject. For example, a subject ID can be a string like 'sub-01' or '001'.
session_idstr: Unique identifier the Session. For example, a session ID can be a string like 'ses-01' or '001'. For DICOM datasets, this can be StudyInstanceUID.
seq_idstr: Unique identifier the Sequence. For example, a sequence ID can be a string like 'fMRI' or 't1w'.
run_idstr: Unique identifier the Run. For example, a run ID can be a string like 'run-01' or '001'. For DICOM datasets, this can be SeriesInstanceUID.
seqprotocol.BaseSequence: Instance of the sequence

get(subject_id, session_id, seq_id, run_id, default=None)#

Returns a Sequence given subject/session/seq/run from the dataset

Parameters:

subject_idstr: Unique identifier for the Subject. For example, a subject ID can be a string like 'sub-01' or '001'.
session_idstr: Unique identifier the Session. For example, a session ID can be a string like 'ses-01' or '001'. For DICOM datasets, this can be StudyInstanceUID.
seq_idstr: Unique identifier the Sequence. For example, a sequence ID can be a string like 'fMRI' or 't1w'.
run_idstr: Unique identifier the Run. For example, a run ID can be a string like 'run-01' or '001'. For DICOM datasets, this can be SeriesInstanceUID.
defaultAny: Default value to return if the sequence is not found

get_sequence_ids()#: Returns a list of all sequence IDs in the dataset

get_subject_ids(seq_id)#

Returns a list of all subject IDs in the dataset for a given sequence ID

Parameters:

seq_idstr: Name of the Sequence ID

abstract load()#: default method to load the dataset

merge(other)#

Merges two datasets. This function is an alias for _merge(). It is provided for intuitive use. See _merge() for more details. It can be overloaded by the child classes to provide additional functionality.

Parameters:

otherBaseDataset: Another instance of BaseDataset to merge with the current dataset

subjects()#: Returns a list of all subject IDs in the dataset

traverse_horizontal(seq_id)#

Generator to traverse the dataset horizontally. i.e., all subjects, across sessions and runs for a given sequence. The method will yield a tuple of (subject_id, session_id, run_id, sequence) for each sequence in the dataset.

Parameters:

seq_idstr: Name of the Sequence ID

Yields:

tuple_idstuple: A tuple of subject_id, session_id, run_id, and protocol.Sequence instance

traverse_vertical2(seq_id1, seq_id2)#

Generator to traverse the dataset vertically. i.e., sequences for a particular subject. The method will yield sequences from the same session for a given subject. For example, fMRI and associated field maps from the same session.

Parameters:

seq_id1str: Name of the Sequence ID
seq_id2str: Name of the Sequence ID

Yields:

tuple_idstuple: A tuple of subj, sess, run, seq_one, seq_two

traverse_vertical_multi(*seq_ids)#

Generator to traverse the dataset vertically. i.e., sequences for a particular subject. The method will yield multiple sequences from the same session for a given subject. For example, fMRI, t1w and associated field maps from the same session.

Parameters:

seq_idslist: Sequence IDs to retrieve from the dataset

Returns:

tuple_ids_datatuple: A tuple of subj, sess, tuple_runs, tuple_seqs