API Reference#

The page holds MRdataset's API documentation, which might be helpful for users or developers to create interface with their own neuroimaging datasets. Among the different sub-packages and modules, there are two categories: core API and high-level API. The core api contains modules for important elements (e.g. BaseDataset).

High level API#

The high-level API contains functions that are useful for importing datasets from disk. After importing these objects can be saved/reloaded as pickle files.

import_dataset(
data_source: str | Path | List,
ds_format: str = 'dicom',
name: str | None = None,
verbose: bool = False,
is_complete: bool = True,
config_path: str | Path | None = None,
output_dir: str | Path | None = None,
**_kwargs,
) BaseDataset#

Create MRdataset from data source as per arguments. This function acts as a Wrapper class for BaseDataset. This is the main interface between this package and your dataset. This function is used by the CLI and the python scripts.

Parameters:
data_sourceUnion[str, Path, List]

path/to/my/dataset containing files e.g. .dcm

ds_formatstr

Specify dataset type. Imports the module "{ds_format}.py", which will instantiate {ds_format}Dataset().

namestr

Name/Identifier for your dataset, like ADNI. The name used to save files and reports. If not provided, a random name is generated e.g. 54231

verbose: bool

The flag allows you to change the verbosity of execution

is_complete: bool
whether the dataset is subset of a larger dataset. It is useful for

parallel processing of large datasets.

config_path: Union[str, Path]

path to config file which contains the rules for reading the dataset e.g. sequences to read, subjects to ignore, etc.

output_dir: Union[str, Path]

path to the directory where the output files will be saved.

Returns:
datasetBaseDataset

dataset object containing the dataset

Examples

from MRdataset import import_dataset
data = import_dataset(data_source='/path/to/my/data/',
                      ds_format='dicom', name='abcd_baseline',
                      config_path='mri-config.json',
                      output_dir='/path/to/my/output/dir/')
load_mr_dataset(filepath: str | Path) BaseDataset#

Load a dataset from a file. The file must be a pickle file with extension .mrds.pkl

Parameters:
filepath: Union[str, Path]

path to the dataset file

Returns:
datasetBaseDataset

dataset loaded from the file

Examples

from MRdataset import load_mr_dataset
dataset = load_mr_dataset('/path/to/my/dataset.mrds.pkl')
save_mr_dataset(
filepath: str | Path,
mrds_obj: BaseDataset,
) None#

Save a dataset to a file with extension .mrds.pkl

Parameters:
filepath: Union[str, Path]

path to the dataset file

mrds_obj: BaseDataset

dataset to be saved

Returns:
None

Examples

from MRdataset import save_mr_dataset
my_dataset = import_dataset(data_source='/path/to/my/data/',
              ds_format='dicom', name='abcd_baseline',
              config_path='mri-config.json')
dataset = save_mr_dataset(filepath='/path/to/my/dataset.mrds.pkl',
                          mrds_obj=my_dataset)

Core API#

The Core API contains modules for important elements (e.g. Modality, Subject, Run etc.in a neuroimaging experiment.

class DicomDataset#

Bases: BaseDataset, ABC

This class represents a dataset of dicom files. It is a subclass of BaseDataset.

Parameters:
data_sourcestr or List[str]

The path or list of folders that contain the dicom files

patternstr

The pattern to match the file extension. Default is '*'.

namestr

The name of the dataset. Default is 'DicomDataset'.

config_pathstr

The path to the config file.

verbosebool

Whether to print verbose output on console. Default is False.

ds_formatstr

The format of the dataset. Default is 'dicom'. Choose one of ['dicom']

Methods

add(subject_id, session_id, seq_id, run_id, seq)

Adds a given sequence to provided subject_id, session_id and run_id for the dataset

get(subject_id, session_id, seq_id, run_id)

Returns a Sequence given subject/session/seq/run from the dataset

get_sequence_ids()

Returns a list of all sequence IDs in the dataset

get_subject_ids(seq_id)

Returns a list of all subject IDs in the dataset for a given sequence ID

load()

Default method to load the dataset.

merge(other)

Merges two dicom datasets

save_process_log([output_dir])

Saves the log file to the output directory.

subjects()

Returns a list of all subject IDs in the dataset

traverse_horizontal(seq_id)

Generator to traverse the dataset horizontally.

traverse_vertical2(seq_id1, seq_id2)

Generator to traverse the dataset vertically.

traverse_vertical_multi(*seq_ids)

Generator to traverse the dataset vertically.

__init__(
data_source,
pattern='*',
name='DicomDataset',
config_path=None,
verbose=False,
output_dir=None,
min_count=1,
**kwargs,
)#

constructor

load()#

Default method to load the dataset. It iterates over all the folders in the data_source and finds the sub-folders with at least min_count files. Then it iterates over all the sub-folders and processes them to find the dicom slices. It then runs some basic validation on them and adds them to the dataset.

merge(other)#

Merges two dicom datasets

save_process_log(output_dir=None)#

Saves the log file to the output directory. This log file contains the information about how many dicom files were processed from each folder. This is used to speed up the loading process.

Parameters:
output_dirstr | Path

The path to the output directory. Default is None.

class BidsDataset#

Bases: BaseDataset, ABC

Class to represent a BIDS dataset. It is a subclass of BaseDataset. It gathers data from JSON files.

Parameters:
data_sourcestr or List[str]

The path to the dataset.

patternstr

The pattern to match for JSON files.

namestr

The name of the dataset.

config_pathstr

The path to the config file.

verbosebool

Whether to print verbose output on console.

ds_formatstr

The format of the dataset. One of ['dicom', 'bids'].

Methods

add(subject_id, session_id, seq_id, run_id, seq)

Adds a given sequence to provided subject_id, session_id and run_id for the dataset

get(subject_id, session_id, seq_id, run_id)

Returns a Sequence given subject/session/seq/run from the dataset

get_sequence_ids()

Returns a list of all sequence IDs in the dataset

get_subject_ids(seq_id)

Returns a list of all subject IDs in the dataset for a given sequence ID

load()

Default method to load the dataset.

merge(other)

Merges two datasets.

subjects()

Returns a list of all subject IDs in the dataset

traverse_horizontal(seq_id)

Generator to traverse the dataset horizontally.

traverse_vertical2(seq_id1, seq_id2)

Generator to traverse the dataset vertically.

traverse_vertical_multi(*seq_ids)

Generator to traverse the dataset vertically.

__init__(
data_source,
pattern='*.json',
name='BidsDataset',
config_path=None,
verbose=False,
output_dir=None,
min_count=1,
**kwargs,
)#

constructor

load()#

Default method to load the dataset. It iterates over all the folders in the data_source and finds subfolders with at least min_count files matching the pattern. It then processes each subfolder and adds the sequence to the dataset.

class BaseDataset#

Bases: ABC

Base class for all datasets. The class provides a common interface to access the dataset in a hierarchical fashion. The hierarchy is as follows: Subject > Session > Sequence > Run

Parameters:
data_sourceList | Path | str

valid path to the dataset on disk

is_completebool

flag to indicate if the dataset is complete or not

namestr

name of the dataset

ds_formatstr

format of the dataset. One of ['dicom', 'bids']

Methods

add(subject_id, session_id, seq_id, run_id, seq)

Adds a given sequence to provided subject_id, session_id and run_id for the dataset

get(subject_id, session_id, seq_id, run_id)

Returns a Sequence given subject/session/seq/run from the dataset

get_sequence_ids()

Returns a list of all sequence IDs in the dataset

get_subject_ids(seq_id)

Returns a list of all subject IDs in the dataset for a given sequence ID

load()

default method to load the dataset

merge(other)

Merges two datasets.

subjects()

Returns a list of all subject IDs in the dataset

traverse_horizontal(seq_id)

Generator to traverse the dataset horizontally.

traverse_vertical2(seq_id1, seq_id2)

Generator to traverse the dataset vertically.

traverse_vertical_multi(*seq_ids)

Generator to traverse the dataset vertically.

__init__(
data_source: List | Path | str | None = None,
is_complete: bool = True,
name: str = 'Dataset',
ds_format: str = 'dicom',
)#

constructor

add(subject_id, session_id, seq_id, run_id, seq)#

Adds a given sequence to provided subject_id, session_id and run_id for the dataset

Parameters:
subject_idstr

Unique identifier for the Subject. For example, a subject ID can be a string like 'sub-01' or '001'.

session_idstr

Unique identifier the Session. For example, a session ID can be a string like 'ses-01' or '001'. For DICOM datasets, this can be StudyInstanceUID.

seq_idstr

Unique identifier the Sequence. For example, a sequence ID can be a string like 'fMRI' or 't1w'.

run_idstr

Unique identifier the Run. For example, a run ID can be a string like 'run-01' or '001'. For DICOM datasets, this can be SeriesInstanceUID.

seqprotocol.BaseSequence

Instance of the sequence

get(subject_id, session_id, seq_id, run_id, default=None)#

Returns a Sequence given subject/session/seq/run from the dataset

Parameters:
subject_idstr

Unique identifier for the Subject. For example, a subject ID can be a string like 'sub-01' or '001'.

session_idstr

Unique identifier the Session. For example, a session ID can be a string like 'ses-01' or '001'. For DICOM datasets, this can be StudyInstanceUID.

seq_idstr

Unique identifier the Sequence. For example, a sequence ID can be a string like 'fMRI' or 't1w'.

run_idstr

Unique identifier the Run. For example, a run ID can be a string like 'run-01' or '001'. For DICOM datasets, this can be SeriesInstanceUID.

defaultAny

Default value to return if the sequence is not found

get_sequence_ids()#

Returns a list of all sequence IDs in the dataset

get_subject_ids(seq_id)#

Returns a list of all subject IDs in the dataset for a given sequence ID

Parameters:
seq_idstr

Name of the Sequence ID

abstract load()#

default method to load the dataset

merge(other)#

Merges two datasets. This function is an alias for _merge(). It is provided for intuitive use. See _merge() for more details. It can be overloaded by the child classes to provide additional functionality.

Parameters:
otherBaseDataset

Another instance of BaseDataset to merge with the current dataset

subjects()#

Returns a list of all subject IDs in the dataset

traverse_horizontal(seq_id)#

Generator to traverse the dataset horizontally. i.e., all subjects, across sessions and runs for a given sequence. The method will yield a tuple of (subject_id, session_id, run_id, sequence) for each sequence in the dataset.

Parameters:
seq_idstr

Name of the Sequence ID

Yields:
tuple_idstuple

A tuple of subject_id, session_id, run_id, and protocol.Sequence instance

traverse_vertical2(seq_id1, seq_id2)#

Generator to traverse the dataset vertically. i.e., sequences for a particular subject. The method will yield sequences from the same session for a given subject. For example, fMRI and associated field maps from the same session.

Parameters:
seq_id1str

Name of the Sequence ID

seq_id2str

Name of the Sequence ID

Yields:
tuple_idstuple

A tuple of subj, sess, run, seq_one, seq_two

traverse_vertical_multi(*seq_ids)#

Generator to traverse the dataset vertically. i.e., sequences for a particular subject. The method will yield multiple sequences from the same session for a given subject. For example, fMRI, t1w and associated field maps from the same session.

Parameters:
seq_idslist

Sequence IDs to retrieve from the dataset

Returns:
tuple_ids_datatuple

A tuple of subj, sess, tuple_runs, tuple_seqs