API Reference

For the sake of brevity, we highlight only the key parts of the mrQA API.

The most important methods are mrQA.project.check_compliance. Here is a summarized reference of commonly used methods.

mrQA.project module

mrQA.project.check_compliance(dataset: BaseDataset, strategy: str = 'majority', decimals: int = 3, output_dir: Optional[Union[Path, str]] = None, verbose: bool = False)[source]

Main function for checking compliance. Infers the reference protocol according to the user chosen strategy, and then generates a compliance report

Parameters
  • dataset (BaseDataset) – BaseDataset instance for the dataset to be checked for compliance

  • strategy (str) – Strategy employed to specify or automatically infer the reference protocol. Allowed options are ‘majority’

  • output_dir (Union[Path, str]) – Path to save the report

  • decimals (int) – Number of decimal places to round to (default:3).

  • verbose (bool) – print more if true

Returns

report_path – Path to the generated report

Return type

Path

Raises
  • ValueError – If the input dataset is empty or otherwise invalid

  • NotImplementedError – If the input strategy is not supported

  • NotADirectoryError – If the output directory doesn’t exist

mrQA.project.compare_with_majority(dataset: BaseDataset, decimals: int = 3) BaseDataset[source]

Method for post-acquisition compliance. Infers the reference protocol/values by looking for the most frequent values, and then identifying deviations

Parameters
  • dataset (BaseDataset) – BaseDataset instance for the dataset which is to be checked for compliance

  • decimals (int) – Number of decimal places to round to (default:3).

Returns

dataset – Adds the non-compliance information to the same BaseDataset instance and returns it.

Return type

BaseDataset

mrQA.project.generate_report(dataset: BaseDataset, report_path: str, sub_lists_dir_path: str, output_dir: Union[Path, str]) Path[source]

Generates an HTML report aggregating and summarizing the non-compliance discovered in the dataset.

Parameters
  • dataset (BaseDataset) – BaseDataset instance for the dataset which is to be checked

  • report_path (str) – Name of the file to be generated, without extension. Ensures that naming is consistent across the report, dataset and record files

  • sub_lists_dir_path (str) – Path to the directory in which the subject lists should be stored

  • output_dir (Union[Path, str]) – Directory in which the generated report should be stored.

Returns

output_path – Path to the generated report

Return type

Path

mrQA.run_parallel module

This module contains functions to run the compliance checks in parallel

mrQA.run_parallel.create_script(data_source: Optional[Union[str, Path, Iterable]] = None, ds_format: str = 'dicom', include_phantom: bool = False, verbose: bool = False, output_dir: Optional[Union[Path, str]] = None, debug: bool = False, subjects_per_job: Optional[int] = None, hpc: bool = False, conda_dist: Optional[str] = None, conda_env: Optional[str] = None)[source]

Given a folder(or List[folder]) it will divide the work into smaller jobs. Each job will contain a fixed number of subjects. These jobs can be executed in parallel to save time.

Parameters
  • data_source (str or List[str]) – /path/to/my/dataset containing files

  • ds_format (str) – Specify dataset type. Use one of [dicom]

  • include_phantom (bool) – Include phantom scans in the dataset

  • verbose (bool) – Print progress

  • output_dir (str) – Path to save the output dataset

  • debug (bool) – If True, the dataset will be created locally. This is useful for testing

  • subjects_per_job (int) – Number of subjects per job. Recommended value is 50 or 100

  • hpc (bool) – If True, the scripts will be generated for HPC, not for local execution

  • conda_dist (str) – Name of conda distribution

  • conda_env (str) – Name of conda environment

mrQA.run_parallel.get_parser()[source]
mrQA.run_parallel.main()[source]
mrQA.run_parallel.parse_args()[source]
mrQA.run_parallel.process_parallel(data_source: Union[str, Path], output_dir: Union[str, Path], out_mrds_path: Union[str, Path], name: Optional[str] = None, subjects_per_job: int = 5, conda_env: str = 'mrcheck', conda_dist: str = 'anaconda3', hpc: bool = False)[source]

Given a folder(or List[folder]) it will divide the work into smaller jobs. Each job will contain a fixed number of subjects. These jobs can be executed in parallel to save time.

Parameters
  • data_source (str or Path) – Path to the folder containing the subject folders

  • output_dir (str or Path) – Path to the folder where the output will be saved

  • out_mrds_path (str or Path) – Path to the final output mrds file

  • name (str) – Name of the final output file

  • subjects_per_job (int) – Number of subjects to be processed in each job

  • conda_env (str) – Name of the conda environment to be used

  • conda_dist (str) – Name of the conda distribution to be used

  • hpc (bool) – Whether to use HPC or not

mrQA.run_parallel.split_ids_list(data_source: Union[str, Path], all_ids_path: Union[str, Path], per_batch_ids: Union[str, Path], output_dir: Union[str, Path], subjects_per_job: int = 50)[source]

Splits a given set of subjects into multiple jobs and creates separate text files containing the list of subjects. Each text file contains the list of subjects to be processed in a single job.

Parameters
  • data_source (Union[str, Path]) – Path to the root directory of the data

  • all_ids_path (Union[str, Path]) – Path to the output directory

  • per_batch_ids (Union[str, Path]) – filepath to a file which has paths to all txt files for all jobs. Each of these txt files contains a list of subject ids for corresponding job.

  • output_dir (Union[str, Path]) – Name of the output directory

  • subjects_per_job (int) – Number of subjects to process in each job

Returns

batch_ids_path_list – Paths to the text files, each containing a list of subjects

Return type

list

mrQA.run_parallel.submit_job(scripts_list_filepath: Union[str, Path], mrds_list_filepath: Union[str, Path], hpc: bool = False) None[source]

Given a folder(or List[folder]) it will divide the work into smaller jobs. Each job will contain a fixed number of subjects. These jobs can be executed in parallel to save time.

Parameters
  • scripts_list_filepath (str) – Path to the file containing list of bash scripts to be executed

  • mrds_list_filepath (str) – Path to the file containing list of partial mrds files to be created

  • hpc (bool) – If True, the scripts will be generated for HPC, not for local execution

Return type

None