mvpa2.datasets.base.preprocessed_dataset(src, raw_loader, ds_converter, preproc_raw=None, preproc_ds=None, add_sa=None, **kwargs)

Convenience function to load and preprocess data into a dataset.

It wraps any given callable that converts data in some format into a PyMVPA dataset. Specifically, this function does three things.

  1. Provide an interface for pre-processing in raw data space.
  2. Convenience functionality to add sample attributes to the dataset.
  3. Provide an interface for sample pre-processing after initial conversion into a dataset

First, data is loaded with the specific raw_loader, and any desired raw data pre-processing is performed by calling `` preproc_raw`` with the output of the loader function. Next, ds_converter is called to yield an initial dataset. The user is responsible for passing callabled that are input/output compatible with each other.

Afterwards, any additional sample attributes are assigned to the dataset. Lastly, the resulting dataset is subjected to another pre-processing step by passing it to preproc_ds. This is another callable that can be any of PyMVPA’s mapper implementations (or another functions that takes a dataset as argument and returns a dataset).


src : any

Specification of the data source in any format that is understood by raw_loader.

raw_loader : callable

Callable that takes src as argument, and returned data in a form that is understood by ds_converter (and any given preproc_raw callable).

ds_converter : callable

Callable that takes the output of raw_loader or preproc_raw as argument and returns a PyMVPA dataset.

preproc_raw : callable or None

If not None, this callable is used to perform initial preprocessing after loading the data from its source. Must return data in a form that is understood by ds_converter.

preproc_ds : callable or None

If not None, this callable will be called with the created dataset to perform any additional pre-processing. The callable must return a dataset.

add_sa : dict or recarray or None

Additional sample attributes to assign to the dataset. In case of a NumPy record array, all values for each sub-dtype are assigned as an attribute under their respective field name.


Any additional arguments are passed on to ds_converter.




Load 4D BOLD fMRI data

>>> import nibabel as nb
>>> from mvpa2.datasets.mri import fmri_dataset
>>> from mvpa2.mappers.detrend import PolyDetrendMapper
>>> ds = preprocessed_dataset(
...         'mvpa2/data/bold.nii.gz', nb.load, fmri_dataset,
...         mask='mvpa2/data/mask.nii.gz',
...         preproc_ds=PolyDetrendMapper(polyord=2, auto_train=True))