mvpa2.clfs.stats.match_distribution(data, nsamples=None, loc=None, scale=None, args=None, test='kstest', distributions=None, **kwargs)

Determine best matching distribution.

Can be used for ‘smelling’ the data, as well to choose a parametric distribution for data obtained from non-parametric testing (e.g. MCNullDist).

WiP: use with caution, API might change


data : np.ndarray

Array of the data for which to deduce the distribution. It has to be sufficiently large to make a reliable conclusion

nsamples : int or None

If None – use all samples in data to estimate parametric distribution. Otherwise use only specified number randomly selected from data.

loc : float or None

Loc for the distribution (if known)

scale : float or None

Scale for the distribution (if known)

test : str

What kind of testing to do. Choices:

detection power for a given ROC. Needs two parameters: p=0.05 and tail='both'


‘full-body’ distribution comparison. The best choice is made by minimal reported distance after estimating parameters of the distribution. Parameter p=0.05 sets threshold to reject null-hypothesis that distribution is the same. WARNING: older versions (e.g. 0.5.2 in etch) of scipy have incorrect kstest implementation and do not function properly.

distributions : None or list of str or tuple(str, dict)

Distributions to check. If None, all known in scipy.stats are tested. If distribution is specified as a tuple, then it must contain name and additional parameters (name, loc, scale, args) in the dictionary. Entry ‘scipy’ adds all known in scipy.stats.


Additional arguments which are needed for each particular test (see above)


>>> from mvpa2.clfs.stats import match_distribution
>>> data = np.random.normal(size=(1000,1));
>>> matches = match_distribution(
...   data,
...   distributions=['rdist',
...                  ('rdist', {'name':'rdist_fixed',
...                             'loc': 0.0,
...                             'args': (10,)})],
...   nsamples=30, test='p-roc', p=0.05)