To provide the most recent news and documentation www.pymvpa.org reflects the development 2.0 series (renamed 0.6 series) of PyMVPA. If you are interested in the documentation of the previous stable 0.4 series of PyMVPA, please visit v04.pymvpa.org.

mvpa2.clfs.stats.kstest

mvpa2.clfs.stats.kstest(rvs, cdf, args=(), N=20, alternative='two_sided', mode='approx', **kwds)

Perform the Kolmogorov-Smirnov test for goodness of fit

This performs a test of the distribution G(x) of an observed random variable against a given distribution F(x). Under the null hypothesis the two distributions are identical, G(x)=F(x). The alternative hypothesis can be either ‘two_sided’ (default), ‘less’ or ‘greater’. The KS test is only valid for continuous distributions.

Parameters :

rvs : string or array or callable

string: name of a distribution in scipy.stats

array: 1-D observations of random variables

callable: function to generate random variables, requires keyword argument size

cdf : string or callable

string: name of a distribution in scipy.stats, if rvs is a string then cdf can evaluate to False or be the same as rvs callable: function to evaluate cdf

args : tuple, sequence

distribution parameters, used if rvs or cdf are strings

N : int

sample size if rvs is string or callable

alternative : ‘two_sided’ (default), ‘less’ or ‘greater’

defines the alternative hypothesis (see explanation)

mode : ‘approx’ (default) or ‘asymp’

defines the distribution used for calculating p-value

‘approx’ : use approximation to exact distribution of test statistic

‘asymp’ : use asymptotic distribution of test statistic

Returns :

D : float

KS test statistic, either D, D+ or D-

p-value : float

one-tailed or two-tailed p-value

Notes

In the one-sided test, the alternative is that the empirical cumulative distribution function of the random variable is “less” or “greater” than the cumulative distribution function F(x) of the hypothesis, G(x)<=F(x), resp. G(x)>=F(x).

Examples

>>> from scipy import stats
>>> import numpy as np
>>> from scipy.stats import kstest
>>> x = np.linspace(-15,15,9)
>>> kstest(x,'norm')
(0.44435602715924361, 0.038850142705171065)
>>> np.random.seed(987654321) # set random seed to get the same result
>>> kstest('norm','',N=100)
(0.058352892479417884, 0.88531190944151261)

is equivalent to this

>>> np.random.seed(987654321)
>>> kstest(stats.norm.rvs(size=100),'norm')
(0.058352892479417884, 0.88531190944151261)

Test against one-sided alternative hypothesis:

>>> np.random.seed(987654321)

Shift distribution to larger values, so that cdf_dgp(x)< norm.cdf(x):

>>> x = stats.norm.rvs(loc=0.2, size=100)
>>> kstest(x,'norm', alternative = 'less')
(0.12464329735846891, 0.040989164077641749)

Reject equal distribution against alternative hypothesis: less

>>> kstest(x,'norm', alternative = 'greater')
(0.0072115233216311081, 0.98531158590396395)

Don’t reject equal distribution against alternative hypothesis: greater

>>> kstest(x,'norm', mode='asymp')
(0.12464329735846891, 0.08944488871182088)

Testing t distributed random variables against normal distribution:

With 100 degrees of freedom the t distribution looks close to the normal distribution, and the kstest does not reject the hypothesis that the sample came from the normal distribution

>>> np.random.seed(987654321)
>>> stats.kstest(stats.t.rvs(100,size=100),'norm')
(0.072018929165471257, 0.67630062862479168)

With 3 degrees of freedom the t distribution looks sufficiently different from the normal distribution, that we can reject the hypothesis that the sample came from the normal distribution at a alpha=10% level

>>> np.random.seed(987654321)
>>> stats.kstest(stats.t.rvs(3,size=100),'norm')
(0.131016895759829, 0.058826222555312224)

NeuroDebian

NITRC-listed