To provide the most recent news and documentation www.pymvpa.org reflects the development 2.0 series (renamed 0.6 series) of PyMVPA. If you are interested in the documentation of the previous stable 0.4 series of PyMVPA, please visit v04.pymvpa.org.

Simple Plotting of Classifier BehaviorΒΆ

This example runs a number of classifiers on a simple 2D dataset and plots the decision surface of each classifier.

First compose some sample data – no PyMVPA involved.

import numpy as np

# set up the labeled data
# two skewed 2-D distributions
num_dat = 200
dist = 4
# Absolute max value allowed. Just to assure proper plots
xyamax = 10
feat_pos=np.random.randn(2, num_dat)
feat_pos[0, :] *= 2.
feat_pos[1, :] *= .5
feat_pos[0, :] += dist
feat_pos = feat_pos.clip(-xyamax, xyamax)
feat_neg=np.random.randn(2, num_dat)
feat_neg[0, :] *= .5
feat_neg[1, :] *= 2.
feat_neg[0, :] -= dist
feat_neg = feat_neg.clip(-xyamax, xyamax)

# set up the testing features
npoints = 101
x1 = np.linspace(-xyamax, xyamax, npoints)
x2 = np.linspace(-xyamax, xyamax, npoints)
x,y = np.meshgrid(x1, x2);
feat_test = np.array((np.ravel(x), np.ravel(y)))

Now load PyMVPA and convert the data into a proper Dataset.

from mvpa2.suite import *

# create the pymvpa dataset from the labeled features
patternsPos = dataset_wizard(samples=feat_pos.T, targets=1)
patternsNeg = dataset_wizard(samples=feat_neg.T, targets=0)
ds_lin = vstack((patternsPos, patternsNeg))

Let’s add another dataset: XOR. This problem is not linear separable and therefore need a non-linear classifier to be solved. The dataset is provided by the PyMVPA dataset warehouse.

# 30 samples per condition, SNR 2
ds_nl = pure_multivariate_signal(30, 2)
l1 = ds_nl.sa['targets'].unique[1]

datasets = {'linear': ds_lin, 'non-linear': ds_nl}

This demo utilizes a number of classifiers. The instantiation of a classifier involves almost no runtime costs, so it is easily possible compile a long list, if necessary.

# set up classifiers to try out
clfs = {
        'Ridge Regression': RidgeReg(),
        'Linear SVM': LinearNuSVMC(probability=1,
                      enable_ca=['probabilities']),
        'RBF SVM': RbfNuSVMC(probability=1,
                      enable_ca=['probabilities']),
        'SMLR': SMLR(lm=0.01),
        'Logistic Regression': PLR(criterion=0.00001),
        '3-Nearest-Neighbour': kNN(k=3),
        '10-Nearest-Neighbour': kNN(k=10),
        'GNB': GNB(common_variance=True),
        'GNB(common_variance=False)': GNB(common_variance=False),
        'LDA': LDA(),
        'QDA': QDA(),
        }

# How many rows/columns we need
nx = int(ceil(np.sqrt(len(clfs))))
ny = int(ceil(len(clfs)/float(nx)))

Now we are ready to run the classifiers. The following loop trains and queries each classifier to finally generate a nice plot showing the decision surface of each individual classifier, both for the linear and the non-linear dataset.

for id, ds in datasets.iteritems():
    # loop over classifiers and show how they do
    fig = 0

    # make a new figure
    pl.figure(figsize=(nx*4, ny*4))

    print "Processing %s problem..." % id

    for c in sorted(clfs):
        # tell which one we are doing
        print "Running %s classifier..." % (c)

        # make a new subplot for each classifier
        fig += 1
        pl.subplot(ny, nx, fig)

        # select the clasifier
        clf = clfs[c]

        # enable saving of the estimates used for the prediction
        clf.ca.enable('estimates')

        # train with the known points
        clf.train(ds)

        # run the predictions on the test values
        pre = clf.predict(feat_test.T)

        # if ridge, use the prediction, otherwise use the values
        if c == 'Ridge Regression':
            # use the prediction
            res = np.asarray(pre)
        elif 'Nearest-Ne' in c:
            # Use the dictionaries with votes
            res = np.array([e[l1] for e in clf.ca.estimates]) \
                  / np.sum([e.values() for e in clf.ca.estimates], axis=1)
        elif c == 'Logistic Regression':
            # get out the values used for the prediction
            res = np.asarray(clf.ca.estimates)
        elif c in ['SMLR']:
            res = np.asarray(clf.ca.estimates[:, 1])
        elif c in ['LDA', 'QDA'] or c.startswith('GNB'):
            # Since probabilities are logprobs -- just for
            # visualization of trade-off just plot relative
            # "trade-off" which determines decision boundaries if an
            # alternative log-odd value was chosen for a cutoff
            res = np.asarray(clf.ca.estimates[:, 1]
                             - clf.ca.estimates[:, 0])
            # Scale and position around 0.5
            res = 0.5 + res/max(np.abs(res))
        else:
            # get the probabilities from the svm
            res = np.asarray([(q[1][1] - q[1][0] + 1) / 2
                    for q in clf.ca.probabilities])

        # reshape the results
        z = np.asarray(res).reshape((npoints, npoints))

        # plot the predictions
        pl.pcolor(x, y, z, shading='interp')
        pl.clim(0, 1)
        pl.colorbar()
        # plot decision surfaces at few levels to emphasize the
        # topology
        pl.contour(x, y, z, [0.1, 0.4, 0.5, 0.6, 0.9],
                   linestyles=['dotted', 'dashed', 'solid', 'dashed', 'dotted'],
                   linewidths=1, colors='black', hold=True)

        # plot the training points
        pl.plot(ds.samples[ds.targets == 1, 0],
               ds.samples[ds.targets == 1, 1],
               "r.")
        pl.plot(ds.samples[ds.targets == 0, 0],
               ds.samples[ds.targets == 0, 1],
               "b.")

        pl.axis('tight')
        # add the title
        pl.title(c)

See also

The full source code of this example is included in the PyMVPA source distribution (doc/examples/pylab_2d.py).

Previous topic

kNN – Model Flexibility in Pictures

Next topic

Generating Topography plots

NeuroDebian

NITRC-listed