Using scikit-learn classifiers with PyMVPA

Scikit-learn is a rich library of algorithms, many of them implementing the estimator and predictor API. PyMVPA provides the wrapper class, SKLLearnerAdapter that enables the use of all of these algorithms within the PyMVPA framework. With this adaptor these aspects of the scikit-learn API are presented through a PyMVPA learner interface that is fully compatible with all other building blocks of PyMVPA.

In this example we demonstrate this interface by mimicking the “Nearest Neighbors Classification” example from the scikit-learn documentation – applying the minimal modifications necessary to run two variants of the scikit-learn k-nearest neighbors algorithm implementation on PyMVPA datasets.


import numpy as np
import pylab as pl
from matplotlib.colors import ListedColormap
from sklearn import neighbors

n_neighbors = 15

So far the code has been identical. The first difference is the import of the adaptor class. We also load the scikit-learn demo dataset, but also with the help of a wrapper function that yields a PyMVPA dataset.

# this first import is only required to run the example a part of the test suite
from mvpa2 import cfg
from mvpa2.clfs.skl.base import SKLLearnerAdapter

# load the iris dataset
from mvpa2.datasets.sources.skl_data import skl_iris
iris = skl_iris()
# compact dataset summary
print iris

The original example uses only the first two features of the dataset, since it intends to visualize learned classification boundaries in 2-D. We can do the same slicing directly on our PyMVPA dataset.


d = {'setosa':0, 'versicolor':1, 'virginica':2}

For visualization we will later map the literal class labels onto numerival values. Besides that, we continue with practically identical code.

h = .02  # step size in the mesh

cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])

for weights in ['uniform', 'distance']:
    # create an instance of the algorithm from scikit-learn,
    # wrap it by SKLLearnerAdapter and finally train it
    clf = neighbors.KNeighborsClassifier(n_neighbors, weights=weights)

The following lines are an example of the only significant modification with respect to a pure scikit-learn implementation: the classifier is wrapped into the adaptor. The result is a PyMVPA classifier, hence can be called with a dataset that contains both samples and targets.

»    # shortcut
     X = iris.samples
     # Plot the decision boundary. For that, we will assign a color to each
     # point in the mesh [x_min, m_max]x[y_min, y_max].
     x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
     y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
     xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                          np.arange(y_min, y_max, h))
     Z = wrapped_clf.predict(np.c_[xx.ravel(), yy.ravel()])

     # to put the result into a color plot we now need numerical values
     # this can be done nicely in PyMVPA
     from mvpa2.datasets.formats import AttributeMap
     Z = AttributeMap().to_numeric(Z)
     Z = Z.reshape(xx.shape)

     pl.pcolormesh(xx, yy, Z, cmap=cmap_light)

     # For plotting the training points we convert to numerical values again
     pl.scatter(X[:, 0], X[:, 1], c=AttributeMap().to_numeric(iris.targets),
     pl.xlim(xx.min(), xx.max())
     pl.ylim(yy.min(), yy.max())
     pl.title("3-Class classification (k = %i, weights = '%s')"
              % (n_neighbors, weights))

This example shows that a PyMVPA classifier can be used in pretty much the same way as the corresponding scikit-learn API. What this example does not show is that with the SKLLearnerAdapter class any scikit-learn classifier can be employed in arbitrarily complex PyMVPA processing pipelines and is enhanced with automatic training and all other functionality of PyMVPA classifier implementations.

See also

The full source code of this example is included in the PyMVPA source distribution (doc/examples/