Using scikit-learn transformers with PyMVPA

Scikit-learn is a rich library of algorithms, many of them implementing the transformer API. PyMVPA provides a wrapper class, SKLTransformer that enables the use of all of these algorithms within the PyMVPA framework. With this adaptor the transformer API is presented as a PyMVPA mapper interface that is fully compatible with all other building blocks of PyMVPA.

In this example we demonstrate this interface by mimicking the “Comparison of Manifold Learning methods” example from the scikit-learn documentation – applying the minimal modifications necessary to run a variety of scikit-learn algorithm implementation on PyMVPA datasets.

This script also prints the same timing information as the original.

print(__doc__)

from time import time

import pylab as pl
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.ticker import NullFormatter

from sklearn import manifold
# Next line to silence pyflakes. This import is needed.
Axes3D

n_points = 1000
n_neighbors = 10
n_components = 2

So far the code has been identical. The first difference is the import of the adaptor class. We also load the scikit-learn demo dataset, but also with the help of a wrapper function that yields a PyMVPA dataset.

# this first import is only required to run the example a part of the test suite
from mvpa2 import cfg
from mvpa2.mappers.skl_adaptor import SKLTransformer

# load the S-curve dataset
from mvpa2.datasets.sources.skl_data import skl_s_curve
ds = skl_s_curve(n_points)

And we continue with practically identical code.

fig = pl.figure(figsize=(15, 8))
pl.suptitle("Manifold Learning with %i points, %i neighbors"
            % (1000, n_neighbors), fontsize=14)

try:
    # compatibility matplotlib < 1.0
    X = ds.samples
    ax = fig.add_subplot(241, projection='3d')
    ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=ds.targets, cmap=pl.cm.Spectral)
    ax.view_init(4, -72)
except:
    X = ds.samples
    ax = fig.add_subplot(241, projection='3d')
    pl.scatter(X[:, 0], X[:, 2], c=ds.targets, cmap=pl.cm.Spectral)

methods = ['standard', 'ltsa', 'hessian', 'modified']
labels = ['LLE', 'LTSA', 'Hessian LLE', 'Modified LLE']

for i, method in enumerate(methods):
    t0 = time()
    # create an instance of the algorithm from scikit-learn
    # and wrap it by SKLTransformer

The following lines are an example of the only significant modification with respect to a pure scikit-learn implementation: the transformer is wrapped into the adaptor. The result is a mapper, hence can be called with a dataset that contains both samples and targets – without explcitly calling fit() and transform().

»    lle = SKLTransformer(manifold.LocallyLinearEmbedding(n_neighbors,
                                                          n_components,
                                                          eigen_solver='auto',
                                                          method=method))
     # call the SKLTransformer instance on the input dataset
     Y = lle(ds)

The rest of the example is unmodified except for the wrapping of the respective transformer into the Mapper adaptor.

»    t1 = time()
     print("%s: %.2g sec" % (methods[i], t1 - t0))

     ax = fig.add_subplot(242 + i)
     pl.scatter(Y[:, 0], Y[:, 1], c=ds.targets, cmap=pl.cm.Spectral)
     pl.title("%s (%.2g sec)" % (labels[i], t1 - t0))
     ax.xaxis.set_major_formatter(NullFormatter())
     ax.yaxis.set_major_formatter(NullFormatter())
     pl.axis('tight')

 t0 = time()
 # create an instance of the algorithm from scikit-learn
 # and wrap it by SKLTransformer
 iso = SKLTransformer(manifold.Isomap(n_neighbors=10, n_components=2))
 # call the SKLTransformer instance on the input dataset
 Y = iso(ds)
 t1 = time()
 print("Isomap: %.2g sec" % (t1 - t0))
 ax = fig.add_subplot(246)
 pl.scatter(Y[:, 0], Y[:, 1], c=ds.targets, cmap=pl.cm.Spectral)
 pl.title("Isomap (%.2g sec)" % (t1 - t0))
 ax.xaxis.set_major_formatter(NullFormatter())
 ax.yaxis.set_major_formatter(NullFormatter())
 pl.axis('tight')


 t0 = time()
 # create an instance of the algorithm from scikit-learn
 # and wrap it by SKLTransformer
 mds = SKLTransformer(manifold.MDS(n_components=2, max_iter=100,
                                   n_init=1, dissimilarity='euclidean'))
 # call the SKLTransformer instance on the input dataset
 Y = mds(ds)
 t1 = time()
 print("MDS: %.2g sec" % (t1 - t0))
 ax = fig.add_subplot(247)
 pl.scatter(Y[:, 0], Y[:, 1], c=ds.targets, cmap=pl.cm.Spectral)
 pl.title("MDS (%.2g sec)" % (t1 - t0))
 ax.xaxis.set_major_formatter(NullFormatter())
 ax.yaxis.set_major_formatter(NullFormatter())
 pl.axis('tight')


 t0 = time()
 # create an instance of the algorithm from scikit-learn
 # and wrap it by SKLTransformer
 se = SKLTransformer(manifold.SpectralEmbedding(n_components=n_components,
                                                n_neighbors=n_neighbors))
 # call the SKLTransformer instance on the input dataset
 Y = se(ds)
 t1 = time()
 print("SpectralEmbedding: %.2g sec" % (t1 - t0))
 ax = fig.add_subplot(248)
 pl.scatter(Y[:, 0], Y[:, 1], c=ds.targets, cmap=pl.cm.Spectral)
 pl.title("SpectralEmbedding (%.2g sec)" % (t1 - t0))
 ax.xaxis.set_major_formatter(NullFormatter())
 ax.yaxis.set_major_formatter(NullFormatter())
 pl.axis('tight')

See also

The full source code of this example is included in the PyMVPA source distribution (doc/examples/skl_transformer_demo.py).