Introduction¶

PyMVPA is a Python module intended to ease pattern classification analysis of large datasets. It provides high-level abstraction of typical processing steps and a number of implementations of some popular algorithms. While it is not limited to neuroimaging data it is eminently suited for such datasets. PyMVPA is truly free software (in every respect) and additionally requires nothing but free software to run. Theoretically PyMVPA should run on anything that can run a Python interpreter, although the proof is yet to come.

PyMVPA stands for Multivariate Pattern Analysis in Python.

What this Manual is NOT¶

This manual does not make an attempt to be a comprehensive introduction into machine learning theory. There is a wealth of high-quality text books about this field available. Two very good examples are: Pattern Recognition and Machine Learning by Christopher M. Bishop, and The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani, and Jerome Friedman (PDF was generously made available online free of charge).

There is a growing number of introductory papers about the application of machine learning algorithms to (f)MRI data. A very high-level overview about the basic principles is available in Mur et al. (2009). A more detailed tutorial covering a wide variety of aspects is provided in Pereira et al. (2009). Two reviews by Norman et al. (2006) and Haynes and Rees (2006) give a broad overview about the literature.

This manual also does not describe every technical bit and piece of the PyMVPA package, but is instead focused on the user perspective. Developers should have a look at the API documentation, which is a detailed, comprehensive and up-to-date description of the whole package. Users looking for an overview of the public programming interface of the framework are referred to the Module Reference. The Module Reference is similar to the API reference, but hides overly technical information, which are only relevant for people intending to extend the framework by adding more functionality.

More examples and usage patterns extending the ones described here can be taken from the examples shipped with the PyMVPA source distribution (doc/examples/; some of them are also available in the Example Analyses and Scripts chapter of this manual) or even the unit test battery, also part of the source distribution (in the tests/ directory).

A bit of History¶

The roots of PyMVPA date back to early 2005. At that time it was a C++ library (no Python yet) developed by Michael Hanke and Sebastian Krüger, intended to make it easy to apply artificial neural networks to pattern recognition problems.

During a visit to Princeton University in spring 2005, Michael Hanke was introduced to the MVPA toolbox for Matlab, which had several advantages over a C++ library. Most importantly it was easier to use. While a user of a C++ library is forced to write a significant amount of front-end code, users of the MVPA toolbox could simply load their data and start analyzing it, providing a common interface to functions drawn from a variety of libraries.

However, there are some disadvantages when writing a toolbox in Matlab. While users in general benefit from the powers of Matlab, they are at the same time bound to the goodwill of a commercial company. That this is indeed a problem becomes obvious when one considers the time when the vendor of Matlab was not willing to support the Mac platform. Therefore even if the MVPA toolbox is GPL-licensed it cannot fully benefit from the enormous advantages of the free software development model environment (free as in free speech, not only free beer).

For these reasons, Michael thought that a successor to the C++ library should remain truly free software, remain fully object-oriented (in contrast to the MVPA toolbox), but should be at least as easy to use and extensible as the MVPA toolbox.

After evaluating some possibilities Michael decided that Python is the most promising candidate that was fully capable of fulfilling the intended development goal. Python is a very powerful language that magically combines the possibility to write really fast code and a simplicity that allows one to learn the basic concepts within a few days.

One of the major advantages of Python is the availability of a huge amount of so called modules. Modules can include extensions written in a hardcore language like C (or even FORTRAN) and therefore allow one to incorporate high-performance code without having to leave the Python environment. Additionally some Python modules even provide links to other toolkits. For example RPy allows to use the full functionality of R from inside Python. Even Matlab can be used via some Python modules (see PyMatlab for an example).

After the decision for Python was made, Michael started development with a simple k-Nearest-Neighbor classifier and a cross-validation class. Using the mighty NumPy package made it easy to support data of any dimensionality. Therefore PyMVPA can easily be used with 4d fMRI dataset, but equally well with EEG/MEG data (3d) or even non-neuroimaging datasets.

By September 2007 PyMVPA included support for reading and writing datasets from and to the NIfTI format, kNN and Support Vector Machine classifiers, as well as several analysis algorithms (e.g. searchlight and incremental feature search).

During another visit in Princeton in October 2007 Michael met with Yaroslav Halchenko and Per B. Sederberg. That incident and the following discussions and hacking sessions of Michael and Yaroslav lead to a major refactoring of the PyMVPA codebase, making it much more flexible/extensible, faster and easier than it has ever been before.

How to cite PyMVPA¶

Below is a list of publications about PyMVPA that have been published so far (in chronological order). If you use PyMVPA in your research please cite the one that matches best, and email use the reference so we could add it to our Who Is Using It? page.

Peer-reviewed publications¶

Hanke, M., Halchenko, Y. O., Sederberg, P. B., Hanson, S. J., Haxby, J. V. & Pollmann, S. (2009). PyMVPA: A Python toolbox for multivariate pattern analysis of fMRI data. Neuroinformatics, 7, 37-53.: First paper introducing fMRI data analysis with PyMVPA.

Hanke, M., Halchenko, Y. O., Sederberg, P. B., Olivetti, E., Fründ, I., Rieger, J. W., Herrmann, C. S., Haxby, J. V., Hanson, S. J. and Pollmann, S. (2009) PyMVPA: a unifying approach to the analysis of neuroscientific data. Frontiers in Neuroinformatics, 3:3.: Demonstration of PyMVPA capabilities concerning multi-modal or modality-agnostic data analysis.

Hanke, M., Halchenko, Y. O., Haxby, J. V., and Pollmann, S. (2010) Statistical learning analysis in neuroscience: aiming for transparency. Frontiers in Neuroscience. 4,1: 38-43: Focused review article emphasizing the role of transparency to facilitate adoption and evaluation of statistical learning techniques in neuroimaging research.
Haxby, J. V., Guntupalli, J. S., Connolly, A. C., Halchenko, Y. O., Conroy, B. R., Gobbini, M. I., Hanke, M. & Ramadge, P. J. (2011). A Common, High-Dimensional Model of the Representational Space in Human Ventral Temporal Cortex. Neuron, 72, 404–416: The Hyperalignment paper demonstrating its application to fMRI data in rich perceptual (movie) and categorization (monkey-dog) experiments.

Posters¶

Hanke, M., Halchenko, Y. O., Sederberg, P. B., Hanson, S. J., Haxby, J. V. & Pollmann, S. (2008). PyMVPA: A Python toolbox for machine-learning based data analysis.: Poster emphasizing PyMVPA’s capabilities concerning multi-modal data analysis at the annual meeting of the Society for Neuroscience, Washington, 2008.
Hanke, M., Halchenko, Y. O., Sederberg, P. B., Hanson, S. J., Haxby, J. V. & Pollmann, S. (2008). PyMVPA: A Python toolbox for classifier-based data analysis.: First presentation of PyMVPA at the conference Psychologie und Gehirn [Psychology and Brain], Magdeburg, 2008. This poster received the poster prize of the German Society for Psychophysiology and its Application.

Authors and Contributors¶

The PyMVPA developers team currently consists of:

Michael Hanke, University of Magdeburg, Germany
Yaroslav O. Halchenko, Dartmouth College, USA
Nikolaas N. Oosterhof, University of Trento, Italy

We are very grateful to the following people, who have contributed valuable advice, code or documentation to PyMVPA:

Florian Baumgartner, University of Magdeburg, Germany
Sven Buchholz, University of Magdeburg, Germany
Andrew C. Connolly, Dartmouth College, USA
Michael W. Cole, Washington University in St. Louis, USA
Ceyhun Çakar
Reka Daniel, Princeton University, USA
Greg Detre, Princeton University, USA
Matthias Ekman, Donders Institute, Netherlands
Ingo Fründ, TU Berlin, Germany
Christoph Gohlke, University of California, Irvine, USA
Scott Gorlin, MIT, USA
Satrajit Ghosh, MIT, USA
Jyothi Swaroop Guntupalli, Dartmouth College, USA
Valentin Haenel, TU Berlin, Germany
Stephen José Hanson, Rutgers University, USA
James V. Haxby, Dartmouth College, USA
James M. Hughes, Dartmouth College, USA
James Kyle, UCLA, USA
Emanuele Olivetti, Fondazione Bruno Kessler, Italy
Russell Poldrack, University of Texas, USA
Stefan Pollmann, University of Magdeburg, Germany
Geethapriya Raghavan, University of Texas Austin, USA
Rajeev Raizada, Dartmouth College, USA
Per B. Sederberg, Princeton University, USA
Tiziano Zito, BCCN, Germany

Acknowledgements¶

We are greatful to the developers and contributers of NumPy, SciPy and IPython for providing an excellent Python-based computing environment.

Additionally, as PyMVPA makes use of a lot of external software packages (e.g. classifier implementations), we want to acknowledge the authors of the respective tools and libraries (e.g. LIBSVM, MDP, scikit-learn, Shogun) and thank them for developing their packages as free and open source software.

Finally, we would like to express our acknowledgements to the Debian project for providing us with hosting facilities for mailing lists and source code repositories. But most of all for developing the universal operating system.

Grant support¶

PyMVPA development was supported, in part, by the following research grants. This list includes grants funding development of specific algorithm implementations in PyMVPA, as well as grants supporting individuals to work on PyMVPA:

German Federal Ministry of Education and Research

BMBF 01GQ11112

German federal state of Saxony-Anhalt

Project: Center for Behavioral Brain Sciences

German Academic Exchange Service

PPP-USA D/05/504/7

McDonnel Foundation

US National Institutes of Mental Health

5R01MH075706
F32MH085433-01A1

US National Science Foundation

NSF 1129764