# Looking here and there – Searchlights¶

Note

This tutorial part is also available for download as an IPython notebook: [ipynb]

In Classifiers – All Alike, Yet Different we have seen how we can implement a classification analysis, but we still have no clue about where in the brain (or our chosen ROIs) our signal of interest is located. And that is despite the fact that we have analyzed the data repeatedly, with different classifiers and investigated error rates and confusion matrices. So what can we do?

Ideally, we would like to have some way of estimating a score for each feature that indicates how important that particular feature (most of the time a voxel) is in the context of a certain classification task. There are various possibilities to get a vector of such per-feature scores in PyMVPA. We could simply compute an ANOVA F-score per each feature, yielding scores that would tell us which features vary significantly between any of the categories in our dataset.

Before we can take a look at the implementation details, let’s first recreate our preprocessed demo dataset. The code is very similar to that from Classifiers – All Alike, Yet Different and should raise no questions. We get a dataset with one sample per category per run.

```
>>> from mvpa2.tutorial_suite import *
>>> ds = get_haxby2001_data(roi='vt')
>>> ds.shape
(16, 577)
```

## Measures¶

Now that we have the dataset, computing the desired ANOVA F-scores is relatively painless:

```
>>> aov = OneWayAnova()
>>> f = aov(ds)
>>> print f
<Dataset: 1x577@float64, <fa: fprob>>
```

If the code snippet above is of no surprise then you probably got the basic
idea. We created an object instance `aov`

being a
`OneWayAnova`

. This instance is subsequently
*called* with a dataset and yields the F-scores wrapped into a
`Dataset`

. Where have we seen this before? Right!
This one differs little from a call to
`CrossValidation`

. Both are objects that get
instantiated (potentially with some custom arguments) and yield the results in
a dataset when called with an input dataset. This is called a processing
object and is a common concept in PyMVPA.

However, there is a difference between the two processing objects.
`CrossValidation`

returns a dataset with a single
feature – the accuracy or error rate, while
`OneWayAnova`

returns a vector with one value per
feature. The latter is called a
`FeaturewiseMeasure`

. But other than the number of
features in the returned dataset there is not much of a difference. All
measures in PyMVPA, for example, support an optional post-processing step.
During instantiation of a measure an arbitrary mapper can be specified to be
called internally to forward-map the results before they are returned. If, for
some reason, the F-scores need to be scaled into the interval [0,1], an
`FxMapper`

can be used to achieve that:

```
>>> aov = OneWayAnova(
... postproc=FxMapper('features',
... lambda x: x / x.max(),
... attrfx=None))
>>> f = aov(ds)
>>> print f.samples.max()
1.0
```

Exercise

Map the F-scores back into a brain volume and look at their distribution in the ventral temporal ROI.

Now that we know how to compute feature-wise F-scores we can start worrying
about them. Our original goal was to decipher information that is encoded
in the multivariate pattern of brain activation. But now we are using an
ANOVA, a **univariate** measure, to localize important voxels? There must
be something else – and there is!

## Searching, searching, searching, ...¶

Kriegeskorte et al. (2006) suggested an algorithm that takes a small, sphere-shaped neighborhood of brain voxels and computes a multivariate measure to quantify the amount of information encoded in its pattern (e.g. mutual information). Later on this searchlight approach has been extended to run a full classifier cross-validation in every possible sphere in the brain. Since that, numerous studies have employed this approach to localize relevant information in a locally constraint fashion.

We know almost all pieces to implement a searchlight analysis in PyMVPA. We can load and preprocess datasets, we can set up a cross-validation procedure.

```
>>> clf = kNN(k=1, dfx=one_minus_correlation, voting='majority')
>>> cv = CrossValidation(clf, HalfPartitioner())
```

The only thing left to do is that we have to split the dataset into all
possible sphere neighborhoods that intersect with the brain. To achieve this,
we can use `sphere_searchlight()`

:

```
>>> sl = sphere_searchlight(cv, radius=3, postproc=mean_sample())
```

This single line configures a searchlight analysis that runs a full
cross-validation in every possible sphere in the dataset. Each sphere has a
radius of three voxels. The algorithm uses the coordinates (by default
`voxel_indices`

) stored in a feature attribute of the input dataset to
determine local neighborhoods. From the `postproc`

argument you might
have guessed that this object is also a measure – and your are right. This
measure returns whatever value is computed by the basic measure (here this
is a cross-validation) and assigns it to the feature representing the
center of the sphere in the output dataset. For this initial example we are
not interested in the full cross-validation output (error per each fold),
but only in the mean error, hence we are using an appropriate mapper for
post-processing. As with any other processing object we have to
call it with a dataset to run the actual analysis:

```
>>> res = sl(ds)
>>> print res
<Dataset: 1x577@float64, <sa: cvfolds>, <fa: center_ids>, <a: mapper>>
```

That was it. However, this was just a toy example with only our ventral temporal ROI. Let’s now run it on a much larger volume, so we can actually localize something (even loading and preprocessing will take a few seconds). We will reuse the same searchlight setup and run it on this data as well. Due to the size of the data it might take a few minutes to compute the results, depending on the number of CPUs in the system.

```
>>> ds = get_haxby2001_data_alternative(roi=0)
>>> print ds.nfeatures
34888
>>> res = sl(ds)
```

Now let’s see what we got. Since a vector with 35k elements is a little hard to comprehend we have to resort to some statistics.

```
>>> sphere_errors = res.samples[0]
>>> res_mean = np.mean(res)
>>> res_std = np.std(res)
>>> # we deal with errors here, hence 1.0 minus
>>> chance_level = 1.0 - (1.0 / len(ds.uniquetargets))
```

As you’ll see, the mean empirical error is just barely below the chance level. However, we would not expect a signal for perfect classification performance in all spheres anyway. Let’s see for how many spheres the error is more the two standard deviations lower than chance.

```
>>> frac_lower = np.round(np.mean(sphere_errors < chance_level - 2 * res_std), 3)
```

So in almost 10% of all spheres the error is substantially lower than what we would expect for random guessing of the classifier – that is more than 3000 spheres!

Exercise

Look at the distribution of the errors
(hint: `hist(sphere_errors, bins=np.linspace(0, 1, 18))`

.
In how many spheres do you think the classifier actually picked up
real signal? What would be a good value to threshold the errors to
distinguish false from true positives? Think of it in the context of
statistical testing of fMRI data results. What problems are we facing
here?

Once you are done thinking about that – and only *after* you’re done,
project the sphere error map back into the fMRI volume and look at it as
a brain overlay in your favorite viewer (hint: you might want to store
accuracies instead of errors, if your viewer cannot visualize the lower
tail of the distribution:
`map2nifti(ds, 1.0 - sphere_errors).to_filename('sl.nii.gz')`

).
Did looking at the image change your mind?

## For real!¶

Now that we have an idea of what can happen in a searchlight analysis, let’s do another one, but this time on a more familiar ROI – the full brain.

Exercise

Load the dataset with `get_haxby2001_data_alternative(roi='brain')`

this will apply any required preprocessing for you. Now run a searchlight
analysis for radii 0, 1 and 3. For each resulting error map look at the
distribution of values, project them back into the fMRI volume and
compare them. How does the distribution change with radius and how does
it compare to results of the previous exercise? What would be a good
choice for the threshold in this case?

You have now performed a number of searchlight analyses, investigated the
results and probably tried to interpret them. What conclusions did you draw
from these analyses in terms of the neuroscientific aspects? What have you
learned about object representation in the brain? In this case we have run
8-way classification analyses and have looked at the average error rate across
all conditions in thousands of sphere-shaped ROIs in the brain. In some spheres the
classifier could perform well, i.e. it could predict all
samples equally well. However, this only applies to a handful of over 30k
spheres we have tested, and does not reveal whether the classifier was capable of
classifying *all* of the conditions or just some. For the vast majority
we observe errors somewhere
between the theoretical chance level and zero and we don’t know what caused
the error to decrease. We don’t even know which samples get misclassified.

From Classifiers – All Alike, Yet Different we know that there is a way out of this dilemma. We can look at the confusion matrix of a classifier to get a lot more information that is otherwise hidden. However, we cannot reasonably do this for thousands of searchlight spheres (Note that this is not completely true. See e.g. Connolly et al., 2012 for some creative use-cases for searchlights). It becomes obvious that a searchlight analysis is probably not the end of a data exploration but rather a crude take off, as it raises more questions than it answers.

Moreover, a searchlight cannot detect signals that extend beyond a small
local neighborhood. This property effectively limits the scope of analyses
that can employ this strategy. A study looking a global brain circuitry
will hardly restrict the analysis to patches of a few cubic millimeters of
brain tissue. As we have seen before, searchlights also have another nasty
aspect. Although they provide us with a multivariate localization measure,
they also inherit the curse of univariate fMRI data analysis –
multiple comparisons. PyMVPA comes with an algorithm that can help to
cope with the problem in the context of group analyses:
`GroupClusterThreshold`

.

Despite these limitations a searchlight analysis can be a valuable exploratory
tool if used appropriately. The capabilities of PyMVPA’s searchlight
implementation go beyond what we looked at in this tutorial. It is not only
possible to run *spatial* searchlights, but multiple spaces can be considered
simultaneously. This is further illustrated in
Multi-dimensional Searchlights.