Abstract
Although spike trains are the principal channel of communication between neurons, a single stimulus will elicit different spike trains from trial to trial. This variability, in both spike timings and spike number can obscure the temporal structure of spike trains and often means that computations need to be run on numerous spike trains in order to extract features common across all the responses to a particular stimulus. This can increase the computational burden and obscure analytical results. As a consequence, it is useful to consider how to calculate a central spike train that summarizes a set of trials. Indeed, averaging responses over trials is routine for other signal types. Here, a simple method for finding a central spike train is described. The spike trains are first mapped to functions, these functions are averaged, and a greedy algorithm is then used to map the average function back to a spike train. The central spike trains are tested for a large data set. Their performance on a classificationbased test is considerably better than the performance of the medoid spike trains.
1 Introduction
Spike trains are highly variable, with the same stimulus causing different responses for different trials. While a stimulus will modulate a neuron’s firing pattern on a longer timescale, noise will affect spike timings on a shorter timescale, masking the message encoded [3]. Therefore, it would often be useful to be able to summarize a set of such responses by averaging them, giving a single exemplar. This would speed up computations based on spiking responses and focus studies of coding in spike trains on features that are common across all responses to a given stimulus.
There have been numerous attempts to effectively summarize responses. These include the calculation of the peristimulus time histogram or spike density function [4,5,20] and the development of algorithms to calculate an ‘average’ or ‘central’ spike train [29]. Here, an algorithm for averaging spiking responses is proposed. It uses an average filtered function to construct a central spike train. The spike trains are mapped into the space of functions by filtering them with a causal exponential filter. The average of these functions is calculated. This average function is then mapped back to a spike train by finding a sequence of spikes whose filtered function is close to the average function.
This is an instance of the wellstudied problem in kernel methods of finding the preimage of a point [18]. Here, the calculation is performed approximately using a greedy algorithm, a type of matching pursuit algorithm [13]. It can be implemented efficiently in this case because of the exponential filter used to map the spike trains to the space of functions.
These central spike trains are tested on a large data set recorded from zebra finch auditory neurons by the Frederic Theunissen laboratory and made available on the Collaborative Research in Computational Neuroscience database [1]. The effectiveness of the central spike train in summarizing the set of responses is studied in various ways. Perhaps most importantly, the central spike trains are tested using a transmitted information measure of metricbased classification. The performance of the central spike train as a classification template is compared to the performance of the obvious alternative, the medoid response. The medoid of a set of responses is taken to be the response in the set with the lowest average distance from the rest of the set. It is found that the central spike trains appear to be considerably more effective at summarizing the responses than the medoid responses.
2 Methods
The algorithm works by mapping all the spike trains to functions, averaging these in the function space and then finding the spike train that best corresponds to this average function.
The first part of the algorithm is the map from spike trains to functions, which is done by filtering. Given a spike train
filtering maps it to a real function,
The kernel function has to be specified. Here, the causal exponential is used
where the normalization factor of
The choice of kernel function is motivated by the van Rossum metric, which also involves filtering [24]. Indeed, the whole approach is motivated by the idea, illustrated by the van Rossum metric that a useful way to calculate with spike trains is to first map them into the space of functions. This is in the spirit of the kernelbased smoothing often performed in estimating inhomogeneous neuronal firing rates and also in the spirit of the reproducing kernel Hilbert space framework for spike train signal processing described in [15].
Now, given a collection of spike trains
the filtering gives a collection of functions
and these are averaged to give
The central spike train is then the spike train
This is illustrated in Fig. 1. However, rather than trying to solve this difficult minimization problem exactly, a greedy algorithm is used as an approximate approach.
Fig. 1. A schematic representation of the averaging algorithm. A is a raster of spiking responses to a single stimulus. These are converted to functions
by filtering (a). B shows the collection of functions that results. These are averaged (b) giving the average function C. This average function is approximated (c) by D, a function which itself is the result of filtering. This filtering is represented
by d and the corresponding spike train by E. The optimal choice of E is the central spike train
In the greedy algorithm, spikes are added to the central spike train
where
The value of this function is easily computed and so it can be rapidly minimized
with respect to
It might seem that the algorithm should continue only while the minimum value of
A better approach is to continue choosing the
Obviously, from the perspective of the spike train metric, the true average of the
spike trains is given by the function average
Of course, the two functions,
will not include
The second contribution to the difference between
3 Results
The averaging algorithm has been tested using the very large zebra finch data set collected by the Frederic Theunissen laboratory at UC Berkeley [1] and made available to the Collaborative Research in Computational Neuroscience database. The details of the experiment and of the stimulus set are given in [2,8,2628]. The data set consists of extracellular recordings from neurons in the auditory pathway of anesthetized zebra finches. Different sound stimuli are used, including a corpus of zebra finch songs. The song responses are considered here. The song corpus generally includes 20 songs. Here, for simplicity, only those data sets with a 20 song corpus and ten trials for each song are used. To make all the spike trains, the same temporal length the first second is used; the length of the stimuli vary, but all are at least one second long. Although the algorithm described here works fine if some empty spike trains are included in the collection to be summarized, for ease of comparison with other methods any cell with empty spike trains is excluded. This gives a total of 183 cells for which a data set of 200 spike trains has been recorded.
A clustering measure has been used for testing the averaging algorithm. Roughly, the central spike trains are used as a template for clustering by song and the accuracy of this classification is then used as a measure of how well the algorithm performs. Obviously, a test based on distancebased clustering requires the choice of a distance measure between spike trains. Here, the van Rossum metric is used [24].
In the van Rossum metric, the distance between two spike trains is calculated by mapping
them into the space of functions by filtering and then using the
where
To estimate the transmitted information measure of metric clustering a confusion matrix
N is calculated. N is a
where z is intended to reduce the effect of outliers, with
After each response has been tested in this way, the entries of N add up to give the total number of responses. Diagonal elements correspond to responses that were correctly clustered. The transmitted information of the confusion matrix, h:
is a useful measure of the accuracy of clustering indicated by the confusion matrix.
The maximum value of the transmitted information h is obtained for perfect clustering of
Here, the τ for each cell which gives the highest value of
Fig. 2. The clustering of the data. (a) is a histogram of the optimal
The same τ is used in the averaging algorithm. The minimization of the error
Although
Thus, the central spike trains are evaluated using a transmitted information measure. Again, each response is considered in turn and the remaining responses are clustered by stimulus. In this case, however, the distance between the test response and the clusters is calculated by finding the central spike train for each cluster and measuring the distance between the test response and this central response. Since the test response has been removed from its cluster, it is not used in calculating the central spike train, making this a crossvalidation procedure. The confusion matrix and transmitted information are calculated in the usual way.
As a comparison, the transmitted information is also calculated using the same weighted
average metric distance defined above, both with
The average transmitted information is given in Table 1. This appears to indicate that the central spike train is effective in representing
the structure of the spike trains and provides a better exemplar than the medoid spike
train. Surprisingly, not only is the transmitted information for the central responses
considerably higher than for the medoid responses, it is also higher than the transmitted
information calculated using the average distances; both with
Fig. 3. Comparing the transmitted information. In (a), for each cell, the value of
Table 1. Transmitted information results. In the first column, the transmitted information
As described in the Methods section (Sect. 2); the halting criterion for adding spikes to the central spike train specifies that the number of spikes in the central spike train matches the average number of spikes for the collection of spike trains it is summarizing. The effect of using this, rather than the more natural halting criterion based on the error is described in Table 2. It is clear that, in this case at least, the halting criterion does not make a significant difference to performance of the central spike train, though it does, of course, affect the spike count.
Table 2. Halting criteria comparison. In the first column, the transmitted information
The best clustering comes from using the function average
Because
Fig. 4. A cartoon representation of the space of functions. Here, the large rectangle represents the space of functions, with S, the image of the space of spike trains in the space of functions under the filter map represented by the wavy line. The spike train functions are marked by open circles, the function average by a closed circle, and the central spike train function, which is the point on S closest to the function average, is marked as a star
This optimization was performed for the responses to one song chosen at random for each cell. It was found that the distance between the central spike train and the function average after this optimization is 0.967 times what it was before, on average. As a comparison, the distance between the medoid and the function average is, on average, 1.407 times the distance of the central spike train from the function average.
It is possible that there is a middle ground between using the central spike trains
and function average to represent a collection of spike trains. One example would
be to use a weighted spike train
This was examined on the example data set by replacing each step of the greedy algorithm with a twodimensional optimization over spike time and weight using the Nelder–Mead method, initialized using a grid search [14]. However, although this does decrease the error ℰ, it causes only a tiny improvement of the clustering performance.
Another measure of centrality is the summed distance between a spike train and the
spike trains in the collection
Thus, the medoid is much less central than the central spike train. This may seem
surprising since the medoid is chosen as the most central spike train in the collection.
However, it is normal in highdimensional data for the medoid to lie some distance
from the center of a data set because the volume around the center is a smaller fraction
of the total volume in which the data points fall. For example, for uniform distributed
data, the fraction of a unit ball in Ddimensions which is within ϵ of the center is
The idea of ‘central’ is metric dependent and the construction of a central spike
train presented here is closely linked to the van Rossum metric; for example, the
error function ℰ in Eq. (7) essentially gives the average van Rossum distance between
the central spike train and the spike trains in the collection. It might be expected
then that the centrality of the central spike train depends on the choice of metric.
This has been tested by using the Victor–Purpura editdistance metric [25] rather than the van Rossum metric to perform the clusteringbased evaluation. In
the Victor–Pupura metric, there is also a timescale,
Table 3. Clustering using the Victor–Purpura metric. In the first column, the transmitted information
4 Discussion
Although simple and straightforward, the averaging algorithm succeeds extremely well in summarizing the response sets in the data considered here. It is anticipated that this will have numerous practical applications in analyzing sets of electrophysiological responses.
It would be interesting to evaluate the algorithm using different data sets and to measure the extent to which it preserves the internal temporal features of the spike trains. It has been previously noted that averaging over many responses may obscure such features [30]. The aim of this paper is to define a central spike train, an object which can be interpreted as a sort of average, but which is nonetheless a spike train. This is achieved in the sense that the central spike train is a list of spike times, but that does not necessarily mean it shares the less easilyspecified properties possessed by real spike trains. For example, real spike trains often have interspike interval distributions which are well described by a Gamma distribution or an inverse Gauss distribution [6,7]. However, as a type of average, the central spike train might differ with respect to properties of this sort, precisely because noise has been removed.
In fact, this is what seems to happen for the data examined in the Results section
(Sect. 3). By design, the mean interspike interval for the central spike train
It is hoped that temporal properties crucial to the coding structure of the spike train can be largely preserved in a single exemplar like the central spike train. This is typically the hope when constructing an average; it does not contain all the information present in the original collection of responses, but does include a substantial part of the signal, as opposed to the noise responsible for trialtotrial variation. In contrast, in the peristimulus time histogram the spikes across trials are aggregated into bins. Binning also constitutes an approach to summarizing a collection of trials, but it does so using an object, which does not resemble the original signal. Thus, the peristimulus time histogram reduces temporal precision through binning, and the central spike train removes trialtotrial structure by representing the collection as a single spike train. In each case, information is lost, but it is hoped that this lost information is noise. The peristimulus time histogram replaces the original spike trains with something like a rate, the construction here replaces them with the central spike train. In studying coding, this may have the advantage that, as a spike train, the effect of the central spike train on a postsynaptic neuron can be investigated.
One disadvantage of this algorithm is that it requires a value for τ, the timescale. Typically, this is chosen so as to optimize the metric clustering and this optimization requires the calculation of the distance matrix for the responses for different values of τ. In applications with large data sets where the results are needed rapidly, this might become problematic. Consequently, it would be interesting to consider other methods of choosing τ based on easilyaccessed properties of the spike trains such as spike number and spike train to spike train variability.
The average is the first moment of a random variable. Often it is useful to also examine quantities such as variance, which are derived from higher moments; for example, it might be interesting to examine whether some types of input produce a noisier output than others. Certainly, it is easy to estimate the variance in the function space, giving a form of the peristimulus variance histogram [16]. However, this is a function and it is not clear how to interpret the function variance in terms of spike trains. This would be an interesting topic for further work. Another approach would be to examine the distribution of displacements between the central spike train and the spike trains in the collection, something that has previously been considered using pairwise displacement in the collection [9].
It might also be informative to use the central spike train to average over different neurons rather than over different trials. In this application, deviation from the average would correspond, in part, to aspects of coding which differ from a summed population code.
The choice of the exponential kernel is somewhat arbitrary. However, from the example of the van Rossum metric [11] and of kernel density estimation in statistics [22], it is unlikely that changing the kernel will have a strong effect. One interesting idea might be to use the actual jitter distribution of spike times in the set of responses as a kernel. It would also be interesting to consider the biological basis for the averaging algorithm itself. Perhaps the electrodynamics of neurons can be, in part, interpreted as an averaging algorithm of this sort. Some models of auditory object recognition rely on the use of ‘template’ or ‘memory’ spike trains [12]. Perhaps the central spike train could play a role here.
Competing Interests
The authors declare that they have no competing interests.
Authors’ Contributions
Both authors planned, carried out, and wrote up the research.
Acknowledgements
We are very grateful to Frederic Theunissen, Sarah M.N. Woolley, Thane Fremouw, and Noopur Amin for their generosity in making their data available on the Collaborative Research in Computational Neuroscience website and to an anonymous referee for suggesting the weighted spike train approach discussed in the Results section (Sect. 3). We thank the James S. McDonald Foundation for financial support through CJH’s Scholar Award in Understanding Human Cognition.
References

Singleunit recordings from multiple auditory areas in male zebra finches (Frederic Theunissen Laboratory, UV Berkeley) [http://crcns.org/datasets/aa/aa2].

Amin N, Gill P, Theunissen FE: Role of the zebra finch auditory thalamus in generating complex representations for natural sounds.
J Neurophysiol 2010, 104:784798. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Brenner N, Agam O, Bialek W, de Ruyter van Steveninck R: Statistical properties of spike trains: universal and stimulusdependent aspects.
Phys Rev E 2002., 66
Article ID 031907

Cunningham JP, Yu BM, Shenoy KV, Sahani M: Inferring neural firing rates from spike trains using Gaussian processes.

Endres D, Oram M, Schindelin J, Födiák P: Bayesian binning beats approximate alternatives: estimating peristimulus time histograms.

Folks JL, Chhikara RS: The inverse Gaussian and its statistical application—a review.

Gerstein GL, Mandelbrot B: Random walk models for the spike activity of a single neuron.
Biophys J 1964, 4:4168. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Gill P, Zhang J, Woolley SM, Fremouw T, Theunissen FE: Sound representation methods for spectrotemporal receptive field estimation.
J Comput Neurosci 2006, 21:520. PubMed Abstract  Publisher Full Text

Gillespie JB, Houghton CJ: A metric space approach to the information capacity of spike trains.
J Comput Neurosci 2011, 30:201209. PubMed Abstract  Publisher Full Text

Houghton C, Kreuz T: On the efficient calculation of van Rossum distances.

Houghton C, Victor JD: Spike rates and spike metrics. In Visual Population Codes Toward a Common Multivariate Framework for Cell Recording and Functional Imaging. Edited by Kriegeskorte N, Kreiman G. MIT Press, Cambridge; 2011:213244.

Larson E, Billimoria CP, Sen K: A biologically plausible computational model for auditory object recognition.
J Neurophysiol 2009, 101:323331. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Mallat SG, Zhang Z: Matching pursuits with timefrequency dictionaries.
IEEE Trans Signal Process 1993, 41:33973415. Publisher Full Text

Nelder JA, Mead R: A simplex method for function minimization.
Comput J 1965, 7:308313. Publisher Full Text

Park I, Paiva ARC, Principe JC: A reproducing kernel Hilbert space framework for spike trains.
Neural Comput 2009, 21:424449. PubMed Abstract  Publisher Full Text

Pillow JW, Paninski L, Uzzell VJ, Simoncelli EP, Chichilnisky EJ: Prediction and decoding of retinal ganglion cell responses with a probabilistic spiking model.
J Neurosci 2005, 25:1100311013. PubMed Abstract  Publisher Full Text

Press WH, Teukolsky SA, Vetterling WT, Flannery PB: Numerical Recipes in C++: The Art of Scientific Computing. 2nd edition. Cambridge University Press, Cambridge; 2007.

Schölkopf B, Smola AJ: Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, Cambridge; 2002.

Shimazaki H, Shinomoto S: A recipe for optimizing a timehistogram.

Shinomoto S, et al.: Relating neuronal firing patterns to functional differentiation of cerebral cortex.
PLoS Comput Biol 2009., 5
Article ID e1000433

Silverman BW: Density Estimation. Chapman & Hall, London; 1986.

Tranbarger KE: Point process prototypes, and other applications of point pattern distance metrics. PhD dissertation. UCLA; 2005.

van Rossum M: A novel spike distance.
Neural Comput 2001, 13:751763. PubMed Abstract  Publisher Full Text

Victor JD, Purpura KP: Nature and precision of temporal coding in visual cortex: a metricspace analysis.
J Neurophysiol 1996, 76:13101326. PubMed Abstract  Publisher Full Text

Woolley SMN, Fremouw TE, Hsu A, Theunissen F: Tuning for spectrotemporal modulations as a mechanism for auditory discrimination of natural sounds.
Nat Neurosci 2005, 8:13711379. PubMed Abstract  Publisher Full Text

Woolley SMN, Gill PR, Theunissen FE: Stimulusdependent auditory tuning results in synchronous population coding of vocalizations in the songbird midbrain.
J Neurosci 2006, 26:24992512. PubMed Abstract  Publisher Full Text

Woolley SMN, Gill PR, Fremouw T, Theunissen FE: Functional groups in the avian auditory system.
J Neurosci 2009, 29:27802793. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Wu W, Srivastava A: Towards statistical summaries of spike train data.
J Neurosci Methods 2011, 195:107110. PubMed Abstract  Publisher Full Text

Yu BM, Afshar A, Santhanam G, Ryu SI, Shenoy KV, Sahani M: Extracting dynamical structure embedded in neural activity.