Open Access Research

Gradient estimation in dendritic reinforcement learning

Mathieu Schiess, Robert Urbanczik and Walter Senn*

Author Affiliations

Department of Physiology, University of Bern, Bühlplatz 5, 3012, Bern, Switzerland

For all author emails, please log on.

The Journal of Mathematical Neuroscience 2012, 2:2 doi:10.1186/2190-8567-2-2


The electronic version of this article is the complete one and can be found online at: http://www.mathematical-neuroscience.com/content/2/1/2


Received:12 May 2011
Accepted:15 February 2012
Published:15 February 2012

© 2012 Schiess et al.; licensee Springer

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

We study synaptic plasticity in a complex neuronal cell model where NMDA-spikes can arise in certain dendritic zones. In the context of reinforcement learning, two kinds of plasticity rules are derived, zone reinforcement (ZR) and cell reinforcement (CR), which both optimize the expected reward by stochastic gradient ascent. For ZR, the synaptic plasticity response to the external reward signal is modulated exclusively by quantities which are local to the NMDA-spike initiation zone in which the synapse is situated. CR, in addition, uses nonlocal feedback from the soma of the cell, provided by mechanisms such as the backpropagating action potential. Simulation results show that, compared to ZR, the use of nonlocal feedback in CR can drastically enhance learning performance. We suggest that the availability of nonlocal feedback for learning is a key advantage of complex neurons over networks of simple point neurons, which have previously been found to be largely equivalent with regard to computational capability.

Keywords:
Dendritic computation; reinforcement learning; spiking neuron

1 Introduction

Except for biologically detailed modeling studies, the overwhelming majority of works in mathematical neuroscience have treated neurons as point neurons, i.e., a linear aggregation of synaptic input followed by a nonlinearity in the generation of somatic action potentials was assumed to characterize a neuron. This disregards the fact that many neurons in the brain have complex dendritic arborization where synaptic inputs may be aggregated in highly nonlinear ways [1]. From an information processing perspective sticking with the minimal point neuron may nevertheless seem justified since networks of such simple neurons already display remarkable computational properties: assuming infinite precision and noiseless arithmetic a suitable network of spiking point neurons can simulate a universal Turing machine and, further, impressive information processing capabilities persist when one makes more realistic assumptions such as taking noise into account (see [2] and the references therein). Such generic observations are underscored by the detailed compartmental modeling of the computation performed in a hippocampal pyramidal cell [3]. There it was found that (in a rate coding framework) the input-output behavior of the complex cell is easily emulated by a simple two layer network of point neurons.

If the computations of complex cells are readily emulated by relatively simple circuits of point neurons, the question arises why so many of the neurons in the brain are complex. Of course, the reason for this may be only loosely related to information processing proper, it might be that maintaining a complex cell is metabolically less costly than the maintenance of the equivalent network of point neurons. Here, we wish to explore a different hypothesis, namely that complex cells have crucial advantages with regard to learning. This hypothesis is motivated by the fact that many artificial intelligence algorithms for neural networks assume that synaptic plasticity is modulated by information which arises far downstream of the synapse. A prominent example is the backpropagation algorithm where error information needs to be transported upstream via the transpose of the connectivity matrix. But in real axons any fast information flow is strictly downstream, and this is why algorithms such as backpropagation are widely regarded as a biologically unrealistic for networks of point neurons. When one considers complex cells, however, it seems far more plausible that synaptic plasticity could be modulated by events which arise relatively far downstream of the synapse. The backpropagating action potential, for instance, is often capable of conveying information on somatic spiking to synapses which are quite distal in the dendritic tree [4,5]. If nonlinear processing occurred in the dendritic tree during the forward propagation, this means that somatic spiking can modulate synaptic plasticity even when one or more layers of nonlinearities lie between the synapse and the soma. Thus, compared to networks of point neurons, more sophisticated plasticity rules could be biologically feasible in complex cells.

To study this issue, we formalize a complex cell as a two layer network, with the first layer made up of initiation zones for NMDA-spikes (Figure 1). NMDA-spikes are regenerative events, caused by AMPA mediated synaptic releases when the releases are both near coincident in time and spatially co-located on the dendrite [6-8]. Such NMDA-spikes boost the effect of the synaptic releases, leading to increases in the somatic potential which are stronger as well as longer compared to the effect obtained from a simple linear superposition of the excitatory post synaptic potentials from the individual AMPA releases. Further, we assume that the contribution of NMDA-spikes from different initiation zones combine additively in contributing to the somatic potential and that this potential governs the generation of somatic action potentials via an escape noise process. While we would argue that this provides an adequate minimal model of dendritic computation in basal dendritic structures, one should bear in mind that our model seems insufficient to describe the complex interactions of basal and apical dendritic inputs in cortical pyramidal cells [9,10].

thumbnailFig. 1. Sketch of the neuronal cell model. Spatio-temporally clustered postsynaptic potentials (PSP, green) can give rise to NMDA-spikes (red) which superimpose additively in the soma (blue) controlling the generation of action potentials (AP).

We will consider synaptic plasticity in the context of reinforcement learning, where the somatic action potentials control the delivery of an external reward signal. The goal of learning is to adjust the strength of the synaptic releases (the synaptic weights) so as to maximize the expected value of the reward signal. In this framework, one can mathematically derive plasticity rules [11,12] by assuming that weight adaption follows a stochastic gradient ascent procedure in the expected reward [13]. Dopamine is widely believed to be the most important neurotransmitter for such reward modulated plasticity [14-16]. A simple minded application of the approach in [13] leads to a learning rule where, except for the external reward signal, plasticity is determined by quantities which are local to each NMDA-spike initiation zone (NMDA-zone). Using this rule, NMDA-zones learn as independent agents which are oblivious of their interaction in generating somatic action potentials, with the external reward signal being the only mechanism for coordinating plasticity between the zones. hence we shall refer to this rule as zone reinforcement (ZR). Due to its simplicity, ZR would seem biologically feasible even if the network were not integrated into a single neuron. On the other hand, this approach to multi-agent reinforcement often leads to a learning performance which deteriorates quickly as the number of agents (here, NMDA-zones) increases since it lacks an explicit mechanism for differentially assigning credit to the agents [17,18]. By algebraic manipulation of the gradient formula leading to the basic ZR-rule, we derive a class of learning rules where synaptic plasticity is also modulated by somatic responses, in addition to reward and quantities local to the NMDA-zone. Such learning rules will be referred to as cell reinforcement (CR), since they would be biologically unrealistic if the nonlinearities where not integrated into a single cell. We present simulation result showing that one rule in the CR-class results in learning which is much faster than for the ZR-rule. This provides evidence for the hypothesis that enabling effective synaptic plasticity rules may be one evolutionary advantage conveyed by dendritic nonlinearities.

2 Stochastic cell model of a neuron

We assume a neuron with <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M1">View MathML</a> initiation zones for NMDA-spikes, indexed by <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M2">View MathML</a>. An NMDA-zone is made up of <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M3">View MathML</a> synapses, with synaptic strength <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M4">View MathML</a> (<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M5">View MathML</a>), where releases are triggered by presynaptic spikes. We denote by <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M6">View MathML</a> the set of times when presynaptic spikes arrive at synapse <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M7">View MathML</a>. In each NMDA-zone, the synaptic releases give rise to a time varying local membrane potential <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M8">View MathML</a> which we assume to be given by a standard spike response equation

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M9">View MathML</a>

(1)

Here, X denotes the entire presynaptic input pattern of the neuron, <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M10','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M10">View MathML</a> (arbitrary units) is the resting potential, and the postsynaptic response kernel ϵ is given by

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M11','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M11">View MathML</a>

We use <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M12">View MathML</a> for the membrane time constant, <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M13','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M13">View MathML</a> for the synaptic rise time, and Θ is the Heaviside step function.

The local potential <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M8">View MathML</a> controls the rate at which what we call NMDA-events are generated in the zone - in our model NMDA-events are closely related to the onset of NMDA-spikes as described in detail below. Formally, we assume that NMDA-events are generated by an inhomogeneous Poisson process with rate function <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M15">View MathML</a>, choosing

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M16','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M16">View MathML</a>

(2)

with <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M17','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M17">View MathML</a> and <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M18','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M18">View MathML</a>. We adopt the symbol <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M19','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M19">View MathML</a> to denote the set of NMDA-event times in zone ν. For future use, we recall the standard result [19] that the probability density <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M20">View MathML</a> of an event-train <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M19','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M19">View MathML</a> generated during an observation period running from <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M22">View MathML</a> to T satisfies

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M23','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M23">View MathML</a>

(3)

where <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M24">View MathML</a> is the δ-function representation of <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M19','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M19">View MathML</a>.

Conceptually, it would be simplest to assume that each NMDA-event initiates a NMDA-spike. But we need some mechanism for refractoriness, since NMDA-spikes have an extended duration (20-200 ms) and there is no evidence that multiple simultaneous NMDA-spikes can arise in a single NMDA-zone. Hence, we shall assume that, while a NMDA-event occurring in temporal isolation causes a NMDA-spike, a rapid succession of NMDA-events within one zone only leads to a somewhat longer but not to a stronger NMDA-spike. In particular, we will assume that a NMDA-spike contributes to the somatic potential during a period of <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M26','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M26">View MathML</a> after the time of the last preceding NMDA-event. Hence, if a NMDA-event is followed by a second one with a <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M27','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M27">View MathML</a> delay, the first event initiates a NMDA-spike which lasts for <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M28','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M28">View MathML</a> due to the second NMDA-event. Formally, we denote by <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M29">View MathML</a> the time of the last NMDA-event up to time t and model the somatic effect of an NMDA-spike by the response kernel

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M30','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M30">View MathML</a>

(4)

The main motivation for modeling the generation of NMDA-spikes in this way is that it proves mathematically convenient in the calculations below. Having said this, it is worthwhile mentioning that treating NMDA-spikes as rectangular pulses seems reasonable, since their rise and fall times are typically short compared to the duration of the spike. Also, there is some evidence that increased excitatory presynaptic activity extends the duration of a NMDA-spike but does not increase its amplitude [7,8]. Qualitatively, the above model is in line with such findings.

For specifying the somatic potential U of the neuron, we denote by Y the vector of all NMDA-event trains <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M19','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M19">View MathML</a> and by Z the set of times when the soma generates action potentials. We then use

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M32','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M32">View MathML</a>

(5)

for the time course of the somatic potential, where the reset kernel κ is given by

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M33','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M33">View MathML</a>

This is a highly stylized model of the somatic potential since we assume that NMDA-zones contribute equally to the somatic potential (with a strength controlled by the positive parameter a) and that, further, the AMPA-releases themselves do not contribute directly to U. Even if these restrictive assumptions may not be entirely unreasonable (for instance, AMPA-releases can be much more strongly attenuated on their way to the soma than NMDA-spikes) we wish to point out that, while becoming simpler, the mathematical approach below does not rely on these restrictions.

Somatic firing is modeled as an escape noise process with an instantaneous rate function <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M34','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M34">View MathML</a> where

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M35','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M35">View MathML</a>

(6)

with <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M36','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M36">View MathML</a> and <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M37','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M37">View MathML</a>. As shown in [20], for the probability density <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M38','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M38">View MathML</a> of responding to the NMDA-events with a somatic spike train Z during the observation period this implies

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M39','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M39">View MathML</a>

(7)

with <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M40','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M40">View MathML</a>.

3 Reinforcement learning

In reinforcement learning, one assumes a scalar reward function <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M41','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M41">View MathML</a> providing feedback about the appropriateness of the somatic response Z to the input X. The goal of learning is to adapt the synaptic strengths so as to obtain appropriate somatic responses. For our neuronal model, the expected value <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M42','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M42">View MathML</a> of the reward signal <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M41','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M41">View MathML</a> is

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M44','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M44">View MathML</a>

(8)

where <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M45','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M45">View MathML</a> is the probability density of the input spike patterns and <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M46','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M46">View MathML</a>. The goal of learning can now be formalized as finding a w maximizing <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M42','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M42">View MathML</a> and synaptic plasticity rules can be obtained using stochastic gradient ascent procedures for this task.

In stochastic gradient ascent, <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M48','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M48">View MathML</a>, and Z are sampled at each trial and every weight is updated by

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M49','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M49">View MathML</a>

where <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M50','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M50">View MathML</a> is the learning rate and <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M51','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M51">View MathML</a> is an (unbiased) estimator of <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M52','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M52">View MathML</a>. Under mild regularity conditions, convergence to a local optimum is guaranteed if one uses an appropriate schedule for decreasing η towards 0 during learning [21]. In biological modeling, one usually simply assumes a small but fixed learning rate.

The derivative of <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M42','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M42">View MathML</a> with respect to the weight of synapse <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M7">View MathML</a> can be written as

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M55','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M55">View MathML</a>

(9)

Hence, a simple choice for the gradient estimator is

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M56','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M56">View MathML</a>

(10)

with <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M20">View MathML</a> given by Equation 3. Note that the conditional probability <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M58','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M58">View MathML</a> does not explicitly appear in the estimator, so the update is oblivious of the architecture of the model neuron, i.e., of how NMDA-events contribute to somatic spiking. Since the only learning mechanism for coordinating the responses of the different NMDA-zones is the global reward signal <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M41','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M41">View MathML</a>, we refer to the update given by Equation 10 as ZR.

Better plasticity rules can be obtained by algebraic manipulations of Equations 8 and 9 which yield gradient estimators which have a reduced variance compared to Equation 10 - this should lead to faster learning. A simple and well-known example for this is adjusting the reinforcement baseline by choosing a constant c and replacing <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M41','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M41">View MathML</a> with <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M61','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M61">View MathML</a> in Equation 10; this amounts to adding c to <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M62','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M62">View MathML</a> and hence does not change the gradient. But a judicious choice of c can reduce the variance of the gradient estimator. More ambitiously, one could consider analytically integrating out Y in Equation 8, yielding an estimator which directly considers the relationship between synaptic weights and somatic spiking because it is based on <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M63','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M63">View MathML</a>. While actually doing the integration analytically seems impractical, we shall obtain estimators below from a partial realization of this program.

4 From zone reinforcement to cell reinforcement

Due to the algebraic symmetries of our model cell, it suffices to give explicit plasticity rules only for one synaptic weight. To reduce clutter we will thus focus on the first synapse <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M64','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M64">View MathML</a> in the first NMDA-zone.

4.1 Notational simplifications

Let <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M65','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M65">View MathML</a> denote the vector <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M66','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M66">View MathML</a> of all NMDA-event trains but the first and <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M67','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M67">View MathML</a> the collection of synaptic weights <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M68','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M68">View MathML</a> in all but the first NMDA-zone. We rewrite the expected reward as

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M69','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M69">View MathML</a>

(11)

Since in Equation 11 only r depends on <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M64','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M64">View MathML</a> we just need to consider <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M71','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M71">View MathML</a>. Hence, we can regard X and <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M65','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M65">View MathML</a> as fixed and suppress them in the notation. This allows us to write the somatic potential (Equation 5) simply as

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M73','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M73">View MathML</a>

(12)

using Y as shorthand for the NMDA-event train <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M74','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M74">View MathML</a> of the first zone and, further, incorporating into a time varying base potential <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M75','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M75">View MathML</a> the following contributions in Equation 5: (i) the resting potential, (ii) the influence of <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M65','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M65">View MathML</a>, i.e., NMDA-events in the other zones, (iii) any reset caused by somatic spiking. Similarly, the notation for the local membrane potential of the first NMDA-zone becomes

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M77','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M77">View MathML</a>

(13)

where w stands for the strength <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M64','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M64">View MathML</a> of the first synapse, <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M79','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M79">View MathML</a>, and the effect of the other synapses impinging on the zone is absorbed into <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M80','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M80">View MathML</a>. Finally, the w-dependent contribution r to the expected reward (Equation 11) can be written as

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M81','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M81">View MathML</a>

(14)

where also for R and <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M82','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M82">View MathML</a> we have suppressed the dependence on X. In the reduced notation, the explicit expression (obtained from Equations 3 and 10) for the gradient estimator in ZR-learning is

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M83','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M83">View MathML</a>

(15)

4.2 Cell reinforcement

To simplify the manipulation of Equation 14, we replace the Poisson process generating Y by a discrete time process with step-size <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M84','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M84">View MathML</a>. We assume that NMDA-events in Y can only occur at times <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M85','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M85">View MathML</a> where k runs from 1 to <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M86','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M86">View MathML</a> and introduce K independent binary random variables <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M87','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M87">View MathML</a> to record whether or not a NMDA-event occurred. For the probability of not having a NMDA-event at time <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M88','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M88">View MathML</a> we use

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M89','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M89">View MathML</a>

(16)

With this definition, we can recover the original Poisson process by taking the limit <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M90','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M90">View MathML</a>. We use <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M91','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M91">View MathML</a> to denote the entire response of the NMDA-zone and, to make contact with the set-based description of the NMDA-trains, we denote by <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M92','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M92">View MathML</a> the set of NMDA-event times in y, i.e., <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M93','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M93">View MathML</a>. Next, the discrete time version of Equation 14 is

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M94','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M94">View MathML</a>

(17)

where <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M95','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M95">View MathML</a>. In the end, we will recover r from <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M96','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M96">View MathML</a> by taking δ to zero.

The derivative of Equation 17 is

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M97','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M97">View MathML</a>

and to focus on the contributions to <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M98','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M98">View MathML</a> from each time bin we set

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M99','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M99">View MathML</a>

(18)

Hence, <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M100','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M100">View MathML</a>.

We now exploit the trivial fact that we can think of <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M101','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M101">View MathML</a> as a function linear in <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M102','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M102">View MathML</a>, simply because <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M102','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M102">View MathML</a> is binary. As a consequence, we can decompose <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M101','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M101">View MathML</a> into two terms: one which depends on <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M102','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M102">View MathML</a> and one which does not. For this, we pick a scalar μ and rewrite <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M101','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M101">View MathML</a> as

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M107','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M107">View MathML</a>

(19)

where <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M108','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M108">View MathML</a> and

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M109','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M109">View MathML</a>

Plugging Equation 19 into Equation 18 yields <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M110','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M110">View MathML</a> as sum of two terms

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M111','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M111">View MathML</a>

(20)

Rearranging terms in <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M112','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M112">View MathML</a>, we get

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M113','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M113">View MathML</a>

Now, <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M114','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M114">View MathML</a>, hence

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M115','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M115">View MathML</a>

(21)

The two equations above encapsulate our main idea for improving on ZR. In showing that <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M116','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M116">View MathML</a> we summed over the two outcomes <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M117','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M117">View MathML</a>, thus identifying a noise contribution in the ZR estimator <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M118','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M118">View MathML</a> for <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M110','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M110">View MathML</a> which vanishes through the averaging by the sampling procedure. Note that the remaining contribution <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M120','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M120">View MathML</a> has as factor <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M121','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M121">View MathML</a>, a term which explicitly reflects how a NMDA-event at time <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M88','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M88">View MathML</a> contributes to the generation of somatic action potentials. In going from Equation 20 to Equation 21, we assumed that the parameter μ was constant. However, a quick perusal of the above derivation shows that this is not really necessary. For justifying Equation 21, one just needs that μ does not depend on <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M102','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M102">View MathML</a>, so that <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M124','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M124">View MathML</a> is indeed independent of <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M102','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M102">View MathML</a>. In the sequel, it shall turn to be useful to introduce a value of μ which depends on somatic quantities.

A drawback of Equations 20 and 21 is that they do not immediately lend themselves to Monte-Carlo estimation by sampling the process generating neuronal events. The reason being the missing term <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M126','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M126">View MathML</a> in the formula for <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M120','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M120">View MathML</a>. To reintroduce the term, we set

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M128','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M128">View MathML</a>

(22)

and in view of Equations 20 and 21 have

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M129','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M129">View MathML</a>

Hence, <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M130','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M130">View MathML</a> is an unbiased estimator of <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M110','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M110">View MathML</a> and, since <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M110','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M110">View MathML</a> gives the contribution to <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M98','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M98">View MathML</a> from the kth time step,

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M134','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M134">View MathML</a>

(23)

is an unbiased estimator of <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M98','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M98">View MathML</a>. Note that, while unavoidable, the above recasting of the gradient calculation as an estimation procedure does seem risky. Due to the division by <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M136','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M136">View MathML</a> in introducing <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M137','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M137">View MathML</a>, Equation 22, rare somatic spike trains Z can potentially lead to large values of the estimator <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M138','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M138">View MathML</a>.

To obtain a CR estimator <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M139','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M139">View MathML</a> for the expected reward <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M42','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M42">View MathML</a> in our original problem, we now just need to take δ to 0 in Equation 23 and tidy up a little. The detailed calculations are presented in Appendix 1, here we just display the final result:

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M141','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M141">View MathML</a>

(24)

In contrast to the ZR-estimator, <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M139','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M139">View MathML</a> depends on somatic quantities via <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M143','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M143">View MathML</a> which assesses the effect of having a NMDA-event at time t on the probability of the observed somatic spike train. This requires the integration over the duration Δ of a NMDA-spike.

The CR-rule can be written as the sum of two terms, a time-discrete one depending on the NMDA-events <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M144','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M144">View MathML</a>, and a time-continuous one depending on the instantaneous NMDA-rate, both weighted by the effect of an NMDA-event on the probability of producing the somatic spike train:

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M145','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M145">View MathML</a>

5 Performance of zone and cell reinforcements

To compare the two plasticity rules, we first consider a rudimentary learning scenario where producing a somatic spike during a trial of duration <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M146','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M146">View MathML</a> is deemed an incorrect response, resulting in reward <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M147','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M147">View MathML</a>. The correct response is not to spike (<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M148','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M148">View MathML</a>) and this results in a reward of 0. With these reward signals, synaptic updates become less frequent as performance improves. This compensates somewhat for having a constant learning rate instead of the decreasing schedule which would ensure proper convergence of the stochastic gradient procedure. We use <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M149','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M149">View MathML</a> for the NMDA-spike strength in Equation 5, so that just 2-3 concurrent NMDA-spikes are likely to generate a somatic action potential. The input pattern X is held fixed and initial weight values are chosen so that correct and incorrect responses are equally likely before learning. Simulation details are given in Appendix 2. Given our choice of a and the initial weights, dendritic activity is already fairly low before learning and decreasing it to a very low level is all that is required for good performance in this simple task (Figure 2).

thumbnailFig. 2. Learning to stay quiescent. (A) Learning curves for cell reinforcement (blue) and zone reinforcement (red) when the neuron should not respond with any somatic firing to one pattern which is repeatedly presented. Values shown are averages over 40 runs with different initial weights and a different input pattern. (B) Distributions of the performance after 1500 trials. (C) A bad run of the CR-rule where performance drops dramatically after the 397th pattern presentation. The grey points show the Euclidean norm of the change <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M150','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M150">View MathML</a> in the neurons weight matrix W, highlighting the excessively large synaptic update after trial 397. (D) Time course of the somatic potential during trial 397 (the straight line at <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M151','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M151">View MathML</a> marks a somatic spike). As shown more clearly by the blow-up in the bottom row an NMDA-spike occurring at <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M152','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M152">View MathML</a> yields a value of U which stays strongly positive for some <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M153','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M153">View MathML</a>. (U drops thereafter because a NMDA-spike in a different zone ends.) Improbably, however, the sustained elevated value of U after <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M154','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M154">View MathML</a> does not lead to a somatic spike. Hence, the likelihood of the observed somatic response Z given the activity <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M19','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M19">View MathML</a> in the zone ν where the NMDA-spike at time <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M154','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M154">View MathML</a> occurred is quite small, <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M157','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M157">View MathML</a>. Indeed, the actual somatic response would have been much more likely without the NMDA-spike, <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M158','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M158">View MathML</a>. The discrepancy between the two probabilities yields a large value of <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M159','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M159">View MathML</a> in Equation 24, leading to the strong weight change. Error bars in the figure show 1 SEM.

Simulations for ZR and CR (with a constant value of <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M160','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M160">View MathML</a>) are shown in panel 6A. Given the sophistication of the rule, the performance of CR is disappointing, yielding on average only a modest improvement over ZR. The histogram in panel 6B shows that in most cases CR does in fact learn substantially faster than ZR but, in contrast to ZR, CR spectacularly fails on some runs. Performance in a bad run of the CR-rule is shown in panel 6C, revealing that performance can deteriorate in a single trial. In this trial, a very unlikely somatic response was observed (panel 6D), resulting in a large value of <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M161','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M161">View MathML</a>, thus leading to an excessively large change in synaptic strength.

The finding that large fluctuations in the CR-estimator can arise from rare somatic events, confirms the suspicion in Section 4.2 that recasting Equation 20 as a sampling procedure can lead to problems. Luckily, this can be addressed using the additional degree of freedom provided by the parameter μ in the CR-rule. To dampen the effect of the fluctuations in <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M161','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M161">View MathML</a>, we set μ to the time-dependent value

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M163','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M163">View MathML</a>

(25)

Note that μ is independent of whether or not <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M164','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M164">View MathML</a>. Hence, in view of our remark following Equation 21, this is in fact a valid choice for μ. The specific form of Equation 25 is to some extent motivated by the aesthetic considerations. It simplifies the first line of Equation 24 to

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M165','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M165">View MathML</a>

(26)

We refer to this estimator as balanced cell reinforcement (bCR) (Figure 3).

thumbnailFig. 3. Balanced cell reinforcement (bCR, Equation 26) compared to zone reinforcement. (A) Average performance of bCR (green) and ZR (red) on the same task as in panel 6A. (B) Performance when learning stimulus-response associations for four different patterns; bCR (green), ZR (red), a logarithmic scale is used for the x-axis. The inset shows the distribution of NMDA-spike durations after learning the task with bCR. The performance values in the figure are averages over 40 runs, and error bars show 1 SEM. (C) Development of the average reward signal <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M166','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M166">View MathML</a> for bCR (green) and ZR (red) when the task is to spike at the mid time of the single input pattern (<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M167','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M167">View MathML</a>, where <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M168','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M168">View MathML</a>, <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M169','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M169">View MathML</a>, is the ith of the n output spike times, <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M170','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M170">View MathML</a> the target spike time, and <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M171','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M171">View MathML</a> the pattern duration; if there was no output spike within <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M172','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M172">View MathML</a> we added one at T, yielding <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M173','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M173">View MathML</a>). (D) Spike raster plot of the output spike times Z with <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M166','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M166">View MathML</a> shown in C using bCR. With ZR, the distribution of spike times after 3000 trials roughly corresponds to the one for bCR after 160 trials (vertical line at ∗), where the two performances coincide (see ∗ and black lines in C). The mean and standard deviation of the spike times at the end of the learning process, averaged across the last 300 trials, was <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M175','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M175">View MathML</a> and <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M176','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M176">View MathML</a> for bCR and ZR, respectively.

From the third line of Equation 24, one sees that the somato-dendritic interaction term in Equation 26 can be written as <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M177','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M177">View MathML</a>. This highlights the terms role as assessing the relevance to the produced somatic spike train of having an NMDA-event at time t. In this, it is analogous to the <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M178','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M178">View MathML</a> terms in the CR-rule. But in contrast to these terms, <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M179','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M179">View MathML</a> is bounded. In ZR, plasticity is driven by the exploration inherent in the stochasticity of NMDA-event generation. Formally, this is reflected by the difference <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M180','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M180">View MathML</a> entering as a factor in Equation 15, which represents the deviation of the sampled NMDA-events from the expected rate. In bCR, this difference has become a sum. Hence, exploration at the NMDA-event level is only of minor importance for the bCR-rule, where the essential driving force for plasticity is the somatic exploration entering through the factor <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M179','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M179">View MathML</a>.

Due to the modification, bCR consistently and markedly improves on ZR, as demonstrated by panel 5A which compares the learning curves for the same task as in panel 6A. The performance improvement seems to become even larger for more demanding tasks. This is highlighted by panel 5B showing the performance when not just one but four different stimulus-response associations have to be learned. For two of the patterns, the correct somatic response was to emit at least one spike, for the other two patterns the correct response was to stay quiescent. One of the four stimulus-response associations was randomly chosen on each trial and, as before, correct somatic responses lead to a reward signal of <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M182','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M182">View MathML</a> whereas incorrect responses resulted in <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M183','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M183">View MathML</a>. The inset to panel 5B shows the distribution of NMDA-spike durations after learning the four stimulus-response associations with bCR. Over 70% of the NMDA-spikes last for just a little longer than the minimal length of <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M184','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M184">View MathML</a>. Further nearly all of the spikes are shorter than <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M185','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M185">View MathML</a>, thus staying well within a physiologically reasonable range.

Panels 5C and 5D show results in a task where reward delivery is contingent on an appropriate temporal modulation of the firing rate. Also, in this second output coding paradigm, the bCR-update is found to be much more efficient in estimating the gradient of the expected reward.

6 Discussion

We have derived a class of synaptic plasticity rules for reinforcement learning in a complex neuronal cell model with NMDA-mediated dendritic nonlinearities. The novel feature of the rules is that the plasticity response to the external reward signal is shaped by the interaction of global somatic quantities with variables local to the dendritic zone where the nonlinear response to the synaptic release arises. Simulation results show that such so-called CR rules can strongly enhance learning performance compared to the case where the plasticity response is determined just from quantities local to the dendritic zone.

In the simulations, we have considered only a very simple task with a single complex cell learning stimulus-response associations. The results, however, show that compared to ZR the bCR rule provides a less noisy procedure for estimating the gradient of the log-likelihood of the somatic response given the neuronal input (<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M186','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M186">View MathML</a>). Estimating this gradient for each neuron is also the key step for reinforcement learning in networks of complex cells [13]. Further, simply memorizing the gradient estimator with an eligibility trace until reward information becomes available, yields a learning procedure for partially observable Markov decision processes, i.e., tasks where the somatic response may have an influence on which stimuli are subsequently encountered and where reward delivery may be contingent on producing a sequence of appropriate somatic responses [22-24]. The quality of the gradient estimator is a crucial factor also in these cases. Hence, it is safe to assume that the observed performance advantage of the bCR rules carries over to learning scenarios which are much more complex than the ones considered here.

In this investigation, we have adopted a normative perspective, asking how the different variables arising in a complex neuronal model should interact in shaping the plasticity response - striving for maximal mathematical transparency and not for maximal biological realism. Ultimately, of course, we have to face the question of how instructive the obtained results are for modeling biological reality. The question has two aspects which we will address in turn: (A) Can the quantities shaping the plasticity response be read-out at the synapse? (B) Is the computational structure of the rules feasible?

(A) The global quantities in CR are the timing of somatic spikes as well as the value of the somatic potential. The fact that somatic spiking can modulate plasticity is well established by STDP experiments (spike timing-dependent plasticity). In fact such experiments can also provide phenomenological evidence for the modulation of synaptic plasticity by the somatic potential, or at least by a low-pass filtered version thereof. The evidence arises from the fact that the synaptic change for multiple spike interactions is not a linear superposition of the plasticity found when pairing a single pre-synaptic and a somatic spike. Explaining the discrepancy seems to require the introduction of the somatic potential as an additional modulating factor [25].

In CR-learning, however, we assume that the somatic potential U (Equation 5) can differ substantially from a local membrane potential <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M8">View MathML</a> (Equation 1) and both potentials have to be read-out by a synapse located in the νth dendritic zone. In a purely electrophysiological framework, this is nonsensical. The way out is to note that what a synapse in CR-learning really needs is to differentiate between the total current flow into the neuron and the flow resulting from AMPA-releases in its local dendritic NMDA-zone. While the differential contribution of the two flows is going to be indistinguishable in any local potential reading, the difference could conceivably be established from the detailed ionic composition giving rise to the local potential at the synapse. A second, perhaps more likely, option arises when one considers that NMDA-spiking is widely believed to rely on the pre-binding of Glutamate to NMDA-receptors [7]. Hence, <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M8">View MathML</a> could simply be the level of such NMDA-receptor bound Glutamate, whereas U is relatively reliably inferred from the local potential. Such a reinterpretation does not change the basic structure of our model, although it might require adjusting some of the time constants governing the build up of <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M8">View MathML</a>.

(B) The plasticity rules considered here integrate over the duration T corresponding to the period during which somatic activity determines eventual reward delivery. But synapses are unlikely to know when such a period starts and ends. As in previous works [12,18], this can be addressed by replacing the integral by a low-pass filter with a time constant matched to the value of T. The CR-rules, however, when evaluating <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M143','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M143">View MathML</a> to assess the effect of an NMDA-spike, require a second integration extending from time t into the future up to <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M191','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M191">View MathML</a>. The acausality of integrating into the future can be taken care of by time shifting the integration variable in the first line of Equation 24, and similarly for Equation 26. But the time shifted rules would require each synapse to buffer an impressive number of quantities. Hence, further approximations seem unavoidable and, in this regard, the bCR-rule (Equation 26) seem particularly promising due to its relatively simple structure. Approximating the hyperbolic tangent in the rule by a linear function yields an update which can be written as a proper double integral. This is an important step in obtaining a rule which can be implemented by a biologically reasonable cascade of low-pass filters.

The derivation of the CR-rules presented above builds on previous work on reinforcement learning in a population of spiking point neurons [18,24,26]. But in contrast to neuronal firings, NMDA-spikes have a non-negligible extended duration and this makes the plasticity problem in our complex cell model more involved. The previous works introduced a feedback signal about the population decision which has a role similar to the somatic feedback in the present CR-rules. A key difference, however, is that the population feedback had to be temporally coarse grained since possible delivery mechanisms such as changing neurotransmitters levels are slow. In a complex cell model, however, a close to instantaneous somatic feedback can be assumed. As a consequence, the CR-rules can now support reinforcement learning also when the precise timing of somatic action potentials is crucial for reward delivery. Yet, if the soma only integrates NMDA-spikes which extend across <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M192','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M192">View MathML</a> or more, it appears to be difficult to reach a higher temporal precision in the somatic firing. In real neurons, the temporal precision is likely to result from the interaction of NMDA-spikes with AMPA-releases, with the NMDA-spikes determining periods of heightened excitability during which AMPA-releases can easily trigger a precise somatic action potential. While important in terms of neuronal functionality, incorporating the direct somatic effect of AMPA-releases into the model poses no mathematical challenge, just yielding additional plasticity terms similar to the ones for point neurons [20]. To focus on the main mathematical issues, we have not considered such direct somatic effects here.

Appendix 1

Here, we detail the steps leading from Equation 22 for <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M138','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M138">View MathML</a> to Equation 24 for <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M139','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M139">View MathML</a>.

We first obtain a more explicit form for <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M138','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M138">View MathML</a>. In view of Equation 22, <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M196','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M196">View MathML</a> if <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M197','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M197">View MathML</a>, whereas <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M198','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M198">View MathML</a> if there is NMDA-triggering at time <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M88','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M88">View MathML</a>. Hence, setting

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M200','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M200">View MathML</a>

and hence

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M201','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M201">View MathML</a>

Further, from Equation 16,

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M202','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M202">View MathML</a>

Hence, taking the limit <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M203','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M203">View MathML</a>, we obtain

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M204','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M204">View MathML</a>

equivalent to the first equation in Equation 24.

We next need an explicit expression for <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M143','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M143">View MathML</a>. Going back to its definition (Equation 24) and using Equations 7 and 12 yields

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M206','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M206">View MathML</a>

We next note that times s outside of the interval <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M207','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M207">View MathML</a> do not contribute to the above integrals since <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M208','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M208">View MathML</a> for such s. Further, <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M209','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M209">View MathML</a> for <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M210','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M210">View MathML</a>. Hence,

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M211','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M211">View MathML</a>

For the term in square brackets we note that, since <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M212','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M212">View MathML</a> is zero or one, <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M213','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M213">View MathML</a>. Hence, finally,

<a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M214','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M214">View MathML</a>

which gives the last line of (Equation 24).

Appendix 2

Here, we provide the remaining simulation details.

An input pattern has a duration of <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M171','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M171">View MathML</a> and is made up from 150 fixed spike trains chosen independently from a Poisson process with a mean firing rate of <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M216','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M216">View MathML</a> (independent realizations are used for each pattern). We think of the input as being generated by an input layer with 150 sites, with each NMDA-zone having a 50% probability of being connected to one of the sites. Hence, on average a NMDA-zone receives 75 input spike trains and 37.5 spike trains are shared between any two NMDA-zones.

A roughly optimized learning rate was used for all tasks and learning rules. Roughly, optimized means that the used learning rate <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M217','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M217">View MathML</a> yields a performance which is better that when using <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M218','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M218">View MathML</a> or <a onClick="popup('http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M219','MathML',630,470);return false;" target="_blank" href="http://www.mathematical-neuroscience.com/content/2/1/2/mathml/M219">View MathML</a>.

In obtaining the learning curves, for each run a moving average of the actual trial by trial performance was computed using an exponential filter with time constant 0.1. Mean learning curves where subsequently obtained by averaging over 40 runs. The exception to this is the single run learning curve in panel 6C. There, subsequently to each learning trial, 100 non-learning trials were used for estimating mean performance.

Initial weights for each run were picked independently from a Gaussian with mean and variance equal to 0.5. Euler’s method with a time step of 0.2 ms was used for numerically integrating the differential equations.

Competing interests

The authors declare that they have no competing interests.

Acknowledgements

This study was supported by the Swiss National Science Foundation (SNSF, sinergia grant CRSIKO 122697/1) and a grant of the Swiss SystemsX.ch initiative (Neurochoice, evaluated by the SNSF).

References

  1. Polsky A, Mel BW, Schiller J: Computational subunits in thin dendrites of pyramidal cells.

    Nat Neurosci 2004, 7(Jun):621-627. PubMed Abstract | Publisher Full Text OpenURL

  2. Maass W: Computation with spiking neurons. In The Handbook of Brain Theory and Neural Networks. Edited by Arbib MA. MIT Press, Cambridge; 2003:1080-1083. OpenURL

  3. Poirazi P, Brannon T, Mel BW: Pyramidal neuron as two-layer neural network.

    Neuron 2003, 37(Mar):989-999. PubMed Abstract | Publisher Full Text OpenURL

  4. Nevian T, Larkum ME, Polsky A, Schiller J: Properties of basal dendrites of layer 5 pyramidal neurons: a direct patch-clamp recording study.

    Nat Neurosci 2007, 10(Feb):206-214. PubMed Abstract | Publisher Full Text OpenURL

  5. Zhou WL, Yan P, Wuskell JP, Loew LM, Antic SD: Dynamics of action potential backpropagation in basal dendrites of prefrontal cortical pyramidal neurons.

    Eur J Neurosci 2008, 27(Feb):923-936. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  6. Schiller J, Major G, Koester HJ, Schiller Y: NMDA spikes in basal dendrites of cortical pyramidal neurons.

    Nature 2000, 404(Mar):285-289. PubMed Abstract | Publisher Full Text OpenURL

  7. Schiller J, Schiller Y: NMDA receptor-mediated dendritic spikes and coincident signal amplification.

    Curr Opin Neurobiol 2001, 11(Jun):343-348. PubMed Abstract | Publisher Full Text OpenURL

  8. Major G, Polsky A, Denk W, Schiller J, Tank DW: Spatiotemporally graded NMDA spike/plateau potentials in basal dendrites of neocortical pyramidal neurons.

    J Neurophysiol 2008, 99(May):2584-2601. PubMed Abstract | Publisher Full Text OpenURL

  9. Larkum ME, Zhu JJ, Sakmann B: A new cellular mechanism for coupling inputs arriving at different cortical layers.

    Nature 1999, 398(Mar):338-341. PubMed Abstract | Publisher Full Text OpenURL

  10. Larkum ME, Nevian T, Sandler M, Polsky A, Schiller J: Synaptic integration in tuft dendrites of layer 5 pyramidal neurons: a new unifying principle.

    Science 2009, 325(Aug):756-760. PubMed Abstract | Publisher Full Text OpenURL

  11. Seung H: Learning in spiking neural networks by reinforcement of stochastic synaptic transmission.

    Neuron 2003, 40:1063-1073. PubMed Abstract | Publisher Full Text OpenURL

  12. Fremaux N, Sprekeler H, Gerstner W: Functional requirements for reward-modulated spike-timing-dependent plasticity.

    J Neurosci 2010, 30(Oct):13326-13337. PubMed Abstract | Publisher Full Text OpenURL

  13. Williams R: Simple statistical gradient-following algorithms for connectionist reinforcement learning.

    Mach Learn 1992, 8:229-256. OpenURL

  14. Matsuda Y, Marzo A, Otani S: The presence of background dopamine signal converts long-term synaptic depression to potentiation in rat prefrontal cortex.

    J Neurosci 2006, 26:4803-4810. PubMed Abstract | Publisher Full Text OpenURL

  15. Seol G, Ziburkus J, Huang S, Song L, Kim I, Takamiya K, Huganir R, Lee H, Kirkwood A: Neuromodulators control the polarity of spike-timing-dependent synaptic plasticity.

    Neuron 2007, 55:919-929.

    Erratum in: Neuron56:754.

    PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  16. Pawlak V, Kerr JN: Dopamine receptor activation is required for corticostriatal spike-timing-dependent plasticity.

    J Neurosci 2008, 28(Mar):2435-2446. PubMed Abstract | Publisher Full Text OpenURL

  17. Werfel J, Xie X, Seung HS: Learning curves for stochastic gradient descent in linear feedforward networks.

    Neural Comput 2005, 17:2699-2718. PubMed Abstract | Publisher Full Text OpenURL

  18. Urbanczik R, Senn W: Reinforcement learning in populations of spiking neurons.

    Nat Neurosci 2009, 12:250-252. PubMed Abstract | Publisher Full Text OpenURL

  19. Dayan P, Abbott L: Theoretical Neuroscience. MIT Press, Cambridge; 2001. OpenURL

  20. Pfister J, Toyoizumi T, Barber D, Gerstner W: Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning.

    Neural Comput 2006, 18:1318-1348. PubMed Abstract | Publisher Full Text OpenURL

  21. Bertsekas DP, Tsitsiklis JN: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Englewood Cliffs; 1989. OpenURL

  22. Baxter J, Bartlett P: Infinite-horizon policy-gradient estimation.

    J Artif Intell Res 2001, 15:319-350. OpenURL

  23. Baxter J, Bartlett P, Weaver L: Experiments with infinite-horizon, policy-gradient estimation.

    J Artif Intell Res 2001, 15:351-381. OpenURL

  24. Friedrich J, Urbanczik R, Senn W: Spatio-temporal credit assignment in neuronal population learning.

    PLoS Comput Biol 2011., 7(Jun): OpenURL

  25. Clopath C, Büsing L, Vasilaki E, Gerstner W: Connectivity reflects coding: a model of voltage-based STDP with homeostasis.

    Nat Neurosci 2010, 13(Mar):344-352. PubMed Abstract | Publisher Full Text OpenURL

  26. Friedrich J, Urbanczik R, Senn W: Learning spike-based population codes by reward and population feedback.

    Neural Comput 2010, 22:1698-1717. PubMed Abstract | Publisher Full Text OpenURL