{"title": "SLAYER: Spike Layer Error Reassignment in Time", "book": "Advances in Neural Information Processing Systems", "page_first": 1412, "page_last": 1421, "abstract": "Configuring deep Spiking Neural Networks (SNNs) is an exciting research avenue for low power spike event based computation. However, the spike generation function is non-differentiable and therefore not directly compatible with the standard error backpropagation algorithm. In this paper, we introduce a new general backpropagation mechanism for learning synaptic weights and axonal delays which overcomes the problem of non-differentiability of the spike function and uses a temporal credit assignment policy for backpropagating error to preceding layers. We describe and release a GPU accelerated software implementation of our method which allows training both fully connected and convolutional neural network (CNN) architectures. Using our software, we compare our method against existing SNN based learning approaches and standard ANN to SNN conversion techniques and show that our method achieves state of the art performance for an SNN on the MNIST, NMNIST, DVS Gesture, and TIDIGITS datasets.", "full_text": "SLAYER: Spike Layer Error Reassignment in Time\n\nSumit Bam Shrestha\u2217\n\nTemasek Laboratories @ NUS\nNational University of Singapore\n\nSingapore, 117411\n\nGarrick Orchard\u2020\n\nTemasek Laboratories @ NUS\nNational University of Singapore\n\nSingapore, 117411\n\ntslsbs@nus.edu.sg\n\ntslgmo@nus.edu.sg\n\nAbstract\n\nCon\ufb01guring deep Spiking Neural Networks (SNNs) is an exciting research avenue\nfor low power spike event based computation. However, the spike generation\nfunction is non-differentiable and therefore not directly compatible with the stan-\ndard error backpropagation algorithm. In this paper, we introduce a new general\nbackpropagation mechanism for learning synaptic weights and axonal delays which\novercomes the problem of non-differentiability of the spike function and uses a\ntemporal credit assignment policy for backpropagating error to preceding layers.\nWe describe and release a GPU accelerated software implementation of our method\nwhich allows training both fully connected and convolutional neural network (CNN)\narchitectures. Using our software, we compare our method against existing SNN\nbased learning approaches and standard ANN to SNN conversion techniques and\nshow that our method achieves state of the art performance for an SNN on the\nMNIST, NMNIST, DVS Gesture, and TIDIGITS datasets.\n\n1\n\nIntroduction\n\nArti\ufb01cial Neural Networks (ANNs), especially Deep Neural Networks, have become the go-to tool for\nmany machine learning tasks. ANNs achieve state of the art performance in applications ranging from\nimage classi\ufb01cation and object recognition, to object tracking, signal processing, natural language\nprocessing, self driving cars, health care diagnostics, and many more. In the currently popular second\ngeneration of ANNs, backpropagation of the error signal to the neurons in preceding layer is the key\nto their learning prowess.\nHowever, ANNs generally require powerful GPUs and computing clusters to crunch their inputs\ninto useful outputs. Therefore, in scenarios where power consumption is constrained, on-site use of\nANNs may not be a viable option. On the other hand, biologically inspired spiking neurons have\nlong shown great theoretical potential as ef\ufb01cient computational units [1\u20133] and recent advances in\nSpiking Neural Network (SNN) hardware [4\u20136] have renewed research interest in this area.\nSNNs are similar to ANNs in terms of network topology, but differ in the choice of neuron model.\nSpiking neurons have memory and use a non-differentiable spiking neuron model (spike function)\nwhile ANNs typically have no memory and model each neuron using a continuously differentiable\nactivation function. Since the spike function is non-differentiable, the backpropagation mechanism\nused to train ANNs cannot be directly applied.\nNevertheless, a handful of supervised learning algorithms for SNNs have been proposed previously.\nThe majority of them are designed for a single neuron [7\u20139], but a few have proposed methods to work\naround the non-differentiable spike function and backpropagate error through multiple layers [10\u201314].\n\n\u2217\n\u2020\n\nbam_sumit@hotmail.com\nwww.garrickorchard.com , garrickorchard@gmail.com\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fEvent based methods such as SpikeProp [10] and EvSpikeProp [11] have the derivative term de\ufb01ned\nonly around the \ufb01ring time, whereas [12\u201314] ignore the temporal effect of spike signal. In Section 3.1\nwe describe the strengths and weaknesses of these approaches in more detail.\nThe main contribution of this paper is a general method of error backpropagation for SNNs (Section 3)\nwhich we call Spike LAYer Error Reassignment (SLAYER). SLAYER distributes the credit of error\nback through the SNN layers, much like the traditional backprop algorithm distributes error back\nthrough an ANN\u2019s layers. However, unlike backprop, SLAYER also distributes the credit of error\nback in time because a spiking neuron\u2019s current state depends on its previous states (and therefore, on\nthe previous states of its input neurons). SLAYER can simultaneously learn both synaptic weights\nand axonal delays, which only a few previous works have attempted [15, 16].\nWe have developed and released3 a CUDA accelerated framework to train SNNs using SLAYER.\nWe demonstrate SLAYER achieving state of the art accuracy for an SNN on neuromorphic datasets\n(Section 4) for visual digit recognition, action recognition, and spoken digit recognition.\nThe rest of the paper is organized as follows. We start by introducing notation for a general model\nof a spiking neuron and extending it to a multi-layer SNN in Section 2. Then, in Section 3 we\ndiscuss previously published methods for learning SNN parameters before deriving the SLAYER\nbackpropagation formulae. In Section 4, we demonstrate the effectiveness of SLAYER on different\nbenchmark datasets before concluding in Section 5.\n\n2 Spiking Neural Network: Background\n\nAn SNN is a type of ANN that uses more biologically realistic spiking neurons, as its computational\nunits. In this Section we introduce a model for a spiking neuron before extending the formulation to\na multi-layer network of spiking neurons (SNN).\n\n2.1 Spiking Neuron Model\n\nSpiking neurons obtain their name from the fact that they only communicate using voltage spikes.\nAll inputs and outputs to the neuron are in the form of spikes, but the neuron maintains an internal\nstate over time. In this paper, we will use a simple yet versatile spiking neuron model known as the\nSpike Response Model (SRM) [17], described below.\n\nConsider an input spike train to a neuron, si(t) =(cid:80)\n\n). Here t(f )\n\nf \u03b4(t \u2212 t(f )\n\nis the time of the f th\nspike of the ith input. In SRM, the incoming spikes are converted into a spike response signal, ai(t),\nby convolving si(t) with a spike response kernel \u03b5(\u00b7). This can be written as ai(t) = (\u03b5 \u2217 si)(t).\nSimilarly, the refractory response of a neuron is represented as (\u03bd \u2217 s)(t), where \u03bd(\u00b7) is the refractory\nkernel and s(t) is the neuron\u2019s output spike train.\nEach spike response signal is scaled by a synaptic weight wi to generate a Post Synaptic Poten-\ntial (PSP). The neuron\u2019s state (membrane potential), u(t), is simply the sum of all PSPs and refractory\nresponses\n\ni\n\ni\n\n(cid:88)\n\nu(t) =\n\nwi (\u03b5 \u2217 si)(t) + (\u03bd \u2217 s)(t) = w(cid:62)a(t) + (\u03bd \u2217 s)(t).\n\n(1)\n\ni\n\nAn output spike is generated whenever u(t) reaches a prede\ufb01ned threshold \u03d1. More formally, the\nspike function fs(\u00b7) is de\ufb01ned as\n\nfs(u) : u \u2192 s, s(t) := s(t) + \u03b4(t \u2212 t(f +1)) where t(f +1) = min{t : u(t) = \u03d1, t > t(f )}.\n\n(2)\nUnlike the activation functions used in non-spiking ANNs, the derivative of the spike function is\nunde\ufb01ned which is a major obstacle for backpropagating error from output to input for SNNs. Also,\nnote that the effect of an input spike is distributed in future via the spike response kernels which is\nthe reason for temporal dependency in the spiking neuron.\nThe above formulation can be extended to include axonal delays by rede\ufb01ning the spike response\nkernel as \u03b5d(t) = \u03b5(t \u2212 d), where d \u2265 0 is the axonal delay.4\n\n3 The code for SLAYER learning framework is publicly available at: https://bitbucket.org/bamsumit/slayer\n\nA brief video description of this work is available at: https://www.youtube.com/watch?v=JGdatqqci5o\n\n4 Synaptic delay can also be modelled in similar manner. Here, we only consider axonal delay for simplicity.\n\n2\n\n\f2.2 SNN Model\n\nHere we describe a feedforward neural network architecture with nl layers. This formulation applies\nto fully connected, convolutional, as well as pooling layers. For implementation details, refer to\nsupplementary material. Consider a layer l with Nl neurons, weights W (l) = [w1,\u00b7\u00b7\u00b7 , wNl+1](cid:62) \u2208\nRNl+1\u00d7Nl and axonal delays d(l) \u2208 RNl. Then the network forward propagation is as described\nbelow.\n(3)\n(4)\n(5)\n\nu(l+1)(t) = W (l) a(l)(t) + (\u03bd \u2217 s(l+1))(t)\ns(l+1)(t) = fs(u(l+1)(t))\n\na(l)(t) = (\u03b5d \u2217 s(l))(t)\n\nAlso note that the inputs, s(0)(t) = sin(t), and outputs, sout(t) = s(nl)(t), are spike trains rather than\nnumeric values.\n\n3 Backpropagation in SNN\n\nIn this Section, we \ufb01rst discuss prior works on learning in SNNs before presenting the details of error\nbackpropagation using SLAYER.\n\n3.1 Existing Methods\n\nPrevious works which use learning to con\ufb01gure a deep SNN (multiple hidden layers) can be grouped\ninto three main categories. The \ufb01rst category uses an ANN to train an equivalent shadow network.\nThe other two categories train directly on the SNN but differ in how they approximate the derivative\nof the spike function.\nThe \ufb01rst category leverages learning methods for conventional ANNs by training an ANN and\nconverting it to an SNN [18\u201325] with some loss of accuracy. There are different approaches to\novercome the loss of accuracy such as introducing extra constraints on neuron \ufb01ring rate [23],\nscaling the weights [23\u201325], constraining the network parameters [20], formulating an equivalent\ntransfer function for a spiking neuron [19\u201322], adding noise in the model [21, 22], using probabilistic\nweights [18] and so on.\nThe second category keeps track of the membrane potential of spiking neurons only at spike times\nand backpropagates errors based only on membrane potentials at spike times. Examples include\nSpikeProp [10] and its derivatives [11, 26]. These methods are prone to the \u201cdead neuron\u201d problem:\nwhen no neurons spike, no learning occurs. Heuristic measures are required to revive the network\nfrom such a condition.\nThe third category of methods backpropagate errors based on the membrane potential of a spiking\nneuron at a single time step only. Different methods are used to approximate the derivative of the\nspike function. Panda et al. [12] use an expression similar to that of a multi-layer perceptron system,\nLee et al. [13] use small signal approximation at the spike times, and Zenke et al. [14] simply propose\na surrogate function to serve as the derivative. All these methods ignore the temporal dependency\nbetween spikes. They credit the error at a given time step to the input signals at that time step only,\nthus neglecting the effect of earlier spike inputs.\n\n3.2 Backpropagation using SLAYER\n\nIn this Section we describe the Loss Function (Section 3.2.1), how error is assigned to previous time-\npoints (Section 3.2.2), and how the derivative of the spike function is approximated (Section 3.2.4).\n\n3.2.1 The Loss Function\n(cid:17)2\nConsider a loss function for the network in time interval t \u2208 [0, T ], de\ufb01ned as\ne(nl)(s(nl)(t), \u02c6s(t))\n\nL(s(nl)(t), \u02c6s(t)) dt =\n\n(cid:90) T\n\n(cid:16)\n\nE:=\n\n(cid:90) T\n\n0\n\n0\n\n1\n2\n\n3\n\ndt\n\n(6)\n\n\f(cid:16)\n\n(cid:17)\ns(nl)(t) \u2212 \u02c6s(t)\n\ntime instance t and\nwhere \u02c6s(t) is the target spike train, L(s(nl)(t), \u02c6s(t)) is the loss at\ne(nl)(s(nl)(t), \u02c6s(t)) is the error signal at the \ufb01nal layer. For brevity we will write the error sig-\nnal as e(nl)(t) from here on.\nTo learn a target spike train \u02c6s(t) an error signal of the form\n\ne(nl)(t): = \u03b5 \u2217\n\n= a(nl)(t) \u2212 \u02c6a(t)\n\n(7)\n\nis a suitable choice. This loss function is similar to the van-Rossum distance [27].\nFor classi\ufb01cation tasks, a decision is typically made based on the number of output spikes during an\ninterval rather than the precise timing of the spikes. To handle such cases, the error signal during the\ninterval can be de\ufb01ned as\n\n(cid:18)(cid:90)\n\n(cid:90)\n\n(cid:19)\n\ne(nl)(t):=\n\ns(nl)(\u03c4 ) d\u03c4 \u2212\n\nTint\n\nTint\n\n\u02c6s(\u03c4 ) d\u03c4\n\n,\n\nt \u2208 Tint\n\n(8)\n\nand zero outside the interval Tint. Here we only need to de\ufb01ne the number of desired spikes during\nthe interval (the second integral term). The actual spike train \u02c6s(t) need not be de\ufb01ned.\n\n3.2.2 Temporal Dependencies to History\n\nIn the mapping from input spikes, s(l)(t), to membrane potential, u(l+1)(t), temporal dependencies\nare introduced due to spike response kernel \u03b5(\u00b7) which distributes the effect of input spikes into\nfuture time values i.e. the signal u(l+1)(t) is dependent on current as well as past values of inputs\ns(l)(t), t \u2264 t1. Step based learning approaches [12\u201314] ignore this temporal dependency and only\nuse signal values at the current time instance. Below we describe how SLAYER accounts for this\ntemporal dependency. Full details of the derivation are provided in the supplementary material.\nLet us, for the time being, discretize the system with a sampling time Ts such that t = n Ts, n \u2208 Z\nand use Ns to denote the total number of samples in the period t \u2208 [0, T ]. The signal values a(l)[n]\nand u(l)[n] have a contribution to future network losses at samples m = n, n + 1,\u00b7\u00b7\u00b7 , Ns. Taking\ninto account the temporal dependency, the gradient term is given by\n\nThe backpropagation estimate of error in layer l is then\n\n\u2207w(l)\n\ni\n\n\u2202L[n]\n\u2202w(l)\n\ni\n\n= Ts\n\nn=m\n\n\u2202L[n]\n\u2202u(l+1)\n\ni\n\n[m]\n\nn=0\n\nE:=Ts\n\n(cid:80)Ns\ne(l)[n]:=(cid:80)Ns\n\u03b4(l)[n]:=(cid:80)Ns\n\nm=n\n\n(cid:80)Ns\nm=0 a(l)[m](cid:80)Ns\n(cid:16)\nW (l)(cid:17)(cid:62) \u03b4(l+1)[n]\n\n\u2202L[m]\n\u2202a(l)[n] =\n\u2202L[m]\n\u2202u(l)[n] = f(cid:48)s(u(l)[n]) \u00b7\n\n(cid:0)\u03b5d (cid:12) e(l)(cid:1) [n].\n\n.\n\n(9)\n\n(10)\n\nm=n\n\n(11)\nHere (cid:12) represents element-wise correlation operation in time. The summation from n to Ns assigns\nthe credit of all the network losses in a future time to the neuron at current time. Note that at the\noutput layer, \u2202L[m]/\u2202a(nl )[n] = 0 for n (cid:54)= m which results in e(nl)[n] = \u2202L[n]/\u2202a(nl )[n]. This is in\nagreement with the de\ufb01nition of output layer error in (6).\n\nSimilarly for axonal delay with \u02d9a(l) =(cid:0) \u02d9\u03b5d \u2217 s(l)(cid:1), one can derive the delay gradient as follows.\n\n(cid:80)Ns\n\nn=0\n\n\u2207d(l)E:=Ts\n\n\u2202L[n]\n\n\u2202d(l) = \u2212Ts\n\n(cid:80)Ns\nm=0 \u02d9a(l)[m] \u00b7 e(l)[m]\n\n(12)\n\n3.2.3 The Derivative of the Spike Function\n\nThe derivative of the spike function is always a problem for supervised learning in a multilayer SNN.\nIn Section 3.1, we discussed how prior works handle the derivative. Below we describe how SLAYER\ndeals with the spike function derivative.\nConsider the state of a spiking neuron at time t = \u03c4. The neuron can either be in spiking state\n(u(\u03c4 ) \u2265 \u03d1) or non-spiking state (u(\u03c4 ) < \u03d1). Now consider a perturbation in the membrane potential\nby an amount \u2206u(\u03c4 ) = \u00b1\u2206\u03b6 for \u2206\u03b6 > 0.\nA neuron in the non-spiking state will switch to the spiking state when perturbed by +\u2206\u03b6 if\n+\u2206\u03b6 \u2265 \u03d1 \u2212 u(\u03c4 ). Similarly, a neuron in the spiking state will switch to the non-spiking state when\n\n4\n\n\f(a)\n\n(b)\n\nFigure 1: (a) Transition of a spiking neuron\u2019s state due to random perturbation \u2206\u03b6. (b) Probability\ndensity function of spike state change.\n\nperturbed by \u2212\u2206\u03b6 if \u2212\u2206\u03b6 < \u03d1 \u2212 u(\u03c4 ). In both the cases, when there is a change in spiking state of\nthe neuron when \u2206\u03b6 > |u(\u03c4 ) \u2212 \u03d1|. Fig. 1(a) shows these transitions. Therefore,\n\n\u2206s(\u03c4 )\n\u2206u(\u03c4 )\n\n=\n\nwhen \u2206\u03b6 > |u(\u03c4 ) \u2212 \u03d1|\notherwise\n\n.\n\n(13)\n\n(cid:40) \u03b4(t\u2212\u03c4 )\n\n\u2206\u03b6\n\n0\n\nThis formulation is still problematic because of Dirac-delta function. However, we can see that the\nderivative term is biased towards zero as |u(\u03c4 ) \u2212 \u03d1| increases. A good estimate of the derivative term\nf(cid:48)s(\u00b7) can be made using the probability of a change in spiking state.\nIf we denote the probability density function as \u03c1(t), then the probability of spiking state change in\nan in\ufb01nitesimal time window of width \u2206t around \u03c4 and a small perturbation \u2206\u03b6 \u2192 0 as \u2206u \u2192 0\ncan be written as \u03c1(\u03c4 ) \u2206\u03b6 \u2206t. Now, the expected value of f(cid:48)s(\u03c4 ) can be written as\n\n(cid:19)\n\n\u03c1(\u03c4 ) \u2206\u03b6 \u2206t\n\n1\n\n\u2206t \u2206\u03b6\n\n+ (1 \u2212 \u03c1(\u03c4 ) \u2206\u03b6 \u2206t) \u00d7 0\n\n= \u03c1(\u03c4 ).\n\n(14)\n\n(cid:18)\n\nE[f(cid:48)s(\u03c4 )] = lim\n\u2206\u03b6\u21920\n\u2206t\u21920\n\nThe derivative of spike function represents the Probability Density Function (PDF) for change of state\nof a spiking neuron. For a completely deterministic spiking neuron model, it is a sum of impulses at\nspike times, which is equivalent to the spike train s(t). Nevertheless, we can relax the deterministic\nnature of spiking neuron and use the stochastic spiking neuron approximation for backpropagating\nerrors.\nThe function \u03c1(t) = \u03c1(u(t) \u2212 \u03d1) must be high when u(\u03c4 ) is close to \u03d1 and must decrease as it moves\nfurther away. An example PDF is shown in Figure 1(b). A good formulation of this function is the\nspike escape rate function [28, 29] \u03c1(t) which is usually represented by an exponentially decaying\nfunction of \u03d1 \u2212 u(\u03c4 )\n\n\u03c1(t) = \u03b1 exp(\u2212\u03b2 |u(t) \u2212 \u03d1|).\n\n(15)\nThe negative portion of fast sigmoid function, used in [14], is also a suitable candidate for \u03c1(u(t)\u2212 \u03d1).\n3.2.4 The SLAYER Backpropagation Pipeline\nNow, applying the limit Ts \u2192 0 for (9) (10) and (11) and using the expectation value of f(cid:48)s(t), we\narrive at the SLAYER backpropagation pipeline.\n\n\u2202a(nl )\n\ne(l)(t) =\n\nif l = nl\n\n\uf8f1\uf8f2\uf8f3 \u2202L(t)\nW (l)(cid:17)(cid:62) \u03b4(l+1)(t) otherwise\n(cid:16)\n\u03b5d (cid:12) e(l)(cid:17)\n(cid:16)\n(cid:90) T\n(cid:17)(cid:62) dt\n(cid:16)\n\u03b4(l)(t) = \u03c1(l)(t) \u00b7\n(cid:90) T\n\u2207W (l)E =\n\n\u03b4(l+1)(t)\n\na(l)(t)\n\n(t)\n\n0\n\n\u2207d(l)E = \u2212\n\n0\n\n\u02d9a(l)(t) \u00b7 e(l)(t) dt\n\n5\n\n(16)\n\n(17)\n\n(18)\n\n(19)\n\nu\u2212\u03d1\u2206u\u2206s=\u03b4\u2206s=\u2212\u03b4\u2206s=0\u2206s=0beforeafterspikingnonspikingu\u2212\u03d1\u03c1PDFofspikestatechange\fThe gradients with respect to weights and delays are given by (18) and (19). It is straightforward to\nuse any of the optimization techniques from simple gradient descent method to adaptive methods\nsuch as RmsProp, ADAM, and NADAM to drive the network towards convergence.\n\n4 Experiments and Results\n\nIn this Section we will present different experiments conducted and results on them to evaluate the\nperformance of SLAYER. First, we train an SNN to produce a \ufb01xed Poisson spike train pattern in\nresponse to a given set of Poisson spike inputs. We use this simple example to show how SLAYER\nworks. Afterwards we present results of classi\ufb01cation tasks performed on both spiking datasets and\nnon-spiking datasets converted to spikes.\nSimulating an SNN is a time consuming process due to the additional temporal dimension of signals.\nAn ef\ufb01cient simulation framework is key to enabling training on practical spiking datasets. We use\nour CUDA accelerated SNN deep learning framework for SLAYER to perform all the simulations for\nwhich results are presented in this paper. All the accuracy values reported for SLAYER are averaged\nover 5 different independent trials. In our experiments, we use spike response kernels of the form\n\u03b5(t) = t/\u03c4s exp(1 \u2212 t/\u03c4s)\u0398(t) and \u03bd(t) = \u22122\u03d1 exp(1 \u2212 t/\u03c4r )\u0398(t). Here, \u0398(t) is the Heaviside step\nfunction. SLAYER, however, is independent of the choice of the kernels.\nThroughout this paper, we will use the following notation to indicate the SNN architecture. Layers\nare separated by - and spatial dimensions are separated by x. A convolution layer is represented by c\nand an aggregation layer is represented by a. For example 34x34x2-8c5-2a-5o represents a 4 layer\nSNN with 32\u00d734\u00d72 input, followed by 8 convolution \ufb01lters (5\u00d75), followed by 2\u00d72 aggregation\nlayer and \ufb01nally a dense layer connected to 5 output neurons.\n\n4.1 Poisson Spike Train\n\nThis is a simple experiment to help understand the learning process in SLAYER. A Poisson spike\ntrain was generated for 250 different inputs over an interval of 50 ms. Similarly a target spike train\nwas generated using a Poisson distribution. The task is to learn to \ufb01re the desired spike train for the\nrandom spike inputs using an SNN with 25 hidden neurons.\n\n(a)\n\n(b)\n\nFigure 2: (a) Spike Raster plot during Poisson spike train learning. (b) Snapshot of SLAYER\nbackpropagated learning signals at 20th learning epoch.\n\nThe learning plots are shown in Figure 2. From the learning spike raster, we can see that initially\nthere are output spikes distributed at random times (Figure 2(a) bottom). As learning progresses,\nthe unwanted spikes are suppressed and the spikes near the desired spike train are reinforced. The\nlearning \ufb01nally converges to the desired spike train at the 739th epoch. The learning snapshot at\nepoch 20 (Figure 2(b)), shows how the error signal is constructed. The spike raster for input, hidden\n\n6\n\nforwardpropagationbackpropagation01020304050time[ms]Layer0Layer1{Layer2,OutputDesiredspikes050u(t)03e(t)00.5\u03b4(t)\u2207W(1)E0204002004006008001,000time[ms]Epochoutputdesiredoutputhiddendesired\fand output layer is shown at the bottom, The blue plots show the respective signals for output layer.\nThe conversion from error signal, e, to delta signal, \u03b4, shows that the error credit assigned depends on\nthe membrane potential value u. Note the temporal credit assignment of error. A nonzero value of e\nresults in non-zero values of \u03b4 at earlier points in time, even if the error signal, e, was zero at those\ntimes. Similar observations can be made for hidden layer signals. Out of 25 hidden layer signals, one\nis highlighted in black and rest are shown faded.\n\nTable 1: Benchmark Classi\ufb01cation Results\nMethod\n\nArchitecture\n28x28-800-10\n\nRueckauer et al. [25] SNN converted from standard ANN\n\nDataset\n\nMNIST\n\nNMNIST\n\nDVS Gesture\n\nTIDIGITS\n\nLee et al. [13]\n\nSLAYER\n\nLee et al. [13]\nSKIM [30]\nDART [31]\nSLAYER\nSLAYER\n\nTrueNorth [32]\n\nSLAYER\n\nSOM-SNN [33]\n\nTavanaei et al. [34]\n\nSLAYER\n\n28x28-12c5-2a-64c5-2a-10o\n\n34x34x2-800-10\n\n34x34x2-10000-10\n\nDART feature descriptor\n34x34x2-500-500-10\n\n34x34x2-12c5-2a-64c5-2a-10o\n\nSNN (16 layers)\nSNN (8 layers)\n\nMFCC-SOM-SNN\n\nSpiking CNN and HMM\n\nMFCC-SOM, 484-500-500-11\n\nAccuracy\n99.31%\n99.44%\n\n99.36 \u00b1 0.05%\n\n98.66%\n92.87%\n97.95%\n\n98.89 \u00b1 0.06%\n99.20 \u00b1 0.02%\n91.77% (94.59%)\n93.64 \u00b1 0.49%\n\n97.6%\n96.00%\n\n99.09 \u00b1 0.13%\n\n4.2 MNIST Digit Classi\ufb01cation\n\nMNIST is a popular machine learning dataset. The task is to classify an image containing a single\ndigit. This dataset is a standard benchmark to test the performance of a learning algorithm. Since\nSLAYER is a spike based learning algorithm, the images are converted into spike trains spanning\n25 ms using Generalized Integrate and Fire Model of neuron [35]. Standard split of 60,000 training\nsamples and 10,000 testing samples was used with no data augmentation. For classi\ufb01cation, we use\nthe spike counting strategy. During training, we specify a target of 20 spikes for the true neuron and 5\nspikes for each false neuron over the 25 ms period. During testing, the output class is the class which\ngenerates the highest spike count.\nThe classi\ufb01cation accuracy of SLAYER for MNIST classi\ufb01cation is listed in Table 1 along with other\nSNN based approaches. We achieve testing accuracy of 99.36% on the network which is the best\nresult for completely SNN based learning. Although this accuracy does not fare well with state of the\nart deep learning methods, for an SNN based approach it is a commendable result.\n\n4.3 NMNIST Digit Classi\ufb01cation\n\nThe NMNIST dataset [36] consists of MNIST images converted into a spiking dataset using a\nDynamic Vision Sensor (DVS) moving on a pan-tilt unit. Each dataset sample is 300 ms long, and\n34\u00d734 pixels big, containing both \u2018on\u2019 and \u2018off\u2019 spikes. This dataset is harder than MNIST because\none has to deal with saccadic motion. For NMNIST training, we use a target of 10 spikes for each\nfalse class neuron and 60 spikes for the true class neuron. The output class is the one with greater\nspike count. The training and testing separation is the same as the standard MNIST split of 60,000\ntraining samples and 10,000 testing samples. The NMNIST data was not stabilized before feeding to\nthe network.\nThe results on NMNIST classi\ufb01cation listed in Table 1 show that SLAYER learning surpasses the\ncurrent reported state of the art result on NMNIST dataset by Lee et. al. [13] with a comparable\nnumber of neurons. However, the CNN architecture trained with SLAYER achieves the best result.\n\n4.4 DVS Gesture Classi\ufb01cation\n\nThe DVS Gesture [32] dataset consists of recordings of 29 different individuals performing 10\ndifferent actions such as clapping, hand wave etc. The actions are recorded using a DVS camera\n\n7\n\n\funder three different lighting conditions. The problem is to classify the action sequence video into an\naction label. The dataset allows us to test SLAYER on a temporal task. For training we set a target\nspike count of 30 for false class neurons and 180 for the true class neuron. Samples from the \ufb01rst 23\nsubjects were used for training and last 6 subjects were used for testing.\nThe results for DVS Gesture classi\ufb01cation are listed in Table 1. SLAYER achieves a very good testing\naccuracy of 93.64% on average. In SLAYER training as well as testing, only the \ufb01rst 1.5 s out of \u2248 6\ns of action video for each class were used to classify the actions. For speed reasons, the SNN was\nsimulated with a temporal resolution of 5 ms. Despite these shortcomings, the accuracy results are\nexcellent, surpassing the testing accuracy of TrueNorth trained with EEDN [32]. With output \ufb01ltering,\nthe TrueNorth accuracy can be increased to 94.59%. Nevertheless, SLAYER is able to classify with a\nsigni\ufb01cantly less number of neurons and layers. The TrueNorth approach uses additional neurons\nbefore the CNN classi\ufb01er for pre-processing, whereas in SLAYER, the spike data from the DVS is\ndirectly fed into the classi\ufb01er.\n\n4.5 TIDIGITS Classi\ufb01cation\n\nTIDIGITS [37] is an audio classi\ufb01cation dataset containing audio signals corresponding to digit\nutterances from \u2018zero\u2019 to \u2018nine\u2019 and \u2018oh\u2019. In this paper, we use audio data converted to spikes using\nthe MFCC transform followed by a Self Organizing Map (SOM) as described in [33].For training, we\nspecify a target of 5 spikes for false classes and 20 spikes for the true class. The dataset was split into\n3950 training samples and 1000 testing samples.\nThe results for TIDIGITS classi\ufb01cation are listed in Table 1. SLAYER signi\ufb01cantly improves upon\nthe testing accuracy results of SNN based approach using SOM-SNN [33] on the same encoded spike\ninputs. However, the best reported accuracy for TIDIGITS classi\ufb01cation 99.7% [38] is using MFCC\nand HMM-GMM approach (non spiking). The accuracy of SLAYER, however, is still competitive at\n99.09%.\n\n5 Discussion\n\nWe have proposed a new error backpropagation for SNNs which properly considers the temporal\ndependency between input and output signals of a spiking neuron, handles the non-differentiable\nnature of the spike function, and is not prone to the dead neuron problem. The result is SLAYER,\na learning algorithm for learning both weight and axonal delay parameters in an SNN. We have\ndemonstrated SLAYER\u2019s effectiveness in achieving state of the art accuracy for an SNN on spoken\ndigit and visual digit recognition as well as visual action recognition.\nDuring training, we require both true and false neurons to \ufb01re, but specify a much higher spike count\ntarget for the true class neuron. This approach prevents neurons from going dormant and they easily\nlearn to \ufb01re more frequently again when required. The desired spike count was chosen to be roughly\nproportional to the simulation interval.\nWith proper scaling factor of surrogate function (15), vanishing or exploding gradients are not an\nissue in SLAYER for any of the networks we have trained thus far, and we believe that SLAYER can\nbe used on even deeper networks.\nThe temporal error credit assignment and axonal delay do increase the computational complexity\nrequirements, respectively comprising 8.03% and 2.55% of the computation time for the training of\nfully connected NMNIST network, the computational overhead is not signi\ufb01cant.\nWe believe that SLAYER is an important contribution towards efforts to implement backpropagation in\nan SNN. The development of a CUDA accelerated learning framework for SLAYER was instrumental\nin tackling bigger datasets in SNN domain, although they are still not big when compared to the huge\ndatasets tackled by conventional (non-spiking) deep learning.\nNeuromorphic hardware such as TrueNorth [4], SpiNNaker [5], Intel Loihi [6] show the potential of\nimplementing large spiking neural networks in an extremely low power chip. These chips usually\ndo not have learning mechanism, or have a primitive learning mechanism built into them. Learning\nmust typically be done of\ufb02ine. SLAYER has good potential to serve as an of\ufb02ine training system to\ncon\ufb01gure a network before deploying it to a chip.\n\n8\n\n\fReferences\n[1] Wolfgang Maass. Lower bounds for the computational power of networks of spiking neurons. Neural\n\nComputation, 8(1):1\u201340, January 1996.\n\n[2] Wolfgang Maass. Noisy spiking neurons with temporal coding have more computational power than\nsigmoidal neurons. In Michael Mozer, Michael I. Jordan, and Thomas Petsche, editors, Advances in Neural\nInformation Processing Systems 9, NIPS, Denver, CO, USA, December 2-5, 1996, pages 211\u2013217. MIT\nPress, 1996.\n\n[3] Wolfgang Maass and Michael Schmitt. On the complexity of learning for a spiking neuron. In Proceedings\n\nof the tenth annual conference on Computational learning theory, pages 54\u201361. ACM, 1997.\n\n[4] Paul A. Merolla, John V. Arthur, Rodrigo Alvarez-Icaza, Andrew S. Cassidy, Jun Sawada, Filipp Akopyan,\nBryan L. Jackson, Nabil Imam, Chen Guo, Yutaka Nakamura, Bernard Brezzo, Ivan Vo, Steven K. Esser,\nRathinakumar Appuswamy, Brian Taba, Arnon Amir, Myron D. Flickner, William P. Risk, Rajit Manohar,\nand Dharmendra S. Modha. A million spiking-neuron integrated circuit with a scalable communication\nnetwork and interface. Science, 345(6197):668\u2013673, 2014.\n\n[5] Steve B. Furber, Francesco Galluppi, Steve Temple, and Luis A. Plana. The spinnaker project. Proceedings\n\nof the IEEE, 102(5):652\u2013665, 2014.\n\n[6] Mike Davies, Narayan Srinivasa, Tsung-Han Lin, Gautham Chinya, Yongqiang Cao, Sri Harsha Choday,\nGeorgios Dimou, Prasad Joshi, Nabil Imam, Shweta Jain, et al. Loihi: A neuromorphic manycore processor\nwith on-chip learning. IEEE Micro, 38(1):82\u201399, 2018.\n\n[7] Filip Ponulak and Andrzej Kasinski. Supervised learning in spiking neural networks with ReSuMe:\nSequence learning, classi\ufb01cation, and spike shifting. Neural Computation, 22(2):467\u2013510, October 2009.\n[8] Ammar Mohemmed, Stefan Schliebs, Satoshi Matsuda, and Nikola Kasabov. Span: Spike pattern\nassociation neuron for learning spatio-temporal spike patterns. International Journal of Neural Systems,\n22(04):1250012, 2012. PMID: 22830962.\n\n[9] Robert G\u00fctig and Haim Sompolinsky. The tempotron: a neuron that learns spike timing\u2013based decisions.\n\nNature neuroscience, 9(3):420\u2013428, 2006.\n\n[10] Sander M. Bohte, Joost N. Kok, and Han La Poutre. Error-backpropagation in temporally encoded networks\n\nof spiking neurons. Neurocomputing, 48(1):17\u201337, 2002.\n\n[11] Sumit Bam Shrestha and Qing Song. Robust spike-train learning in spike-event based weight update.\n\nNeural Networks, 96:33 \u2013 46, 2017.\n\n[12] Priyadarshini Panda and Kaushik Roy. Unsupervised regenerative learning of hierarchical features in\nspiking deep networks for object recognition. In 2016 International Joint Conference on Neural Networks\n(IJCNN), pages 299\u2013306, July 2016.\n\n[13] Jun Haeng Lee, Tobi Delbruck, and Michael Pfeiffer. Training deep spiking neural networks using\n\nbackpropagation. Frontiers in Neuroscience, 10:508, 2016.\n\n[14] Friedemann Zenke and Surya Ganguli. Superspike: Supervised learning in multi-layer spiking neural\n\nnetworks. arXiv preprint arXiv:1705.11146, 2017.\n\n[15] Benjamin Schrauwen and Jan Van Campenhout. Extending SpikeProp.\n\nIn Neural Networks, 2004.\n\nProceedings. 2004 IEEE International Joint Conference on, volume 1. IEEE, 2004.\n\n[16] Aboozar Taherkhani, Ammar Belatreche, Yuhua Li, and Liam P Maguire. DL-ReSuMe: a delay learning-\nbased remote supervised method for spiking neurons. IEEE transactions on neural networks and learning\nsystems, 26(12):3137\u20133149, 2015.\n\n[17] Wulfram Gerstner. Time structure of the activity in neural network models. Phys. Rev. E, 51:738\u2013758, Jan\n\n1995.\n\n[18] Steve K. Esser, Rathinakumar Appuswamy, Paul Merolla, John V. Arthur, and Dharmendra S. Modha.\nBackpropagation for energy-ef\ufb01cient neuromorphic computing. In C. Cortes, N. D. Lawrence, D. D. Lee,\nM. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages\n1117\u20131125. Curran Associates, Inc., 2015.\n\n[19] Eric Hunsberger and Chris Eliasmith. Spiking deep networks with LIF neurons. CoRR, abs/1510.08829,\n\n2015.\n\n[20] Steven K. Esser, Paul A. Merolla, John V. Arthur, Andrew S. Cassidy, Rathinakumar Appuswamy, Alexan-\nder Andreopoulos, David J. Berg, Jeffrey L. McKinstry, Timothy Melano, Davis R. Barch, Carmelo\ndi Nolfo, Pallab Datta, Arnon Amir, Brian Taba, Myron D. Flickner, and Dharmendra S. Modha. Convolu-\ntional networks for fast, energy-ef\ufb01cient neuromorphic computing. Proceedings of the National Academy\nof Sciences, 113(41):11441\u201311446, 2016.\n\n[21] Peter O\u2019Connor, Daniel Neil, Shih-Chii Liu, Tobi Delbruck, and Michael Pfeiffer. Real-time classi\ufb01cation\n\nand sensor fusion with a spiking deep belief network. Frontiers in neuroscience, 7:178, 2013.\n\n9\n\n\f[22] Qian Liu, Yunhua Chen, and Steve B. Furber. Noisy softplus: an activation function that enables snns to be\n\ntrained as anns. CoRR, abs/1706.03609, 2017.\n\n[23] Peter U. Diehl, Daniel Neil, Jonathan Binas, Matthew Cook, Shih-Chii Liu, and Michael Pfeiffer. Fast-\nIn 2015\n\nclassifying, high-accuracy spiking deep networks through weight and threshold balancing.\nInternational Joint Conference on Neural Networks (IJCNN), pages 1\u20138. IEEE, 2015.\n\n[24] Peter U. Diehl, Bruno U. Pedroni, Andrew S. Cassidy, Paul Merolla, Emre Neftci, and Guido Zarrella.\nTruehappiness: Neuromorphic emotion recognition on truenorth. In 2016 International Joint Conference\non Neural Networks (IJCNN), pages 4278\u20134285, July 2016.\n\n[25] Bodo Rueckauer, Iulia-Alexandra Lungu, Yuhuang Hu, Michael Pfeiffer, and Shih-Chii Liu. Conversion of\ncontinuous-valued deep networks to ef\ufb01cient event-driven networks for image classi\ufb01cation. Frontiers in\nNeuroscience, 11:682, 2017.\n\n[26] Sam McKennoch, Dingding Liu, and Linda G Bushnell. Fast modi\ufb01cations of the SpikeProp algorithm. In\n\nNeural Networks, 2006. IJCNN\u201906. International Joint Conference on, pages 3970\u20133977. IEEE, 2006.\n\n[27] Justin Dauwels, Fran\u00e7ois Vialatte, Theophane Weber, and Andrzej Cichocki. On similarity measures for\n\nspike trains. In Advances in Neuro-Information Processing, pages 177\u2013185. Springer, 2008.\n\n[28] Wulfram Gerstner and Werner M Kistler. Spiking neuron models: Single neurons, populations, plasticity.\n\nCambridge university press, 2002.\n\n[29] Renaud Jolivet, J Timothy, and Wulfram Gerstner. The spike response model: a framework to predict\nneuronal spike trains. In Arti\ufb01cial Neural Networks and Neural Information Processing\u2013ICANN/ICONIP\n2003, pages 846\u2013853. Springer, 2003.\n\n[30] Gregory K. Cohen, Garrick Orchard, Sio-Hoi Leng, Jonathan Tapson, Ryad B. Benosman, and Andre van\nSchaik. Skimming digits: Neuromorphic classi\ufb01cation of spike-encoded images. Frontiers in Neuroscience,\n10:184, 2016.\n\n[31] Bharath Ramesh, Hong Yang, Garrick Orchard, Ngoc Anh Le Thi, and Cheng Xiang. DART: distribution\n\naware retinal transform for event-based cameras. CoRR, abs/1710.10800, 2017.\n\n[32] Arnon Amir, Brian Taba, David Berg, Timothy Melano, Jeffrey McKinstry, Carmelo di Nolfo, Tapan\nNayak, Alexander Andreopoulos, Guillaume Garreau, Marcela Mendoza, Jeff Kusnitz, Michael Debole,\nSteve Esser, Tobi Delbruck, Myron Flickner, and Dharmendra Modha. A low power, fully event-based\ngesture recognition system. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR),\nJuly 2017.\n\n[33] Jibin Wu, Yansong Chua, and Haizhou Li. A biologically plausible speech recognition framework based\non spiking neural networks. In International Joint Conference on Neural Networks (IJCNN), pages 1\u20138,\n2018 (Accepted).\n\n[34] Amirhossein Tavanaei and Anthony Maida. Bio-inspired multi-layer spiking neural network extracts\ndiscriminative features from speech signals. In Derong Liu, Shengli Xie, Yuanqing Li, Dongbin Zhao,\nand El-Sayed M. El-Alfy, editors, Neural Information Processing, pages 899\u2013908, Cham, 2017. Springer\nInternational Publishing.\n\n[35] Jonathan W. Pillow, Liam Paninski, Valerie J. Uzzell, Eero P. Simoncelli, and E. J. Chichilnisky. Prediction\nand decoding of retinal ganglion cell responses with a probabilistic spiking model. The Journal of\nNeuroscience, 25(47):11003\u201311013, 2005.\n\n[36] Garrick Orchard, Ajinkya Jayawant, Gregory K. Cohen, and Nitish Thakor. Converting static image\n\ndatasets to spiking neuromorphic datasets using saccades. Frontiers in Neuroscience, 9:437, 2015.\n\n[37] R Gary Leonard and George Doddington. Tidigits speech corpus. Texas Instruments, Inc, 1993.\n[38] M. Abdollahi and S. C. Liu. Speaker-independent isolated digit recognition using an aer silicon cochlea.\n\nIn 2011 IEEE Biomedical Circuits and Systems Conference (BioCAS), pages 269\u2013272, Nov 2011.\n\n10\n\n\f", "award": [], "sourceid": 725, "authors": [{"given_name": "Sumit", "family_name": "Shrestha", "institution": "Temasek Laboratories @ National University of Singapore"}, {"given_name": "Garrick", "family_name": "Orchard", "institution": "National University of Singapore"}]}