{"title": "Long short-term memory and Learning-to-learn in networks of spiking neurons", "book": "Advances in Neural Information Processing Systems", "page_first": 787, "page_last": 797, "abstract": "Recurrent networks of spiking neurons (RSNNs) underlie the astounding computing and learning capabilities of the brain. But computing and learning capabilities of RSNN models have remained poor, at least in comparison with ANNs. We address two possible reasons for that. One is that RSNNs in the brain are not randomly connected or designed according to simple rules, and they do not start learning as a tabula rasa network. Rather, RSNNs in the brain were optimized for their tasks through evolution, development, and prior experience. Details of these optimization processes are largely unknown. But their functional contribution can be approximated through powerful optimization methods, such as backpropagation through time (BPTT). \n\nA second major mismatch between RSNNs in the brain and models is that the latter only show a small fraction of the dynamics of neurons and synapses in the brain. We include neurons in our RSNN model that reproduce one prominent dynamical process of biological neurons that takes place at the behaviourally relevant time scale of seconds: neuronal adaptation. We denote these networks as LSNNs because of their Long short-term memory. The inclusion of adapting neurons drastically increases the computing and learning capability of RSNNs if they are trained and configured by deep learning (BPTT combined with a rewiring algorithm that optimizes the network architecture). In fact, the computational performance of these RSNNs approaches for the first time that of LSTM networks. In addition RSNNs with adapting neurons can acquire abstract knowledge from prior learning in a Learning-to-Learn (L2L) scheme, and transfer that knowledge in order to learn new but related tasks from very few examples. We demonstrate this for supervised learning and reinforcement learning.", "full_text": "Long short-term memory and learning-to-learn in\n\nnetworks of spiking neurons\n\nGuillaume Bellec*, Darjan Salaj*, Anand Subramoney*, Robert Legenstein & Wolfgang Maass\n\nInstitute for Theoretical Computer Science\n\n{bellec,salaj,subramoney,legenstein,maass}@igi.tugraz.at\n\nGraz University of Technology, Austria\n\n* equal contributions\n\nAbstract\n\nRecurrent networks of spiking neurons (RSNNs) underlie the astounding comput-\ning and learning capabilities of the brain. But computing and learning capabilities\nof RSNN models have remained poor, at least in comparison with arti\ufb01cial neural\nnetworks (ANNs). We address two possible reasons for that. One is that RSNNs\nin the brain are not randomly connected or designed according to simple rules,\nand they do not start learning as a tabula rasa network. Rather, RSNNs in the\nbrain were optimized for their tasks through evolution, development, and prior\nexperience. Details of these optimization processes are largely unknown. But\ntheir functional contribution can be approximated through powerful optimization\nmethods, such as backpropagation through time (BPTT).\nA second major mismatch between RSNNs in the brain and models is that the\nlatter only show a small fraction of the dynamics of neurons and synapses in\nthe brain. We include neurons in our RSNN model that reproduce one promi-\nnent dynamical process of biological neurons that takes place at the behaviourally\nrelevant time scale of seconds: neuronal adaptation. We denote these networks\nas LSNNs because of their Long short-term memory. The inclusion of adapting\nneurons drastically increases the computing and learning capability of RSNNs if\nthey are trained and con\ufb01gured by deep learning (BPTT combined with a rewiring\nalgorithm that optimizes the network architecture). In fact, the computational per-\nformance of these RSNNs approaches for the \ufb01rst time that of LSTM networks.\nIn addition RSNNs with adapting neurons can acquire abstract knowledge from\nprior learning in a Learning-to-Learn (L2L) scheme, and transfer that knowledge\nin order to learn new but related tasks from very few examples. We demonstrate\nthis for supervised learning and reinforcement learning.\n\n1\n\nIntroduction\n\nRecurrent networks of spiking neurons (RSNNs) are frequently studied as models for networks of\nneurons in the brain. In principle, they should be especially well-suited for computations in the\ntemporal domain, such as speech processing, as their computations are carried out via spikes, i.e.,\nevents in time and space. But the performance of RSNN models has remained suboptimal also for\ntemporal processing tasks. One difference between RSNNs in the brain and RSNN models is that\nRSNNs in the brain have been optimized for their function through long evolutionary processes,\ncomplemented by a sophisticated learning curriculum during development. Since most details of\nthese biological processes are currently still unknown, we asked whether deep learning is able to\nmimic these complex optimization processes on a functional level for RSNN models. We used\nBPTT as the deep learning method for network optimization. Backpropagation has been adapted\npreviously for feed forward networks with binary activations in [1, 2], and we adapted BPTT to work\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00b4eal, Canada.\n\n\fin a similar manner for RSNNs. In order to also optimize the connectivity of RSNNs, we augmented\nBPTT with DEEP R, a biologically inspired heuristic for synaptic rewiring [3, 4]. Compared to\nLSTM networks, RSNNs tend to have inferior short-term memory capabilities. Since neurons in the\nbrain are equipped with a host of dynamics processes on time scales larger than a few dozen ms [5],\nwe enriched the inherent dynamics of neurons in our model by a standard neural adaptation process.\nWe \ufb01rst show (section 4) that this approach produces new computational performance levels of\nRSNNs for two common benchmark tasks: Sequential MNIST and TIMIT (a speech processing\ntask). We then show that it makes L2L applicable to RSNNs (section 5), similarly as for LSTM\nnetworks. In particular, we show that meta-RL [6, 7] produces new motor control capabilities of\nRSNNs (section 6). This result links a recent abstract model for reward-based learning in the brain\n[8] to spiking activity. In addition, we show that RSNNs with sparse connectivity and sparse \ufb01ring\nactivity of 10-20 Hz (see Fig. 1D, 2D, S1C) can solve these and other tasks. Hence these RSNNs\ncompute with spikes, rather than \ufb01ring rates.\nThe superior computing and learning capabilities of LSNNs suggest that they are also of interest for\nimplementation in spike-based neuromorphic chips such as Brainscales [9], SpiNNaker [10], True\nNorth [2], chips from ETH Z\u00a8urich [11], and Loihi [12]. In particular, nonlocal learning rules such\nas backprop are challenges for some of these neuromorphic devices (and for many brain models).\nHence alternative methods for RSNN learning of nonlinear functions are needed. We show in sec-\ntions 5 and 6 that L2L can be used to generate RSNNs that learn very ef\ufb01ciently even in the absence\nof synaptic plasticity.\nRelation to prior work: We refer to [13, 14, 15, 16] for summaries of preceding results on compu-\ntational capabilities of RSNNs. The focus there was typically on the generation of dynamic patterns.\nSuch tasks are not addressed in this article, but it will be shown in [17] that LSNNs provide an al-\nternative model to [16] for the generation of complex temporal patterns. Huh et al. [15] applied\ngradient descent to recurrent networks of spiking neurons. There, neurons without a leak were used.\nHence, the voltage of a neuron could used in that approach to store information over an unlimited\nlength of time.\nWe are not aware of previous attempts to bring the performance of RSNNs for time series classi\ufb01ca-\ntion into the performance range of LSTM networks. We are also not aware of any previous literature\non applications of L2L to SNNs.\n\n2 LSNN model\n\nNeurons and synapses in common RSNN models are missing many of the dynamic processes found\nin their biological counterparts, especially those on larger time scales. We integrate one of them\ninto our RSNN model: neuronal adaptation. It is well known that a substantial fraction of excita-\ntory neurons in the brain are adapting, with diverse time constants, see e.g. the Allen Brain Atlas\nfor data from the neocortex of mouse and humans. We refer to the resulting type of RSNNs as\nLong short-term memory Spiking Neural Networks (LSNNs). LSNNs consist of a population R\nof integrate-and-\ufb01re (LIF) neurons (excitatory and inhibitory), and a second population A of LIF\nexcitatory neurons whose excitability is temporarily reduced through preceding \ufb01ring activity, i.e.,\nthese neurons are adapting (see Fig. 1C and Suppl.). Both populations R and A receive spike trains\nfrom a population X of external input neurons. Results of computations are read out by a population\nY of external linear readout neurons, see Fig. 1C.\nCommon ways for \ufb01tting models for adapting neurons to data are described in [18, 19, 20, 21]. We\nare using here the arguably simplest model: We assume that the \ufb01ring threshold Bj(t) of neuron j\nincreases by some \ufb01xed amount \u03b2/\u03c4a,j for each spike of this neuron j, and then decays exponentially\nback to a baseline value b0\nj with a time constant \u03c4a,j. Thus the threshold dynamics for a discrete\ntime step of \u03b4t = 1 ms reads as follows\n\nBj(t) = b0\n\nj + \u03b2bj(t),\n\nbj(t + \u03b4t) = \u03c1jbj(t) + (1 \u2212 \u03c1j)zj(t),\n\n(1)\n(2)\nwhere \u03c1j = exp(\u2212 \u03b4t\n\u03b4t}. Note\nthat this dynamics of thresholds of adaptive spiking neurons is similar to the dynamics of the state\nof context neurons in [22]. It generally suf\ufb01ces to place the time constant of adapting neurons into\nthe desired range for short-term memory (see Suppl. for speci\ufb01c values used in each experiment).\n\n) and zj(t) is the spike train of neuron j assuming values in {0, 1\n\n\u03c4a,j\n\n2\n\n\f3 Applying BPTT with DEEP R to RSNNs and LSNNs\n\nWe optimize the synaptic weights, and in some cases also the connectivity matrix of an LSNN for\nspeci\ufb01c ranges of tasks. The optimization algorithm that we use, backpropagation through time\n(BPTT), is not claimed to be biologically realistic. But like evolutionary and developmental pro-\ncesses, BPTT can optimize LSNNs for speci\ufb01c task ranges. Backpropagation (BP) had already been\napplied in [1] and [2] to feedforward networks of spiking neurons. In these approaches, the gradient\nis backpropagated through spikes by replacing the non-existent derivative of the membrane potential\nat the time of a spike by a pseudo-derivative that smoothly increases from 0 to 1, and then decays\nback to 0. We reduced (\u201cdampened\u201d) the amplitude of the pseudo-derivative by a factor < 1 (see\nSuppl. for details). This enhances the performance of BPTT for RSNNs that compute during larger\ntime spans, that require backpropagation through several 1000 layers of an unrolled feedforward\nnetwork of spiking neurons. A similar implementation of BPTT for RSNNs was proposed in [15]. It\nis not yet clear which of these two versions of BPTT work best for a given task and a given network.\nIn order to optimize not only the synaptic weights of a RSNN but also its connectivity matrix, we\nintegrated BPTT with the biologically inspired [3] rewiring method DEEP R [4] (see Suppl. for\ndetails). DEEP R converges theoretically to an optimal network con\ufb01guration by continuously up-\ndating the set of active connections [23, 3, 4].\n\n4 Computational performance of LSNNs\n\nSequential MNIST: We tested the performance of LSNNs on a standard benchmark task that re-\nquires continuous updates of short term memory over a long time span: sequential MNIST [24, 25].\nWe compare the performance of LSNNs with that of LSTM networks. The size of the LSNN, in the\ncase of full connectivity, was chosen to match the number of parameters of the LSTM network. This\nled to 120 regular spiking and 100 adaptive neurons (with adaptation time constant \u03c4a of 700 ms) in\ncomparison to 128 LSTM units. Actually it turned out that the sparsely connected LSNN shown in\nFig. 1C, which was generated by including DEEP R in BPTT, had only 12% of the synaptic connec-\ntions but performed better than the fully connected LSNN (see \u201cDEEP R LSNN\u201d versus \u201cLSNN\u201d in\nFig. 1B).\nThe task is to classify the handwritten digits of the MNIST dataset when the pixels of each hand-\nwritten digit are presented sequentially, one after the other in 784 steps, see Fig. 1A. After each\npresentation of a handwritten digit, the network is required to output the corresponding class. The\ngrey values of pixels were given directly to arti\ufb01cial neural networks (ANNs), and encoded by spikes\nfor RSNNs. We considered both the case of step size 1 ms (requiring 784 ms for presenting the in-\nput image) and 2 ms (requiring 1568 ms for each image, the adaptation time constant \u03c4a was set to\n1400 ms in this case, see Fig. 1B.). The top row of Fig. 1D shows a version where the grey value of\nthe currently presented pixel is encoded by population coding through the \ufb01ring probability of the\n80 input neurons. Somewhat better performance was achieved when each of the 80 input neurons\nis associated with a particular threshold for the grey value, and this input neuron \ufb01res whenever the\ngrey value crosses its threshold in the transition from the previous to the current pixel (this input\nconvention is chosen for the SNN results of Fig. 1B). In either case, an additional input neuron be-\ncomes active when the presentation of the 784 pixel values is \ufb01nished, in order to prompt an output\nfrom the network. The \ufb01ring of this additional input neuron is shown at the top right of the top panel\nof Fig. 1D. The softmax of 10 linear output neurons Y is trained through BPTT to produce, during\nthis time segment, the label of the sequentially presented handwritten digit. We refer to the yellow\nshading around 800 ms of the output neuron for label 3 in the plot of the dynamics of the output\nneurons Y in Fig. 1D. This output was correct.\nA performance comparison is given in Fig. 1B. LSNNs achieve 94.7% and 96.4% classi\ufb01cation\naccuracy on the test set when every pixel is presented for 1 and 2ms respectively. An LSTM network\nachieves 98.5% and 98.0% accuracy on the same task setups. The LIF and RNN bars in Fig. 1B show\nthat this accuracy is out of reach for BPTT applied to spiking or nonspiking neural networks without\nenhanced short term memory capabilities. We observe that in the sparse architecture discovered by\nDEEP R, the connectivity onto the readout neurons Y is denser than in the rest of the network (see\nFig. 1C). Detailed results are given in the supplement.\n\n3\n\n\fFigure 1: Sequential MNIST. A The task is to classify images of handwritten digits when the\npixels are shown sequentially pixel by pixel, in a \ufb01xed order row by row. B The performance\nof RSNNs is tested for three different setups: without adapting neurons (LIF), a fully connected\nLSNN, and an LSNN with randomly initialized connectivity that was rewired during training (DEEP\nR LSNN). For comparison, the performance of two ANNs, a fully connected RNN and an LSTM\nnetwork are also shown. C Connectivity (in terms of connection probabilities between and within\nthe 3 subpopulations) of the LSNN after applying DEEP R in conjunction with BPTT. The input\npopulation X consisted of 60 excitatory and 20 inhibitory neurons. Percentages on the arrows from\nX indicate the average connection probabilities from excitatory and inhibitory neurons. D Dynamics\nof the LSNN after training when the input image from A was sequentially presented. From top to\nbottom: spike rasters from input neurons (X), and random subsets of excitatory (E) and inhibitory (I)\nregularly spiking neurons, and adaptive neurons (A), dynamics of the \ufb01ring thresholds of a random\nsample of adaptive neurons; activation of softmax readout neurons.\n\nSpeech recognition (TIMIT): We also tested the performance of LSNNs for a real-world speech\nrecognition task, the TIMIT dataset. A thorough study of the performance of many variations of\nLSTM networks on TIMIT has recently been carried out in [26]. We used exactly the same setup\nwhich was used there (framewise classi\ufb01cation) in order to facilitate comparison. We found that\na standard LSNN consisting of 300 regularly \ufb01ring (200 excitatory and 100 inhibitory) and 100\nexcitatory adapting neurons with an adaptation time constant of 200 ms, and with 20% connection\nprobability in the network, achieved a classi\ufb01cation error of 33.2%. This error is below the mean\nerror around 40% from 200 trials with different hyperparameters for the best performing (and most\ncomplex) version of LSTMs according to Fig. 3 of [26], but above the mean of 29.7% of the 20\nbest performing choices of hyperparameters for these LSTMs. The performance of the LSNN was\nhowever somewhat better than the error rates achieved in [26] for a less complex version of LSTMs\nwithout forget gates (mean of the best 20 trials: 34.2%).\nWe could not perform a similarly rigorous search over LSNN architectures and meta-parameters\nas was carried out in [26] for LSTMs. But if all adapting neurons are replaced by regularly \ufb01ring\nexcitatory neurons one gets a substantially higher error rate than the LSNN with adapting neurons:\n37%. Details are given in the supplement.\n\n4\n\n\f5 LSNNs learn-to-learn from a teacher\n\nOne likely reason why learning capabilities of RSNN models have remained rather poor is that one\nusually requires a tabula rasa RSNN model to learn. In contrast, RSNNs in the brain have been\noptimized through a host of preceding processes, from evolution to prior learning of related tasks,\nfor their learning performance. We emulate a similar training paradigm for RSNNs using the L2L\nsetup. We explore here only the application of L2L to LSNNs, but L2L can also be applied to\nRSNNs without adapting neurons [27]. An application of L2L to LSNNs is tempting, since L2L\nis most commonly applied in machine learning to their ANN counterparts: LSTM networks see\ne.g. [6, 7]. LSTM networks are especially suited for L2L since they can accommodate two levels\nof learning and representation of learned insight: Synaptic connections and weights can encode,\non a higher level, a learning algorithm and prior knowledge on a large time-scale. The short-term\nmemory of an LSTM network can accumulate, on a lower level of learning, knowledge during the\ncurrent learning task. It has recently been argued [8] that the pre-frontal cortex (PFC) similarly\naccumulates knowledge during fast reward-based learning in its short-term memory, without using\ndopamine-gated synaptic plasticity, see the text to Suppl. Fig. 3 in [8]. The experimental results of\n[28] suggest also a prominent role of short-term memory for fast learning in the motor cortex.\nThe standard setup of L2L involves a large, in fact in general in\ufb01nitely large, family F of learning\ntasks C. Learning is carried out simultaneously in two loops (see Fig. 2A). The inner loop learning\ninvolves the learning of a single task C by a neural network N , in our case by an LSNN. Some\nparameters of N (termed hyper-parameters) are optimized in an outer loop optimization to support\nfast learning of a randomly drawn task C from F. The outer loop training \u2013 implemented here\nthrough BPTT \u2013 proceeds on a much larger time scale than the inner loop, integrating performance\nevaluations from many different tasks C of the family F. One can interpret this outer loop as\na process that mimics the impact of evolutionary and developmental optimization processes, as\nwell as prior learning, on the learning capability of brain networks. We use the terms training and\noptimization interchangeably, but the term training is less descriptive of the longer-term evolutionary\nprocesses we mimic. Like in [29, 6, 7] we let all synaptic weights of N belong to the set of hyper-\nparameters that are optimized through the outer loop. Hence the network is forced to encode all\nresults from learning the current task C in its internal state, in particular in its \ufb01ring activity and\nthe thresholds of adapting neurons. Thus the synaptic weights of the neural network N are free to\nencode an ef\ufb01cient algorithm for learning arbitrary tasks C from F.\nWhen the brain learns to predict sensory inputs, or state changes that result from an action, this\ncan be formalized as learning from a teacher (i.e., supervised learning). The teacher is in this case\nthe environment, which provides \u2013 often with some delay \u2013 the target output of a network. The\nL2L results of [29] show that LSTM networks can learn nonlinear functions from a teacher without\nmodifying their synaptic weights, using their short-term memory instead. We asked whether this\nform of learning can also be attained by LSNNs.\nTask: We considered the task of learning complex non-linear functions from a teacher. Speci\ufb01cally,\nwe chose as family F of tasks a class of continuous functions of two real-valued variables (x1, x2).\nThis class was de\ufb01ned as the family of all functions that can be computed by a 2-layer arti\ufb01cial\nneural network of sigmoidal neurons with 10 neurons in the hidden layer, and weights and biases\nfrom [-1, 1], see Fig. 2B. Thus overall, each such target network (TN) from F was de\ufb01ned through\n40 parameters in the range [-1, 1]: 30 weights and 10 biases. We gave the teacher input to the LSNN\nfor learning a particular TN C from F in a delayed manner as in [29]: The target output value was\ngiven after N had provided its guessed output value for the preceding input.\nThis delay of the feedback is consistent with biologically plausible scenarios. Simultaneously, hav-\ning a delay for the feedback prevents N from passing on the teacher value as output without \ufb01rst\nproducing a prediction on its own.\nImplementation: We considered a LSNN N consisting of 180 regularly \ufb01ring neurons (population\nR) and 120 adapting neurons (population A) with a spread of adaptation time constants sampled\nuniformly between 1 and 1000 ms and with full connectivity. Sparse connectivity in conjunction\nwith rewiring did not improve performance in this case. All neurons in the LSNN received input\nfrom a population X of 300 external input neurons. A linear readout received inputs from all neurons\nin R and A. The LSNN received a stream of 3 types of external inputs (see top row of Fig. 2D): the\nvalues of x1, x2, and of the output C(x(cid:48)\n2 (set to 0\n\n2) of the TN for the preceding input pair x(cid:48)\n\n1, x(cid:48)\n\n1, x(cid:48)\n\n5\n\n\fat the \ufb01rst trial), all represented through population coding in an external population of 100 spiking\nneurons. It produced outputs in the form of weighted spike counts during 20 ms windows from all\nneurons in the network (see bottom row of Fig. 2D), where the weights for this linear readout were\ntrained, like all weights inside the LSNN, in the outer loop, and remained \ufb01xed during learning of a\nparticular TN.\nThe training procedure in the outer loop of L2L was as follows: Network training was divided into\ntraining episodes. At the start of each training episode, a new target network TN was randomly cho-\nsen and used to generate target values C(x1, x2) \u2208 [0, 1] for randomly chosen input pairs (x1, x2).\n500 of these input pairs and targets were used as training data, and presented one per step to the\nLSNN during the episode, where each step lasted 20 ms. LSNN parameters were updated using\nBPTT to minimize the mean squared error between the LSNN output and the target in the training\nset, using gradients computed over batches of 10 such episodes, which formed one iteration of the\nouter loop. In other words, each weight update included gradients calculated on the input/target\npairs from 10 different TNs. This training procedure forced the LSNN to adapt its parameters in a\nway that supported learning of many different TNs, rather than specializing on predicting the output\nof single TN. After training, the weights of the LSNN remained \ufb01xed, and it was required to learn\nthe input/output behaviour of TNs from F that it had never seen before in an online manner by just\nusing its short-term memory and dynamics. See the suppl. for further details.\nResults: Most of the functions that are computed by TNs from the class F are nonlinear, as il-\nlustrated in Fig. 2G for the case of inputs (x1, x2) with x1 = x2. Hence learning the input/output\nbehaviour of any such TN with biologically realistic local plasticity mechanisms presents a daunting\nchallenge for a SNN. Fig. 2C shows that after a few thousand training iterations in the outer loop,\nthe LSNN achieves low MSE for learning new TNs from the family F, signi\ufb01cantly surpassing the\nperformance of an optimal linear approximator (linear regression) that was trained on all 500 pairs\nof inputs and target outputs, see orange curve in Fig. 2C,E. In view of the fact that each TN is de-\n\ufb01ned by 40 parameters, it comes at some surprise that the resulting network learning algorithm of\nthe LSNN for learning the input/output behaviour of a new TN produces in general a good approxi-\nmation of the TN after just 5 to 20 trials, where in each trial one randomly drawn labelled example\nis presented. One sample of a generic learning process is shown in Fig. 2D. Each sequence of exam-\nples evokes an internal model that is stored in the short-term memory of the LSNN. Fig. 2H shows\nthe fast evolution of internal models of the LSNN for the TN during the \ufb01rst trials (visualized for\na 1D subset of the 2D input space). We make the current internal model of the LSNN visible by\nprobing its prediction C(x1, x2) for hypothetical new inputs for evenly spaced points (x1, x2) in the\ndomain (without allowing it to modify its short-term memory; all other inputs advance the network\nstate according to the dynamics of the LSNN). One sees that the internal model of the LSNN is from\nthe beginning a smooth function, of the same type as the ones de\ufb01ned by the TNs in F. Within\na few trials this smooth function approximated the TN quite well. Hence the LSNN had acquired\nduring the training in the outer loop of L2L a prior for the types of functions that are to be learnt,\nthat was encoded in its synaptic weights. This prior was in fact quite ef\ufb01cient, since Fig. 2E and F\nshow that the LSNN was able to learn a TN with substantially fewer trials than a generic learning\nalgorithm for learning the TN directly in an arti\ufb01cial neural network as in Fig. 2A: BP with a prior\nthat favored small weights and biases (see end of Sec. 3 in suppl.). These results suggest that L2L\nis able to install some form of prior knowledge about the task in the LSNN. We conjectured that the\nLSNN \ufb01ts internal models for smooth functions to the examples it received.\nWe tested this conjecture in a second, much simpler, L2L scenario. Here the family F consisted of\nall sinus functions with arbitrary phase and amplitudes between 0.1 and 5. Fig. 2I shows that the\nLSNN also acquired an internal model for sinus functions (made visible analogously as in Fig. 2H)\nin this setup from training in the outer loop. Even when we selected examples in an adversarial\nmanner, which happened to be in a straight line, this did not disturb the prior knowledge of the\nLSNN.\nAltogether the network learning that was induced through L2L in the LSNNs is of particular interest\nfrom the perspective of the design of learning algorithms, since we are not aware of previously\ndocumented methods for installing structural priors for online learning of a recurrent network of\nspiking neurons.\n\n6\n\n\fFigure 2: LSNNs learn to learn from a teacher. A L2L scheme for an SNN N . B Architecture\nof the two-layer feed-forward target networks (TNs) used to generate nonlinear functions for the\nLSNN to learn; weights and biases were randomly drawn from [-1,1]. C Performance of the LSNN\nin learning a new TN during (left) and after (right) training in the outer loop of L2L. Performance is\ncompared to that of an optimal linear predictor \ufb01tted to the batch of all 500 experiments for a TN. D\nNetwork input (top row, only 100 of 300 neurons shown), internal spike-based processing with low\n\ufb01ring rates in the populations R and A (middle rows), and network output (bottom row) for 25 trials\nof 20 ms each. E Learning performance of the LSNN for 10 new TNs. Performance for a single TN\nis shown as insert, a red cross marks step 7 after which output predictions became very good for this\nTN. The spike raster for this learning process is the one depicted in C. Performance is compared to\nthat of an optimal linear predictor, which, for each example, is \ufb01tted to the batch of all preceding\nexamples. F Learning performance of BP for the same 10 TNs as in D, working directly on the\nANN from A, with a prior for small weights. G Sample input/output curves of TNs on a 1D subset\nof the 2D input space, for different weight and bias values. H These curves are all fairly smooth,\nlike the internal models produced by the LSNN while learning a particular TN. I Illustration of the\nprior knowledge acquired by the LSNN through L2L for another family F (sinus functions). Even\nadversarially chosen examples (Step 4) do not induce the LSNN to forget its prior.\n\n7\n\n\fFigure 3: Meta-RL results for an LSNN. A, B Performance improvement during training in the\nouter loop. C, D Samples of navigation paths produced by the LSNN before and after this training.\nBefore training, the agent performs a random walk (C). In this example it does not \ufb01nd the goal\nwithin the limited episode duration. After training (D), the LSNN had acquired an ef\ufb01cient explo-\nration strategy that uses two pieces of abstract knowledge: that the goal always lies on the border,\nand that the goal position is the same throughout an episode. Note that all synaptic weights of the\nLSNNs remained \ufb01xed after training.\n\n6 LSNNs learn-to-learn from reward\n\nWe now turn to an application of meta reinforcement learning (meta-RL) to LSNNs. In meta-RL,\nthe LSNN receives rewards instead of teacher inputs. Meta-RL has led to a number of remarkable\nresults for LSTM networks, see e.g. [6, 7]. In addition, [8] demonstrates that meta-RL provides a\nvery interesting perspective of reward-based learning in the brain. We focused on one of the more\nchallenging demos of [6] and [7], where an agent had to learn to \ufb01nd a target in a 2D arena, and to\nnavigate subsequently to this target from random positions in the arena. This task is related to the\nwell-known biological learning paradigm of the Morris water maze task [30, 31]. We study here the\ncapability of an agent to discover two pieces of abstract knowledge from the concrete setup of the\ntask: the distribution of goal positions, and the fact that the goal position is constant within each\nepisode. We asked whether the agent would be able to exploit the pieces of abstract knowledge from\nlearning for many concrete episodes, and use it to navigate more ef\ufb01ciently.\nTask: An LSNN-based agent was trained on a family of navigation tasks with continuous state and\naction spaces in a circular arena. The task is structured as a sequence of episodes, each lasting 2\nseconds. The goal was placed randomly for each episode on the border of the arena. When the agent\nreached the goal, it received a reward of 1, and was placed back randomly in the arena. When the\nagent hit a wall, it received a negative reward of -0.02 and the velocity vector was truncated to remain\ninside the arena. The objective was to maximize the number of goals reached within the episode.\nThis family F of tasks is de\ufb01ned by the in\ufb01nite set of possible goal positions. For each episode, an\noptimal agent is expected to explore until it \ufb01nds the goal position, memorize it and exploits this\nknowledge until the end of the episode by taking the shortest path to the goal. We trained an LSNN\nso that the network could control the agent\u2019s behaviour in all tasks, without changing its network\nweights.\nImplementation: Since LSNNs with just a few hundred neurons are not able to process visual input,\nwe provided the current position of the agent within the arena through a place-cell like Gaussian\npopulation rate encoding of the current position. The lack of visual input made it already challenging\nto move along a smooth path, or to stay within a safe distance from the wall. The agent received\ninformation about positive and negative rewards in the form of spikes from external neurons. For\ntraining in the outer loop, we used BPTT together with DEEP R applied to the surrogate objective\nof the Proximal Policy Optimization (PPO) algorithm [32]. In this task the LSNN had 400 recurrent\nunits (200 excitatory, 80 inhibitory and 120 adaptive neurons with adaptation time constant \u03c4a of\n1200 ms), the network was rewired with a \ufb01xed connectivity of 20%. The resulting network diagram\nand spike raster is shown in Suppl. Fig. 1.\nResults: The network behaviour before, during, and after L2L optimization is shown in Fig. 3.\nFig. 3A shows that a large number of training episodes \ufb01nally provides signi\ufb01cant improvements.\nWith a close look at Fig. 3B, one sees that before 52k training episodes, the intermediate path plan-\n\n8\n\n\fning strategies did not seem to use the discovered goal position to make subsequent paths shorter.\nHence the agents had not yet discovered that the goal position does not change during an episode.\nAfter training for 300k episodes, one sees from the sample paths in Fig. 3D that both pieces of ab-\nstract knowledge had been discovered by the agent. The \ufb01rst path in Fig. 3D shows that the agent\nexploits that the goal is located on the border of the maze. The second and last paths show that\nthe agent knows that the position is \ufb01xed throughout an episode. Altogether this demo shows that\nmeta-RL can be applied to RSNNs, and produces previously not seen capabilities of sparsely \ufb01r-\ning RSNNs to extract abstract knowledge from experimentation, and to use it in clever ways for\ncontrolling behaviour.\n\n7 Discussion\n\nWe have demonstrated that deep learning provides a useful new tool for the investigation of networks\nof spiking neurons: It allows us to create architectures and learning algorithms for RSNNs with\nenhanced computing and learning capabilities.\nIn order to demonstrate this, we adapted BPTT\nso that it works ef\ufb01ciently for RSNNs, and can be combined with a biologically inspired synaptic\nrewiring method (DEEP R). We have shown in section 4 that this method allows us to create sparsely\nconnected RSNNs that approach the performance of LSTM networks on common benchmark tasks\nfor the classi\ufb01cation of spatio-temporal patterns (sequential MNIST and TIMIT). This qualitative\njump in the computational power of RSNNs was supported by the introduction of adapting neurons\ninto the model. Adapting neurons introduce a spread of longer time constants into RSNNs, as they\ndo in the neocortex according to [33]. We refer to the resulting variation of the RSNN model as\nLSNNs, because of the resulting longer short-term memory capability. This form of short-term\nmemory is of particular interest from the perspective of energy ef\ufb01ciency of SNNs, because it stores\nand transmits stored information through non-\ufb01ring of neurons: A neuron that holds information in\nits increased \ufb01ring threshold tends to \ufb01re less often.\nWe have shown in Fig. 2 that an application of deep learning (BPTT and DEEP R) in the outer loop\nof L2L provides a new paradigm for learning of nonlinear input/output mappings by a RSNN. This\nlearning task was thought to require an implementation of BP in the RSNN. We have shown that it\nrequires no BP, not even changes of synaptic weights. Furthermore we have shown that this new\nform of network learning enables RSNNs, after suitable training with similar learning tasks in the\nouter loop of L2L, to learn a new task from the same class substantially faster. The reason is that\nthe prior deep learning has installed abstract knowledge (priors) about common properties of these\nlearning tasks in the RSNN. To the best of our knowledge, transfer learning capabilities and the use\nof prior knowledge (see Fig. 2I) have previously not been demonstrated for SNNs. Fig 3 shows\nthat L2L also embraces the capability of RSNNs to learn from rewards (meta-RL). For example,\nit enables a RSNN \u2013 without any additional outer control or clock \u2013 to embody an agent that \ufb01rst\nsearches an arena for a goal, and subsequently exploits the learnt knowledge in order to navigate\nfast from random initial positions to this goal. Here, for the sake of simplicity, we considered only\nthe more common case when all synaptic weights are determined by the outer loop of L2L. But\nsimilar results arise when only some of the synaptic weights are learnt in the outer loop, while other\nsynapses employ local synaptic plasticity rules to learn the current task [27].\nAltogether we expect that the new methods and ideas that we have introduced will advance our un-\nderstanding and reverse engineering of RSNNs in the brain. For example, the RSNNs that emerged\nin Fig. 1-3 all compute and learn with a brain-like sparse \ufb01ring activity, quite different from a SNN\nthat operates with rate-codes. In addition, these RSNNs present new functional uses of short-term\nmemory that go far beyond remembering a preceding input as in [34], and suggest new forms of\nactivity-silent memory [35].\nApart from these implications for computational neuroscience, our \ufb01nding that RSNNs can acquire\npowerful computing and learning capabilities with very energy-ef\ufb01cient sparse \ufb01ring activity pro-\nvides new application paradigms for spike-based computing hardware through non-\ufb01ring.\n\nAcknowledgments\n\nThis research/project was supported by the HBP Joint Platform, funded from the European Union\u2019s\nHorizon 2020 Framework Programme for Research and Innovation under the Speci\ufb01c Grant Agree-\nment No. 720270 (Human Brain Project SGA1) and under the Speci\ufb01c Grant Agreement No.\n\n9\n\n\f785907 (Human Brain Project SGA2). We gratefully acknowledge the support of NVIDIA Cor-\nporation with the donation of the Quadro P6000 GPU used for this research. Research leading to\nthese results has in parts been carried out on the Human Brain Project PCP Pilot Systems at the\nJ\u00a8ulich Supercomputing Centre, which received co-funding from the European Union (Grant Agree-\nment No. 604102). We gratefully acknowledge Sandra Diaz, Alexander Peyser and Wouter Klijn\nfrom the Simulation Laboratory Neuroscience of the J\u00a8ulich Supercomputing Centre for their sup-\nport. The computational results presented have been achieved in part using the Vienna Scienti\ufb01c\nCluster (VSC).\n\nReferences\n[1] Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural\nnetworks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv\npreprint arXiv:1602.02830, 2016.\n\n[2] Steven K. Esser, Paul A. Merolla, John V. Arthur, Andrew S. Cassidy, Rathinakumar Appuswamy,\nAlexander Andreopoulos, David J. Berg, Jeffrey L. McKinstry, Timothy Melano, Davis R. Barch,\nCarmelo di Nolfo, Pallab Datta, Arnon Amir, Brian Taba, Myron D. Flickner, and Dharmendra S. Modha.\nConvolutional networks for fast, energy-ef\ufb01cient neuromorphic computing. Proceedings of the National\nAcademy of Sciences, 113(41):11441\u201311446, November 2016.\n\n[3] David Kappel, Robert Legenstein, Stefan Habenschuss, Michael Hsieh, and Wolfgang Maass. Reward-\n\nbased stochastic self-con\ufb01guration of neural circuits. eNEURO, 2018.\n\n[4] Guillaume Bellec, David Kappel, Wolfgang Maass, and Robert Legenstein. Deep rewiring: Training very\n\nsparse deep networks. International Conference on Learning Representations (ICLR), 2018.\n\n[5] Uri Hasson, Janice Chen, and Christopher J Honey. Hierarchical process memory: memory as an integral\n\ncomponent of information processing. Trends in cognitive sciences, 19(6):304\u2013313, 2015.\n\n[6] Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles\nBlundell, Dharshan Kumaran, and Matt Botvinick. Learning to reinforcement learn. arXiv preprint\narXiv:1611.05763, 2016.\n\n[7] Yan Duan, John Schulman, Xi Chen, Peter L Bartlett, Ilya Sutskever, and Pieter Abbeel. RL2: Fast\n\nreinforcement learning via slow reinforcement learning. arXiv preprint arXiv:1611.02779, 2016.\n\n[8] Jane X Wang, Zeb Kurth-Nelson, Dharshan Kumaran, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo,\nDemis Hassabis, and Matthew Botvinick. Prefrontal cortex as a meta-reinforcement learning system.\nNature Neuroscience, 2018.\n\n[9] Johannes Schemmel, Daniel Br\u00a8uderle, Andreas Gr\u00a8ubl, Matthias Hock, Karlheinz Meier, and Sebastian\nMillner. A wafer-scale neuromorphic hardware system for large-scale neural modeling. In Circuits and\nsystems (ISCAS), proceedings of 2010 IEEE international symposium on, pages 1947\u20131950. IEEE, 2010.\n\n[10] Steve B Furber, David R Lester, Luis A Plana, Jim D Garside, Eustace Painkras, Steve Temple, and\nAndrew D Brown. Overview of the spinnaker system architecture. IEEE Transactions on Computers,\n62(12):2454\u20132467, 2013.\n\n[11] Ning Qiao, Hesham Mostafa, Federico Corradi, Marc Osswald, Fabio Stefanini, Dora Sumislawska, and\nGiacomo Indiveri. A recon\ufb01gurable on-line learning spiking neuromorphic processor comprising 256\nneurons and 128k synapses. Frontiers in neuroscience, 9:141, 2015.\n\n[12] Mike Davies, Narayan Srinivasa, Tsung-Han Lin, Gautham Chinya, Yongqiang Cao, Sri Harsha Cho-\nday, Georgios Dimou, Prasad Joshi, Nabil Imam, Shweta Jain, et al. Loihi: A neuromorphic manycore\nprocessor with on-chip learning. IEEE Micro, 38(1):82\u201399, 2018.\n\n[13] Chris Eliasmith. How to build a brain: A neural architecture for biological cognition. Oxford University\n\nPress, 2013.\n\n[14] Brian DePasquale, Mark M Churchland, and LF Abbott. Using \ufb01ring-rate dynamics to train recurrent\n\nnetworks of spiking model neurons. arXiv preprint arXiv:1601.07620, 2016.\n\n[15] Dongsung Huh and Terrence J Sejnowski. Gradient descent for spiking neural networks. arXiv preprint\n\narXiv:1706.04698, 2017.\n\n[16] Wilten Nicola and Claudia Clopath. Supervised learning in spiking neural networks with force training.\n\nNature communications, 8(1):2208, 2017.\n\n10\n\n\f[17] Guillaume Bellec, Darjan Salaj, Anand Subramoney, Robert Legenstein, and Wolfgang Maass. Compu-\n\ntational properties of networks of spiking neurons with adapting neurons; in preparation. 2018.\n\n[18] Wulfram Gerstner, Werner M. Kistler, Richard Naud, and Liam Paninski. Neuronal dynamics: From\n\nsingle neurons to networks and models of cognition. Cambridge University Press, 2014.\n\n[19] Christian Pozzorini, Skander Mensi, Olivier Hagens, Richard Naud, Christof Koch, and Wulfram Ger-\nstner. Automated high-throughput characterization of single neurons by means of simpli\ufb01ed spiking\nmodels. PLoS computational biology, 11(6):e1004275, 2015.\n\n[20] Nathan W Gouwens, Jim Berg, David Feng, Staci A Sorensen, Hongkui Zeng, Michael J Hawrylycz,\nChristof Koch, and Anton Arkhipov. Systematic generation of biophysically detailed models for diverse\ncortical neuron types. Nature communications, 9(1), 2018.\n\n[21] Corinne Teeter, Ramakrishnan Iyer, Vilas Menon, Nathan Gouwens, David Feng, Jim Berg, Aaron Szafer,\nNicholas Cain, Hongkui Zeng, Michael Hawrylycz, et al. Generalized leaky integrate-and-\ufb01re models\nclassify multiple neuron types. Nature communications, 1(1):1\u201315, 2018.\n\n[22] Tomas Mikolov, Armand Joulin, Sumit Chopra, Michael Mathieu, and Marc\u2019Aurelio Ranzato. Learning\n\nlonger memory in recurrent neural networks. arXiv preprint arXiv:1412.7753, 2014.\n\n[23] David Kappel, Stefan Habenschuss, Robert Legenstein, and Wolfgang Maass. Network Plasticity as\n\nBayesian Inference. PLOS Computational Biology, 11(11):e1004485, 2015.\n\n[24] Quoc V. Le, Navdeep Jaitly, and Geoffrey E. Hinton. A simple way to initialize recurrent networks of\n\nrecti\ufb01ed linear units. CoRR, abs/1504.00941, 2015.\n\n[25] Rui Costa, Ioannis Alexandros Assael, Brendan Shillingford, Nando de Freitas, and Tim Vogels. Cortical\nmicrocircuits as gated-recurrent neural networks. In Advances in Neural Information Processing Systems,\npages 272\u2013283, 2017.\n\n[26] Klaus Greff, Rupesh K Srivastava, Jan Koutn\u00b4\u0131k, Bas R Steunebrink, and J\u00a8urgen Schmidhuber. LSTM: A\n\nsearch space odyssey. IEEE transactions on neural networks and learning systems, 2017.\n\n[27] Anand Subramoney, Guillaume Bellec, Franz Scherr, Robert Legenstein, and Wolfgang Maass. Recurrent\n\nnetworks of spiking neurons learn to learn; in preparation. 2018.\n\n[28] Matthew G Perich, Juan A Gallego, and Lee E Miller. A neural population mechanism for rapid learning.\n\nNeuron, 2018.\n\n[29] Sepp Hochreiter, A Steven Younger, and Peter R Conwell. Learning to learn using gradient descent. In\n\nInternational Conference on Arti\ufb01cial Neural Networks, pages 87\u201394. Springer, 2001.\n\n[30] Richard Morris. Developments of a water-maze procedure for studying spatial learning in the rat. Journal\n\nof neuroscience methods, 11(1):47\u201360, 1984.\n\n[31] Eleni Vasilaki, Nicolas Fr\u00b4emaux, Robert Urbanczik, Walter Senn, and Wulfram Gerstner. Spike-based\nreinforcement learning in continuous state and action space: when policy gradient methods fail. PLoS\ncomputational biology, 5(12):e1000586, 2009.\n\n[32] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy opti-\n\nmization algorithms. arXiv preprint arXiv:1707.06347, 2017.\n\n[33] Allen Institute. c(cid:13) 2018 Allen Institute for Brain Science. Allen Cell Types Database, cell feature search.\n\nAvailable from: celltypes.brain-map.org/data. 2018.\n\n[34] Gianluigi Mongillo, Omri Barak, and Misha Tsodyks. Synaptic theory of working memory. Science (New\n\nYork, N.Y.), 319(5869):1543\u20131546, March 2008.\n\n[35] Mark G. Stokes. \u2018Activity-silent\u2019 working memory in prefrontal cortex: a dynamic coding framework.\n\nTrends in Cognitive Sciences, 19(7):394\u2013405, 2015.\n\n11\n\n\f", "award": [], "sourceid": 442, "authors": [{"given_name": "Guillaume", "family_name": "Bellec", "institution": "Graz University of Technology"}, {"given_name": "Darjan", "family_name": "Salaj", "institution": "Graz University of Technology"}, {"given_name": "Anand", "family_name": "Subramoney", "institution": "Graz University of Technology"}, {"given_name": "Robert", "family_name": "Legenstein", "institution": "Graz University of Technology"}, {"given_name": "Wolfgang", "family_name": "Maass", "institution": "Graz University of Technology"}]}