{"title": "Dynamic Modelling of Chaotic Time Series with Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 311, "page_last": 318, "abstract": null, "full_text": "Reinforcement Learning Predicts the Site \nof Plasticity for Auditory Remapping in \n\nthe Barn Owl \n\nAlexandre Pougett \n\nalex@salk.edu \n\nCedric Deffayett \n\ncedric@salk.edu \n\nTerrence J. Sejnowskit \n\nterry@salk.edu \n\ntHoward Hughes Medical Institute \n\nThe Salk Institute \nLa Jolla, CA 92037 \n\nDepartment of Biology \n\nUniversity of California, San Diego \n\nand \n\ntEcole Normale Superieure \n\n45 rue d'Ulm \n\n75005 Paris, France \n\nAbstract \n\nThe auditory system of the barn owl contains several spatial maps. \nIn young barn owls raised with optical prisms over their eyes, these \nauditory maps are shifted to stay in register with the visual map, \nsuggesting that the visual input imposes a frame of reference on \nthe auditory maps. However, the optic tectum, the first site of \nconvergence of visual with auditory information, is not the site of \nplasticity for the shift of the auditory maps; the plasticity occurs \ninstead in the inferior colliculus, which contains an auditory map \nand projects into the optic tectum. We explored a model of the owl \nremapping in which a global reinforcement signal whose delivery is \ncontrolled by visual foveation. A hebb learning rule gated by rein(cid:173)\nforcement learned to appropriately adjust auditory maps. In addi(cid:173)\ntion, reinforcement learning preferentially adjusted the weights in \nthe inferior colliculus, as in the owl brain, even though the weights \nwere allowed to change throughout the auditory system. This ob(cid:173)\nservation raises the possibility that the site of learning does not \nhave to be genetically specified, but could be determined by how \nthe learning procedure interacts with the network architecture. \n\n\f126 \n\nAlexandre Pouget, Cedric Deffayet, Te\"ence J. Sejnowski \n\nc:::======:::::\u00bb \u2022 \n\nc,an ~_m \n\nOptic Tectum \n\nVisual System \n\n.-\n\nInferior Colllc-ulus \nExternal nucleua \n\nU~ \n\nt \nt \n\nForebrain Field L \n\na~ \n\nOvold.H. Nucleull \n\n\u00b7\"bala:m.ic Relay \n\nInferior Colltculu. \n\nCenlnll Nucleus \n\nC1ec) \n\nt \n\nCochlea \n\nFigure 1: Schematic view of the auditory pathways in the barn owl. \n\n1 \n\nIntroduction \n\nThe barn owl relies primarily on sounds to localize prey [6] with an accuracy vastly \nsuperior to that of humans. Figure 1A illustrates some of the nuclei involved in \nprocessing auditory signals. The barn owl determines the location of sound sources \nby comparing the time and amplitude differences of the sound wave between the \ntwo ears. These two cues are combined together for the first time in the shell and \ncore of the inferior colliculus (ICc) which is shown at the bottom of the diagram. \nCells in the ICc are frequency tuned and subject to spatial aliasing. This prevents \nthem from unambiguously encoding the position of objects. The first unambiguous \nauditory map is found at the next stage, in the external capsule of the inferior \ncolliculus (ICx) which itself projects to the optic tectum (OT). The OT is the \nfirst subforebrain structure which contains a multimodal spatial map in which cells \ntypically have spatially congruent visual and auditory receptive fields. \n\nIn addition, these subforebrain auditory pathways send one major collateral toward \nthe forebrain via a thalamic relay. These collaterals originate in the ICc and are \nthought to convey the spatial location of objects to the forebrain [3]. Within the \nforebrain, two major structures have been involved in auditory processing: \nthe \narchistriatum and field L. The archistriatum sends a projection to both the inferior \ncolliculus and the optic tectum. \nKnudsen and Knudsen (1985) have shown that these auditory maps can adapt to \nsystematic changes in the sensory input. Furthermore, the adaptation appears to \nbe under the control of visual input, which imposes a frame of reference on the \nincoming auditory signals. In owls raised with optical prisms, which introduce a \nsystematic shift in part of the visual field, the visual map in the optic tectum was \nidentical to that found in control animals, but the auditory map in the ICx was \nshifted by the amount of visual shift introduced by the prisms. This plasticity \nensures that the visual and auditory maps stay in spatial register during growth \n\n\fReinforcement Learning Predicts the Site of Plasticity for Auditory Remapping \n\n127 \n\nand other perturbations to sensory mismatch. \n\nSince vision instructs audition, one might expect the auditory map to shift in the \noptic tectum, the first site of visual-auditory convergence. Surprisingly, Brainard \nand Knudsen (1993b) observed that the synaptic changes took place between the \nICc and the ICx, one synapse before the site of convergence. \n\nThese observations raise two important questions: First, how does the animal knows \nhow to adapt the weights in the ICx in the absence of a visual teaching signal? \nSecond, why does the change take place at this particular location and not in the \naT where a teaching signal would be readily available? \n\nIn a previous model [7], this shift was simulated using backpropagation to broadcast \nthe error back through the layers and by constraining the weights changes to the \nprojection from the ICc to ICx. There is, however, no evidence for a feedback \nprojection between from the aT to the ICx that could transmit the error signal; \nnor is there evidence to exclude plasticity at other synapses in these pathways. \n\nIn this paper, we suggest an alternative approach in which vision guides the remap(cid:173)\nping of auditory maps by controlling the delivery of a scalar reinforcement signal. \nThis learning proceeds by generating random actions and increasing the probability \nof actions that are consistently reinforced [1, 5] . In addition, we show that rein(cid:173)\nforcement learning correctly predicts the site of learning in the barn owl, namely \nat the ICx-ICc synapse, whereas backpropagation [8] does not favor this location \nwhen plasticity is allowed at every synapse. This raises a general issue: the site of \nsynaptic adjustment might be imposed by the combination of the architecture and \nlearning rule, without having to restrict plasticity to a particular synapse. \n\n2 Methods \n\n2.1 Network Architecture \n\nThe network architecture of the model based on the barn owl auditory system, \nshown in figure 2A, contains two parallel pathways. The input layer was an 8x21 \nmap corresponding to the ICc in which units responded to frequency and interaural \nphase differences. These responses were pooled together to create auditory spatial \nmaps at subsequent stages in both pathways. The rest of the network contained a \nseries of similar auditory maps, which were connected topographically by receptive \nfields 13 units wide. We did not distinguish between field L and the archistriatum \nin the forebrain pathways and simply used two auditory maps, both called FBr. \n\nWe used multiplicative (sigma-pi) units in the aT whose activities were determined \naccording to: \n\nYi = L,. w~Br yfBr WfkBr yfc:c \n\nj \n\n(1) \n\nThe multiplicative interaction between ICx and FBr activities was an important \nassumption of our model. It forced the ICx and FBr to agree on a particular \nposition before the aT was activated. As a result, if the ICx-aT synapses were \nmodified during learning, the ICx-FBr synapses had to be changed accordingly. \n\n\f128 \n\nAlexandre Pouget, Cedric Deffayet, Terrence J. Sejnowski \n\nFigure 2: Schematic diagram of weights (white blocks) in the barn owl auditory \nsystem. A) Diagram of the initial weights in the network. B) Pattern of weights \nafter training with reinforcement learning on a prism-induced shift offour units. The \nremapping took place within the ICx and FBr. C) Pattern of weights after training \nwith backpropagation. This time the ICx-OT and FBr-OT weights changed. \n\nWeights were clipped between 5.0 and 0.01, except for the FBr-ICx connections \nwhose values were allowed to vary between 8.0 and 0.01. The minimum values were \nset to 0.01 instead of zero to prevent getting trapped in unstable local minima which \nare often associated with weights values of zero. The strong coupling between FBr \nand ICx was another important assumption of the model whose consequence will \nbe discussed in the last section. \n\nExamples were generated by simply activating one unit in the ICc while keeping the \nothers to zero, thereby simulating the pattern of activity that would be triggered by \na single localized auditory stimulus. In all simulations, we modeled a prism-induced \nshift of four units. \n\n2.2 Reinforcement learning \n\nWe used stochastic units and trained the network using reinforcement learning [1]. \nThe weighted sum of the inputs, neti, passed through a sigmoid, f(x) , is interpreted \nas the probability, Pi, that the unit will be active: \n\nPi = f(neti) * 0.99 + 0.01 \n\nwere the output of the unit Yi was: \n\n. _ {a with probability 1 - Pi \n\n1 with probability Pi \n\ny, -\n\n(2) \n\n(3) \n\n\fReinforcement Learning Predicts the Site of Plasticity for Auditory Remapping \n\n129 \n\nBecause of the form of the equation for Pi, all units in the network had a small \nprobability (0.01) of being spontaneously active in the absence of any inputs. This \nis what allowed the network to perform a stochastic search in action space to find \nwhich actions were consistently associated with positive reinforcement. \n\nWe ensured that at most one unit was active per trial by using a winner-take-all \ncompetition in each layer. \nAdjustable weights in the network were updated after each training examples with \nhebb-like rule gated by reinforcement: \n\n(4) \n\nA trial consisted in choosing a random target location for auditory input (ICc) and \nthe output of the OT was used to generate a head movement. The reinforcement, \nr , was then set to 1 for head movements resulting in the foveation of the stimulus \nand to -0.05 otherwise. \n\n2.3 Backpropagation \n\nFor the backpropagation network , we used deterministic units with sigmoid activa(cid:173)\ntion functions in which the output of a unit was given by: \n\nwhere neti is the weighted sum of the inputs as before. \n\nThe chain rule was used to compute the partial derivatives of the squared error, \nE , with respect to each weights and the weights were updated after each training \nexample according to: \n\n(5) \n\n(6) \n\nThe target vectors were similar to the input vectors, namely only one OT units was \nrequired to be activated for a given pattern, but at a position displaced by 4 units \ncompared to the input. \n\n3 Results \n\n3.1 Learning site with reinforcement \n\nIn a first set of simulation we kept the ICc-ICx and ICc-FBr weights fixed. Plasticity \nwas allowed at these site in later simulations. \n\nFigure 2A shows the initial set of weights before learning starts. The central diago(cid:173)\nnal lines in the weight diagrams illustrate the fact that each unit receives only one \nnon-zero weight from the unit in the layer below at the same location. \n\n\f130 \n\nAlexandre Pouget, Cedric Deffayet, Terrence J. Sejnowski \n\nThere are two solutions to the remapping: either the weights change within the \nICx and FBr, or from the ICx and the FBr to the ~T. As shown in figure 2B , \nreinforcement learning converged to the first solution. \nIn contrast, the weights \nbetween the other layers were unaltered, even though they were allowed to change. \n\nTo prove that the network could have actually learned the second solution, we \ntrained a network in which the ICc-ICx weights were kept fixed . As we expected, \nthe network shifted its maps simultaneously in both sets of weights converging onto \nthe OT, and the resulting weights were similar to the ones illustrated in figure 2C. \nHowever, to reach this solution, three times as many training examples were needed. \n\nThe reason why learning in the ICx and FBr were favored can be attributed to \nprobabilistic nature of reinforcement learning. If the probability of finding one \nsolution is p, the probability of finding it twice independently is p2. Learning in the \nICx and FBR is not independent because of the strong connection from the FBr to \nthe ICx. When the remapping is learned in the FBR this connection automatically \nremapped the activities in the ICx which in turn allows the ICx-ICx weights to \nremap appropriately. In the OT on the other hand, the multiplicative connection \nbetween the ICx and FBr weights prevent a cooperation between this two sets of \nweights. Consequently, they have to change independently, a process which took \nmuch more training. \n\n3.2 Learning at the ICc-ICx and ICc-FBr synapses \n\nThe aliasing and sharp frequency tuning in the response of ICc neurons greatly \nslows down learning at the ICc-ICx and ICc-FBr synapses. We found that when \nthese synapses were free to change, the remapping still took place within the ICx \nor FBr (figure 3). \n\n3.3 Learning site with backpropagation \n\nIn contrast to reinforcement learning, backpropagation adjusted the weights in two \nlocations: between the ICx and the OT and between the Fbr and OT (figure 2C). \nThis is the consequence of the tendency of the backpropagation algorithm to first \nchange the weights closest to where the error is injected. \n\n3.4 Temporal evolution of weights \n\nWhether we used reinforcement or supervised learning, the map shifted in a very \nsimilar way. There was a simultaneous decrease of the original set of weights with a \nsimultaneous increase of the new weights, such that both sets of weights coexisted \nhalf way through learning. This indicates that the map shifted directly from the \noriginal setting to the new configuration without going through intermediate shifts. \n\nThis temporal evolution of the weights is consistent the findings of Brainard and \nKnudsen (1993a) who found that during the intermediate phase of the remapping, \ncells in the inferior colli cuI us typically have two receptive fields. More recent work \nhowever indicates that for some cells the remapping is more continuous(Brainard \nand Knudsen , personal communication) , a behavior that was not reproduced by \neither of the learning rule. \n\n\fReinforcement Learning Predicts the Site of Plasticity for Auditory Remapping \n\n131 \n\nFigure 3: Even when the ICc-ICx weights are free to change, the network update \nthe weights in the ICx first. A separate weight matrix is shown for each isofrequency \nmap from the ICc to ICx. The final weight matrices were predominantly diagonal; \nin contrast, the weight matrix in ICx was shifted. \n\n4 Discussion \n\nOur simulations suggest a biologically plausible mechanism by which vision can \nguide the remapping of auditory spatial maps in the owl's brain. Unlike previous \napproaches, which relied on visual signals as an explicit teacher in the optic tec(cid:173)\ntum [7], our model uses a global reinforcement signal whose delivery is controlled by \nthe foveal representation of the visual system. Other global reinforcement signals \nwould work as well. For example, a part of the forebrain might compare auditory \nand visual patterns and report spatial mismatch between the two. This signal could \nbe easily incorporated in our network and would also remap the auditory map in \nthe inferior colli cuI us. \n\nOur model demonstrates that the site of synaptic plasticity can be constrained \nby the interaction between reinforcement learning and the network architecture. \nReinforcement learning converged to the most probably solution through stochastic \nsearch. In the network, the strong lateral coupling between ICx and FBr and the \nmultiplicative interaction in the OT favored a solution in which the remapping took \nplace simultaneously in the ICx and FBr. A similar mechanism may be at work \nin the barn owl's brain. Colaterals from FBr to ICx are known to exist, but the \nmultiplicative interaction has not been reported in the barn owl optic tectum. \nLearning mechanisms may also limit synaptic plasticity. NMDA receptors have been \nreported in the ICx, but they might not be expressed at other synapses. There may, \nhowever, be other mechanisms for plasticity. \n\nThe site of remapping in our model was somewhat different from the existing ob(cid:173)\nservations. We found that the change took place within the ICx whereas Brainard \nand Knudsen [3] report that it is between the ICc and the ICx. A close examination \nof their data (figure 11 in [3]) reveals that cells at the bottom of ICx were not \n\n\f132 \n\nAlexandre Pouget, Cedric Deffayet, Terrence J. Sejnowski \n\nremapped, as predicted by our model, but at the same time, there is little anatom(cid:173)\nical or physiological evidence for a functional and hierarchical organization within \nthe ICx. Additional recordings are need to resolve this issue. We conclude that \nfor the barn owl's brain, as well as for our model, synaptic plasticity within ICx \nwas favored over changes between ICc and ICx. This supports the hypothesis that \nreinforcement learning is used for remapping in the barn owl auditory system. \n\nAcknowledgments \n\nWe thank Eric Knudsen and Michael Brainard for helpful discussions on plasticity \nin the barn owl auditory system and the results of unpublished experiments. Peter \nDayan and P. Read Montague helped with useful insights on the biological basis of \nreinforcement learning in the early stages of this project. \n\nReferences \n\n[1] A.G. Barto and M.1. Jordan. Gradient following without backpropagation in \n\nlayered networks. Proc. IEEE Int. Conf. Neural Networks, 2:629-636, 1987. \n\n[2] M.S. Brainard and E.1. Knudsen. Dynamics of the visual calibration of the \nmap of interaural time difference in the barn owl's optic tectum. In Society For \nNeuroscience Abstracts, volume 19, page 369.8, 1993. \n\n[3] M.S. Brainard and E.!. Knudsen. Experience-dependent plasticity in the inferior \ncolliculus: a site for visual calibration of the neural representation of auditory \nspace in the barn owl. The journal of Neuroscience, 13:4589-4608, 1993. \n\n[4] E. Knudsen and P. Knudsen. Vision guides the adjustment of auditory localiza(cid:173)\n\ntion in the young barn owls. Science, 230:545-548, 1985. \n\n[5] P.R. Montague, P. Dayan, S.J. Nowlan, A. Pouget, and T.J. Sejnowski. U s(cid:173)\n\ning aperiodic reinforcement for directed self-organization during development. \nIn S.J. Hanson, J.D. Cowan, and C.L. Giles, editors, Advances in Neural In(cid:173)\nformation Processing Systems, volume 5. Morgan-Kaufmann, San Mateo, CA, \n1993. \n\n[6] R.S. Payne. Acoustic location of prey by barn owls (tyto alba). Journal of \n\nExperimental Biology, 54:535-573, 1970. \n\n[7] D.J. Rosen, D.E. Rumelhart, and E.I. Knudsen. A connectionist model of the \nowl's sound localization system. In Advances in Neural Information Processing \nSystems, volume 6. Morgan-Kaufmann, San Mateo, CA, 1994. \n\n[8] D.E. Rumelhart, G.E. Hinton, and R.J . Williams. Learning internal representa(cid:173)\n\ntions by error propagation. In D. E. Rumelhart, J. L. McClelland, and the PDP \nResearch Group, editors, Parallel Distributed Processing, volume 1, chapter 8, \npages 318-362. MIT Press, Cambridge, MA, 1986. \n\n\fDynamic Modelling of Chaotic Time Series with \n\nNeural Networks \n\nJose C. Principe, Jyh-Ming Kuo \n\nComputational NeuroEngineering Laboratory \nUniversity of Florida, Gainesville, FL32611 \n\nprincipe@synapse.ee.ufi.edu \n\nAbstract \n\nThis paper discusses the use of artificial neural networks for dynamic \nmodelling of time series. We argue that multistep prediction is more \nappropriate to capture the dynamics of the underlying dynamical system, \nbecause it constrains the iterated model. We show how this method can be \nimplemented by a recurrent ANN trained with trajectory learning. We also \nshow how to select the trajectory length to train the iterated predictor for the \ncase of chaotic time series. Experimental results corroborate the proposed \nmethod. \n\n1.0 Introduction \n\nThe search for a model of an experimental time series has been an important problem \nin science. For a long time the linear model was almost exclusively used to describe \nthe system that produced the time series [1], but recently nonlinear models have also \nbeen proposed to replace the linear ones [2]. Lapedes and Farber [3] showed how \nartificial neural networks (ANNs) can be used to identify the dynamics of the \nunknown system that produced the time series. He simply used a multilayer \nperceptron to predict the next point in state space, and trained this topology with \nbackpropagation. This paper explores more complex neural topologies and training \nmethods with the goal of improving the quality of the identification of the dynamical \nsystem, and to understand better the issues of dynamic modelling with neural \nnetworks which are far from being totally understood. \nAccording to Takens' embedding theorem, a map F: Jilm + 1 ~ Jilm + 1 exists that \ntransforms the current reconstructed state y (t) to the next state y (t + 1) , i.e. \n(1) \n\ny(t+ 1) = F(y(t\u00bb \n\n\f312 \n\nor \n\nJose Principe, Jyh-Ming Kuo \n\nwhere m is the estimated dimension of the unknown dynamical system cI>. Note that \nthe map contains several trivial (nonlinear) filters and a predictor. The predictive \n\nmapping r: Jilm + 1 ~ R can be expressed as \n\nx(t+ 1) = r(x(t\u00bb \n\n(2) \n\nwhere x (t) = [x (t - 2m) ... x (t - 1) x (t)] T. This is actually the estimated nonlinear \nautoregressive model of the input time series. The existence of this predictive model \nlays a theoretical basis for dynamic modelling in the sense that we can build from a \nvector time series a model to approximate the mapping r. If the conditions of \nTakens embedding theorem are met, this mapping captures some of the properties of \nthe unknown dynamical system cI> that produced the time series [7]. \nPresently one still does not have a capable theory to guarantee if the predictor has \nsuccessfully identified the original model cI>. The simple point by point comparison \nbetween the original and predicted time series used as goodness of fit for non-chaotic \ntime series breaks down for chaotic ones [5]. Two chaotic time series can be very \ndifferent pointwise but be produced by the same dynamical system (two trajectories \naround the same attractor). The dynamic invariants (correlation dimension, Lyapunov \nexponents) measure global properties of the attractor, so they should be used as the \nrule to decide about the success of dynamic modelling. Hence, a pragmatic approach \nin dynamic modelling is to seed the predictor with a point in state space, feed the \noutput to its input as an autonomous system, and create a new time series. If the \ndynamic invariants computed from this time series match the ones from the original \ntime series, then we say that dynamic modelling was successful [5]. The long term \nbehavior of the autonomous predictive model seems to be the key factor to find out if \nthe predictor identified the original model. This is the distinguishing factor between \nprediction of chaotic time series and dynamic modelling. The former only addresses \nthe instantaneous prediction error, while the latter is interested in long term behavior. \nIn order to use this theory, one needs to address the choices of predictor \nimplementation. Due \nto the universal mapping characteristics of multilayer \nperceptrons (MLPs) and the existence of well established learning rules to adapt the \nMLP coefficients, this type of network appears as an appropriate choice [3]. However, \none must realize that the MLP is a static mapper, and in dynamic modelling we are \ndealing with time varying signals, where the past of the signal contains vital \ninformation to describe the mapping. The design considerations to select the neural \nnetwork topology are presented elsewhere [4]. Wejust would like to say that the MLP \nhas to be enhanced with short term memory mechanisms, and that the estimation of \nthe correlation dimension should be used to set the size of the memory layer. The \nmain goal of the paper is to establish the methodology to efficiently train neural \nnetworks for dynamic modelling. \n\n\fDynamic Modelling of Chaotic Time Series with Neural Networks \n\n313 \n\n2. Iterated versus Single Step Prediction. \n\nFrom eqn. 2 it seems that the resulting dynamic model F can be obtained through \nsingle step prediction. This has been the conventional way to handle dynamic \nmodelling [2],[3]. The predictor is adapted by minimizing the error \n\nL \n\nE = L dist (x (i + 1) -F (1 (i\u00bb ) \n\n_.1 \n\nI = 2m + 1 \n\n(3) \n\n_.1 \nwhere L is the length of the time series, x(i) is the itb data sample, F \nis the map \ndeveloped by the predictor and dist() is a distance measure (normally the L2 norm). \nNotice that the training to obtain the mapping is done independently from sample to \nsample, i.e. \n\nx(i+ 1) = F (xU\u00bb +51 \n\n_.1 \n\n_.1 \n\nx (i + j) = F (1 (i + j -1\u00bb + 5j \n\nwhere 5j are the instantaneous prediction errors, which are minimized during \ntraining. Notice that the predictor is being optimized under the assumption that the \nprevious point in state space is known without error. \nThe problem with this approach can be observed when we iterate the predictor as an \nautonomous system to generate the time series samples. If one wants to produce two \nsamples in the future from sample i the predicted sample i+ 1 needs to be utilized to \ngenerate sample i+2. The predictor was not optimized to do this job, because during \ntraining the true i+ 1 sample was assumed known. As long as 51 is nonzero (as will be \nalways the case for nontrivial problems), errors will accumulate rapidly. Single step \nprediction is more associated with extrapolation than with dynamic modelling, which \nrequires the identification of the unique mapping that produces the time series. \nWhen the autonomous system generates samples, past values are used as inputs to \ngenerate the following samples, which means that the training should constrain also \nthe iterates of the predictive mapping. Putting it in a simple way, we should train the \npredictor in the same way we are going to use it for testing (Le. as an autonomous \nsystem). \nWe propose multistep prediction (or trajectory learning) as the way to constrain the \niterates of the mapping developed by the predictor. Let us define \n\nE = L dist(x(i + I)-xU + 1\u00bb \n\n\" \nI = 2m + 1 \n\nwhere k is the number of prediction steps (length ofthe trajectory) and x (i + 1) is an \nestimate of the predictive map \n\n_.1 \n\nx(i+l) = F (i(i-2m), ... ,i(i\u00bb \n\n(4) \n\n(5) \n\n\f314 \n\nwith \n\nJose Principe, Jyh-Ming Kuo \n\nj (i) = \n\n[\n\nX (i) \n_.1. \nF (x(i-2m-l), ... ,x(i-l\u00bb \n\n0 SiS 2m \n\ni>2m \n\nEquation (5) states that \n(for i>2m), i.e. \n\ni (i) \n\nis the i-2m iterate of the predictive part of the map \n\ni(i+l) \n\n= (F (F ( ... F (x(2m\u00bb\u00bb) = (F (x(2m\u00bb) \n\n_ .1. \n\n-.1. \n\n- .1. \n\n_.1. \n\nt - 2m \n\n(6) \n\nHence, minimizing the criterion expressed by equation (4) an optimal multistep \npredictor is obtained. The number of constraints that are imposed during learning is \nassociated with k, the number of prediction steps, which corresponds to the number \nof iterations of the map. The more iterations, the less likely a sub-optimal solution is \nfound, but note that the training time is being proportionally increased. In a chaotic \ntime series there is a more important consideration that must be brought into the \npicture, the divergence of nearby trajectories, as we are going to see in a following \nsection. \n3. Multistep prediction with neural networks \n\nFigure 1 shows the topology proposed in [4] to identify the nonlinear mapping. \nNotice that the proposed topology is a recurrent neural network. with a global \nfeedback loop. This topology was selected to allow the training of the predictor in the \nsame way as it will be used in testing, i.e. using the previous network outputs to \npredict the next point. This recurrent architecture should be trained with a mechanism \nthat will constrain the iterates of the map as was discussed above. Single step \nprediction does not fit this requirement. \nWith multistep prediction, the model system can be trained in the same way as it is \nused in testing. We seed the dynamic net with a set of input samples, disconnect the \ninput and feed back the predicted sample to the input for k steps. The mean square \nerror between the predicted and true sample at each step is used as the cost function \n(equation (4\u00bb. If the network topology was feed forward , batch learning could be used \nto train the network, and static backpropagation applied to train the net. However, as \na recurrent topology is utilized, a learning paradigm such as backpropagation through \ntime (BPTT) or real time recurrent learning (RTRL) must be utilized [6]. The use of \nthese training methods should not come as a surprise since we are in fact fitting a \ntrajectory over time, so the gradients are time varying. This learning method is \nsometimes called \"trajectory learning\" in the recurrent learning literature [6]. A \ncriterion to select the length of the trajectory k will be presented below. \nThe procedure described above must be repeated for several different segments of the \ntime series. For each new training segment, 2m+ 1 samples of the original time series \nare used to seed the predictor. To ease the training we suggest that successive training \nsequences of length k overlap by q samples (q<k). For chaotic time series we also \nsuggest that the error be weighted according to the largest Lyapunov exponent. Hence \n\n\fDynamic Modelling of Chaotic Time Series with Neural Networks \n\nthe cost function becomes \n\" \n\nE = L L h(i) \u00b7dist(x(i+jq+l)-i(i+jq+l\u00bb \n\nr \n\nJ = 01 = 2m+ I \n\nwhere r is the number of training sequences, and \n\n~ A I -(i-2m-I) \n\nh (i) = (e III/IX \n\n) \n\n315 \n\n(7) \n\n(8) \n\nIn this equation A.max is the largest Lyapunov exponent and L\\t the sampling interval. \nWith this weighting the errors for later iteration are given less credit, as they should \nsince due to the divergence of trajectories a small error is magnified proportionally \nto the largest Lyapunov exponent [7]. \n4. Finding the length of the trajectory \n\nFrom the point of view of dynamic modelling, each training sequences should \npreferably contain enough information to model the attractor. This means that each \nsequence should be no shorter than the orbital length around the attractor. We \nproposed to estimate the orbital length as the reciprocal of the median frequency of \nthe spectrum of the time series [8]. Basically this quantity is the average time \nrequired for a point to return to the same neighborhood in the attractor. \nThe length of the trajectory is also equivalent to the number of constraints we impose \non the iterative map describing the dynamical model. However, in a chaotic time \nseries there is another fundamental limitation imposed on the trajectory length - the \nnatural divergence of trajectories which is controlled by A.max' the largest Lyapunov \nexponent. If the trajectory length is too long, then instabilities in the training can be \nexpected. A full discussion of this topic is beyond the scope of this paper, and is \npresented elsewhere [8]. We just want to say that when A.max is positive there is an \nuncertainty region around each predicted point that is a function of the number of \nprediction steps (due to cummulative error). If the trajectory length is too long the \nuncertainty regions from two neighboring trajectories will overlap, creating \nconflicting requirements for training (the model is requried to develop a map to \nfollow both segments A and B- Figure 2). \nIt turns out that one can approximately find the number of iterations is that will \nguarantee no overlap of uncertainty regions [8]. The length of the principal axis of \nthe uncertainty region around a signal trajectory at iteration i can be estimated as \n\n\u00a3; = toe \n\n~III/lXiAI \n\n(9) \n\nwhere \u00a30 is the initial separation. Now assuming that the two principal axis of nearby \ntrajectories are colinear, we should choose the number of iterations is such that the \ndistance dj between trajectories is larger than the uncertainty region, i.e. d; ~ 2\u00a3; . \nThe estimate of is should be averaged over a number of neighboring training \nsequences (-50 depending on the signal dynamics). \nHence, to apply this method three quantities must be estimated: the largest Lyapunov \n\nI \n\nI \n\n\f316 \n\nJose Principe, Jyh-Ming Kuo \n\nexponent, using one of the accepted algorithms. The initial separation can be \nestimated from the one-step predictor. And is by averaging local divergence. The \ncomputation time required to estimate these quantities is usually much less than \nsetting by trial and error the length of the trajectory until a reasonable learning curve \nis achieved. \nWe also developed a method to train predictors for chaotic signals with large A.max' \nbut it will not be covered in this paper [8]. \n5. Results \nWe used this methodology to model the Mackey-Glass system (d=30, sampled at 116 \nHz). A signal of 500 samples was obtained by 4th order Runge-Kutta integration and \nnormalized between -1,1. The largest Lyapunov exponent for this signal is 0.0071 \nnats/sec. We selected a time delay neural network (TDNN) with topology 8-14-1. The \noutput unit is linear, and the hidden layer has sigmoid nonlinearities. The number of \ntaps in the delay line is 8. \nWe trained a one-step predictor and the multistep predictor with the methodology \ndeveloped in this paper to compare results. The single step predictor was trained with \nstatic backpropagation with no momentum and step size of 0.001. Trained was \nstopped after 500 iterations. The final MSE was 0.000288. After training, the \npredictor was seeded with the first 8 points of the time series and iterated for 3,000 \ntimes. Figure 3a shows the corresponding output. Notice that the waveform produced \nby the model is much more regular that the Mackey-Glass signal, showing that some \nfine detail of the attractor has not been captured. \nNext we trained the same TDNN with a global feedback loop (TDNNGF). The \nestimate of the is over the neighboring orbits provided an estimate of 14, and it is \ntaken as the length of the trajectory. We displaced each training sequence by 3 \nsamples (q=3 in eqn 7). BPTT was used to train the TDNNGF for 500 iterations over \nthe same signal. The final MSE was 0.000648, higher than for the TDNN case. We \ncould think that the resulting predictor was worse. The TDNNGF predictor was \ninitialized with the same 8 samples of the time series and iterated for 3,000 times. \nFigure 3b shows the resulting waveform. It \"looks\" much closer to the original \nMackey-Glass time series. We computed the average prediction error as a function of \niteration for both predictors and also the theoretical rate of divergence of trajectories \nassuming an initial error EO (Casdagli conjecture, which is the square of eqn 9) [7]. \nAs can be seen in Figure 4 the TDNNGF is much closer to the theoretical limit, which \nmeans a much better model. We also computed the correlation dimension and the \nLyapunov exponent estimated from the generated time series, and the figures \nobtained from TDNNGF are closer to the original time series. \nFigure 5 shows the instability present in the training when the trajectory length is \nabove the estimated value of 14. For this case the trajectory length is 20. As can be \nseen the MSE decreases but then fluctuates showing instability in the training. \n6. Conclusions \n\nThis paper addresses dynamic modelling with artificial neural networks. We showed \n\n\fDynamic Modelling of Chaotic Time Series with Neural Networks \n\n317 \n\nthat the network topology should be recurrent such that the iterative map is \nconstrained during learning. This is a necessity since dynamic modelling seeks to \ncapture the long term behavior of the dynamical system. These models can also be \nused as a sample by sample predictors. Since the network topology is recurrent, \nbackpropagation through time or real time recurrent learning should be used in \ntraining. In this paper we showed how to select the length of the trajectory to avoid \ninstability during training. \nA lot more work needs to be done to reliably capture dynamical properties of time \nseries and encapsulate them in artificial models. But we believe that the careful \nanalysis of the dynamic characteristics and the study of its impact on the predictive \nmodel performance is much more promising than guess work. According to this (and \nothers) studies, modelling of chaotic time series of low Amax seems a reality. We have \nextended some of this work for time series with larger Amax, and successfully \ncaptured the dynamics of the Lorenz system [8]. But there, the parameters for \nlearning have to be much more carefully selected, and some of the choices are still \narbitrary. The main issue is that the trajectories diverge so rapidly that predictors have \na hard time to capture information regarding the global system dynamics. It is \ninteresting to study the limit of predictability of this type of approach for high \ndimensional and high Amax chaos. \n\nPredictor \n\nCorr. Dim. \n\nLyapunov \n\nMG30 \n\n2.70+/-0.05 \n\n0.0073+/-0.000 1 \n\nTDNNGF \n\n2.65+/-0.03 \n\n0.0074+/-0.0001 \n\nTDNN \n\n1.60+1-0.10 \n\n0.0063+/-0.0001 \n\nsegment B \n\nFigure 1. Prop_osed recurrent architecture Figure 2. State space representation in \n\n(IDNNGF) \n\ntraining a model \n\n7. Acknowledgments \n\nThis work was partially supported by NSF grant #ECS-9208789, and ONR #1494-94-\n1-0858. \n\n\f318 \n\nJose Principe, Jyh-Ming Kuo \n\n-OJ! \n\n-OJ! \n\n~ m ~ ~ _ ~ w ~ ~ _ \nFigure 3a. Generated sequence with the \nTDNN \n\n_ \n\n-I~~~~!:--=--=--:!~~~--::!=----= \nJIO ~ . . II1II \n\no ~ '110. \n\nFigure 3b. uenerated sequence with the \nTDNNGF \n\n110 \n\nICIO ~ W \n\nTDNNerror \n\n/ \n\nj \n\n, \n! \n\n0.2 \n\nm \n\n002 \n\n015 TDNNGF e~ ~ /1 ,/ \n\n0.01 \n\nC asdagli conjecture \n\n/~ .-' \n\n/ ~ \n\n\u2022 \u2022 \u2022 \u2022 \u2022\u2022\u2022\u2022\u2022 \u2022\u2022\u2022\u2022 _._.:::~ \n\nIS \n\n~ \n\n~ \n\n\u00bb \n\n.005 \no \no \n\n5 \n\nto \n\nFigure 4. Comparison of predictors \n\nFigure 5. Instability in training \n\n8. References \n[1] Box, G. E., and G. M. Jenkins, Time Series Analysis, Forecasting and Control, \nHolden Day, San Francisco, 1970. \n[2] Weigend, A. S., B. A. Huberman, and D. E. Rumelhart, \"Predicting the future: a \nconnectionist approach,\" International Journal of Neural Systems, vol. 1, pp. 193-\n209, 1990. \n[3] Lapedes, R., and R. Farber, \"Nonlinear signal processing using neural networks: \nprediction and system modelling,\" Technical Report LA-UR87-2662, Los Alamos \nNational Laboratory, Los Alamos, New Mexico, 1987. \n[4] Kuo J-M., Principe J.C., \"A systematic approach to chaotic time series modeling \nwith neural networks\", in IEEE Workshop on Neural Nets for Signal Processing, \nErmioni, Greece, 1994. \n[5] Principe, J. C., A. Rathie, and J. M. Kuo, \"Prediction of chaotic time series with \nneural networks and the issue of dynamic modeling,\" International Journal of \nBiburcation and Chaos, vol. 2, no. 4, pp. 989-996, 1992. \n[6] Hertz, J, A. Krogh, and R. G. Palmer, Introduction to the Theory of Neural \nComputation, Addison-Wesley, Redwood City, CA, 1991. \n[7] Casdagli, M., \"Nonlinear prediction of chaotic time series,\" Physica D 35, \npp.335-356, 1989. \n[8] Kuo, J.M., \"Nonlinear Dynamic Modelling with Artificial neural networks\", \nPh.D. dissertation, University of FLorida, 1993. \n\n\f", "award": [], "sourceid": 934, "authors": [{"given_name": "Jose", "family_name": "Principe", "institution": null}, {"given_name": "Jyh-Ming", "family_name": "Kuo", "institution": null}]}