{"title": "Sequential Adaptation of Radial Basis Function Neural Networks and its Application to Time-series Prediction", "book": "Advances in Neural Information Processing Systems", "page_first": 721, "page_last": 727, "abstract": "", "full_text": "Sequential Adaptation of Radial Basis Function \n\nNeural Networks and its Application to \n\nTime-series Prediction \n\nv. Kadirkamanathan \nEngineering Department \nCambridge University \nCambridge CB2 IPZ, UK \n\nM. Niranjan \n\nF. Fallside \n\nAbstract \n\nWe develop a sequential adaptation algorithm for radial basis function \n(RBF) neural networks of Gaussian nodes, based on the method of succes(cid:173)\nsive F-Projections. This method makes use of each observation efficiently \nin that the network mapping function so obtained is consistent with that \ninformation and is also optimal in the least L 2-norm sense. The RBF \nnetwork with the F-Projections adaptation algorithm was used for pre(cid:173)\ndicting a chaotic time-series. We compare its performance to an adapta(cid:173)\ntion scheme based on the method of stochastic approximation, and show \nthat the F-Projections algorithm converges to the underlying model much \nfaster. \n\n1 \n\nINTRODUCTION \n\nSequential adaptation is important for signal processing applications such as time(cid:173)\nseries prediction and adaptive control in nonstationary environments. With increas(cid:173)\ning computational power, complex algorithms that can offer better performance \ncan be used for these tasks. A sequential adaptation scheme, called the method \nof successive F-Projections [Kadirkamanathan & Fallside, 1990], makes use of each \nobservation efficiently in that, the function so obtained is consistent with that ob(cid:173)\nservation and is the optimal posterior in the least L 2-norm sense. \n\nIn this paper we present an adaptation algorithm based on this method for the \nradial basis function (RBF) network of Gaussian nodes [Broomhead & Lowe, 1988]. \nIt is a memory less adaptation scheme since neither the information about the past \nsamples nor the previous adaptation directions are retained. Also, the observations \nare presented only once. The RBF network employing this adaptation scheme \n\n721 \n\n\f722 \n\nKadirkamanathan, Niranjan, and Iilllside \n\nwas used for predicting a chaotic time-series. The performance of the algorithm is \ncompared to a memoryless sequential adaptation scheme based on the method of \nstochastic approximation. \n\n2 METHOD OF SUCCESSIVE F-PROJECTIONS \n\nThe principle of F-Projection [Kadirkamanathan et al., 1990J is a general method \nof choosing a posterior function estimate of an unknown function 1*, when there \nexists a prior estimate and new information about 1* in the form of constraints. \nThe principle states that, of all the functions that satisfy the constraints, one should \nchoose the posterior In that has the least L 2-norm, II/n - In-til, where In-l is the \nprior estimate of 1*. viz., \n\nwhere H / is the set of functions that satisfy the new constraints, and \n\nIn = arg min III - In- III \nIII - In-lW = J IIf(~) - In_'(~)1I2Id\u00a31 = D(f,/n-d \n\nIn E HI \n\nJ \n\nsuch that \n\n(1) \n\n(2) \n\n~EC \n\nwhere ~ is the input vector, Id\u00a31 is the infinitesimal volume in the input space \ndomain C. \nIn functional analysis theory, the metric D(., .) describes the L2-normed linear space \nof square integrable functions. Since an inner product can be defined in this space, \nit is also the Hilbert space of square integrable functions [Linz, 1984J. Constraints \nof the form Yn = I(~n) are linear in this space, and the functions that satisfy the \nconstraint lie in a hyperplane subspace H /. The posterior In, obtained from the \nprinciple can be seen to be a projection of fn-l onto the subspace H / containing \nr, the underlying function that generates the observation set, and hence is optimal \n(i.e., best possible choice), see Figure l. \n\nHilbert space \n\nFigure 1: Principle of F-Projection \n\nNeural networks can be viewed as constructing a function in its input space. The \nstructure of the neural network and the finite number of parameters restrict the class \n\n\fSequential Adaptation of Radial Basis Function Neural Networks \n\n723 \n\nof functions that can be constructed to a subset of functions in the Hilbert space. \nNeural networks, therefore approximate the underlying function that describes the \nset of observations. Hence, the principle of .1\"-Projection now yields a posterior \nI(fin) E HI that is an approximation of In (see Figure 1). \nThe method of successive :F-Projectjons is the application of the principle of .1\"(cid:173)\nProjection on a sequence of observations or information [Kadirkamanathan et al., \n1990]. For neural networks, the method gives an algorithm that has the following \ntwo steps . \n\n\u2022 Initialise parameters with random values or values based on a priori knowledge . \n\u2022 For each pattern (~i' Yi) i = 1 ... n, determine the posterior parameter estimate \n\n~ = argmJn J II/(~,!D - 1(~'~_1)1I2Id~1 \n\nsuch that I(~,~) = Yi' \n\n~EC \n\nwhere (~i' Yi), for i = 1 ... n constitutes the observation set, ft is the neural network \nparameter set and I(~, ft) is the function constructed by the neural network. \n\n3 \n\n.r-PROJECTIONS FOR AN RBF NETWORK \n\nThe class of radial basis function (RBF) neural networks were first introduced by \nBroomhead & Lowe [1988]. One such network is the RBF network of Gaussian \nnodes. The function constructed at the output node of the RBF network of Gaussian \nnodes, J(~), is derived from a set of basis functions of the form, \n\ni = l. .. m \n\n(3) \n\nEach basis function \u00a2i(~) is described at the output of each hidden node and is \ncentered on ~ in the input space. \u00a2i(.~) is a function of the radial weighted distance \nbetween the input vector ~ and the node centre ~. In general, Ci is diagonal with \nelements [O\"il, 0\" 12, ... ,000iN]. 1(;,,) is a linear combination of the m basis functions. \n\nm \n\nf(~) = L ai\u00a2i(~) \n\ni=l \n\n(4) \n\nand ~ = [ ... , ai, Il i ,Qi I \u2022 \u2022 \u2022 J is then the parameter vector for the RBF network. \nThere are two reasons for developing the sequential adaptation algorithm for the \nRBF network of Gaussian nodes. Firstly, the method of successive :F-Projections is \nbased on minimizing the hypervolume change in the hypersurface when learning new \npatterns. The RBF network of Gaussian nodes construct a localized hypersurface \nand therefore the changes will also be local. This results in the adaptation of a few \nnodes and therefore the algorithm is quite stable. Secondly, the L 2-norm measure of \nthe hypervolume change can be solved analytically for the RBF network of Gaussian \nnodes. \n\n\f724 \n\nKadirkamanatban, Niranjan, and l8.11side \n\nThe method of successive F-Projections is developed under deterministic noise-free \nconditions. When the observations are noisy, the constraint that f(~, ~) = Yi must \nbe relaxed to, \n\nIIf(~,~) - Yi 112 ::; f \n\n(5) \n\nHence, the sequential adaptation scheme is modified to, \n\n!l.n = arg min J (0) \nJ(~) = J IIf(~,~) - f(~_1,~)1I2Id~1 + \n\n9 \n\n-\n\ncillf(~'~i) - Yill 2 \n\n(6) \n\n(7) \n\nCi is the penalty parameter that trades off between the importance of learning the \nnew pattern and losing the information of the past patterns. This minimization \ncan be performed by the gradient descent procedure. The minimization procedure \nis halted when the change ~J falls below a threshold. The complete adaptation \nalgorithm is as follows: \n\n\u2022 Choose ~o randomly \n\u2022 For each pattern (i = 1 ... P) \n\n~-\n\n~ \n\n\u2022 O~O) = o\u00b7 1 \n\u2022 Repeat (Ph iteration) \nO~ Ie) = O~ Ie - 1) _ \n~ \n\n- I \n\nUntil ~J(Ie) < tth \n\n\"V J I \n- -. \n\n7] \n\n9=9(1,-1) \n\nwhere \"V J is the gradient vector of JUt.) with respect to ~, ~J(Ie) = J(~~Ie\u00bb) -\nJ(~~Ie-l\u00bb) is the change in the cost function and tth is a threshold. Note that \n(Xi, Jli,!!..i for i = 1 ... m are all adapted. The details of the algorithm can be found \nin the report by Kadirkamanathan et a/., [Kadirkamanathan, Niranjan & Fallside, \n1991]. \n\n4 TIME SERIES PREDICTION \n\nAn area of application for sequential adaptation of neural networks is the prediction \nof time-series in nonstationary environments, where the underlying model generat(cid:173)\ning the time-series is time-varying. The adaptation algorithm must also result in \nthe convergence of the neural network to the underlying model under stationary \nconditions. The usual approach to predicting time-series is to train the neural net(cid:173)\nwork on a set of training data obtained from the series [Lapedes & Farber, 1987; \nFarmer & Sidorowich, 1988; Niranjan, 1991]. Our sequential adaptation approach \ndiffers from this in that the adaptation takes place for each sample. \n\nIn this work, we examine the performance of the F-Projections adaptation algorithm \nfor the RBF network of Gaussian nodes in predicting a deterministic chaotic series. \nThe chaotic series under investigation is the logistic map [Lapedes & Farber, 1987], \nwhose dynamics is governed by the equation, \n\nXn = 4Xn-l(1- xn-d \n\n(8) \n\n\fSequential Adaptation of Radial Basis Function Neural Networks \n\n725 \n\nThis is a first order nonlinear process where only the previous sample determines \nthe value of the present sample. Since neural networks offer the capability of con(cid:173)\nstructing any arbitrary mapping to a sufficient accuracy, a network with input nodes \nequal to the process order will find the underlying model. Hence, we use the RBF \nnetwork of Gaussian nodes with a single input node. We are thus able to compare \nthe map the RBF network constructed with that of the actual map given by eqn \n(8). \nFirst, RBF network with 2 input nodes and 8 Gaussian nodes was used to predict \nthe logistic map chaotic series of 100 samples. Each sample was presented only once \nfor training. The training was temporarily halted after 0, 20, 40, 60, 80 and 100 \nsamples, and in each case the prediction error residual was found. This is given in \nFigure 2 where the increasing darkness of the curves stand for the increasing number \nof patterns used for training. It is evident from this figure that the prediction model \nimproves very quickly from the initial state and then slowly keeps on improving as \nthe number of training patterns used is increased. \n\n0.5 \n\n~.5~--~~--~----~--~~--~~--~----~----~--~----~ \n100 \n\no \n\n80 \n\n90 \n\n70 \n\nFigure 2: Evolution of prediction error residuals \n\ntime(samples) \n\nIn order to compare the performance of the sequential adaptation algorithm, a mem(cid:173)\noryless adaptation scheme was also used to predict the chaotic series. The scheme is \nthe LMS or stochastic approximation (sequential back propagation [White, 1987]), \nwhere for each sample, one iteration takes place. The iteration is given by, \n\nwhere, \n\nfl.i = ~-l -\n\nrfv JI (J=(J \nJ(fl.) = 11!(fl.,~) - ydl 2 \n\n- I \n\n-\n\n(9) \n\n(10) \n\nand JUt) is the squared prediction error for the present sample. \n\nNext, the RBF network with a single input node and 8 Gaussian units was used \nto predict the chaotic series. The .1'-Projections and the stochastic approximation \n\n\f726 \n\nKadirkamanathan, Niranjan, and Htllside \n\nadaptation algorithms were used for training this network on 60 samples. Results \non the map constructed by a network trained by each of these schemes for 0, 20 and \n60 samples and the samples used for training are shown in Figure 3. Again, each \nsample was presented only once for training. \n\n(0 p .. n.,nsl \n(20 Pan.,na) \n(60 Pan.rns) \n\nt \nE \n~ 1.0 \nC \n! ::s D.I \n\n(.) \n\nD.I \n\nG.4 \n\nD.2 \n\n(0 PMI.,na) \n(20 pan.,na) \n(60 pan.tns) \n\n.. .. \n\n..... ..... \n\nCD \nQ. \nE \nIII 1.0 \nCI) \nC \n~ \n5 0.8 \nU \n\n0.6 \n\n0.4 \n\n0.2 \n\n. \no \nQ ... . \n\n. . \"\u20ac} \n\n0.1 \n\n0.8 \n\n1.0 \n\nPast Sample \n\n(a) \n\n0.1 \n\n1.0 \n\nPast Sample \n\n(b) \n\nFigure 3: Map f(x) constructed by the RBF network. (a) f-projections (b) stochastic \napproximation. \n\nThe stochastic approximation algorithm fails to construct a close-fit mapping of \nthe underlying function after training on 60 samples. The F-Projections algorithm \nhowever, provides a close-fit map after training on 20 samples. It also shows stability \nby maintaining the map up to training on 60 samples. The speed of convergence \nachieved, in terms of the number of samples used for training, is much higher for \nthe F-Projections. \nComparing the cost functions being minimized for the F -Projections and the \nstochastic approximation algorithms, given by eqns (7) and (10), it is clear that \nthe difference is only an additional integral term in eqn (7). This term is not a \nfunction of the present observation, but is a function of the a priori parameter \nvalues. The addition of such a term is to incorporate a priori knowledge of the \nnetwork to that of the present observation in determining the posterior parameter \nvalues. The faster convergence result for the F-Projections indicate the importance \nof the extended cost function . Even though the cost term for the F-Projections \nwas developed for a recursive estimation algorithm, it can be applied to a block \nestimation method as well. The cost function given by eqn (7) can be seen to be an \nextension of the nonlinear least squared error to incorporate a priori knowledge. \n\n5 CONCLUSIONS \n\nThe principle of F-Projection proposed by Kadirkamanathan et a/., [1990], pro(cid:173)\nvides an optimal posterior estimate of a function, from the prior estimate and new \n\n\fSequential Adaptation of Radial Basis Function Neural Networks \n\n727 \n\ninformation. Based on it, they propose a sequential adaptation scheme called, the \nmethod of successive I-projections. We have developed a sequential adaptation \nalgorithm for the RBF network of Gaussian nodes based on this method. \n\nApplying the RBF network with the .1\"-Projections algorithm to the prediction \nof a chaotic series, we have found that the RBF network was able to map the \nunderlying function. The prediction error residuals at the end of training with \ndifferent number of samples, indicate that, after a substantial reduction in the error \nin the initial stages, with increasing number of samples presented for training the \nerror was steadily decreasing. By comparing with the performance of the stochastic \napproximation algorithm, we show the superior convergence achieved by the .1\"(cid:173)\nProjections. \n\nComparing the cost functions being minimized for the .1\"-Projections and the \nstochastic approximation algorithms reveal that the .1\"-Projections uses both the \nprediction error for the current sample and the a priori values of the parameters, \nwhereas the stochastic approximation algorithms use only the prediction error. We \nalso point out that such a cost term that includes a priori knowledge of the network \ncan be used for training a trained network upon receipt of further information. \n\nReferences \n\n[1] Broomhead.D.S & Lowe.D, (1988), \"Afulti-variable Interpolation and Adaptive \nNetworks\", RSRE memo No.4148, Royal Signals and Radar Establishment, \nMalvern. \n\n[2] Farmer.J.D & Sidorowich.J.J, (1988), \"Exploiting chaos to predict the future \n\nand reduce noise\", Technical Report, Los Alamos National Laboratory. \n\n[3] Kadirkamanathan.V & Fallside.F (1990), \".1\"-Projections: A nonlinear recur(cid:173)\nsive estimation algorithm for neural networks\", Technical Report CUED/F(cid:173)\nINFENG/TR.53, Cambridge University Engineering Department. \n\n[4] Kadirkamanathan.V, Niranjan.M & Fallside.F (1991), \"Adaptive RBF network \n\nfor time-series prediction\", Technical Report CUED/F-INFENG/TR.56, Cam(cid:173)\nbridge University Engineering Department. \n\n[5] Lapedes.A.S & Farber.R, (1987), \"Non-linear signal processing using neural \nnetworks: Prediction and system modelling\", Technical report, Los Alamos \nNational Laboratory, Los Alamos, New Mexico 87545. \n\n[6] Linz.P, (1984), \"Theoretical Numerical Analysis\", John Wiley, New York. \n[7] Niranjan.M, (1991), \"Implementing threshold autoregressive models for time \n\nseries prediction on a multilayer perceptron\" , Technical Report CUED /F(cid:173)\nINFENG/TR.50, Cambridge University Engineering Department. \n\n[8] White.H, 1987, \"Some asymptotic results for learning in single hidden layer \nfeedforward network models\", Technical Report, Department of Economics, \nUniveristy of California, San Diego. \n\n\f", "award": [], "sourceid": 373, "authors": [{"given_name": "V.", "family_name": "Kadirkamanathan", "institution": null}, {"given_name": "M.", "family_name": "Niranjan", "institution": null}, {"given_name": "F.", "family_name": "Fallside", "institution": null}]}