{"title": "A Theory for Neural Networks with Time Delays", "book": "Advances in Neural Information Processing Systems", "page_first": 162, "page_last": 168, "abstract": null, "full_text": "A Theory for Neural Networks with Time Delays \n\nBert de Vries \nDepartment of Electrical Engineering \nUniversity of Horida, CSE 447 \nGainesville, FL 32611 \n\nJose C. Principe \nDepartment of Electrical Engineering \nUniversity of Horida, CSE 444 \nGainesville, FL 32611 \n\nAbstract \n\nWe present a new neural network model for processing of temporal \npatterns.  This  model,  the  gamma  neural model,  is as  general  as  a \nconvolution  delay  model  with  arbitrary  weight  kernels  w(t).  We \nshow  that  the  gamma  model  can  be  formulated  as  a  (partially \nprewired)  additive  model.  A  temporal  hebbian  learning  rule  is \nderived  and  we  establish  links  to  related  existing  models  for \ntemporal processing. \n\nINTRODUCTION \n\n1 \nIn this paper,  we  are  concerned with developing neural  nets with  short term memory for \nprocessing  of temporal  patterns.  In  the  literature,  basically  two  ways  have  been \nreported to incorporate short-term memory in the neural system equations. The first \napproach utilizes reverberating (self-recurrent) units of type  :  =  - aa (x) + e, that \nhold  a  trace  of the  past  neural  net  states  x(t)  or the  input  e(t).  Elman  (1988)  and \nJordan (1986) have successfully used this approach. The disadvantage of this method \nis the lack of weighting flexibility in the temporal domain, since the system equations \nare described by first order dynamics, implementing a recency gradient (exponential \nfor linear units). \n\nThe  second  approach  involves  explicit  inclusion  of delays  in  the  neural  system \nequations.  A  general  formulation  for  this  type  requires  a  time-dependent  weight \nmatrix W(t). In such a system, multiplicative interactions are substituted by temporal \nconvolution  operations,  leading  to  the  following  system  equations  for  an  additive \nconvolution model -\n\nt \n\n:  =  JW(t-s)a(x(s\u00bbds+e. \n\n( 1) \n\no \n\n162 \n\n\fA Theory for Neural Networks with Time Delays \n\n163 \n\nDue to  the complexity of general convolution models, only strong simplifications of \nthe weight kernel have been proposed. Lang et. al. (1990) use a delta function kernel, \nW(I)  =  L Wk8(1-lk ) ,  which  is  the  core  for  the  Time-Delay-Neural-Network \n(TDNN).  Tank  and  Hopfield  (1987)  prewire W(t)  as  a  weighted  sum  of dispersive \n\nk=O \n\nK \n\ndelay  kernels,  W (I)  =  ~ Wk (t) e \n\nK \n~  I \n\nk  k(l--) \n\nt \n\n11; \n\nK \n~ \n\n=  ~ Wkhk (I, t k).  The  kernels \n\nk=O \n\nk \n\nk=O \n\nhk (I, t k)  are  the  integrands  of the  gamma function.  Tank and  Hopfield described a \none-layer system for classification of isolated words. We  will refer to their model as \na  Concentration-In-Time-Network  (CITN).  The  system  parameters  were  non(cid:173)\nadaptive, although a Hebbian rule equivalent in functional differential equation form \nwas suggested. \n\nIn  this  paper,  we  will  develop  a  theory  for  neural  convolution  models  that  are \nexpressed through a sum of gamma kernels. We will show that such a gamma neural \nnetwork can be reformulated as a (Grossberg) additive model. As a consequence, the \nsubstantial learning and stability theory for additive models is directly applicable to \ngamma models. \n\nTHE GAMMA NEURAL MODEL - FORMAL DERIVATION \n\n2 \nConsider the N-dimensional convolution model -\n\n~ = -ax+ WoY+  fdsW(,-S)Y(S)  +e, \n\nI \n\no \n\n( 2) \n\nwhere  x(t),  y(t)=a(x)  and  e(t)  are  N-dimensional  signals;  Wo  is  NxN  and  W(t)  is \nN xN x [0, 00]  .  The  weight  matrix  W 0  communicates  the  direct  neural  interactions, \nwhereas  W(t)  holds  the  weights  for  the  delayed  neural  interactions.  We  will  now \nassume  that  W(t)  can  be  written  as  a  linear  combination  of normalized  gamma \nkernels, that is, -\n\nK \n\nW(I)  = L Wkgk(I), \n\nk=l \n\nwhere -\n\n( 3) \n\n( 4) \n\nwhere  1.1  is  a  decay  parameter  and  k  a  (lag)  order  parameter.  If W(t)  decays \nexponentially to  zero  for  I  ---+ 00,  then it follows  from  the  completeness of Laguerre \npolynomials  that  this  approximation  can  be  made  arbitrarily  close  (Cohen  et.  aI., \n\n\f164 \n\nde '\\Ties and Principe \n\n1979). In other words, for all physical plausible weight kernels there is a K such that \nW(t)  can  be  expressed  as  (3), (  4).  The  following  properties  hold  for  the  gamma \nkernels  g1c (t)  -\n\n- [1] The gamma kernels are  related by a set of linear homogeneous ODEs -\n\ndg 1 \ndt  = -J,lgl \n\n- [2] The peak value (dt  = 0) occurs at tp  = ~. \n\nk - 1 \n\ndg1c \n\n- [3] The area of the gamma kernels is a normalized,  that is,  J dsg1c (s)  =  1. \n\no \n\nSubstitution of  (3) into ( 2)  yields -\n\ndx \ndt  =  -ax+ L W1cY1c+ e , \n\nK \n\n1c=O \n\nwhere we defined Yo (t)  =  Y (t)  and the gamma state variables -\n\nt \n\nY1c(t)  =  Jdsg1c (t-S)YO(S) ,  k=l, .. ,K. \n\no \n\n( 5) \n\n(6) \n\n( 7) \n\nThe  gamma  state  variables  hold  memory  traces  of  the  neural  states  yo(t).  The \nimportant question is how  to  compute Y1c (t) . Differentiating  (7) using Leibniz' rule \nyields -\n\ndy \n\ndt1c  = J :f1c (t - s) y (s) ds + g1c  (0) Y (t) . \n\nt \n\no \n\nWe  now  utilize gamma kernel property [1] (eq.  (5\u00bb  to obtain-\n\nNote that since  g1c (0)  =  0  for  k ~ 2 and gl (0)  =  J,1.  (9) evaluates to -\n\n( 8) \n\n(9) \n\n\fA Theory for Neural Networks with Time Delays \n\n165 \n\ndy1c \ndt  =  -~Y1c+~Y1c-t'  k=I, .. ,K. \n\n( 10) \n\nThe  gamma  model  is  described  by  (6) and  (10).  This  extended  set of ordinary \ndifferential equations  (ODEs)  is  equivalent to  the  convolution  model, described by \nthe set of functional  differential equations  (2),  (3) and  (4). \n\nIt is  a  valid question to  ask  whether the  system of ODEs  that describes  the  gamma \nmodel  can still  be expressed as  a neural  network model.  The  answer is  affirmative, \nsince  the  gamma model  can be  formulated  as  a regular (Grossberg) additive model. \n\nTo  see this, define the  N(K+l)-dimensional augmented state vector X  = \n\nx \nYt \n\n, the \n\nneural output signal  Y = \n\na (x) \n\nYt \n\n, an external input  E  = \n\n, a diagonal matrix \n\ne \no \n\no \n\nof  decay  parameters  M  = \n\nand \n\nthe  weight \n\n(super)matrix \n\na \n\n~ \n\no \n0\\ \n\n~ \n\n...  WK \n\nWo  Wt \n~ ~ 0 \no  ,,~O \n\n0= \n\nform -\n\n. Then the gamma model can be rewritten in the following \n\ndX \ndt  =  -MX+QY+E, \n\n(  11) \n\nthe familiar Grossberg additive model. \n\n3 \n\nHEBBIAN LEARNING IN THE GAMMA MODEL \n\nThe additive  model  formulation of the  gamma model  allows  a direct generalization \n\n\f166 \n\nde 'ties and Principe \n\nof learning techniques  to  the  gamma model. Note however that the augmented state \nvector  X  contains  the  gamma  state  variables  Y1, ... ,YK,  basically  (dispersively) \ndelayed  neural  states.  As  a  result,  although  associative  learning  rules  for \nconventional  additive  models  only  encode  the  simultaneous  correlation  of neural \nstates,  the  gamma learning  rules  are  able  to  encode  temporal  associations as  well. \nHere we  present Hebbian learning for the gamma model. \n\nThe  Hebbian  postulate  is  often mathematically  translated  to  a  learning  rule  of the \nform  dd~ = 11x (1) yT (t) , where 11  is a learning rate constant,  x the neural activation \n\nvector and yT the neuron output signal vector. This procedure is not likely to encode \ntemporal order, since information about past states is not incorporated in the learning \nequations. \n\nTank and Hopfield (1987) proposed a generalized Hebbian learning rule with delays \nthat can be written as  -\n\n( 12) \n\nwhere g (s)  is a normalized delay kernel. Notice that  ( 12) is a functional differential \nequation,  for  which  explicit solutions  and  convergence  criteria are  not  known  (for \n\nmost implementations of g (s) ). In the gamma model, the  signals J dsgk (s) Y (t - s) \n\nt \n\nare computed by the  system and locally available as  Yk (1)  at the synaptic junctions \nWk.  Thus, in the gamma model,  ( 12) reduces  to -\n\no \n\ndWk \ndt = 11x (1) Yk  (1)  . \n\nT \n\n(  13) \n\nThis learning  rule encodes  simultaneous  correlations  (for k=O)  as  well  as  temporal \nassociations  (for  k ~ 1).  Since  the  gamma  Hebb  rule  is  structurally  similar  to  the \nconventional Hebb rule, it is  also local both in time and space. \n\n4 \n\nRELATION TO OTHER MODELS \n\nThe gamma model is related to Tank and Hopfield 's CITN model in that both models \ndecompose  W(t)  into  a  linear  combination  of gamma kernels.  The  weights  in  the \nCITN system are preset and fixed. The gamma model, expressed as a regular additive \nsystem, allows conventional adaptation procedures to  train the system parameters; Il \nand  K  adapt  the  depth  and  shape  of  the  memory,  while  Wo, .. ,W K  encode \nspatiotemporal correlations between neural states. \n\nTime-Delay-Neural-Nets  (TDNN) are characterized by a tapped delay line  memory \nstructure. The relation is best illustrated by an example. Consider a linear one-layer \n\n\fA Theory for Neural Networks with Time Delays \n\n167 \n\nfeedforward convolution model, described by -\n\nx(t) = e (t) \n\ny(t) = JW(t-s)x(s)dS \n\nt \n\no \n\n( 14) \n\nwhere  x(t),  e(t)  and  y(t)  are  N-dimensional  signals  and  W(t)  a  NxNx [0,00] \ndimensional weight matrix. This system can be approximated in discrete time by -\n\nx(n)=e(n) \n\nn \n\ny(n) = L W(n-m)x(m) \n\nm=O \n\n(  15) \n\nwhich  is  the  TDNN  formulation.  An  alternative  approximation  of the  convolution \nmodel  by means of a (discrete-time) gamma model, is described by (figure  1) -\n\nXo (n)  = e (n) \n\nx k (n)  =  (l - ~) X k (n - 1)  + Ilx k _ 1 (n - 1)  k= 1, .. ,K \n\nK \n\ny(n)  =  L W~k(n) \n\nk=O \n\n( 16) \n\nThe  recursive memory structure in  the  gamma model  is  stable  for  0 S ~ S 2, but an \ninteresting  memory structure is obtained only for  0 < Il S 1. For  Il  =  0, this  system \ncollapses to a static additive net. In this case, no information from past signal values \nare  stored in the  net.  For  0 < 1.1.  < 1,  the  system works as  a discrete-time CITN.  The \ngamma memory structure consists of a cascade of first-order leaky integrators. Since \nthe total memory structure is of order K,  the shape of the memory is not restricted to \n\na  recency  gradient.  The effective  memory  depth  approximates  K  for  small  Il.  For \n\n1.1.  =  1, the gamma model becomes a TDNN. In this case, memory is implemented by \na tapped delay line. The strength of the gamma model is that the parameters Il and K \ncan  be  adapted  by  conventional  additive  learning  procedures.  Thus,  the  optimal \ntemporal  structure of the neural system, whether of CITN or TDNN type, is  part of \nthe  training  phase  in  a  gamma  neural  net.  Finally,  the  application  of the  gamma \nmemory  structure  is  of course  not  limited  to  one-layer  feedforward  systems.  The \ntopologies  suggested by Jordan (1986) and Elman (1988) can easily be extended to \ninclude gamma memory. \n\nIl \n\n\f168 \n\nde \\Ties and Principe \n\ne(n) \n\nFieure L \na one-layer \nffw  gamma net \n\nCONCLUSIONS \n\n5 \nWe  have  introduced  the  gamma  neural  model,  a  neural  net  model  for  temporal \nprocessing, that generalizes most existing approaches, such as  the CITN and TDNN \nmodels.  The  model  can  be  described  as  a  conventional  dynamic  additive  model, \nenabling direct application of existing learning procedures for additive models. In the \ngamma model, dynamic objects are encoded by the same learning equations as static \nobjects. \n\nAcknowledgments \n\nThis  work  has  been  partially  supported  by  NSF  grants  ECS-8915218  and  DDM-\n8914084. \n\nReferences \nCohen  et.  al.,  1979.  Stable  oscillations  in  single  species  growth  models  with \n\nhereditary effects. Mathematical Biosciences 44:255-268, 1979. \n\nDeVries  and  Principe,  1990.  The  gamma  neural  net  - A  new  model  for  temporal \n\nprocessing. submitted to Neural Networks,  Nov.1990. \n\nElman.  1988. Finding structure in time.  CRL technical report 8801,  1988. \nJordan,  1986.  Attractor  dynamics  and  parallelism  in  a  connectionist  sequential \n\nmachine. Proc.  Cognitive Science  1986. \n\nLang  et.  al.  1990.  A  time-delay  neural  network  architecture  for  isolated  word \n\nrecognition. Neural Networks,  vol.3 (1), 1990. \n\nTank and Hopfield. 1987. Concentrating information in time: analog neural networks \nwith  applications  to  speech  recognition  problems.  1st  into  con!  on  neural \nnetworks. IEEE. 1987. \n\n\f", "award": [], "sourceid": 356, "authors": [{"given_name": "Bert", "family_name": "de Vries", "institution": null}, {"given_name": "Jos\u00e9", "family_name": "Pr\u00edncipe", "institution": null}]}