{"title": "A Novel Reinforcement Model of Birdsong Vocalization Learning", "book": "Advances in Neural Information Processing Systems", "page_first": 101, "page_last": 108, "abstract": null, "full_text": "A  Novel Reinforcement  Model of \nBirdsong Vocalization  Learning \n\nKenji Doya \n\nATR Human Infonnation Processing \n\nResearch Laboratories \n\nTerrence J.  Sejnowski \n\nHoward Hughes Medical Institute \n\nUCSD and Salk Institute, \n\n2-2 Hikaridai, Seika, Kyoto 619-02, Japan \n\nSan Diego, CA 92186-5800, USA \n\nAbstract \n\nSongbirds learn to imitate a tutor song through auditory and motor learn(cid:173)\ning.  We  have developed a theoretical framework  for song learning that \naccounts  for response properties of neurons that have been  observed in \nmany  of the nuclei  that are involved in  song learning.  Specifically,  we \nsuggest that the anteriorforebrain pathway, which is not needed for song \nproduction  in  the  adult  but is  essential  for  song  acquisition,  provides \nsynaptic perturbations and adaptive evaluations for syllable vocalization \nlearning.  A computer model based on reinforcement learning was  con(cid:173)\nstructed  that could replicate a real  zebra finch  song with 90% accuracy \nbased on a spectrographic measure.  The second generation of the bird(cid:173)\nsong model replicated the tutor song with 96% accuracy. \n\n1 \n\nINTRODUCTION \n\nStudies of motor pattern generation have generally focussed on innate motor behaviors that \nare  genetically preprogrammed  and  fine-tuned  by  adaptive mechanisms  (Harris-Warrick \net  al.,  1992).  Birdsong learning  provides  a  favorable  opportunity for  investigating  the \nneuronal mechanisms for the acquisition of complex motor patterns.  Much is known about \nthe neuroethology of bird song and its neuroanatomical substrate (see Nottebohm, 1991 and \nDoupe,  1993  for  reviews),  but relatively little is known  about the overall system from  a \ncomputational viewpoint.  We  propose a set of hypotheses for  the functions  of the brain \nnuclei  in  the song  system  and  explore their computational strength  in  a  model  based on \nbiological constraints.  The model  could reproduce real  and  artificial birdsongs in  a  few \nhundred learning trials. \n\n\f102 \n\nKenji  Doya,  Terrence  1.  Sejnowski \n\n----+ Direct Motor Pathway \n\n----+ Anterior Forebrain Pathway \n\nFigure  1:  Major songbird brain nuclei  involved in song control.  The dark arrows  show \nthe direct motor control pathway and the gray arrows show the anterior forebrain pathway. \nAbbreviations:  Uva,  nucleus  uvaeforrnis  of the  thalamus;  NIf,  nucleus  interface  of the \nneostriatum; L, field L (primary auditory are of the forebrain); HV c, higher vocal center; RA, \nrobust nucleus of the archistriatum; DM, dorso-medial part of the nucleus intercollicularis; \nnXllts, tracheosyringeal part of the hypoglossal nucleus;  AVT,  ventral area of Tsai of the \nmidbrain; X, area X of lobus parolfactorius; DLM, medial part of the dorsolateral nucleus \nof the thalamus; LMAN, lateral magnocellular nucleus of the anterior neostriatum. \n\n2  NEUROETHOLOGY OF  BIRDSONG \n\nAlthough songs from individual birds of the same species may sound quite similar, a young \nmale songbird learns to sing by imitating the song of a tutor, which is usually the father or \nanother adult male in the colony.  If a young bird does not hear a tutor song during a critical \nperiod,  it will sing short,  poorly structured songs,  and if a bird is deafened  in  the period \nwhen  it practices  vocalization,  it develops  highly  abnormal  songs.  These  observations \nindicate that there  are  two  phases  in  song  learning:  the  sensory  learning phase  when  a \nyoung  bird  memorizes  song  templates  and  the  motor  learning  phase  in  which  the  bird \nestablishes the motor programs using auditory feedback (Konishi, 1965).  These two phases \ncan be separated by  several months in some species, implying that birds have remarkable \ncapability for memorizing complex temporal  sequences.  Once a song  is  crystallized,  its \npattern is very stable.  Even deafening the bird has little immediate effect. \n\nThe  brain  nuclei  involved in  song  learning  are  shown  in  Figure  1.  The primary  motor \ncontrol pathway is composed ofUva, NIf, HVc, RA, DM, and nXllts.  If any of these nuclei \nis lesioned, a bird cannot sing normally.  Experimental studies suggest that HVc is involved \nin generating syllable sequences and that RA produces motor commands for each syllable \n(Vu  et aI.,  1994).  Interestingly,  neurons in  HVc,  RA and  nXllts show vigorous auditory \nresponses,  suggesting that the motor control system  is  closely coupled  with the auditory \nsystem (Nottebohm, 1991). \n\nThere is also  a  \"bypass\" from  HV c  to  RA which  consists of area  X,  DLM,  and  LMAN \ncalled the anterior forebrain pathway (Doupe, 1993).  This pathway is not directly involved \n\n\fA  Novel  Reinforcement of Birdsong  Vocalization  Learning \n\n103 \n\nauditory \n\nsyllable encoding \n\npreprocessing  sequence generation  motor pattern generation \n\nreinforcement \n\nattention \n\nmemory of tutor song \nnormalized evaluation \n\nsynaptic perturbation \n\ngradient estimate \n\nFigure  2:  Schematic  of primary  song control nuclei  and their proposed functions in  the \npresent model of bird song learning. \n\nin vocalization because lesions in these nuclei in adult birds do not impair their crystallized \nsongs.  However,  lesions in area X and LMAN during the motor learning phase result in \ncontrasting deficits.  The songs of LMAN-Iesioned birds crystallize prematurely, whereas \nthe songs of area X-Iesioned birds remain  variable (Scharff and Nottebohm,  1991).  It has \nbeen  suggested that this pathway  is responsible for the storage of song templates (Doupe \nand  Konishi,  1991) or guidance  of the  synaptic  connection  from  HVc  to RA (Mooney, \n1992). \n\n3  FUNCTIONAL NEUROANATOMY  OF  BIRDSONG \n\nThe song learning process can be decomposed into three stages.  In the first stage, suitable \ninternal  acoustic  representations  of syllables  and  syllable combinations are  constructed. \nThis \"auditory template\" can be assembled by unsupervised learning schemes like cluster(cid:173)\ning and principal components analysis.  The second stage involves the encoding of phonetic \nsequences using the internal representation. If the representation is sparse or nearly orthog(cid:173)\nonal,  sequential transition can  be easily  encoded by Hebbian learning.  The third stage is \nan  inverse mapping from the internal auditory representation into spatio-temporal patterns \nof motor commands.  This can be accomplished by exploration in the space of motor com(cid:173)\nmands using reinforcement learning.  The responses of the units that encode the acoustic \nprimitives can be used to the evaluate the resulting auditory signal and direct the exploration. \n\nHow are these three computational stages  organized  within the brain areas  and pathways \nof the songbird? Figure 2 gives an overview of our current working hypothesis. Auditory \ninputs are pre-processed in field L. Some higher-order representations, such as syllables and \nsyllable combinations, are established in HVc depending on the bird's auditory experience. \nMoreover,  transitions between syllables are encoded  in  the HVc network.  The sequential \nactivation of syllable coding units in  HVc  are transformed into the time course of motor \ncommands in RA. DM and nXIIts control breathing and the muscles in syrinx, bird's vocal \norgan. \n\n\f104 \n\nKenji Doya,  Terrence 1.  Sejnowski \n\na \n\nbronchus \n\nFigure 3:  (a) The syrinx of songbirds.  (b) The model syrinx. \n\nThe consequences  of selective lesions of areas in the anterior forebrain pathway (Scharff \nand Nottebohm, 1991) are consistent with the failures expected for a reinforcement learning \nsystem.  In particular, we suggest that this pathway serves the function of an adaptive critic \nwith stochastic search elements (Barto et al.,  1983).  We  propose that LMAN perturbs the \nsynaptic connections from HV c to RA and area X regulates LMAN by the song evaluation. \nModulation of HVc  to  RA connection  by  LMAN is biologically plausible since LMAN \ninput to RA is mediated mainly by NMDA type synapses, which can modulate the amplitude \nof mainly non-NMDA type synaptic input from HVc (Mooney, 1992). \n\nThe  assumption that area  X  provides  evaluation  is  supported by  the fact  that it receives \ncatecholaminergic projection (dopamine of norepinephrine) from a midbrain nucleus AVT \n(Lewis et al.,  1981).  These neurotransmitters are used  in many species for reinforcement \nor attention signals.  It is  known that auditory learning is enhanced  when  associated with \nvisual  or social  interaction  with the  tutor.  Area X  is  a  candidate region  where  auditory \ninputs from HVc are associated with reinforcing input from AVT during auditory learning. \n\n4  CONSTRUCTION  OF  SONG  LEARNING MODEL \n\nIn  order to test  the above  hypothesis,  we constructed  a  computer model  of the bird song \nlearning system.  The specific aim was to simulate the process of explorative motor learning, \nin  which the time course of motor command  for each  syllable is  determined by  auditory \ntemplate matching.  We assumed that orthogonal representations for syllables and their se(cid:173)\nquential activation were already established in HVc and that an auditory template matching \nmechanism exists in area X. \n\n4.1  The syrinx \n\nThe bird's syrinx is located near the junction of the trachea and the bronchi (Vicario, 1991). \nIts sound source  is  the tympani form membrane which faces  to the bronchus on  one side \nand the air sac on the other (Figure 3a).  When some of the syringeal muscles contract, the \nlumen of the bronchus is throttled and produces vibration in the membrane.  When stretched \nalong one dimension, the membrane produces harmonic sounds, but when  stretched along \ntwo dimensions, the sound contains non-harmonic components (Casey and Gaunt, 1985). \n\nAccordingly,  we  provided  two  sound  sources  for  the  model  syrinx  (Figure  3b).  The \nfundamental frequency of the harmonic component was controlled by the membrane tension \nin  one  direction  (Tl).  The  amplitu6e of the  noisy  component  was  proportional  to  the \n\n\fA  Novel  Reinforcement  of Birdsong  Vocalization  Learning \n\n105 \n\nEx \n\n!  ~Th \n! \n!  ~T' \u2022 \n~ \n\n\u2022 \n\nHVc \n\nRA \n\nT2 \n\nOM \nnXlIts \n\nFigure  4:  RA units with different spatio-temporal output profiles are driven  by locally(cid:173)\ncoded RVc units.  LMAN perturbs the weights W  between RVc and RA. The output units \nin DM and nXIIts drive the model syrinx. \n\nmembrane tension in an orthogonal direction (T2).  Mixture of these sounds went through a \nbandpass filter whose resonance frequency was controlled by the throttling of the bronchus \n(Th).  The overall sound amplitude was determined by the strength of expiration (Ex).  By \ncontrolling the time course of these four variables (Ex,Th,Tl,T2), the model could produce \nwide variety of \"bird-like\" chirps and warbles. \n\n4.2  Motor pattern generation in RA \n\nRA  is  topographically organized,  each  part  projecting to  different motoneuron  pools  in \nnXIIts (Vicario,  1991).  Also, RA neurons have complex temporal responses to the inputs \nfrom RVc (Mooney,  1992). Therefore, we assumed that RA consists of groups of neurons \nwith specific spatial and temporal output properties, as shown in Figure 4.  For each of the \nfour motor command variables, we provided several units with different temporal response \nkernels.  The sequential  activation  of syllable coding  units  in  RVc  drove  the  RA  units \nthrough the weights W.  Their responses were linearly combined and squashed between 0 \nand 1 to make the final  motor commands. \n\n4.3  Weight space search by LMAN and area X \n\nWith the above model of the motor output, the task is to find  a connection matrix W  that \nmaximizes a template matching measure.  One way for doing this is to perturb the output of \nthe units and correlate it with the input and the evaluation (Barto et aI., 1983).  An alternative \nway,  adopted here, is to perturb the weights and correlate them with the evaluation. \nWe  used  the  following  stochastic  gradient  ascent  algorithm.  The  weight  matrix  W  is \nmodulated by ~ W, given by the sum of the evaluation gradient estimate G and a random \ncomponent.  The modulated weight persists if the resulting vocalization is better than the \nrecent average evaluation E[r]. The evaluation gradient estimate G is updated by the sum \n\n\f106 \n\nKenji  Doya,  Terrence  J.  Sejnowski \n\nof the perturbations ~ W  weighted with the normalized evaluation. \n\n~ W  := G + random perturbation \n\nr  := evaluation of the song generated with W  + ~ W \nW:= W+~W  if r  > E[r] \n\nG := 0:  JVR\"  ~W + (1- o:)G, \n\nr  - E[r] \nV[r] \n\nwhere 0 < 0:  < 1 provides a form of \"momentum\" in weight space similar to that used in \nsupervised learning.  The average and the variance of evaluation are also estimated on-line \nas follows. \n\nE[r]  := o:r + (1 - 0: )E[r] \nV[r]  :=  o:(r - E[r]? + (1  - o:)V[r] . \n\n4.4  Spectrographic template matching \n\nWe  assumed  that evaluation for vocalization is  given separately  for each  of the syllables \nin  a  song.  The sound signal  was  analyzed  by  an  80 channel  spectrogram.  Each  output \nchannel was sent to an analog delay line similar to the gamma filter (de Vries and Principe, \n1992).  The snapshot image of this (80 channels)  x  (12 steps) delay line at the end of each \nsyllable was stored as the template.  The same delay line image of the syllable generated by \nthe model was compared with the template.  This allowed some compensation for variable \nsyllable duration.  The direction cosine between the two delay line images was used for the \nreinforcement signal, r. \n\n5  SIMULATION RESULTS \n\nOne phrase of a  recorded  zebra  finch  song  (Figure 5a)  was  the target.  Templates  were \nstored for the five syllables in the phrase. Five HVc units coded the five different syllables \nand 16 RA units represented the four output variables and four different temporal kernels. \nLearning  was  started  with  small  random  weights.  After  200 to  300 trials,  the  syllable \nevaluation  by  direction  cosine reached  0.9  (Figure 5d,  solid line).  The  syllables  of the \nlearned song resembled the overall frequency profiles of the original syllables. The complex \nspectrographic structure of the original syllables were, however, not accurate (Figure 5b). \n\nOne reason for this imperfect replication could be the difference between the vocal organs \nof the real  zebra finch  syrinx and our model syrinx.  In order to check the significance of \nthis difference,  we  took syllable templates from  the model  song  (Figure 5b)  and  trained \nanother model with random initial weights.  In  this case,  the direction cosine went up  to \n0.96 (Figure 5d, dotted line) and they sounded quite similar to human ears (Figure 5c). \n\nWe also checked the importance of the gradient estimate G  in our algorithm.  The dashed \n\nline in Figure 5d shows the performance of the model with G = 0: a simple random walk \n\nlearning.  The learning was  hopelessly slow and resembles  the deficit seen after lesion of \narea X (Figure 2). \n\n\fA  Novel  Reinforcement  of Birdsong  Vocalization  Learning \n\n107 \n\n1.0  ~-~--~--~-~---, \n\nd \n\n-\n\nzebra  finch JOng \nmodel song \n- - - - random walk \n\ntrilla \n\nFigure 5:  Spectrograms of (a) the original zebra finch song, (b) the learned song based on \nthe tutor in (a),  and  (c)  the second generation learned song based on the tutor in (b).  (d) \nLearning curves for two tutors:  a zebra finch song (solid line) and a model song (dotted line) \ncompared  with an  undirected search  in  weight space  (dashed  line).  Weight perturbation \nwas given by a Gaussian distribution with (J = 0.1.  The averaging parameter was a = 0.2. \nSimulating 500 trials took 30 minutes on Sparc Station 10. \n\n6  Discussion \n\nWe have assumed that each vocalized syllable was separately evaluated.  If the evaluation \nis  given  only  at  the  end  of one  song  or a  phrase,  learning  can  be  much  more difficult \nbecause of the temporal credit assignment problem. If we assume that birds take the easiest \nstrategy  available,  there should be syllable specific  evaluation  and  separate  perturbation \nmechanisms.  In some songbirds, individual syllables are practiced out of order at an early \nstage, and only later is the sequence matched to the auditory template. \n\nSelectivity of auditory responses in both HVc and area X develop during motor learning \n(Volman,  1993; Doupe,  1993).  We  can  expect  such change in response tuning in area X \nif the evaluations of syllables or syllable sequences are normalized with respect to  recent \naverage performance, as we assumed in our model. \n\nMany  simplifying  assumptions  were  made  in  the  present  model:  syllables  were  unary \ncoded in  HVc;  simple spectrographic template matching was used;  the number of motor \noutput variables  and  temporal  kernels  were  fairly  small;  and  the  sound  synthesizer was \nmuch simpler than a real syrinx.  However,  it is not difficult to replace these idealizations \nwith more biologically accurate models.  Since the number of learning trials needed in the \npresent model was much less than in the real birdsong learning (tens of thousands of trials), \nthere is margin for further elaboration. \n\n\f108 \n\nKenji  Doya,  Terrence  1.  Sejnowski \n\nAcknowledgments \n\nWe thank M. Lewicki for the zebra finch song data and M.  Konishi, A. Doupe, M. Lewicki, \nE.  Vu, D. Perkel and G.  Striedter for their helpful discussions. \n\nReferences \n\nBarto, A. G., Sutton, R. S., and Anderson, C. W. (1983). Neuronlike adaptive elements that \ncan solve difficult learning control problems.  IEEE Transactions on System,  Man,  and \nCybernetics, SMC-13:834-846. \n\nCasey, R.  M.  and Gaunt, A.  S.  (1985).  Theoretical models of the avian syrinx.  Journal of \n\nTheoretical Biology,  116:45-64. \n\nde  Vries,  B.  and  Principe,  J.  C.  (1992).  The gamma  model-A new  neural  model  for \n\ntemporal processing.  Neural Networks, 5:565-576. \n\nDoupe, A.  J.  (1993).  A  neural  circuit specialized  for vocal learning.  Current  Opinion in \n\nNeurobiology, 3:104-111. \n\nDoupe, A.  J.  and Konishi, M.  (1991).  Song-selective auditory circuits in the vocal control \nsystem of the zebra finch.  Proceedings  of the National Academy of Sciences,  USA, \n88: 11339-11343. \n\nHarris-Warrick, R.  M.,  Marder,  E.,  Selverston,  A.  I., and  Moulins,  M.  (1992).  Dynamic \nBiological Networks-The Stomatogastric Nervous  System.  MIT Press,  Cambridge, \nMA. \n\nKonishi,  M.  (1965).  fhe  role of auditory  feedback  in  the  control of vocalization  in  the \n\nwhite-crowned sparrow.  Zeitschrift fur Tierpsychologie, 22:770-783. \n\nLewis,  J.  W.,  Ryan,  S.  M.,  Arnold,  A.  P.,  and  Butcher,  L.  L.  (1981).  Evidence  for  a \ncatecholarninergic  projection  to  area  x  in  the  zebra  finch.  Journal  of Comparative \nNeurology,  196:347-354. \n\nMooney,  R.  (1992).  Synaptic  basis  of developmental  plasticity  in  a  birdsong  nucleus. \n\nJournal of Neuroscience,  12:2464-2477. \n\nNottebohm, F.  (1991).  Reassessing the mechanisms and origins of vocal learning in birds. \n\nTrends  in Neurosciences,  14:206-211. \n\nScharff,  C.  and  Nottebohm,  F.  (1991).  A  comparative  study  of the  behavioral  deficits \nfollowing lesions of various  parts of the zebra finch  song systems:  Implications for \nvocal learning.  Journal of Neuroscience,  11 :2896-2913. \n\nVicario,  D.  S.  (1991).  Neural  mechanisms  of vocal  production  in  songbirds.  Current \n\nOpinion in Neurobiology,  1 :595-600. \n\nVolman, S. F. (1993). Development of neural selectivity for birdsong during vocal learning. \n\nJournal of Neuroscience,  13:4737--4747. \n\nVu,  E.  T.,  Mazurek,  M.  E.,  and  Kuo,  Y.-C.  (1994).  Identification  of a  forebrain  motor \nprogramming network for the learned song of zebra finches.  Journal of Neuroscience, \n14:6924-6934. \n\n\f", "award": [], "sourceid": 888, "authors": [{"given_name": "Kenji", "family_name": "Doya", "institution": null}, {"given_name": "Terrence", "family_name": "Sejnowski", "institution": null}]}