{"title": "Multiple Paired Forward-Inverse Models for Human Motor Learning and Control", "book": "Advances in Neural Information Processing Systems", "page_first": 31, "page_last": 37, "abstract": null, "full_text": "Multiple  Paired  Forward-Inverse Models \nfor  Human  Motor Learning and  Control \n\nMasahiko  Haruno* \nmharuno@hip .atr.co.jp \n\nDaniel M.  Wolpert t \nwolpert@hera.ucl.ac.uk \n\nMitsuo Kawato* o \nkawato(Q)hip.atr.co.jp \n\n* ATR Human  Information Processing Research  Laboratories \n2-2  Hikaridai,  Seika-cho,  Soraku-gun, Kyoto 619-02, Japan. \n\ntSobell  Department of Neurophysiology, Institute of Neurology, \n\nQueen  Square,  London  WelN  3BG,  United  Kingdom . \n\u00b0 Dynamic  Brain  Project, ERATO,  JST,  Kyoto,  Japan. \n\nAbstract \n\nHumans  demonstrate  a  remarkable  ability  to  generate  accurate  and \nappropriate motor behavior under many different and oftpn  uncprtain \nenvironmental  conditions.  This  paper  describes  a  new  modular  ap(cid:173)\nproach to human motor learning and control, baspd on multiple pairs of \ninverse  (controller)  and forward  (prpdictor)  models.  This architecture \nsimultaneously learns the multiple inverse models necessary for control \nas well as how to select the inverse models appropriate for a given em'i(cid:173)\nronm0nt.  Simulations of object manipulation demonstrates the ability \nto learn  mUltiple  objects,  appropriate  generalization  to  novel  objects \nand  the  inappropriate  activation  of motor  programs  based  on  visual \ncues,  followed  by on-line  correction, seen  in  the  \"size-weight  illusion\". \n\n1 \n\nIntroduction \n\nGiven  the multitude of contexts within  which  we  must act, there are two  qualitatively \ndistinct strategies to motor control and learning.  The first  is  to uSP  a  Single  controller \nwhich  would  need  to  be  highly  complex  to  allow  for  all  possible  scenarios.  If this \ncontroller  were  unable  to  encapsulate  all  the  contexts  it  would  need  to  adapt  pvery \ntime the context of the movement  changed before it  could  produce appropriate motor \ncommands- -this  would  produce  transient  and  possibly  large  performancp errors.  Al(cid:173)\nternatively, a modular approach can be used in which multiple controllers co-exist, with \neach controller suitable for  onp or a  small set of contexts.  Such a  modular strategy' has \nbeen  introduced  in  the  \"mixture  of experts\"  architecture  for  supervised  learning  [6]. \nThis architecture comprises  a  set of expert networks  and a  gating network which  per(cid:173)\nforms  classification  by  combining  each  expert's  output.  These  networks  are  trained \nsimultaneously so that the gating network splits  the  input  spacp into regions  in  which \nparticular experts can specialize. \nTo apply such a modular strategy to motor control two problems must  bp  solved.  First \n\n\f32 \n\nM  Haruno,  D.  M  Wolpert  and M  Kawato \n\nhow  are  the  set  of  inverse  models  (controllers)  learned  to  cover  the  contexts  which \nmight  be  experienced \nthe  module  learning  problem.  Second,  given  a  set  of inverse \nmodules  (controllers)  how  are  the  correct  subset  selected  for  the  current  context -(cid:173)\nthe  module  selection  problem.  From  human  psychophysical  data we  know  that  such \na  selection  process  must  be  driven  by  two  distinct  processes;  feedforward  switching \nbased on  sensory  signals  such  as  the  perceived  size  of an object,  and  switching  based \non  feedback  of  the  outcome  of  a  movement .  For  example,  on  picking  up  a  object \nwhich  appears  heavy,  feedforward  switching  may  activate  controllers  responsible  for \ngenerating a  large motor impulse.  However,  feedback  processes,  based on contact with \nthe  object,  can  indicate  that  it  is  in  fact  light  thereby  switching  control  to  inverse \nmodels  appropriate for  a  light object. \n\nIn  the  coutext  of  motor  control  and  learning,  Gomi  and  Kawato  [4J  combined  the \nfeedback-error-learning  [7J  approach and  the  mixture of experts architecture to learn \nmultiple  inverse  models  for  different  manipulated  objects.  They  used  both  the  visual \nshapes of the manipulated objects and intrinsic signals, such as somatosensory feedback \nand efference copy of the motor command,  as  the  inputs  to the gating network.  Using \nthis architecture it was quite difficult  to acquire multiple inverse models.  This difficulty \narose because a single gating network needed to divide up, based solely on control error, \nthe  large  input  space  into  complex  regions.  Furthermore,  Gomi  and  Kawato's  model \ncould  not  demonstrate feedforward  controller selection prior to movement execution. \nHere we  describe a  model of human motor control which addresses these problems and \ncan  solve  the  module  learning  and  selection  problems  in  a  computationally  coherent \nmanner.  The  basic  idea of the model  is  that  the  brain  contains  multiple  pairs  (mod(cid:173)\nules) of forward (predictor) and inverse (controller) models  (~fPFIM) [10J.  Within each \nmodule,  the forward  and inverse models  are tightly  coupled  both during  their acquisi(cid:173)\ntion  and  use,  in  which  the forward  models  determine  the contribution  (responsibility) \nof each  inverse  model 's  output  to  the  final  motor  command.  This  architecture  can \nsimultaneously  learn  the  mult.iple  inverse  models  necessary  for  control  as  well  as  how \nto select  the inverse models appropriate for  a  given environment in  both a  feedforward \nand a  feedback  manner. \n\n2  Multiple paired  forward-inverse  models \n\nactual arm trajectory \n\ncontextual \n\nsIgnal \n\netlerence copy \n\n01  motor \ncommand \n\ndesIred arm \n\ntrajectory \n\n--- ---:- ----- : \n\n,  Feedback:  : \n:  controller  : .'..  . ~ . \n\n_ __ ___ !~~?~~~_r:'!'.t?~7?~':l_a_n_~ . _~':. \n\n. 1 \n\nFigure  1:  A  schematic  diagram  showing  how  MPFIM  architecture is  used  to  control \narm  movement  while  manipulating  different  objects.  Parenthesized  numbers  in  the \nfigure  relate to  the equations  in  the  text. \n\n\fMultiple Paired Forward-Inverse Modelsfor Human  Motor Learning and Control \n\n33 \n\n2.1  Motor  learning  and  feedback  selection \n\nFigure  1  illustrates  how  the  MPFIM  architecture  can  be  used  to  learn  and  control \narm movements when  the hand manipulates different  objects.  Central to the multiple \npaired  forward-inverse  model  is  the  notion  of dividing  up  experience  using  predictive \nforward  models.  We  consider n  undifferentiated forward  models which each receive the \ncurrent  state,  Xl,  and  motor  command,  Ut,  as  input.  The output  of the  ith  forward \nmodel  is  xl+!,  the  prediction of the next state at time t \n\n(1) \nwhere wI  are the parameters of a function approximator \u00a2  (e.g. neural network weights) \nused  to model the forward dynamics .  These predicted next states are compared to the \nactual  next  state  to  provide  the  responsibility  signal  which  represents  the  extent  to \nwhich each forward  model  presently accounts for  the behavior of the system.  Based on \nthe  prediction  errors of the  forward  models,  the  responsibility  signal,  AL  for  the  i-th \nforward-inverse  model  pair  (module)  is  calculated by  the soft-max function \n\n(2) \n\nwhere  X,  is  the  true  state  of  the  system  and  a  is  a  scaling  constant.  The  soft-max \ntransforms the errors using  the exponential function  and  then  normalizes  these values \nacross  the  modules, so  that  the responsibilities lie  between 0  and  1 and  sum  to  lover \nthe  modules.  Those forward models  which  capture the current behavior, and therefore \nproduce small  prediction errors,  will  have  high  responsibilities  1.  The  responsibilities \nare  then  used  to control  the  learning of the  forward  models  in  a  competitive manner, \nwith those models with high responsibilities receiving proportionally more of their error \nsignal  than  modules  \\vith  low  responsibility.  The competitive learning among forward \nmodels  is  similar in  spirit  to  \"annealed  competition of experts\"  architecture  [9]. \n\n'\" \n\ni \n\n. (XI  - X,  = f -d  ./1,  Xt  - Xt ) \n....JoW,  = f/l l -d \nd  d\u00a2z \nwi \n\ndil  \\/( \nwi \n\nAi) \n\nAi \n\n(3) \n\nFor  each  forward  model  there  is  a  paired  inverse  model  whose  inputs  are  the  desired \nnext  state  X;+I  and  the  current  state  Xt.  The  ith  inverse  model  produces  a  motor \ncommand ul  as output \n\ni  _  ,1,( \n\nUt  -\n\nZ \n\n'f/  at, x t+I '  Xt \n\n* \n\n) \n\n(4) \n\nwhere al  are the parameters of some function  approximator 'lb . \nThe total motor command is  the summation of the outputs from  these inverse models \nusing the responsibilities. A:, to weight  the contributions. \n\nn \n\n11 \n\nUt  = LA~U: = LA;t.b(a;,x;+l,xd \n\n(5) \n\ni=1 \n\n;=1 \n\nOnce again.  the responsibilities are  used  to  weight  the learning of each  inverse model. \nThis ensures  t hat  inverse  models  learns  only  when  their  paired  forward  models  make \naccurate  predictions.  Although  for  supervised  learning  the  desired  control  command \nu;  is  needed  (hut  is  generally  not  available),  we  can  approximate  (ui  - uD  with  the \nfeedback  motor command signal  u fb \n\n[7] . \n\nI Because  selecting  modules  can  be  regarded  as  a  hidden  state  estimation  problem ,  an \n\nalternative  way  to determine appropriate forward  models is  to  use  the E~1 algorithm  [3J. \n\n\f34 \n\nM.  Haruno,  D.  M.  Wolpert and M.  Kawato \n\n(6) \n\nIn summary, the responsibility signals are used in  three ways-\nfirst  to gate the learning \nof the forward  models  (Equation 3),  second  to gate the learning of the inverse models \n(Equation 6), and third to gate the contribution of the inverse models to the final  motor \ncommand  (Equation  5). \n\n2.2  Multiple responsibility predictors:  Feedforward  selection \n\nWhile  the  system  described  so  far  can  learn  mUltiple  controllers  and  switch  between \nthem based on  prediction errors, it  cannot provide switching before a  motor command \nhas been generated and the consequences of this action evaluated.  To allow the system \nto switch controllers based on contextual information,  we  introduce a  new  component, \nthe  responsibility  predictor  (RP).  The  input  to  this  module,  yt,  contains  contextual \nsensory information (Figure 1)  and each RP produces a  prediction of its own  module's \nresponsibility \n\n(7) \n\nThese estimated responsibilities can then be compared to the actual responsibilities A.~ \ngenerated from  the responsibility estimator.  These error signals are used to update the \nweights of the RP by  supervised learning. \nFinally  a  mechanism is  required  to combine  the  responsibility  estimates  derived  from \nthe  feed forward  RP  and  from  the  forward  models'  prediction  errors  derived  from \nfeedback.  We  determine  the  final  value  of  responsibility  by  using  Bayes  rule;  mul-\ntiplying  the transformed  feedback  errors e- lx,-5;;1 2/O'2 by  the feed forward  responsibil(cid:173)\nity  ~;  and  then  normalizing  across  the  modules  within  the  responsibility  estimator: \n~ ie-IXt -5;; 12/20'2/ \",n  ~j e-Ixt -5;{1 2 /2 0'2 \n\nt \n\n~)=l  t \n\nThe estimates  of the  responsibilities  produced  by  the  RP  can  be  considered  as  prior \nprobabilities because they are computed before the movement execution based only on \nextrinsic signals  and  do  not  rely on  knowing  the  consequences  of the action.  Once an \naction takes place, the forward models' errors can be calculated and this can be thought \nof as  the  likelihood after  the  movement execution  based  on  knowledge  of the  result  of \nthe movement.  The final  responsibility which is the product of the prior and likelihood, \nnormalized across  the modules,  represents the posterior probability.  Adaptation of the \nRP ensures  that the prior probability becomes closer to the posterior probability. \n\n3  Simulation of arm tracking while  manipulating objects \n\n3.1  Learning and  control of different  objects \n\n~I  M \n\nK \n\na \n(J \nJ \n\n5.0 \n8.0 \n2.0 \n\n7.0 \n3.0 \n10.0 \n\n4.0 \n1.0 \n1.0 \n\nFigure 2:  Schematic illustration of the simulation experiment in  which  the arm makes \nreaching  movements  while  grasping  different  objects  with  mass  M,  damping  Band \nspring  K.  The object properties are shown  in  the Table. \n\n\fMultiple Paired Forward-Inverse Models for Human Motor Learning and Control \n\n35 \n\nTo  examine  motor  learning  and  control  we  simulated  a  task  in  which  the  hand  had \nto track a  given  trajectory  (30  s shown  in  Fig.  3  (b)),  while  holding  different  objects \n(Figure 2).  The manipulated object was  periodically switched every 5 s between three \ndifferent  objects  Ct,  {3  and  'Y  in  this  order.  The  physical  characteristics  of these  ob(cid:173)\njects  are  shown  in  Figure  2.  The  task  was  exactly  the  same  as  that  of  Gomi  and \nKawato  [4],  and simulates recent  grip force-load  force  coupling experiments  by  Flana(cid:173)\ngan and Wing  [2]. \nIn the first simulation, three forward-inverse model pairs (modules) were used:  the same \nnumber of modules  as  the number of objects.  We  assumed  the existence of a  perfect \ninverse  dynamic  model  of  the  arm  for  the  control  of reachiilg  movements.  In  each \nmodule,  both forward  (\u00a2 in  (1))  and inverse  ('IjJ  in  (4))  models were implemented as  a \nlinear neural network2 .  The use of linear networks allowed M, Band K  to be estimated \nfrom  the  forward  and  inverse  model  weights.  Let  MJ ,Bf ,Kf  be  the estimates  from \nthe jth forward  model  and Mj,B},Kj  be the estimates from  the jth inverse model. \nFigure  3(a)  shows  the  evolution  of  the  forward  model  estimates  of  MJ ,Bf ,Kf  for \nthe three modules  during learning.  During learning the desired  trajectory  (Fig.  3(b)) \nwas  repeated  200  times.  The  three  modules  started  from  randomly  selected  initial \nconditions  (open  arrows)  and  converged  to  very  good  approximations  of  the  three \nobjects  (filled  arrows)  as  shown  in  Table  1.  Each of the three  modules  converged  to \nCt,  {3  and 'Y  objects,  respectively.  It is  interesting to note that all  the estimates of the \nforward  models  are  superior  to  those  of inverse  models.  This  is  because  the  inverse \nmodel learning depends  on  how  modules are switched  by  the forward  models . \n\n... -J \n\n, . \n\n(a) \n\nFigure 3:  (a)  Learning acquisition of three pairs of forward  and inverse models  corre(cid:173)\nsponding  to three objects.  (b)  Responsibility  signals  from  the  three  modules  (top  3) \nand  tracking  performance  (bottom)  at  the  beginning  (left)  and  at  the  end  (right)  of \nlearning. \n\n2 \n3 \n\n5.0071 \n8.0029 \n\n7.0040 \n3.0010 \n\n4.0000 \n0.9999 \n\n5.0102 \n7.8675 \n\n6.9554 \n3.0467 \n\n4.0089 \n0.9527 \n\nTable 1:  Learned object characteristics \n\nFigure 3(b) shows the performance of the model at the beginning (left)  and end (right) \nof learning.  The top 3 panels show the responsibility signals of Ct,  {3  and 'Y  modules  in \n\n2 Any  kind of architecture can be adopted instead of linear networks \n\n\f36 \n\nM  Hanlno,  D.  M  Wolpert and M  Kawato \n\nthis order, and the bottom panel shows  the hand's actual and desired  trajectories.  At \nthe start of learning,  the three modules  were equally poor and thus generated  almost \nequal  responsibilities  (1/3)  and  were  involved  in  control  almost  equally.  As  a  result, \nthe overall control performance was  poor with large trajectory errors.  However, at the \nend of learning,  the  three modules  switched  almost  perfectly  (only  three noisy  spikes \nwere  observed  in  the  top  3  panels  on  the  right),  and  no  trajectory  error  was  visible \nat  this  resolution  in  the bottom  panel.  If we  compare these  results  with  Figure  7 of \nGomi  and  Kawato  [4]  for  the same task,  the superiority  of the  MPFIM compared  to \nthe gating-expert  architecture is  apparent.  Note  that the number of free  parameters \n(synaptic weights) is smaller in the current architecture than the other.  The difference \nin performance comes  from  two features  of the basic architecture.  First, in the gating \narchitecture a single gating network tries to divide the space while many forward models \nsplits the space in MPFIM. Second, in the gating architecture only a single control error \nis  used  to divide the space,  but mUltiple  prediction errors are simultaneously  utilized \nin MPFIM. \n\n3.2  Generalization to a  novel object \n\nA  natural question  regarding  MPFIM  architecture  is  how  many  modules  need  to  be \nused.  In  other words,  what  happens  if the number of objects exceeds  the  number  of \nmodules  or  an  already  trained  MPFIM  is  presented  with  an  unfamiliar  object.  To \nexamine  this,  the  MPFIM  trained  from  4  objects  a,(3\"  and  <5  was  presented  with  a \nnovel  object 'fJ  (its  (M, B, K) is  (2.02,3.23,4.47)).  Because the object dynamics  can  be \nrepresented  in  a  3-dimensional  parameter space  and  the 4  modules  already acquired \ndefine  4  vertices  of a  tetrahedron  within  the  3-D  space,  arbitrary  object  dynamics \ncontained  within  the  tetrahedron  can  be decomposed  into a  weighted  average  of the \nexisting  4  forward  modules  (internal  division  point  of the  4  vertices).  The  theoreti(cid:173)\ncally  calculated  weights  of 'fJ  were  (0.15,0.20,0.35,0.30).  Interestingly,  each  module's \nresponsibility  signal  averaged over  trajectory was  (0.14,0.24,0.37,0.26).  Although  the \nresponsibility  was  computed  in  the space of accelerations  prediction by  soft-max  and \nhad no direct  relation to the space of (M, B, K), the two vectors had very similar val(cid:173)\nues.  This demonstrates the flexibility of MPFIM architecture which originates from its \nprobabilistic soft-switching mechanism.  This is in sharp contrast to the hard switching \nof Narendra [8]  for  which  only one controller can be selected at a  time. \n\n3.3  Feedforward selection and the size-weight  illusion \n\nFigure  4:  Responsibility  predictions  based  on  contextual  information  of 2-D  object \nshapes  (top  3  traces)  and corresponding acceleration  error of control  induced  by  the \nillusion  (bottom trace) \n\nIn  this  section,  we  simulated  prior  selection  of inverse  models  by  responsibility  pre(cid:173)\ndictors based on contextual information, and reproduce the size-weight  illusion.  Each \nobject was associated with a 2-D shape represented as a 3x3 binary matrix, which was \nrandomly  placed at one of four  possible locations  on  a  4x4 retinal matrix  (see  Gomi \n\n\fMultiple Paired Forward-Inverse Models for Human  Motor Learning and Control \n\n37 \n\nand  Kawato  for  more  details).  The  retinal  matrix  was  used  as  the  contextual  input \nto the RP  (3-layer sigmoidal  feedforward  network).  During the course of learning, the \ncombination  of  manipulated  objects  and  visual  cues  were  fixed  as  A-a,  B-,B  and  C(cid:173)\n-y.  After  200  iterations  of the  trajectory,  the  combination  A--y  was  presented  for  the \nfirst.  Figure 4 plots  the  responsibility  signals of the three  modules  (top  3 traces)  and \ncorresponding acceleration error of the control induced  by  the illusion  (bottom trace). \nThe result  replicates  the size-weight illusion  [1,  5]  seen  in  the erroneous responsibility \nprediction  of the  a  responsibility  predictor  based  on  the  contextual  signal  A  and  its \ncorrection  by  the  responsibility  signal  calculated  by  the  forward  models.  Until  the \nonset of movement  (time 0) , A  was  always  associated  with  light  Ct,  and C  was  always \nassociated with heavy -y.  Prior to movement when A was associated with -y,  the a  mod(cid:173)\nule was switched on by  the visual contextual information, but soon after the movement \nwas initiated, the responsibility signal from  the forward model's prediction dominated, \nand the -y  module was  properly selected.  Furthermore, after a while, the responsibility \npredictor of the modules  were  re-Iearned  to capture this  new  association  between  the \nobjects visual shape and its dynamics. \nIn conclusion, the MPFIM model of human motor learning and control, like the human \nmotor system, can learn multiple tasks, shows generalization to new tasks and an ability \nto switch  between  tasks  appropriately. \n\nAcknowledgments \nWe  thank  Zoubin  Ghahramani for  helpful  discussions on  the  Bayesian formulation  of \nthis  model.  Partially supported by  Special  Coordination Funds for  promoting Science \nand  Technology  at  the  Science  and  Technology  Agency  of Japanese govenmnent, and \nby  HFSP grant. \n\nReferences \n[1]  E.  Brenner, B.  Jeroen, and J . Smeets.  Size  illusion  influences  how  we  lift  but not \n\nhow  we  grasp an object.  Exp  Brain  Res,  111:473- 476, 1996. \n\n[2]  J.R.  Flanagan and  A. Wing.  The role of internal models  in  motion  planning and \ncontrol:  Evidence  from  grip  force  adjustments  during  movements  of hand-held \nloads.  J Neurosci, 17(4):1519- 1528, 1997. \n\n[3]  A.M.  Fraser and A. Dimitriadis.  Forecasting probability densities by using hidden \nMarkov models with mixed states. In A.S. Wiegand and N.A. Gershenfeld, editors, \nTime  series  prediction:  Forecasting  the  future  and  understanding  the  past,  pages \n265-282.  Addison-Wesley, 1993. \n\n[4]  H.  Gomi  and  M.  Kawato.  Recognition  of manipulated objects by  motor learning \n\nwith  modular architecture networks.  Neural  Networks, 6:485- 497,  1993. \n\n[5]  A.  Gordon,  H.  Forssberg,  R.  Johansson ,  and  G.  Westling.  Visual  size  cues  in \nth~ I>rogramming  of manipulative  forces  during  precision  grip.  Exp  Brain  Res, \n83.477-482, 1991. \n\n[6]  R.  Jacobs,  M.  Jordan,  S.  Nowlan,  and  G.  Hinton.  Adaptive  mixture  of  local \n\nexperts.  Neural  Computation, 3:79- 87,  1991. \n\n[7]  M.  Kawato .  Feedback-error-Iearning neural  network  for  supervised  learning.  In \nR.  Eckmiller, editor,  Advanced  neural  computers,  pages 365- 372.  North-Holland, \n1990. \n\n[8]  K.  Narendra and J.  Balakrishnan. Adaptive control using multiple models.  IEEE \n\nTransaction  on  Automatic  Control, 42(2):171  -187, 1997. \n\n[9]  K.  Pawelzik,  J.  Kohlmorgen,  and  K.  Muller.  Annealed  competition  of  experts \nf<?r  a segmentation and classification of switching dynamics.  Neural  Computation, \n8.340- 356, 1996. \n\n[10]  D.M.  \\Volpert  and  M.  Kawato.  Multiple  paired  forward  and  inverse  models  for \n\nmotor control.  Neural  Networks,  11:1317- 1329, 1998. \n\n\f", "award": [], "sourceid": 1585, "authors": [{"given_name": "Masahiko", "family_name": "Haruno", "institution": null}, {"given_name": "Daniel", "family_name": "Wolpert", "institution": null}, {"given_name": "Mitsuo", "family_name": "Kawato", "institution": null}]}