{"title": "A Neural Network for Motion Detection of Drift-Balanced Stimuli", "book": "Advances in Neural Information Processing Systems", "page_first": 714, "page_last": 721, "abstract": null, "full_text": "A  Neural  Network for  Motion  Detection of \n\nDrift-Balanced  Stimuli \n\nHilary  Tunley* \n\nSchool of Cognitive and  Computer Sciences \n\nSussex  University \nBrighton,  England. \n\nAbstract \n\nThis  paper  briefly  describes  an  artificial  neural  network  for  preattentive \nvisual processing.  The network is  capable of determiuing image motioll in \na type of stimulus which defeats most popular methods of motion detect.ion \n- a  subset  of second-order  visual  motion stimuli known  as  drift-balanced \nstimuli(DBS). The processing st.ages of the network described in this paper \nare  integratable into  a  model  capable  of simultaneous motion extractioll. \nedge  detection,  and  the determination of occlusion. \n\n1 \n\nINTRODUCTION \n\nPrevious  methods  of  motion  detection  have  generally  been  based  on  one  of \ntwo  underlying  approaches:  correlation;  and  gradient-filter.  Probably  the  best \nknown  example  of  the  correlation  approach  is  th(!  Reichardt  movement  detEctor \n[Reiehardt 1961].  The  gradient-filter (GF) approach underlies  the  work of AdElson \nand  Bergen  [Adelson  1985],  and Heeger  [Heeger  L9H8],  amongst others. \nThese  motion-detecting  methods  eannot  track  DBS,  because  DBS  Jack  essential \ncomponellts  of  information  needed  by  such  methods.  Both  the  correlation  and \nGF  approaches  impose  constraints  on  the  input  stimuli.  Throughout  the  image \nsequence,  correlation  methods  require  information  that  is  spatiotemporally corre(cid:173)\nlatable;  and  GF  motion  detectors  assume  temporally  constant  spatial  gradi,'nts. \n\n\"Current  address:  Experimental  Psychology,  School  of  Biological  Sciences,  Sussex \n\nUniversity. \n\n714 \n\n\fA Neural  Network for  Motion  Detection of Drift-Balanced Stimuli \n\n715 \n\nThe  network discussed  here  does  not impose such  constraints.  Instead,  it  extracts \nmotion  energy  and  exploits  the  spatial  coherence  of movement  (defined  more for(cid:173)\nmally in  the  Gestalt  theory  of common fait  [Koffka 1935])  to  achieve tracking. \n\nThe remainder of this paper discusses  DBS  image sequences,  then correlation meth(cid:173)\nods,  then  GF  methods in  more  detail,  followed  by  a  qualitative description  of this \nnetwork which  can process  DBS. \n\n2  SECOND-ORDER AND  DRIFT-BALANCED STIMULI \n\nThere  has  been  a  lot  of recent  interest  in  second-order  visual stimuli, and  DBS  in \nparticular  ([Chubb  1989,  Landy  1991]).  DBS  are stimuli which give a  clear  percept \nof directional motion, yet Fourier analysis reveals  a  lack of coherent motion energy, \nor energy  present  in  a  direction opposing that of the displacement  (hence  the  term \n'drift-balanced ').  Examples of DBS  include  image sequences  in  which  the  contrast \npolarity of edges  present  reverses  between frames. \n\nA  subset  of DBS,  which  are  also  processpd  by  the  network,  are  known  as  micro(cid:173)\nbalanced  stimuli  (MBS).  MBS  cont,ain  no  correlatable  features  and  are  drift(cid:173)\nbalanced  at  all  scales.  The  MBS  image sequences  used  for  this  work  were  created \nfrom  a  random-dot  image  in  which  an  area  is  successively  shifted  by  a  constant \ndisplacement between each frame  and  sim ultaneously re-randomised. \n\n3  EXISTING  METHODS  OF  MOTION  DETECTION \n\n3.1  CORRELATION METHODS \n\nCorrelation methods perform a local cross-correlation in  image space:  the matching \nof features  in  local neighbourhoods  (depending  upon displacement/speed)  between \nimage  frames  underlies  the  motion  detection.  Examples  of  this  method  include \n[Van  Santen  1985J.  Most  correlation  models  suffer  from  noise  degradation  in  that \nany noise features  extracted  by the edge  detection  are  available for  spurious  corre(cid:173)\nlation. \n\nThere  has been  much recent  debate questioning the validity of correlation methods \nfor  modelling human motion  detection  abilit.ies.  In  addition  to  DBS,  there  is  also \nincreasing  psychophysical evidence  ([Landy  1991,  Mather  1991])  which  correlation \nmethods cannot  account for. \n\nThese factors  suggest  that correlation  techniques  are  not suitable for  low-level mo(cid:173)\ntion  processing  where  no  information  is  available  concerning  what  is  moving  (as \nwith  MBS).  However,  correlation  is  a  more  plausible  method  when  working  with \nhigher  level  constructs  such  as  tracking in  model-based vision  (e.g.  [Bray  1990]), \n\n3.2  GRADIENT-FILTER (GF)  METHODS \n\nGF methods  use  a  combination of spatial filtering  to determine edge  positions and \ntemporal filtering to determine whether such edges are moving.  A common assump(cid:173)\ntion used  by G F  methods is  that spatial gradients are  constant.  A recent method by \nVerri  [Verri  1990],  for  example, argu es  that flow  det.ection  is  based  upon  the notion \n\n\f716 \n\nTunley \n\n-\n\nModel \n\n\u2022 \u2022  \n\n\u2022 \n\n\u2022 \u2022   \u2022 \n\n\u2022 \u2022 \u2022 \u2022 \u2022 \u2022  ~ . \u2022 \u2022 \u2022 \u2022 \n\n\u2022 \u2022  \n\n\u2022 \n\n\u2022  T  \u2022 \u2022  \n\nR: \n\nM: \n\n0: \n\nE: \n\nReceptor  UnIts  - Detect  temporal \nchanges  In  IMage  intensit~ \n(polarIty-independent) \n\nMotion  Units  - Detect \ndistribution  of  change \niniorMtlon \n\nOcclusIon  Units  - Detect \nchanges  In  .otlon \ndIstribution \n\nEdge  Units  - Detect  edges \ndlrectl~ from  occluslon \n\nFigure  1:  The  Network (Schematic) \n\nof tracking spatial gradient magnitude and/or  direction,  and  that  any  variation in \nthe  spatial  gradient  is  due  to  some  form  of  motion  deformation  - i.e.  rotation, \nexpansion  or  shear.  Whilst  for  scenes  containing  smooth  surfaces  this  is  a  valid \napproximation, it is  not the  case  for  second-order stimuli such as  DBS. \n\n4  THE NETWORK \n\nA  simplified  diagram  illustrating  the  basic  structure  of the  network  (based  upon \nearlier  work  ([Tunley  1990,  Tunley  1991a,  Tunley  1991b])  is  shown  in  Figure  1 \n( the  edge  detection  stage  is  discussed  elsewhere  ([Tunley  1990,  Tunley 1991 b, \nTunley  1992]). \n\n4.1 \n\nINPUT  RECEPTOR UNITS \n\nThe  units  in  the  input  layer  respond  to  rectified  local  changes  in  image  intensity \nover  time.  Each unit has  a variable adaption  rate,  resulting  in  temporal sensitivity \n- a fast  adaption rate gives a high temporal filtering rate.  The main advantages for \nthis  temporal averaging processing  are: \n\n\u2022  Averaging  removes  the  D.C.  component  of  image  intensity.  This  elimi(cid:173)\n\nnates  problematic  gain  for  motion  in  high  brightness  areas  of  the  image. \n[Heeger  1988] . \n\n\u2022  The random nature  of DBS/MBS generation cannot guarantee that each pixel \nchange  is  due  to  local  image  motion.  Local  temporal  averaging smooths  the \n\n\fA Neural  Network for  Motion Detection of Drift-Balanced Stimuli \n\n717 \n\nmoving regions, thus creating a more coherently structured input for the motion \nunits. \n\nThe input units have a  pointwise rectifying response  governed by an  autoregressive \nfilter  of the following form: \n\nwhere  a  E  [0,1]  is  a  variable which  controls the  degree  of temporal filtering  of the \nchange in  input intensity, nand n  - 1 are successive  image frames,  and  Rn  and  In \nare  the filter  output  and  input, respectively. \n\nThe receptor  unit responses  for  two different  a  values are shown in  Figure  2.  C\\'  can \nthus  be  used  to  alter  the  amount  of motion  blur  produced  for  a  particular  frame \nrate,  effectively  producing a  unit with differing  velocity sensitivity. \n\n(1 ) \n\n( a) \n\n(b) \n\nFigure  2:  Receptor  Unit  Response:  (a)  a  =  0.3;  (b)  a  =  0.7. \n\n4.2  MOTION  UNITS \n\nThese  units  determine  the  coherence of image  changes  indicated  by  corresponding \nreceptor  units.  First-order  motion  produces  highly-tuned  motion  activity - i.e.  a \nstrong response  in a particular direction - whilst second-order  motion results in  less \ncoherent  output. \n\nThe operation of a  basic motion detector  can  be described  by: \n\nw here  !vI  is  the  detector,  (if, j') is  a  point  in  frame  n  at  a  distance  d  from  (i, j), \na  point  in  frame  n  - 1,  in  the  direction  k.  Therefore,  for  coherent  motion  (i.e. \nfirst-order),  in  direction  k  at  a  speed  of d units/frame, as  n  ---- 00: \n\n(2) \n\n(3) \n\n\f718 \n\nTunley \n\nThe  convergence  of motion  activity  can  be  seen  using  an  example.  The  stimulus \nsequence  used  consists  of a  bar  of re-randomising  texture  moving  to  the  right  in \nfront  of a  leftward  moving background  with  the  same  texture  (i.e.  random  dots). \nThe  bar motion  is  second-order  as  it  contains  no  correlatable  features,  whilst  the \nbackground  consists  of a  simple  first-order  shifting of dots  between  frames.  Fig(cid:173)\nures  3,  4 and 5 show two-dimensional images of the leftward motion activity for  the \nstimulus after  3,4 and 6 frames  respectively.  The background,  which  has  coherent \nleftward  movement  (at  speed  d  units/frame)  is  gradually  reducing  to  zero  whilst \nthe microbalanced rightwards-moving bar, remains active.  The fact  that a non-zero \nresponse  is  obtained  for  second-order  motion suggests,  according  to  the  definition \nof Chubb and Sperling [Chubb  1989],  that first-order  detectors produce no response \nto  MBS,  that this detector  is  second-order  with  regard  to motion detection. \n\nFigure  3:  Leftward  Motion Response  to Third Frame in Sequence. \n\nHfOL(tlyllmh ~ .4) \n\n.. ' \n\nFigure 4:  Leftward  Motion  Response  to  Fourth Frame. \n\nHf Ol (llyrlnh ~. 6) \n\nFigure  5:  Leftward Motion  Response  to Sixth  Frame. \n\nThe  motion  units  in  this  model  are  arranged  on  a  hexagonal  grid.  This  grid  is \nknown  as  a flow  web  as  it  allows information to flow,  both laterally between  units \nof the  same type,  and  between  the  different  units in  the  model (motion,  occlusion \nor  edge).  Each  flow  web  unit  is  represented  by  three  variables  - a  position  (a, b) \nand  a direction  k,  which is  evenly spaced  between 0 and  360  degrees.  In  this model \neach  k  is  an  integer between  1 and  kmax  -\nthe  value of kmax  can  be  varied  to vary \nthe sensitivity of the  units. \n\nA  way  of  using  first-order  techniques  to  discriminate  between  first  and  second(cid:173)\norder  motions  is  through  the  concept  of coherence.  At  any  point  in  the  motion(cid:173)\nprocessed  images in Figures 3-5, a measure of the overall variation in motion activity \ncan  be  used  to  distinguish  between  the  motion of the  micro-balanced  bar  and  its \nbackground.  The motion energy for  a detector with displacement d,  and orientation \n\n\fA Neural Network for  Motion Detection of Drift-Balanced Stimuli \n\n719 \n\nk,  at position (a, b),  can be represented  by Eabkd.  For each motion unit, responding \nover  distance  d,  in each cluster  the energy  present  can  be  defined  as: \n\nE \n\n_  mink(Mabkd) \n\nabkdn  -\n\nAI \n\nabkd \n\n(4) \n\nwhere  mink(xk) is  the minimum value of x  found searching over k  values.  If motion \nis  coherent,  and  of approximately  the  correct  speed  for  the  detector  M,  then  as \nn  -+ 00: \n\n(5) \n\nwhere  km  is  in the  actual direction  of the motion.  In reality n  need  only  approach \naround 5 for  convergence to occur.  Also,  more importantly, under the same conver(cid:173)\ngence  conditions: \n\n(6) \n\nThis is  due  to the  fact  that  the minimum activation value  in  a  group of first-order \ndetectors  at  point  (a, b)  will  be  the  same  as  the  actual  value in  the  direction,  km . \nBy  similar reasoning,  for  non-coherent  motion as  n  -+ 00: \n\nEabkdn  -\n\n1 'Vk \n\n(7) \n\nin  other  words  there  is  no  peak of activity in  a  given direction .  The motion energy \nis  ambiguous at  a  large  number of points in  most images, except  at discontinuities \nand  on  well-textured surfaces. \n\nA  measure of motion coherence  used  for  the  motion units can  now  be  defined  as: \n\nMc( abkd) = \n\n. Eabkd \n\",\", k max  E \nL...k=l \n\nabkd \n\nFor  coherent  motion in  direction  km  as  n  -+ 00: \n\nWhilst for  second-order  motion, also  as  n  -\n\n00: \n\n(8) \n\n(9) \n\n(10) \n\nUsing this approach the total Me  activity at each position - regardless of coherence, \nor lack of it - is unity.  Motion energy is the same in all moving regions,  the difference \nis  in  the  distribution,  or  tuning of that energy. \n\nFigures  6,  7  and  8  show  how  motion  coherence  allows  the  flow  web  structure  to \nreveal  the  presence  of motion in  microbalanced areas  whilst  not  affecting the easily \ndetected  background motion for  the stimulus. \n\n\f720 \n\nTunley \n\nFigure  6:  Motion  Coherence  Response  to Third  Frame \n\nFigure  7:  Motion  Coherence  Response  to Fourth  Frame \n\nFigure  8:  Motion  Coherence  Response  to Sixth  Frame \n\n4.3  OCCLUSION  UNITS \n\nThese  units  identify  discontinuities  in  second-order  motion  which  are  vitally  im(cid:173)\nportant when  computing the  direction of that motion . They  determine spatial and \ntemporal changes in motion coherence  and can process single or multiple motions at \neach  image  point .  Established  and  newly-activated occlusion  units  work,  through \na  gating process,  to  enhance  continuously-displacing surfaces,  utilising the  concept \nof visual inertia. \n\nThe implementation details  of the  occlusion stage  of this model  are  discussed  else(cid:173)\nwhere  [Tunley  1992], but some output from the occlusion units to the above second(cid:173)\norder  stimulus are  shown  in  Figures  9  and  10.  The  figures  show  how  the  edges  of \nthe  bar can  be  determined. \n\nReferences \n\n[Adelson  1985) \n\n[Bray  1990) \n\n[Chubb  1989) \n\nE.H. Adelson and J .R. Bergen . Spatiotemporal energy models for \nthe perception  of motion.  J.  Opt.  Soc.  Am. 2,  1985. \nA.J .  Bray.  Tracking  objects  using  image  disparities.  Image  and \nVision  Computin,q,  8,  1990. \nC.  Chubb  and  G.  Sperling.  Second-order  motion  perception: \nSpace/time separable mechanisms. In  Proc.  Workshop  on  Visual \nMotion,  Irvine,  CA ,  USA,  1989. \n\n\fA Neural Network for  Motion  Detection of Drift-Balanced Stimuli \n\n721 \n\nFigure  9:  Occluding  Motion  Information:  Occlusion  activity  produced  by  an  in(cid:173)\ncrease  in motion coherence  activity. \n\nO( IlynnlJsl . 1\") \n\nFigure  10:  Occluding  Motion  Information:  Occlusion  activity  produced  by  a  de(cid:173)\ncrease  in  motion activity at a  point.  Some spurious activity is  produced  due  to the \nrandom nature  of the  second-order  motion information. \n\n[Heeger  1988] \n\n[Koffka 1935] \n\n[Landy  1991] \n\n[Mather  1991] \n[Reichardt  1961] \n\nD.J.  Heeger.  Optical  Flow  using  spatiotemporal  filters.  Int.  J. \nCamp.  Vision,  1,  1988. \nK.  Koffka.  Principles  of  Gestalt  Psychology.  Harcourt  Brace, \n1935. \nM.S.  Landy,  B.A.  Dosher,  G.  Sperling  and  M.E.  Perkins.  The \nkinetic  depth  effect  and  optic  flow  II:  First- and  second-order \nmotion.  Vis.  Res.  31,  1991. \nG.  Mather.  Personal Communication. \nW.  Reichardt.  Autocorrelation,  a  principle for  the  evaluation of \nsensory  information by the central nervous system. In W.  Rosen-\nblith, editor,  Sensory  Communications.  Wiley  NY,  1961. \n\n[Van  Santen 1985]  J .P.H.  Van  Santen  and  G.  Sperling.  Elaborated  Reichardt  de(cid:173)\n\n[Tunley  1990] \n\n[Tunley  1991a] \n\n[Tunley  1991b] \n\n[Tunley  1992] \n\n[Verri  1990] \n\ntectors.  J.  Opt.  Soc.  Am. 2,  1985. \nH.  Tunley. Segmenting Moving Images. In  Proc.  Int.  Neural Net(cid:173)\nwork  Conf  (INN C9 0) ,  Paris,  France,  1990. \nH.  Tunley. Distributed  dynamic processing for  edge  detection.  In \nProc.  British  Machine  Vision  Conf  (BMVC91),  Glasgow,  Scot(cid:173)\nland,  1991. \nH.  Tunley.  Dynamic segmentation and  optic flow  extraction. In. \nProc.  Int.  Joint.  Conf  Neural  Networks  (IJCNN91) ,  Seattle, \nUSA,  1991. \nH.  Tunley.  Sceond-order  motion  processing:  A  distributed  ap(cid:173)\nproach. CSRP 211, School of Cognitive and Computing Sciences, \nUniversity of Sussex  (forthcoming). \nA.  Verri,  F.  Girosi  and  V.  Torre. Differential techniques for  optic \nflow.  J.  Opt.  Soc.  Am. 7,  1990. \n\n\f", "award": [], "sourceid": 561, "authors": [{"given_name": "Hilary", "family_name": "Tunley", "institution": null}]}