{"title": "Keeping Flexible Active Contours on Track using Metropolis Updates", "book": "Advances in Neural Information Processing Systems", "page_first": 859, "page_last": 865, "abstract": null, "full_text": "Keeping flexible active contours on track using \n\nMetropolis updates \n\nTrausti T. Kristjansson \nUniversity of Waterloo \n\ntt kr i s tj @uwate r l oo . ca \n\nBrendan J. Frey \n\nUniversity of Waterloo \nf r ey@uwate r l oo . ca \n\nAbstract \n\nCondensation, a form of likelihood-weighted particle filtering,  has been \nsuccessfully used to infer the shapes of highly constrained \"active\" con(cid:173)\ntours in video sequences.  However, when the contours are highly flexible \n(e.g.  for tracking fingers of a hand), a computationally burdensome num(cid:173)\nber of particles is needed to successfully approximate the contour distri(cid:173)\nbution.  We  show how  the Metropolis algorithm can be used to update a \nparticle set representing a distribution  over contours  at each  frame in a \nvideo sequence.  We compare this method to condensation using a video \nsequence that requires highly  flexible  contours,  and  show  that the  new \nalgorithm performs dramatically better that the condensation algorithm. \nWe  discuss  the  incorporation of this  method  into  the  \"active  contour\" \nframework where a shape-subspace is  used constrain shape variation. \n\n1  Introduction \n\nTracking objects with flexible shapes in video sequences is currently an important topic in \nthe vision community. Methods include curve fitting [9], layered models [1, 2, 3], Bayesian \nreconstruction of 3-D models from video[6], and active contour models [10,  14, 15]. \n\nFitting curves to the outlines of objects has been attempted using various methods, includ(cid:173)\ning \"Snakes\"  [8,  9], where  an  energy function is  minimized so  as  to  find  the  best fit.  As \nwith other optimization methods, this  approach suffers from  local  maxima.  This problem \nis  amplified when using real data where edge noise can prevent the fit of the contour to the \ndesired object outline. \nIn  contrast,  Blake  et at.  [10]  introduced a probabilistic framework for  curve  fitting  and \ntracking.  Instead of proposing one single best fit for the contour, a probability distribution \nover contours is found.  The distribution is represented as a particle set where each particle \nrepresents one contour shape.  Inference in these \"active contour\" models is accomplished \nusing particle filtering. \n\nIn the \"active contour\" method,  a probabilistic dynamic  system is  used to  model the dis(cid:173)\ntribution over the outline of the object (the contour) yt  and the  observations Zt  at time t. \nTracking is performed by inference in this model. \n\nThe outline of an object is tracked through successive frames in a video by using a particle \n\n\f(a) \n\n(b) \n\n.... \n\n-..'  H \n\n.'IO'III!!I.l' \n\n~\"'.~ \n. \n. \n. , \n\u00b7t~~', \n~  . \n\n, \n\n~\"\\tt~. \n\n\" \n\n#~., \n-'-\n\n\" .. \n\n'\"  \" \n\n,J\" \n\nr \n\n,.):,  \u2022 \n\n~; \n-'~! ~rJ\" \n- . \n\nIlllll'iIi,W \n\nf~'~ ~ ~~ \n\n, \n\nFigure 1:  (a) Condensation with Gaussian dynamics (result for best a  = 2 shown) applied \nto  a  video  sequence.  The  200 contours  corresponding to  200  particles  fail  to  track  the \ncomplex outline of the hand. The pictures show every 24th frame of a 211-frame sequence. \n(b) Metropolis updates with  only 12 particles keep  the contours on track.  At each step, 4 \niterations of Metropolis updates are applied with a  = 3. \n\ndistribution.  Each particle Xn  represents  single contour Y  1  that approximates the outline \nof the object.  For any given frame,  a set of particles represents the probability distribution \nover positions and shapes of an object. \n\nIn order to find the likelihood of an observation Zt, given a particle X n , lines perpendicular \nto  the  contour are  examined  and  edges  are  detected.  A  variety  of distributions  can  be \nused  to  model the  likelihood  of the edge positions  along each  line.  We  assume  that  the \nposition of the edge belonging to the object is drawn from a Gaussian with  mean position \nat the intersection of the contour and the measurement line Y(Sm)  and the positions of the \nother edges are drawn from a Poisson distribution.  The observation likelihood for a single \nmeasurement line Zm  can be simplified to  [10] \n\np(zmlxn)  ex:  1 + \n\n1  L exp  [_Izm ,j - B~sm)xnI2] \n\n(1) \n\nV21fam lQ \n\nj \n\n2aml \n\nwhere Zm,j  denotes the coordinates of an  edge on  measurement line m, and  B(sm)xn = \nYn(Sm)  is  the  intersection of the  contour and  the  measurement line  (see later).  Q  = q>.. \nlNotation:  We  will use Y  to refer to a curve, parameterized by  x, and yes) for a particular point \non the curve.  x refers to  a particle consisting of subspace parameters, or in  our case, control points. \nn  indexes a particle in  a particle set, i  indexes  a component of a particle (i.e.  a single control point), \nm  indexes measurement lines and t is used as  a frame index \n\n\fwhere q is the probability of not observing the edge, and A is the rate of the Poisson process. \n(J'rnl  defines  the  standard deviation in  pixels.  A  multitude  of measurement lines  is  used \nalong the contour, and (assuming independence) the contour likelihood is \n\np(Zlxn) = IIP(ZrnIXn) \n\n(2) \n\nM \nwhere m  E M  is the set of measurement lines. \n\nAs  mentioned, in the condensation algorithm, a particle set is  used to represent the distri(cid:173)\nbution of contours. Starting from an initial distribution, a new distribution for a successive \nframe  is  produced by  propagating each  particle using  the  system dynamics  P(xtlxt-t} . \nNow  the  observation likelihood P(Ztlxt) is  calculated for each particle,  and  the  particle \nset is  resampled with replacement, using  the likelihoods  as  weights.  The resulting set of \nparticles approximates the posterior distribution at time t and is then propagated to the next \nframe. \n\nFigure l(a) shows the results of using condensation with 200 particles. As  can be seen, the \nresult is poor.  Intuitively, the reason condensation fails  is  that it is highly unlikely to draw \na particle that has raised control points over the four fingers , while keeping the remainder \nfixed.  Figure 1 (b) shows the result of using Metropolis updates and 12 particles (equivalent \namount of computation). \n\n2  Keeping contours on track using Metropolis updates \n\nTo  reduce the  dimensionality of the  inference,  a  subspace is  often used.  For example, a \nfixed  shape is  only  allowed horizontal and  vertical translation.  Using a subspace reduces \nthe  size  of the required particle set,  allowing for  successful tracking using  standard con(cid:173)\ndensation.  If the  object can  deform,  a  subspace  that captures  the  allowed  deformations \nmay be used  [15].  This increases the flexibility  of the contour, but at the cost of enlarged \ndimensionality.  In  order to  learn such a subspace, a large amount of training samples  are \nused, which are supplied by hand fitting contour shapes to a large number of frames.  How(cid:173)\never, even moderately detailed contours (say, the outline of a hand) will have many control \npoints that interact in complex ways, making subspace modeling difficult or impractical. \n\n2.1  Metropolis sampling \n\nMetropolis sampling is a popular Markov Chain Monte Carlo method for problems of large \ndimensionality[16, 17].  A new particle is  drawn from a proposal density Q(X'; Xt) , where \nin our case, Xt  is  a particle (i.e.  a set of control points) at time t, and x' is  a tentative new \nparticle produced by perturbing a subset of the control points. \n\nI \n\nQi(X  IXt)  =  J'<\\? exp  -\n\n1 \n\nV  27r(J'2 \n\n[  (x'  - Xt)2] \n\n2  2 \n(J' \n\nWe then calculate \n\n. \n\n(3) \n\n(4) \n\nwhere p(Xt IXt-l)p(Zt IXt)  is proportional to the posterior probability of observing the con(cid:173)\ntour in  that position.  If a  ~ 1 the proposed particle is accepted.  If a  < 1,  it is  accepted \nwith probability a.  Since Q is  symmetric, the second factor Q(x'; Xt)/Q(Xt; x') = 1. \nMetropolis sampling can be used in the framework of particle propagation in two ways.  It \ncan either be used to fit  splines around contours of a training set that is used to construct a \nshape subspace, e.g.  by PCA, or it can also be used to refine the shapes of the subspace to \nthe actual data during tracking. \n\n\f2.2  B-splines \n\nB-splines or basis function splines are parametric curves, defined as follows: \n\nY(s) = B(s)C \n\n(5) \n\nwhere Y (s)  is a two dimensional vector consisting of the 2-D coordinates of a point on the \ncurve, B(s) is  a matrix of polynomial basis functions, and C is  a vector of control points. \nIn other words, a point along the curve Y (s)  is a weighted sum of the values of the basis \nfunctions  B(s) for  a particular value  of s,  where  the  weights  are  given by  the  values  of \nC.  The  basis  functions  of b-splines  have  the characteristic that they  are  non-zero over a \nlimited range of s.  Thus a particular control point will  only  affect a portion of the curve. \nFor regular b-splines of order 4 (the basis functions  are 3rd degree polynomials), a single \ncontrol point will only affect Y (s)  over a range of s of length 4. Conversely, for particular \nSm (m : Sm  E SuppartO !(Xi), where i indexes the component of x that has been altered), \nY(Sm)  is affected by at most 4 control points (fewer towards the ends). \nAs mentioned before, a detailed contour can have a large number of control points, and thus \nhigh dimensionality and so it is common to  use a subspace.  In this case C can be written as \nC = W x + Co  where W  defines a linear subspace and Co  is the template of control points, \nand x represents perturbations from the template in the subspace. \n\nIn this  work we examine unconstrained models,  where no prior knowledge about the de(cid:173)\nformations or dynamics of the object are presumed.  In this case W  is  the identity matrix, \nCo  = 0, and x  are the actual coordinates of the control points.  This allows the contour to \ndeform in any way. \n\n2.3  Metropolis updates in condensation \n\nThe new algorithm consists of two steps:  a Metropolis step, followed by a resampling step. \n\n1.  Iterate over control points: \n\n\u2022  For one control point at a time,  draw  a proposal particle by drawing a new \ncontrol point x~ from a  2-D  Gaussian  centered at the  current control point \nXt ,i, Eq.  (3), and keeping all others unchanged. \n\n\u2022  Calculate the observation likelihood for the new control point, Eq.  (2). \n\u2022  Calculate a (Eq.  4) and reject or accept the new particle \n\n2.  Resample \n3.  Get next image in video \n\nIf the particle distribution at t - 1 reflects  P(xt-lIZl, ... , Zt-t}, the Metropolis updates \nwill converge to P(XtIZl, ... , Zt)  [16]. \nAs  mentioned above,  the affect of altering the position of a control point is  to  change the \nshape  of the  contour locally  since the  basis  functions  have  limited  support.  Thus,  when \nevaluating p(x~lxt-t}p(ZtlxD for  a  proposed  particle,  we  only  need  to  reexamine  mea(cid:173)\nsurement lines and evaluate p(zm,t Ix~,t) for lines in the effected interval and similarly for \np(x~,t IXn,t-l). This allows for an efficient algorithm implementation. \nThe  computation eM  required  to  update  a  single particle  using  metropolis,  compared  to \ncondensation is  eM  =  o\u00b7 it . ec  where  0  is  the  order of the  b-spline,  it is  the  number \nof iterations,  and  ec  is  the  number of computations  required  to  update  a particle  using \ncondensation. Thus, in the case offourth order splines such as the ones we use, the increase \nin  computation for  a  single  particle is  only  four for  a  single iteration,  and  eight for two \niterations. However, we have seen that far fewer particles are required. \n\n\fFigure 2:  The  behavior of the  algorithm with  Metropolis updates is  shown at frame  100 \n(t  =  100) as  a function  of iterations and  u.  The columns,  show,  from left to  right,  1,2,4 \nand  8 iterations,  and  the  rows,  from  top  to  bottom show  u  =  {I, 2, 3, 4}.  The rejection \nratio (i.e.  the ratio of rejected proposal particles to  the total number of proposed particles) \nis  shown as a bar on the right side of each image. \n\n3  Results \n\nWe tested our algorithm on the video sequence shown in Figure 1. The contour had 56 2-D \ncontrol points i.e a state space of 112 dimensions.  Such high dimensionality is required for \nthe detailed contours required to properly outline the fingers of the hand. \n\nThe results presented are  for relatively noise free data,  i.e.  free  from background clutter. \nThis allows us  to contrast the performance of using Metropolis updates and standard con(cid:173)\ndensation,  for the  scenarios  of interest,  i.e.  the  learning of subspace models  and contour \nrefinement. \n\nFigure  l(b) shows the results for  the Metropolis updates for  12 particles, 4 iterations and \nu  = 3.  The figure  shows every 24th frame from frame  1 to frame 211.  The outline of the \nsplayed fingers is tracked very successfully. \n\nFigure l(a) shows every 24th frame for the condensation algorithm of equivalent complex(cid:173)\nity,  using  200 particles and u  = 2.  This value of u  gave the best results for 200 particles. \nAs can be seen, the little finger is  tracked moderately well.  However the other parts of the \nhand are very poorly tracked.  For lower values of u  the contour distribution did not track \n\n\fthe hand, but stayed in  roughly the position of the initial  contour distribution.  For higher \nvalues of 0', the contour looped around in the general area of the fingers. \n\nFigure 2 shows the contour distribution for frame  100 and  12 particles, for different num(cid:173)\nbers of iterations and values of 0'.  When 0'  =  1 and 2 the contour distribution does not keep \nup with the deformation. For 0' = 4 the contour is correctly tracked except for the case of a \nsingle iteration.  The rejection ratio (i.e.  the ratio of rejected proposal particles to the total \nnumber of proposed particles) is  shown as  a bar on the right side of each image.  Notice \nthat the general trend is  that rejection ratio increases as  0'  increases,  and decreases as  the \nnumber of iterations is increased (due to a smaller 0' at each step). \n\nIntuitively, it is  not surprising that our new algorithm outperforms standard condensation. \nIn  the  case  of condensation,  Gaussian  noise  is  added to  each  control  point at  each  time \nstep.  One particle may  be  correctly positioned for the little  finger  and poorly positioned \nfor the forefinger,  whereas an other particle may be well positioned around the forefinger \nand poorly positioned around the little finger.  In order to track the deformation of the hand, \nsome particles are required that track both the little finger and the forefinger (and all other \nparts too).  In contrast the Metropolis updates are likely to reject particles that are locally \nworse than the current particle, but accept local improvements. \n\nIt should  be noted  that for lower dimensional problems,  the  increase in  tracking perfor(cid:173)\nmance is not as  dramatic.  E.g.  in the case of tracking a rotating head,  using a  12 control \npoint b-spline, the two algorithms performed comparably. \n\n4  Future work and conclusion \n\nWe  are currently examining the effects  of background clutter on  the  performance of the \nalgorithm.  We  are  also investigating other sequences and groupings of control points for \ngenerating  proposal  particles,  and  ways  of using  subspace  models  in  combination  with \nMetropolis updates. \n\nIn  this  paper we  showed how  Metropolis  updates can  be used to  keep  highly  flexible  ac(cid:173)\ntive  contours  on  track,  and an  efficient implementation strategy  was  presented.  For high \ndimensional problems which are common for detailed shapes, the new algorithm presented \nproduces dramatically better results than standard condensation. \n\nAcknowledgments \n\nWe thank Andrew Blake and Dale Schuurmans for helpful discussions. \n\nReferences \n\n[1]  1.  Y.  A.  Wang  and  E.  H.  Adelson  \"Representing moving images  with  layers.\"  IEEE \nTransactions on Image  Processing,  Special Issue:  Image Sequence Compression,  vol. \n3, no.  5.  1994, pp 625-638 \n\n[2]  Y.  Weiss  \"Smoothness  in  layers:  Motion  segmentation  using  nonparametric mixture \n\nestimation.\" Proceedings of IEEE conference on  Computer Vision and Pattern Recog(cid:173)\nnition, 1997. \n\n[3]  A. Jepson and M. 1. Black \"Mixture models for optical flow computation.\" Proceedings \n\nof the IEEE Conference on Computer Vision  and Pattern Recognition. \n\n[4]  W.  T.  Freeman and P.  A.  Viola \"Bayesian model  of surface perception.\" Advances in \n\nNeural Information Processing Systems 10, MIT Press,  1998. \n\n[5]  W.  Freeman, E. Pasztor,\"Leaming low-level vision,\" Proceedings of the International \n\nConference on Computer Vision,  1999 pp.  1182-1189 \n\n\f[6]  N. R.  Howe, M. E.  Leventon, W.  T.  Freeman, \"Bayesian Reconstruction of 3D Human \nMotion from  Single-Camera Video  To  appear in:\"  Advances  in  Neural  Information \nProcessing Systems 12, edited by S. A. Solla, T.  KLeen, and K-R Muller, 2000. TR99-\n37. \n\n[7]  G. E.  Hinton, Z. Ghahramani and Y.  W.  Teh \"Learning to parse images.\" In S.A. Solla, \nT. KLeen, and K-R. Muller (eds) Advances in Neural Information Processing Systems \n12, MIT Press, 2000 \n\n[8]  D. Terzopoulos, R. Szeliski, \"Tracking with Kalman snakes\" In A. Blake and A.  Yuille \n\n(ed) Active Vision,  3-20. MIT Press, Cambridge, MA,  1992 \n\n[9]  N. Papanikolopoulos, P.  Khosla, T.  Kanade \"Vision and Control Techniques for robotic \nvisual  tracking,\" In  Proc.  IEEE Int.  Con! Robotics and Autmation 1,  1991, pp.  851  -\n856. \n\n[10]  A.  Blake, M.  Isard \"Active Contours\" Springer-Verlag 1998 ISBN 3540762175 \n[11]  1.  MacCormick, A.  Blake \"A  probabilistic exclusion principle for tracking multiple \n\nobjects\" Proc.  7th IEEE Int.  Con!  Computer Vision,  1999 \n\n[12]  M.  Isard, A.  Blake \"ICONDENSATION: Unifying low-level and high-level tracking \nin  a  stochastic framework\" Proc.  5th  European  Con!  Computer Vision,  vol.  1  1998, \npp.893-908 \n\n[13]  1.  Sullivan,  A.  Blake,  M.  Isard,  1.  MacCormick,  \"Object Localization  by  Bayesian \n\nCorrelation\" Proc.  Int.  Con! Computer Vision,  1999 \n\n[14]  T.  F.  Cootes, G.  H.  Edwards, C. 1.  Taylor, \"Active Appearance Models\" Proceedings \n\nof the European conference on Computer Vision,  Vol.  2, 1998, pp. 484 - 498 \n\n[15]  I. Matthews,  J.  A.  Bangham,  R.  Harvey  and  S.  Cox.  Proc.  Auditory-Visual  Speech \n\nProcessing (AVSP),  1998 pp. 73-78. \n\n[16]  R.  M.  Neal,  \"Probabilistic  Inference  Using  Markov  Chain Monte Carlo  Methods\", \n\nTechnical Report CR G-TR -93-1, University of Toronto, 1993 \n\n[17]  D. J. C MacKay \"Introduction to Monte Carlo methods\" In M.1. Jordan (ed) Learning \n\nin Graphical Models, MIT Press, Cambridge, MA, 1999 \n\n\f", "award": [], "sourceid": 1835, "authors": [{"given_name": "Trausti", "family_name": "Kristjansson", "institution": null}, {"given_name": "Brendan", "family_name": "Frey", "institution": null}]}