{"title": "The Computation of Stereo Disparity for Transparent and for Opaque Surfaces", "book": "Advances in Neural Information Processing Systems", "page_first": 385, "page_last": 392, "abstract": null, "full_text": "The Computation of Stereo Disparity for \n\nTransparent  and for  Opaque  Surfaces \n\nSuthep Madarasmi \n\nComputer Science  Department \n\nUniversity of Minnesota \nMinneapolis, MN  55455 \n\nDaniel Kersten \n\nDepartment of Psychology \nUniversity of Minnesota \n\nTing-Chuen Pong \n\nComputer Science  Department \n\nUniversity of Minnesota \n\nAbstract \n\nThe  classical  computational  model  for  stereo  vision  incorporates \na  uniqueness  inhibition  constraint  to enforce  a  one-to-one  feature \nmatch, thereby sacrificing the ability to handle transparency.  Crit(cid:173)\nics  of the  model  disregard  the  uniqueness  constraint  and  argue \nthat the smoothness constraint  can provide the excitation support \nrequired  for  transparency  computation.  However,  this  modifica(cid:173)\ntion  fails  in  neighborhoods  with  sparse  features.  We  propose  a \nBayesian  approach  to  stereo  vision  with  priors  favoring  cohesive \nover transparent surfaces.  The disparity and its segmentation into a \nmulti-layer \"depth planes\"  representation  are simultaneously com(cid:173)\nputed.  The smoothness constraint propagates support  within each \nlayer, providing mutual excitation for non-neighboring transparent \nor  partially occluded  regions.  Test  results  for  various  random-dot \nand other stereograms are presented. \n\n1 \n\nINTRODUCTION \n\nThe horizontal disparity in  the  projection of a  3-D point in  a  parallel stereo imag(cid:173)\ning system can be used  to compute depth  through triangulation.  As  the number of \n\n385 \n\n\f386 \n\nMadarasmi,  Kersten,  and Pong \n\npoints  in  the scene  increases,  the  correspondence  problem increases  in  complexity \ndue  to  the  matching ambiguity.  Prior constraints on surfaces  are  needed  to arrive \nat a  correct  solution.  Marr and  Poggio [1976]  use  the smoothness constraint  to re(cid:173)\nsolve matching ambiguity and the uniqueness  constraint  to enforce  a  1-to-1 match. \nTheir smoothness constraint tends to oversmooth at occluding boundaries and their \nuniqueness  assumption discourages  the computation of stereo  transparency  for  two \noverlaid  surfaces.  Prazdny  [1985]  disregards  the uniqueness  inhibition term  to en(cid:173)\nable  transparency  perception.  However,  their  smoothness  constraint  is  locally en(cid:173)\nforced  and fails  at providing excitation for  spatially disjoint  regions  and for  sparse \ntransparency. \n\nMore recently,  Bayesian approaches have been used  to incorporate prior constraints \n(see  [Clark and Yuille,  1990] for a  review) for stereopsis while overcoming the prob(cid:173)\nlem of oversmoothing.  Line  processes  are  activated for  disparity  discontinuities  to \nmark  the  smoothness  boundaries  while  the  disparity  is  simultaneously  computed. \nA  drawback  of such  methods  is  the  lack  of an  explicit  grouping  of  image  sites \ninto piece-wise  smooth  regions.  In  addition, when  presented  with  a  stereogram  of \noverlaid  (transparent)  surfaces  such  as  in  the  random-dot  stereogram  in  figure  5, \nmultiple  edges  in  the  image  are  obtained  while  we  clearly  perceive  two  distinct, \noverlaid  surfaces.  With  edges  as output,  further  grouping of overlapping  surfaces \nis  impossible  using  the  edges  as  boundaries.  This suggests  that  surface  grouping \nshould  be performed simultaneously with disparity computation. \n\n2  THE MULTI-LAYER REPRESENTATION \n\nWe  propose  a  Bayesian approach to computing disparity and its segmentation that \nuses  a  different  output representation from the previous,  edge-based  methods.  Our \nrepresentation was inspired by the observations of Nakayama et  al.  [1989]  that mid(cid:173)\nlevel  processing  such  as  the  grouping of objects behind occluders  is  performed for \nobjects within the same  \"depth plane\"  . \n\nAs  an example consider the stereogram of a floating square shown in figure  1a.  The \nedge-based  segmentation  method  computes  the  disparity  and  marks the  disparity \nedges  as shown  in figure  lb.  Our  approach  produces  two  types  of output  at  each \npixel:  a  layer  (depth  plane) number and a  disparity value for  that  layer.  The goal \nof the system is  to place points that could have arisen from a single smooth surface \nin the scene  into one distinct layer.  The output for our multi-surface representation \nis shown in figure  1c.  Note that the floating square has a unique layer label,  namely \nlayer 4,  and  the  background  has  another  label of 2.  Layers  1 and  3  have  no  data \nsupport  and are,  therefore,  inactive. \n\nThe rest  of the  pixels  in  each  layer  that  have  no  data support  obtain values  by  a \nmembrane fitting  process  using  the  computed  disparity  as  anchors.  The occluded \nparts of surfaces are, thus, represented in each layer.  In addition, disjoint regions of a \nsingle surface due to occlusion are represented  in a single layer.  This representation \nof occluded parts is an important difference between our representation and a similar \nrepresentation for  segmentation by  Darrell  and Pentland [1991]. \n\n\fThe Computation of Stereo Disparity for  Transparent and for  Opaque Surfaces \n\n387 \n\n(a) \n\nFigure I:  a)  A gray scale \ndisplay of a noisy stereo(cid:173)\ngram depicting a floating \nsquare. b.  Edge based \ndisp. =  0 method:  disparity com(cid:173)\n\nputed and disparity discon(cid:173)\ntinuity computed. c.  Multi(cid:173)\nSurface method:  disparity \ncomputed, surface grouping \nperformed by layer assign-\nh \nment. an \nlspanty  or eac \nlayer filled  in. \nALGORITHM AND SIMULATION  METHOD \n\nIl1IIII  -Layer 4 \n~ -Layer 2 \n\n. \ndlsp. = 4 \n\nLayer 4 \n\n(b) \n\nd d' \n\n. \n\nf \n\n3 \n\nWe  use  Bayes'  [1783]  rule  to  compute  the scene  attribute,  namely disparity  u  and \nits layer assignment 1 for  each  layer: \n\nIldL  dR) = p(dL,dRlu, I)p(u, I) \n\np( dL , dR ) \n\n( \np u , '  \n\nwhere dL  and dR  are the left  and right intensity image data.  Each  constraint is ex(cid:173)\npressed as a local cost function using the Markov Random Field (MRF) assumption \nlGeman and Geman,  1984],  that pixels values are  conditional only on their nearest \nneighbors.  Using  the  Gibbs-MRF equivalence,  the  energy  function  can  be written \nas  a  probability function: \n\np(x) = -e-\"(cid:173)\n\nE(.,) \n\n1 \nZ \n\nwhere  Z  is  the  normalizing constant,  T  is  the  temperature,  E  is  the  energy  cost \nfunction,  and  x  is  a  random variable \n\nOur energy  constraints can  be expressed  as \n\nE  = >'D VD  + >'s Vs + >'G VG  + >'E VE  + AR VR \n\nwhere  the>. 's  are  the  weighting factors  and  the  VD, Vs,  VG,  VE,  VR  functions  are \nthe  data matching cost,  the smoothness term,  the gap  term,  the  edge  shape  term, \nand  the  disparity versus intensity edge  coupling term, respectively. \n\nThe data matching constraint  prefers  matches with similar intensity and contrast: \n\nVD  = t  [Idr - dfl +.., .2: I(df - dr) - (d~ - df)l] \n\n, \n\nJENi \n\nwith the image indices k and m given by the ordered pairs k = (row(i), col(i)+uC,i), \nm = (row(j) , col(j) + UCii),  M  is  the number of pixels in  the image, Ci  is  the layer \nclassification for  site  i,  and  Uli  is  the  disparity  at  layer  I.  The..,  weighs  absolute \nintensity versus  contrast  matching. \nThe  >'D  is  higher for  points  that  belong  to  unambiguous features  such  as  straight \nvertical  contours,  so  that  ambiguous  pixels  rely  more  on  their  prior  constraints. \n\n\f388 \n\nMadarasmi, Kersten,  and Pong \n\ncost \n\n(b) \n\ndepth difference \n\ndepth difference \n\nFigure 2:  Cost function  V s. a) The smoothness cost is  quadratic until  the disparity differ(cid:173)\nence is high and an  edge process is  activated. b) In  our simulations we use a threshold \nbelow which the smoothness cost is  scaled down and above which a different layer \nassignment is  accepted at a constant high cost. \n\nAlso, if neighboring pixels have a higher disparity than the current  pixel and are in \na  different  layer, its )..D  is lowered since its corresponding  point in the left image is \nlikely  to be occluded. \n\nThe equation for  the smoothness term is given by: \n\nM \n\nL \n\nVs  = LL L  V,(uu, U'j)a, \n\nwhere,  Ni  are  the  neighbors  of i,  V,  is  the  local  smoothness  potential, a,  is  the \n\nactivity level for layer I  defined  by  the  percent  of pixels belonging to layer I,  and L \nis  the number layers in the system.  The local smoothness potential is given  by: \n\ni \n\n1 \n\njEN. \n\nif (a - b)2  < Tn \notherwise \n\nwhere JJ  is the weighting term between  depth smoothness and directional derivative \nsmoothness.  The  ~k is  the  difference  operation  in  various  directions  k,  and  T \nis  the  threshold. \nInstead  of the  commonly  used  quadratic  smoothness  function \ngraphed  in  figure  2a,  we  use  the  (7  function  graphed  in  figure  2b  which  resembles \nthe  Ising  potential.  This  allows  for  some  flexibility  since  )..5  is  set  rather  high  in \nour simulations. \nThe VG  term ensures  a gap in the values of corresponding  pixels between  layers: \n\nThis ensures  that if a  site  i  belongs  to layer C.,  then all  points j  neighboring i  for \neach  layer 1 must have different  disparity values  ulj  than  uCia' \nThe edge  or  boundary  shape  constraint  VE  incorporates  two  types of constraints: \na  cohesive  measure  and  a  saliency  measure.  The  costs  for  various  neighborhood \nconfigurations are given in figure  3. \n\nThe constraint VR  ensures  that if there is  no edge in intensity then there should be \nno  edge  in  the  disparity.  This is  particularly important  to  avoid  local  minima for \ngray scale  images since  there is so  much ambiguity in  the matching. \n\n\fThe Computation of Stereo Disparity for  Transparent and for  Opaque Surfaces \n\n389 \n\n\u2022 \n\n- same  layer label \n\ncost ==  0.7 \n\ncost ==  I \n\nD  -different layer label \n\ncost = 0.2 \n\ncost = 0.25  cost ==  0.5 \n\ncost = 0 \nFigure 3:  Cost function  VE.  The costs associated nearest neighborhood layer label con(cid:173)\n~gurations. a) Fully cohesive region (lowest cost) b) Two opaque regions with straight \nhne boundary. c) Two opaque regions with diagonal line boundary. d) Opaque regions \nwith no figural  continuity. e) Transparent region with dense samplings.  f) Transparent \nregion with no other neighbors (highest cost). \n\nLayer 3 \n\nlayer labels \n\nWire-frame plot of Layer 3 \n\nFigure 4:  Stereogram of floating cyl(cid:173)\ninder shown in crossed and uncrossed \ndisparity. Only disparity values in the \nactive layers are shown. A wire(cid:173)\nframe rendering for layer 3 which \ncaptures the cylinder is shown. \n\nThe  Gibbs  Sampler  [Geman  and  Geman,  1984]  with  simulated  annealing  is  used \nto  compute the disparity  and layer assignments.  After each  iteration of the Gibbs \nSampler, the missing values within each layer are filled-in using the disparity at the \navailable sites.  A quadratic energy functional enforces  smoothness of disparity and \nof disparity difference in various directions.  A gradient descent  approach minimizes \nthis energy  and the missing values are filled-in. \n\n4  SIMULATION  RESULTS \n\nAfter  normalizing each of the local costs  to lie  between  0 and  1,  the values for  the \nweighting parameters used  in decreasing  order are:  .As, .AR, .AD, .AE,.AG  with the .AD \nvalue moved to follow .AG  if a  pixel is  partially occluded.  The results for  a  random(cid:173)\ndot  stereogram  with  a  floating  half-cylinder  are  shown  in  figure  4.  Note  that  for \nclarity only the visible pixels within each layer are displayed,  though the remaining \npixels are filled-in.  A  wire-frame rendering for  layer 3 is  also provided. \n\nFigure  5  is  a  random-dot  stereogram  with  features  from  two  transparent  fronto(cid:173)\nparallel  surfaces.  The  output  consists  primarily  of two  labels  corresponding  to \nthe foreground  and  the  background.  Note  that  when  the stereogram is  fused,  the \npercept  is  of two  overlaid  surfaces  with  various  small,  noisy  regions  of incorrect \nmatches. \nFigure 6 is  a  random-dot stereogram depicting many planar-parallel surfaces.  Note \n\n\f390 \n\nMadarasmi,  Kersten, and Pong \n\n~ _\n\n... \n\n_ \n\n1 \n\nLayer  I \n\n- _ - - -iJ2~JI!5iw \n\n- ~ -~ \n\n~4j. -,  7  - An \n\nLayer 2 \n\nLayer 3 \n\n--\n\n- _ ..... \n\n........-\n\nLaye~o;II~_:-~_IIIII;;C;=;:;F;_~.~2i:4::3=\"\":;;\"--~~ \nLayer 5 \n\nFigure 5:  Random-dot ste(cid:173)\nreogram of two overlaid \nsurfaces. Layers  1 and 4 \nare the mostly activated \nlayers. Only 5 of the layers \nare shown here. \n\n-\n\n\")51 \n\" . , \u00b7.ii \n\nlayer labeb \n\nFigure 6:  Random-dot stereogram of \nmultiple flat  surfaces. Layers 4 captures \ntwo regions since they belong to the \nsame surface (equal disparity). \n\nlayer labels \n\n.. ). \n\nthat there are two disjoint regions which are classified into the same layer since they \nform a  single surface. \nA  gray-scale  stereogram  depicting  a  floating  square  occluding  the  letter  'C'  also \nfloating  above  the  background  is  shown  in  figure  7.  A  feature-based  matching \nscheme  is  bound  to fail  here  since  locally  one  cannot  correctly  attribute  the  com(cid:173)\nputed  disparity  at  a  matched  corner  of the  rectangle,  for  example,  to  either  the \nrectangle,  the background, or to both regions.  Our VR  constraint forces  the system \nto  attempt  various  matches  until  points  with  no  intensity  discontinuity  have  no \ndisparity discontinuity.  Another important feature is that the two ends of the letter \n'C' are in the same  \"depth plane\"  [Nakayama et  al.,  1989] and may later be merged \nto  complete the letter. \n\nFigure 8 is a gray scale stereogram depicting 4 distant surfaces with planar disparity. \nAt occluding boundaries, the region corresponding to the further surface in the right \nimage has  no  corresponding  region  in  the left  image.  A  high  .AD  would  only force \nthese  points  to  find  an  incorrect  match  and  add  to  the  systems  errors.  The.AD \nreduction factor for partially occluded points reduces the data matching requirement \nfor  such  points.  This is  crucial  for  obtaining  correct  matches  especially  since  the \nimages are sparsely textured  and the dependence on accurate information from the \ntextured  regions is  high. \n\nA  transparency  example of a  fence  in front  a  bill-board is  given  in figure  9.  Note \n\n\f", "award": [], "sourceid": 709, "authors": [{"given_name": "Suthep", "family_name": "Madarasmi", "institution": null}, {"given_name": "Daniel", "family_name": "Kersten", "institution": null}, {"given_name": "Ting-Chuen", "family_name": "Pong", "institution": null}]}