{"title": "Markov Random Fields Can Bridge Levels of Abstraction", "book": "Advances in Neural Information Processing Systems", "page_first": 396, "page_last": 403, "abstract": null, "full_text": "Markov  Random  Fields Can Bridge Levels of \n\nAbstraction \n\nPaul R.  Cooper \n\nPeter N.  Prokopowicz \n\nInstitute for  the  Learning Sciences \n\nInstitute for  the  Learning Sciences \n\nNorthwestern  University \n\nEvanston,  IL \n\ncooper@ils.nwu.edu \n\nNorthwestern  U ni versity \n\nEvanston, IL \n\nprokopowicz@ils.nwu.edu \n\nAbstract \n\nNetwork  vision  systems  must  make  inferences  from  evidential  informa(cid:173)\ntion across levels of representational abstraction, from low level invariants, \nthrough  intermediate scene  segments,  to  high  level  behaviorally  relevant \nobject  descriptions.  This paper shows  that such  networks  can  be realized \nas  Markov  Random  Fields  (MRFs).  We  show  first  how  to  construct  an \nMRF  functionally  equivalent  to  a  Hough  transform  parameter  network, \nthus establishing a  principled  probabilistic basis for  visual networks.  Sec(cid:173)\nond,  we  show  that these  MRF  parameter networks  are  more capable and \nflexible  than traditional methods.  In particular,  they  have a  well-defined \nprobabilistic  interpretation,  intrinsically  incorporate  feedback,  and  offer \nricher  representations  and decision capabilities. \n\n1 \n\nINTRODUCTION \n\nThe nature of the vision problem dictates that neural networks for vision must make \ninferences  from evidential information across  levels  of representational abstraction. \nFor  example,  local  image  evidence  about  edges  might  be  used  to  determine  the \noccluding boundary of an object in a scene.  This paper demonstrates that parameter \nnetworks  [Ballard,  1984],  which  use  voting  to  bridge  levels  of abstraction,  can  be \nrealized  with  Markov Random Fields  (MRFs). \n\nWe show two main results.  First, an MRF is constructed with functionality formally \nequivalent to that of a  parameter net  based on the  Hough  transform.  Establishing \n\n396 \n\n\fMarkov Random Fields Can Bridge Levels of Abstraction \n\n397 \n\nthis  equivalence  provides  a  sound  probabilistic foundation for  neural  networks  for \nvision.  This is  particularly important given the fundamentally evidential  nature of \nthe vision  problem. \n\nSecond,  we  show  that  parameter  networks  constructed  from  MRFs  offer  a  more \nflexible and capable framework for intermediate vision than traditional feedforward \nparameter networks  with  threshold  decision  making.  In particular,  MRF  parame(cid:173)\nter  nets  offer  a  richer  representational  framework,  the  potential for  more  complex \ndecision  surfaces,  an  integral treatment  of feedback,  and  probabilistically justified \ndecision  and  training  procedures.  Implementation experiments  demonstrate  these \nfeatures. \n\nTogether,  these  results  establish a  basis for  the  construction of integrated  network \nvision systems  with a  single  well-defined  representation  and control structure  that \nintrinsically incorporates feedback. \n\n2  BACKGROUND \n\n2.1  HOUGH TRANSFORM  AND  PARAMETER  NETS \n\nOne approach to bridging levels  of abstraction in vision is to combine local, highly \nvariable  evidence  into  segments  which  can  be  described  compactly  by  their  pa(cid:173)\nrameters.  The  Hough  transform  offers  one  method  for  obtaining  these  high-level \nparameters.  Parameter  networks  implement  the  Hough  transform  in  a  parallel \nfeedforward  network.  The central idea is  voting:  local low-level evidence  cast votes \nvia  the  network  for  compatible  higher-level  parameterized  hypotheses.  The  clas(cid:173)\nsic  Hough example finds  lines  from  edges.  Here  local evidence  about the  direction \nand  magnitude  of image  contrast  is  combined  to  extract  the  parameters  of lines \n(e.g.  slope-intercept),  which  are more useful  scene segments.  The Hough transform \nis  widely  used  in  computer  vision  (e.g. \n[Bolle  et  al.,  1988])  to  bridge  levels  of \nabstraction. \n\n2.2  MARKOV RANDOM  FIELDS \n\nMarkov Random Fields offer a formal foundation for  networks [Geman and Geman, \n1984]  similar to that of the  Boltzmann machine.  MRFs  define  a  prior joint  prob(cid:173)\nability distribution over  a  set  X  of discrete  random variables.  The possible  values \nfor  the  variables can be interpreted  as  possible  local features  or hypotheses.  Each \nvariable is  associated  with a  node  S  in an undirected  graph  (or  network),  and can \nbe  written  as  X,.  An assignment of values to all the variables in the field  is  called \na  configuration, and is  denoted Wi  an assignment of a  single variable is  denoted w,. \nEach  fully-connected  neighborhood  C  in a  configuration of the field  has  a  weight, \nor clique potential, Vc. \nWe  are  interested  in  the  probability  distributions  P  over  the  random  field  X. \nMarkov  Random Fields have a  locality property: \n\nP(X, = w,IXr  = Wr,r  E S,r '# s)  = P(X, = w,lXr  = Wr,r  EN,) \n\n(1) \nthat  says  roughly  that  the  state  of site  is  dependent  only  upon  the  state  of its \nneighbors  (N,).  MRFs can also be  characterized  in terms of an energy  function  U \n\n\f398 \n\nCooper and Prokopowicz \n\nwith a  Gibb's distribution: \n\ne-U(w)/T \n\nP(w) = \n\nZ \n\n(2) \n\nwhere T  is  the temperature,  and  Z  is  a  normalizing constant. \nIf we  are  interested  only  in  the  prior  distribution  P(w),  the  energy  function  U  is \ndefined  as: \n\nU(w)  = L Vc(w) \n\n(3) \n\ncEO \n\nwhere  C  is  the  set  of cliques  defined  by  the  neighborhood  graph,  and  the  Vc  are \nthe  clique  potentials.  Specifying  the  clique  potentials  thus  provides  a  convenient \nway to specify  the global joint prior probability distribution P, i.e.  to encode prior \ndomain knowledge about plausible structures. \nSuppose  we  are  instead interested  in the distribution P(wIO)  on the field  after an \nobservation 0, where an observation constitutes a  combination of spatially distinct \nobservations at each local site.  The evidence from an observation at a site is denoted \nP ( 0 .11lw.ll)  and  is  called  a  likelihood.  Assuming  likelihoods  are  local  and  spatially \ndistinct,  it is  reasonable to assume that they are conditionally independent.  Then, \nwith  Bayes'  Rule we  can derive: \n\n(4) \n\nThe  MRF  definition,  together  with  evidence  from  the  current  problem,  leaves  a \nprobability  distribution  over  all  possible  configurations.  An  algorithm  is  then \nused  to  find  a  solution,  normally  the  configuration  of  maximal  probability,  or \nequivalently,  minimal  energy  as  expressed  in  equation  4.  The  problem  of  min(cid:173)\nimizing  non-convex  energy  functions,  especially  those  with  many  local  minima, \nhas  been  the  subject  of intense  scrutiny  recently  (e.g.  [Kirkpatrick  et  al.,  1983; \nHopfield  and  Tank,  1985]).  In this  paper  we  focus  on  developing  MRF  represen(cid:173)\ntations wherein  the minimum energy  configuration defines  a  desirable goal,  not on \nmethods of finding the minimum.  In our experiments have have used  the determin(cid:173)\nistic  Highest  Confidence  First  (HCF)  algorithm [Chou and  Brown,  1990]. \n\nMRFs  have  been  widely  used  in  computer  vision  applications,  including  image \nrestoration,  segmentation,  and  depth  reconstruction  [Geman  and  Geman,  1984; \nMarroquin,  1985;  Chellapa and Jain,  1991].  All these  applications involve Hat  rep(cid:173)\nresentations  at  a  single  level  of  abstraction.  A  novel  aspect  of our  work  is  the \nhierarchical  framework  which explicitly represents  visual entities at different  levels \nof abstraction,  so  that these  higher-order  entities  can  serve  as  an interpretation  of \nthe data as well as playa role in further constraint satisfaction at even higher levels. \n\n3  CONSTRUCTING MRFS  EQUIVALENT TO \n\nPARAMETER NETWORKS \n\nHere  we  define  a  Markov  Random  Field  that  computes  a  Hough  transform;  i.e. \nit  detects  higher-order  features  by  tallying  weighted  votes  from  low-level  image \ncomponents  and  thresholding  the  sum.  The  MRF  has  one  discrete  variable  for \n\n\fMarkov Random Fields Can Bridge Levels  of Abstraction \n\n399 \n\nParameterized \nsegment \n\nLinear sum and \nthreshold \n\nf<f \n\nmax \n\nInput nodes \n\nHigh-level \nvariable and \nlabel set \n\nlow-level \nvariables and \nlabel sets \n\nclique \n\n- Exists \n\nClique potentials: \n'f \n-8 \n-w.f \nkl  1  max \nk2 o \n\nEe \nE-e \n-Ee \n-E-e \n\nFigure  1:  Left:  Hough-transform parameter net.  Input determines confidence  I,  in \neach  low-level  feature;  these  confidences  are  weighted  (Wi)'  summed,  and  thresh(cid:173)\nolded.  Right:  Equivalent  MRF.  Circles  show  variables  with  possible  labels  and \nnon-zero  unary  clique  potentials;  lines  show  neighborhoods;  potentials are  for  the \nfour  labellings of the  binary cliques. \n\nthe higher-order feature,  whose  possible  values  are  ezists and  doesn't  ezist and  one \ndiscrete  variable for  each  voting element,  with  the same two  possible  values.  Such \na  field  could be  replicated  in space  to compute many features  simultaneously. \n\nThe construction follows from two  ideas:  first,  the clique potentials of the network \nare defined  such  that only two  of the many configurations  need  be  considered,  the \nother  configurations  being  penalized  by  high  clique  potentials  (i.e. \nlow  a  priori \nprobability).  One configuration encodes  the  decision  that  the  higher-order  feature \nexists,  the other  that  it  doesn't  exist.  The  second  point  is  that  the  energy  of the \n\"doesn't exist\"  configuration is  independent of the observation,  while the energy of \nthe  \"exists\"  configurations improves with the strength of the evidence. \n\nConsider  a  parameter  net  for  the  Hough  transform  that  represents  only  a  single \nparameterized image segment  (e.g.  a  line segment)  and  a  set  of low-level features, \n(e.g.  edges)  which  vote for  it (  Figure  1 left).  The variables,  labels,  and neighbor(cid:173)\nhoods,  of the  equivalent  MRF  are defined  in  the  right side  of Figure  1 The clique \npotentials,  which  depend  on  the Hough  parameters,  are  shown  in  the  right side  of \nthe figure  for  a  single neighborhood of the graph  (There are four  ways to label this \nclique.)  Unspecified  unary  potentials are  zero.  Evidence  applies only to  the labels \nei;  it is  the likelihood of making a  local observation 0,: \n\nIn  lemma  1,  we  show  that  the  configuration  WE  =  Eele2 ... en ,  has  an  en(cid:173)\nergy  equal  to  the  negated  weighted  sum  of the  feature  inputs,  and  configuration \nW9  = ,Ee'le'2 ... ,en has  a  constant energy  equal to the negated  Hough  thresh(cid:173)\nold.  Then,  in  lemma 2,  we  show  that  the  clique  potentials  restrict  the  possible \nconfigurations  to only  these  two,  so  that  the  network  must have  its  minimum en(cid:173)\nergy  in  a  configuration whose  high-level feature  has the correct  label. \n\n(5) \n\n\f400 \n\nCooper and Prokopowicz \n\nLemma 1: \nU(WE  10) = - E~=l wi/i \nU(W9  I 0) =  -0 \nProof:  The  energy  contributed  by  the  clique  potentials  in WE  is  E~=l -Wi!mo.1:' \nDefining W  = E~=1 Wi,  this simplifies to  -W!mo.1:' \nThe  evidence  also  contributes  to  the  energy  of WE,  in  the  form:  - E~=1 log ei' \nSubstituting from 5 into  4 and simplifying gives  the total posterior energy  of WE: \n\nU(WE  10) =  -W!mo.1: + W!mo.1:  - LWi!;, =  - LWi!i \n\nn \n\nn \n\n(6) \n\n1=1 \n\n;'=1 \n\nThe energy  of the configuration W9  does  not  depend  on evidence  derived  from the \nHough features.  It has only one clique  with a  non-zero  potential, the unary  clique \nof label  -,E.  Hence  U(W9  I 0) =  -0.0 \nLemma 2: \n(Vw)(w  = E  . .. -,elt ... ) :::}  U(w  I 0) > U(WE  I 0) \n(Vw)(w  = -,E ... elt  ... ) :::}  U(w  I 0) > U(W9  I 0) \nProof:  For  a  mixed  configuration W  = E  . .. -,elt  ... , changing label -,elt  to elt  adds \nIt \nenergy  because  of  the  evidence  associated  with  elt.  This  is  at  most  Wi!mo.1:' \nalso  removes  energy  because  of the  potential of the clique  Eelt,  which  is  -Wi!mo.1:' \nBecause the clique potential K2  from E-,e1c  is also removed, if K2  > 0, then changing \nthis label always  reduces  the energy. \nFor  a  mixed  configuration  w  =  -,E ... elt  ... ,  changing  the  low-level  label  e1e  to \n-,e1c  cannot  add  to the energy  contributed  by  evidence,  since  -,elt  has  no  evidence \nassociated  with it.  There is  no binary clique  potential for  -,E-,e, but the  potential \nK1  for  clique  -,Ee1c  is  removed.  Therefore,  again,  choosing  any  K1  >  0  reduces \nenergy  and ensures  that compatible labels are preferred.D \n\nFrom lemma 2,  there  are  two configurations  that could possibly have  minimal pos(cid:173)\nterior  energy.  From lemma I,  the  configuration  which  represents  the  existence  of \nthe higher-order feature is  preferred  if and only if the weighted sum of the evidence \nexceeds  threshold,  as in the  Hough  transform. \n\nOften  it is  desirable  to find  the  mode in a  high-level  parameter space  rather  than \nthose  elements  which  surpass  a  fixed  threshold.  Finding  a  single  mode  is  easy  to \ndo  in  a  Hough-like  MRFj  add  lateral  connections  between  the  ezists labels  of the \nhigh-level  features  to  form  a  winner-take-all  network.  If the  potentials  for  these \ncliques are large enough,  it is  not possible for  more than one variable corresponding \nto a  high-level feature  to be  labeled  ezists. \n\n4  BEYOND  HOUGH TRANSFORMS:  MRF \n\nPARAMETER NETS \n\nThe essentials of a  parameter network  are a  set of variables  representing  low-order \nfeatures,  a  set  of variables  representing  high-order  features,  and  the  appropriate \n\n\fMarkov Random Fields Can Bridge Levels of Abstraction \n\n401 \n\nFigure 2:  Noisy image data \n\nFigure  3:  Three  parameter-net  MRF  experiments:  white  dots  in  the  lower  images \nindicate the decision that a horizontal or vertical local edge is present.  Upper images \nshow  the  horizontal and  vertical  lines  found.  The left  net  is  a  feedforward  Hough \ntransform;  the middle net  uses  positive feedback  from  lines  to edges;  the right  net \nuses  negative feedback,  from  non-existing lines to non-existing edges \n\nweighted  connections  between  them.  This  section  explores  the  characteristics  of \nmore  \"natural\"  MRF  parameter  networks,  still  based  on  the  same  variables  and \nconnections,  but  not  limited to  binary  label  sets  and  sum/threshold decision  pro(cid:173)\ncedures. \n\n4.1  EXPERIMENTS  WITH  FEEDBACK \n\nThe  Hough  transform  and  its  parameter  net  instantiation  are  inherently  feed(cid:173)\nforward.  In contrast, all MRFs intrinsically incorporate feedback.  We experimented \nwith a network designed to find lines from edges.  Horizontal and vertical edge inputs \nare  represented  at  the  low  level,  and  horizontal and  vertical  lines  which  span  the \nimage at the  high level.  The input data look  like  Figure 2.  Probabilistic  evidence \nfor  the low-level edges is generated from pixel data using a model of edge-image for(cid:173)\nmation [Sher,  1987].  The edges  vote for  compatible lines.  In  Figure 3,  the  decision \nof the feed-forward,  Hough  transform MRF  is  shown at the left:  edges  exist  where \nthe local  evidence  is  sufficient;  lines exist  where enough  votes  are  received. \n\nKeeping the same topology, inputs, and representations  in the MRF, we  added top(cid:173)\ndown feedback  by changing binary clique potentials so that the existence of a line at \nthe high level  is  more strongly compatible with  the  existence  of its edges.  Missing \nedges are filled in (middle).  By making non-existent lines strongly incompatible with \nthe  existence  of edges,  noisy  edges  are  substantially removed  (right).  Other  MRFs \nfor  segmentation  [Chou  and  Brown,  1990;  Marroquin,  1985]  find  collinear  edges, \n\n\f402 \n\nCooper and Prokopowicz \n\nbut cannot reason  about lines and therefore  cannot exploit  top-down feedback. \n\n4.2  REPRESENTATION  AND  DECISION MAKING \n\nBoth parameter nets  and  MRFs represent  confidence in local hypotheses,  but here \nthe  MRF framework  has intrinsic advantages.  MRFs can simultaneously represent \nindependent  beliefs  for  and  against  the  same  hypotheses.  In  an  active  vision  sys(cid:173)\ntem, which must reason about gathering as well as interpreting evidence, one could \nextend  this  to  include  the  label  don't  know,  allowing explicit  reasoning  about  the \ncondition in which the local evidence insufficiently supports any decision.  MRFs can \nalso express  higher-order  constraints as more than a  set of pairs.  The exploitation \nof appropriate  3-cliques,  for  example,  has  been  shown  to  be  very  useful  [Cooper, \n1990]. \n\nSince the potentials in an MRF are related  to local conditional probabilities,  there \nis a  principled way to obtain them.  Observations can be used to estimate local joint \nprobabilities,  which  can  be  converted  to  the  clique  potentials  defining  the  prior \ndistribution on the field  [Pearl,  1988;  Swain,  1990]. \n\nMost evidence integration schemes require,  in addition to the network topology and \nparameters,  the  definition  of a  decision  making process  (e.g.  thresholding)  and  a \ntheory of parameter acquisition for  that process,  which is often ad hoc.  To estimate \nthe  maximum posterior  probability of a  MRF,  on  the  other  hand,  is  intrinsically \nto  make  a  decision  among the  possibilities  embedded  in  the  chosen  variables  and \nlabels. \n\nThe  space  of  possible  decisions  (interpretations  of  problem  input)  is  also  much \nricher  for  MRFs than for  parameter networks.  For  both nets,  the  nodes  for  which \nevidence  is  available  define  a  n-dimensional  problem  input  space.  The  weights \ndi vide this space into regions  defined  by  the one best interpretation (configuration) \nfor  all  problems  in  that  region.  With parameter nets,  these  regions  are  separated \nby  planes,  since only the sum of the inputs matters.  In MRFs,  the energy  depends \non  the  log-product  of the  evidence  and  the  sum of the  potentials,  allowing  more \ngeneral  decision  surfaces.  Non-linear  decisions  such  as  AND  or  XOR  are  easy  to \nencode,  whereas  they are impossible for  the linear Hough transform. \n\n5  CONCLUSION \n\nThis  paper  has  shown  that  parameter  networks  can  be  constructed  with  Markov \nRandom  Fields.  MRFs  can  thus  bridge  representational  levels  of abstraction  in \nnetwork  vision  systems.  Furthermore,  it  has  been  demonstrated  that  MRFs  offer \nthe  potential for  a  significantly  more  powerful  implementation of parameter  nets, \neven  if their topological architecture  is  identical to traditional Hough networks.  In \nshort,  at  least  one  method  is  now  available  for  constructing  intermediate  vision \nsolutions with  Markov  Random Fields. \n\nIt may thus be possible to build entire integrated vision systems  with a  single well(cid:173)\njustified  formal framework - Markov  Random  Fields.  Such  systems  would  have  a \nunified  representational scheme,  constraints and evidence  with well-defined  seman(cid:173)\ntics,  and a  single control structure.  Furthermore, feedback  and feedforward  flow  of \n\n\fMarkov Random Fields Can Bridge Levels of Abstraction \n\n403 \n\ninformation, crucial  in  any complete vision system,  is intrinsic  to  MRFs. \n\nOf  course,  the  task  still  remains  to  build  a  functioning  vision  system  for  some \ndomain.  In  this  paper  we  have  said  nothing  about  the  definition  of specific  \"fea(cid:173)\ntures\"  and  the  constraints  between  them  that  would  constitute  a  useful  system. \nBut providing essential  tools implemented in a  well-defined formal framework is  an \nimportant step  toward building robust, functioning systems. \n\nAcknowledgements \n\nSupport for this research was provided by NSF grant #IRI-9110492 and by Andersen \nConsulting, through their founding grant to the Institute for  the Learning Sciences. \nPatrick Yuen  wrote the  MRF  simulator that was  used  in the experiments. \n\nReferences \n\n[Ballard,  1984]  D.H.  Ballard, \n\n\"Parameter  Networks,\" \n\nArtificial  Intelligence, \n\n22(3):235-267,  1984. \n\n[Bolle  et  al.,  1988]  Ruud M.  Bolle,  Andrea Califano, Rick Kjeldsen,  and R.W. Tay(cid:173)\nlor,  \"Visual  Recognition  Using  Concurrent  and  Layered  Parameter  Networks,\" \nTechnical Report RC-14249, IBM  Research  Division, T.J. Watson Research Cen(cid:173)\nter,  Dec  1988. \n\n[Chellapa and Jain,  1991]  Rama Chellapa and Anil Jain, editors,  Markov  Random \n\nFields:  Theory  and  Application,  Academic  Press,  1991. \n\n[Chou and  Brown,  1990]  Paul  B.  Chou and  Christopher  M.  Brown,  \"The  Theory \nand  Practice  of Bayesian  Image  Labeling,\"  International  Journal  of Computer \nVision,  4:185-210,  1990. \n\n[Cooper,  1990]  Paul R.  Cooper,  \"Parallel Structure  Recognition with Uncertainty: \nCoupled Segmentation and  Matching,\"  In  Proceedings  of the  Third International \nConference  on  Computer  Vision ICCV  '90,  Osaka, Japan,  December  1990. \n\n[Geman and Geman, 1984]  Stuart Geman and  Donald Geman,  \"Stochastic Relax(cid:173)\n\nation,  Gibbs  Distributions,  and  the  Bayesian  Restoration  of Images,\"  PAMI, \n6(6):721-741, November  1984. \n\n[Hopfield and Tank,  1985]  J. J.  Hopfield and D.  W. Tank,  \"\"Neural\"  Computation \nof Decisions in Optimization Problems,\"  Biological Cybernetics, 52:141-152, 1985. \n\n[Kirkpatrick  et  al.,  1983]  S.  Kirkpatrick, C.D. Gelatt, and M.P. Vecchi,  \"Optimiza(cid:173)\n\ntion by  Simulated Annealing,\"  Science,  220:671-680, 1983. \n\n[Marroquin,  1985]  Jose  Luis  Marroquin,  \"Probabilistic  Solution  of Inverse  Prob(cid:173)\n\nlems,\"  Technical report, MIT Artificial Intelligence Laboratory, September, 1985. \n[Pearl,  1988]  Judea Pearl,  Probabalistic  Reasoning  in Intelligent  Systems,  Morgan \n\nKaufman, 1988. \n\n[Sher,  1987]  David B.  Sher,  \"A Probabilistic Approach to Low-Level Vision,\"  Tech(cid:173)\n\nnical  Report  232,  Department  of  Computer  Science,  University  of  Rochester, \nOctober  1987. \n\n[Swain,  1990]  Michael  J.  Swain,  \"Parameter  Learning for  Markov  Random Fields \n\nwith Highest Confidence First Estimation,\"  Technical Report 350, Dept. of Com(cid:173)\nputer  Science,  University of Rochester,  August  1990. \n\n\f", "award": [], "sourceid": 505, "authors": [{"given_name": "Paul", "family_name": "Cooper", "institution": null}, {"given_name": "Peter", "family_name": "Prokopowicz", "institution": null}]}