{"title": "Softening Discrete Relaxation", "book": "Advances in Neural Information Processing Systems", "page_first": 438, "page_last": 444, "abstract": null, "full_text": "Softening Discrete Relaxation \n\nAndrew M. Finch, Richard C. Wilson and Edwin R. Hancock \n\nDepartment of Computer Science, \n\nUniversity of York, York, Y01 5DD, UK \n\nAbstract \n\nThis paper describes a new framework for relational graph match(cid:173)\ning. The starting point is a recently reported Bayesian consistency \nmeasure which gauges structural differences using Hamming dis(cid:173)\ntance. The main contributions of the work are threefold. Firstly, \nwe demonstrate how the discrete components of the cost func(cid:173)\ntion can be softened. The second contribution is to show how \nthe softened cost function can be used to locate matches using \ncontinuous non-linear optimisation. Finally, we show how the res(cid:173)\nulting graph matching algorithm relates to the standard quadratic \nassignment problem. \n\nIntroduction \n\n1 \nGraph matching [6, 5, 7, 2, 3, 12, 11J is a topic of central importance in pattern \nperception. The main computational issues are how to compare inexact relational \ndescriptions (7J and how to search efficiently for the best match [8J. These two issues \nhave recently stimulated interest in the connectionist literature (9, 6, 5, lOJ. For \ninstance, Simic [9], Suganathan et al. (101 and Gold et ai. [6, 51 have addressed the \nissue of how to expressively measure relational distance. Both Gold and Rangarajan \n(61 and Suganathan et al [101 have shown how non-linear optimisation techniques \nsuch as mean-field annealing [lOJ and graduated assignment [61 can be applied to \nfind optimal matches. \n\nIn a recent series of papers we have developed a Bayesian framework for relational \ngraph matching [2, 3, 11, 121. The novelty resides in the fact that relational con(cid:173)\nsistency is gauged by a probability distribution that uses Hamming distance to \nmeasure structural differences between the graphs under match. This new frame(cid:173)\nwork has not only been used to match complex infra-red (3J and radar imagery \n[11], it has also been used to successfully control a graph-edit process (12J of the \nsort originally proposed by Sanfeliu and Fu (71. The optimisation of this relational \nconsistency measure has hitherto been confined to the use of discrete update pro(cid:173)\ncedures [11, 2, 3]. Examples include discrete relaxation [7, 11], simulated annealing \n\n\fSoftening Discrete Relaxation \n\n439 \n\n[4, 3] and genetic search [2]. Our aim in this paper is to consider how the optim(cid:173)\nisation of the relational consistency measure can be realised by continuous means \n[6, 10]. Specifically we consider how the matching process can be effected using a \nnon-linear technique similar to mean-field annealing [IOJ or graduated assignment \n[6]. In order to achieve this goal we must transform our discrete cost function [11] \ninto a form suitable for optimisation by continuous techniques. The key idea is \nto exploit the apparatus of statistical physics [13] to compute the effective Gibbs \npotentials for our discrete relaxation process. The potentials are in-fact weighted \nsums of Hamming distance enumerated over the consistent relations of the model \ngraph. The quantities of interest in the optimisation process are the derivatives \nof the global energy function computed from the Gibbs potentials. In the case of \nour weighted sum of Hamming distance, these derivatives take on a particularly \ninteresting form which provides an intuitive insight into the dynamics of the update \nprocess. An experimental evaluation of the technique reveals not only that it is \nsuccessful in matching noise corrupted graphs, but that it significantly outperforms \nthe optimisation of the standard quadratic energy function. \n2 Relational Consistency \nOur overall goal in this paper is to formulate a non-linear optimisation technique \nfor matching relational graphs. We use the notation G = (V, E) to denote the \ngraphs under match, where V is the set of nodes and E is the set of edges. Our \naim in matching is to associate nodes in a graph G D = (V D , ED) representing data \nto be matched against those in a graph G M = (V M , EM) representing an available \nrelational model. Formally, the matching is represented by a function f : VD -T VM \nfrom the nodes in the data graph G D to those in the model graph G M. We represent \nthe structure of the two graphs using a pair of connection matrices. The connection \nmatrix for the data graph consists of the binary array \n\nwhile that for the model graph is \n\nDab = \n\n{ 1 \n\nif (a, b) E ED \n\n0 otherwise \n\nMOl{3 = \n\nif (a , (3) E EM \n\n{ 1 \n0 otherwise \n\n(1) \n\n(2) \n\nThe current state of match between the two graphs is represented by the function \nf : V D -T V M\u00b7 In others words the statement f (a) = a means that the node a E V D \nis matched to the node a E V M. The binary representation of the current state \nof match is captured by a set of assignment variables which convey the following \nmeaning \n\n(3) \n\n_ {1 \n\nSaa -\n\nif f{a) = a \no otherwise \n\nThe basic goal of the matching process is to optimise a consistency-measure which \ngauges the structural similarity of the matched data graph and the model graph. \nIn a recent series of papers, Wilson and Hancock [11, 12] have shown how consist(cid:173)\nency of match can be modelled using a Bayesian framework. The basic idea is to \nconstruct a probability distribution which models the effect of memoryless match(cid:173)\ning errors in generating departures from consistency between the data and model \ngraphs. Suppose that Sa = aU {(3I(a, (3) E EM} represents the set of nodes that \nform the immediate contextual neighbourhood of the node a in the model graph. \n\n\f440 \n\nA. M. Finch, R. C. Wilson and E. R. Hancock \n\nFurther suppose that ra = f(a) U {f(b)l(a,b) E ED} represents the set of matches \nassigned to the contextual neighbourhood of the node a E VD of the data graph. \nBasic to Wilson and Hancock's modelling of relational consistency is to regard the \ncomplete set of model-graph relations as mutually exclusive causes from which the \npotentially corrupt matched model-graph relations arise. As a result, the probabil(cid:173)\nity of the matched configuration r a can be expressed as a mixture distribution over \nthe corresponding space of model-graph configurations \n\np(ra) = L p(raISa)P(Sa) \n\naEVM \n\n(4) \n\nThe modelling of the match confusion probabilities p(r alSa) draws on the assump(cid:173)\ntion that the error process is independent of location. This allows p(raISa ) to be \nfactorised over its component matches. Individual label errors are further assumed \nto act with a memoryless probability Pe . With these ingredients the probability of \nthe matched neighbourhood r a reduces to [11, 12] \n\np(ra) = I~~I 2: exp[-ItH(a,a)] \n\naEVM \n\n(5) \n\nwhere Ka = (1- Pe)lfal and the exponential constant is related to the probability \nof label errors, i.e. It = In (l-;,~e ). Consistency of match is gauged by the \"Hamming \ndistance\", H(a, a) between the matched relation r a and the set of consistent neigh(cid:173)\nbourhood structures Sa, 'Va E VM from the model graph. According to our binary \nrepresentation of the matching process, the distance measure is computed using the \nconnectivity matrices and the assignment variables in the following manner \n\nH(a, a) = 2: 2: Ma{3Dab(l - Sb{3) \n\nbEVD {3EVM \n\n(6) \n\nThe probability distribution p(r a) may be regarded as providing a natural way of \nmodelling departures from consistency at the neighbourhood level. Matching con(cid:173)\nsistency is graded by Hamming distance and controlled hardening may be induced \nby reducing the label-error probability Pe towards zero. \n3 The Effective Potential for Discrete Relaxation \nWe commence the development of our graduated assignment approach to discrete \nrelaxation by computing an effective Gibbs potential U(r a) for the matching config(cid:173)\nuration r a. In other words, we aim to replace the compound exponential probability \ndistribution appearing in equation (5) by the single Gibbs distribution \n\n(7) \n\nOur route to the effective potential is provided by statistical physics. If we represent \np(r a) by an equivalent Gibbs distribution with an identical partition function, then \nthe equilibrium configurational potential is related to the partial derivative of the \nlog-probability with respect to the coupling constant It in the following manner [13] \n\n\fSoftening Discrete Relaxation \n\n8J.t \nUpon substituting for p(r a) from equation (5) \n\nu(r a) = _ 8ln p(r a) \n\n2: H(a, a) exp[ -J.tH(a, a)] \nu(ra) = _a_E~VM~ ________________ __ \n\n2: exp[-J.tH(a,a)] \n\naEVM \n\n441 \n\n(8) \n\n(9) \n\nIn other words the neighbourhood Gibbs potentials are simply weighted sums of \nHamming distance between the data and model graphs. In fact the local clique \npotentials display an interesting barrier property. The potential is concentrated at \nHamming distance H ~ ~. Both very large and very small Hamming distances \ncontribute insignificantly to the energy function, i.e. limH-to H exp[-J.tH] = 0 and \nlimH-too H exp[-J.tH] = o. \nWith the neighbourhood matching potentials to hand, we construct a global \n\"matching-energy\" [; = 2:aEVD U(r a) by summing the contributions over the nodes \nof the data graph. \n4 Optimising the Global Cost Function \nWe are now in a position to develop a continuous update algorithm by softening the \ndiscrete ingredients of our graph matching potential. The idea is to compute the \nderivatives of the global energy given in equation (10) and to effect the softening \nprocess using the soft-max idea of Bridle [1]. \n4.1 Softassign \n\nThe energy function represented by equations (9) and (10) is defined over the dis(cid:173)\ncrete matching variables Saa. The basic idea underpinning this paper is to realise a \ncontinuous process for updating the assignment variables. The optimal step-size is \ndetermined by computing the partial derivatives of the global matching energy with \nrespect to the assignment variables. We commence by computing the derivatives of \nthe contributing neighbourhood Gibbs potentials, i.e. \n\nwhere \n\n~aa = \n\nexp(-J.tH(a, a)] \n\n2:aIEVM exp[-J.tH(a, a l )] \n\n(11) \n\nTo further develop this result, we must compute the derivatives of the Hamming \ndistances. From equation (6) it follows that \n\n8H(a,a) _ M D \n\n8 \n\nSb{3 \n\n-\n\n-\n\na{3 \n\nab \n\n(12) \n\nIt is now a straightforward matter to show that the derivative of the global matching \nenergy is equal to \n\n\f442 \n\nA. M. Finch, R. C. Wilson and E. R. Hancock \n\nWe would like our continuous matching vanables to remain constrained to lie within \nthe range [0, 1]. Rather than using a linear update rule, we exploit Bridle's soft-max \nansatz [1). In doing this we arrive at an update process which has many features in \ncommon with the well-known mean-field equations of statistical physics \n\nexp[-~~] \nSao. +- -----'::.........,[;:---:--0-3-:\"\"\"\"\"\"\"\"\"] \nL exp -~_\u00a3 \nT OSaa' \n\nT OSaa \n\na'EVM \n\n(14) \n\nThe mathematical structure of this update process is important and deserves further \ncomment. The quantity eaa defined in equation (11) naturally plays the role of a \nmatching probability. The first term appearing under the square bracket in equation \n(13) can therefore be thought of as analogous to the optimal update direction for \nthe standard quadratic cost function [10,6); we will discus this relationship in more \ndetail in Section 4.2. The second term modifies this principal update direction \nby taking into account the weighted fluctuations in the Hamming distance about \nthe effective potential or average Hamming distance. If the average fluctuation \nis zero, then there is no net modification to the update direction. When the net \nfluctuation is non-zero, the direction of update is modified so as to compensate for \nthe movement of the mean-value of the effective potential. This corrective tracking \nprocess provides an explicit mechanism for maintaining contact with the minimum \nof the effective potential under rescaling effects induced by changes in the value of \nthe coupling constant p. Moreover, since the fluctuation term is itself proportional \nto p, this has an insignificant effect for Pe ~ ~ but dominates the update process \nwhen Pe -+ 0. \n4.2 Quadratic Assignment Problem \n\nBefore we proceed to experiment with the new graph matching process, it is inter(cid:173)\nesting to briefly review the standard quadratic formulation of the matching problem \ninvestigated by Simic (9], Suganathan et al (to] and Gold and Rangarajan (6]. The \ncommon feature of these algorithms is to commence from the quadratic cost function \n\n(15) \n\nIn this case the derivative of the global cost function is linear in the assignment \nvariables, i.e. \n\n(16) \n\nThis step size is equivalent to that appearing in equation (14) provided that p = 0, \ni.e. Pe -+ !. The update is realised by applying the soft-max ansatz of equation \n(14) . In the next section, we will provide some experimental comparison with the \nresulting matching process. However, it is important to stress that the update pro(cid:173)\ncess adopted here is very simplistic and leaves considerable scope for further refine(cid:173)\nment. For instance, Gold and Rangarajan (6] have exploited the doubly stochastic \nproperties of Sinckhorn matrices to ensure two-way symmetry in the matching pro(cid:173)\ncess. \n\n\fSoftening Discrete Relaxation \n\n443 \n\n5 Experiments and Conclusions \n\nOur main aim in this Section is to compare the non-linear update equations with \nthe optimisation of the quadratic matching criterion described in Section 4.2. The \ndata for our study is provided by synthetic Delaunay graphs. These graphs are \nconstructed by generating random dot patterns. Each random dot is used to seed \na Voronoi cell. The Delaunay triangulation is the region adjacency graph for the \nVoronoi cells. In order to pose demanding tests of our matching technique, we have \nadded controlled amounts of corruption to the synthetic graphs. This is effected by \ndeleting and adding a specified fraction of the dots from the initial random patterns. \nThe associated Delaunay graph is therefore subject to structural corruption. We \nmeasure the degree of corruption by the fraction of surviving nodes in the corrupted \nDelaunay graph. \n\nOur experimental protocol has been as follows . For a series of different corruption \nlevels, we have generated a sample of 100 random graphs. The graphs contain 50 \nnodes each. According to the specified corruption level, we have both added and \ndeleted a predefined fraction of nodes at random locations in the initial graphs so \nas to maintain their overall size. For each graph we measure the quality of match \nby computing the fraction of the surviving nodes for which the assignment variables \nindicate the correct match. The value of the temperature T in the update process \nhas been controlled using a logarithmic annealing schedule of the form suggested \nby Geman and Geman (41 . We initialise the assignment variables uniformly across \nthe set of matches by setting Saa = JM , \"ta, 0:. \nWe have compared the results obtained with two different versions of the matching \nalgorithm. The first of these involves updating the softened assignment variables \nby applying the non-linear update equation given in (14). The second matching \nalgorithm involves applying the same optimisation apparatus to the quadratic cost \nfunction defined in equation (15) in a simplified form of the quadratic assignment \nalgorithm [6, 101. \n\nFigure 1 shows the final fraction of correct matches for both algorithms. The data \ncurves show the correct matching fraction averaged over the graph samples as a \nfunction of the corruption fraction. The main conclusions that can be drawn from \nthese plots is that the new matching technique described in this paper significantly \noutperforms its conventional quadratic counterpart described in Section 4.2. The \nmain difference between the two techniques resides in the fact that our new method \nrelies on updating with derivatives of the energy function that are non-linear in the \nassignment variables. \n\nTo conclude, our main contribution in this paper has been to demonstrate how \nthe discrete Bayesian relational consistency measure of Wilson and Hancock (111 \ncan be cast in a form that is amenable to continuous non-linear optimisation. We \nhave shown how the method relates to the standard quadratic assignment algorithm \nextensively studied in the connectionist literature [6, 9, 101. Moreover, an exper(cid:173)\nimental analysis reveals that the method offers superior performance in terms of \nnoise control. \n\nReferences \n[1] Bridle J.S. \"Training stochastic model recognition algorithms can lead to maximum \n\nmutual information estimation of parameters\" NIPS2, pp. 211-217, 1990. \n\n\f444 \n\nA. M. Finch, R. C. Wilson and E. R. Hancock \n\n~' .. -.. \"\", \n\n..... \n\n..... \n\nQuadratic Assignment - -\n\n\u00b7\u00b7\u00b7~\u00b7 ....... ~oftened Discrete Relaxation \n\nc: \n0 \n'13 \ntil u: \ni \n0 \n() \niii c: u:: \n\n0.8 \n\n0.6 \n\n0.4 \n\n0.2 \n\n0 \n\n0 \n\n0.2 \n\n. ~ \n\", \n\n\". \n\n'.' \u2022.. ~.--............... \" \n\n\u2022.. ' ...... .,.~-------i; \n\n\\ \n\\ \\ \n\\ '. \n\n\\ \n\n.... \n\n0.8 \n\n0.4 \n\n0.6 \n\nFraction of Graph Corrupt \n\nFigure 1: Experimental comparison: softened discrete relaxation (dotted curve); \nmatching using the quadratic cost function (solid curve). \n\n[2] Cross A.D.J., RC.Wilson and E.R Hancock, \"Genetic search for structural match(cid:173)\n\ning\", Proceedings ECCV96, LNCS 1064, pp. 514-525, 1996. \n\n[3] Cross A.D.J . and E .RHancock, \"Relational matching with stochastic optimisation\" \nIEEE Computer Society International Symposium on Computer Vision, pp . 365-370, \n1995. \n\n[4] Geman S. and D. Geman, \"Stochastic relaxation, Gibbs distributions and Bayesian \n\nrestoration of images,\" IEEE PAMI, PAMI-6 , pp.721- 741 , 1984. \n\n[5] Gold S., A. Rangarajan and E. Mjolsness, \"Learning with pre-knowledge: Clustering \nwith point and graph-matching distance measures\", Neural Computation, 8, pp. 787-\n804, 1996. \n\n[6] Gold S. and A. Rangarajan, \"A graduated assignment algorithm for graph matching\", \n\nIEEE PAMI, 18, pp. 377-388, 1996. \n\n[7] Sanfeliu A. and Fu K.S ., \"A distance measure between attributed relational graphs \n\nfor pattern recognition\", IEEE SMC, 13, pp 353-362, 1983. \n\n[8] Shapiro L. and RM.Haralick, \"Structural description and inexact matching\", IEEE \n\nPAM!, 3, pp 504-519, 1981. \n\n[9] Simic P., \"Constrained nets for graph matching and other quadratic assignment prob(cid:173)\n\nlems\", Neural Computation, 3 , pp . 268- 281, 1991. \n\n[10] Suganathan P.N., E .K. Teoh and D.P. Mital, \"Pattern recognition by graph matching \n\nusing Potts MFT networks\", Pattern Recognition, 28, pp. 997-1009, 1995. \n\n[11] Wilson RC., Evans A.N. and Hancock E.R, \"Relational matching by discrete relax(cid:173)\n\nation\", Image and Vision Computing, 13, pp. 411-421, 1995. \n\n[12] Wilson RC and Hancock E.R, \"Relational matching with dynamic graph struc(cid:173)\n\ntures\" , Proceedings of the Fifth International Conference on Computer Vision, pp. \n450-456, 1995. \n\n[13] Yuille A., \"Generalised deformable models, statistical physics and matching prob(cid:173)\n\nlems\", Neural Computation, 2, pp. 1-24, 1990. \n\n\f", "award": [], "sourceid": 1308, "authors": [{"given_name": "Andrew", "family_name": "Finch", "institution": null}, {"given_name": "Richard", "family_name": "Wilson", "institution": null}, {"given_name": "Edwin", "family_name": "Hancock", "institution": null}]}