{"title": "Graph Matching for Shape Retrieval", "book": "Advances in Neural Information Processing Systems", "page_first": 896, "page_last": 902, "abstract": null, "full_text": "The Bias-Variance Tradeoff and the Randomized \n\nGACV \n\nGrace Wahba, Xiwu Lin and Fangyu Gao \n\nDept of Statistics \nUniv of Wisconsin \n\n1210 W Dayton Street \nMadison, WI 53706 \n\nwahba,xiwu,fgao@stat.wisc.edu \n\nDong Xiang \n\nSAS Institute, Inc. \nSAS Campus Drive \n\nCary, NC 27513 \n\nsasdxx@unx.sas.com \n\nRonald Klein, MD and Barbara Klein, MD \n\nDept of Ophthalmalogy \n610 North Walnut Street \n\nMadison, WI 53706 \n\nkleinr,kleinb@epi.ophth.wisc.edu \n\nAbstract \n\nWe propose a new in-sample cross validation based method (randomized \nGACV) for choosing smoothing or bandwidth parameters that govern the \nbias-variance or fit-complexity tradeoff in 'soft' classification. Soft clas(cid:173)\nsification refers to a learning procedure which estimates the probability \nthat an example with a given attribute vector is in class 1 vs class O. The \ntarget for optimizing the the tradeoff is the Kullback-Liebler distance \nbetween the estimated probability distribution and the 'true' probabil(cid:173)\nity distribution, representing knowledge of an infinite population. The \nmethod uses a randomized estimate of the trace of a Hessian and mimics \ncross validation at the cost of a single relearning with perturbed outcome \ndata. \n\n1 \n\nINTRODUCTION \n\nWe propose and test a new in-sample cross-validation based method for optimizing the bias(cid:173)\nvariance tradeoff in 'soft classification' (Wahba et al1994), called ranG ACV (randomized \nGeneralized Approximate Cross Validation). Summarizing from Wahba et al(l994) we are \ngiven a training set consisting of n examples, where for each example we have a vector \nt E T of attribute values, and an outcome y, which is either 0 or 1. Based on the training \ndata it is desired to estimate the probability p of the outcome 1 for any new examples in the \n\n\fThe Bias-Variance TradeofJand the Randomized GACV \n\n621 \n\nfuture. In 'soft' classification the estimate p(t) of p(t) is of particular interest, and might be \nused by a physician to tell patients how they might modify their risk p by changing (some \ncomponent of) t, for example, cholesterol as a risk factor for heart attack. Penalized like(cid:173)\nlihood estimates are obtained for p by assuming that the logit f(t), t E T, which satisfies \np(t) = ef(t) 1(1 + ef(t\u00bb) is in some space 1{ of functions . Technically 1{ is a reproducing \nkernel Hilbert space, but you don't need to know what that is to read on. Let the training \nset be {Yi, ti, i = 1,\u00b7\u00b7\u00b7, n}. Letting Ii = f(td, the negative log likelihood .c{Yi, ti, fd of \nthe observations, given f is \n\nn \n\n.c{Yi, ti, fd = 2::[-Ydi + b(li)], \n\ni=1 \n\n(1) \n\nwhere b(f) = log(l + ef ). The penalized likelihood estimate of the function f is the \nsolution to: Find f E 1{ to minimize h. (I): \n\nn \n\nh.(f) = 2::[-Ydi + b(ld) + J>.(I), \n\ni =1 \n\n(2) \n\nwhere 1>.(1) is a quadratic penalty functional depending on parameter(s) A = (AI, ... , Aq) \nwhich govern the so called bias-variance tradeoff. Equivalently the components of A con(cid:173)\ntrol the tradeoff between the complexity of f and the fit to the training data. In this paper we \nsketch the derivation of the ranG ACV method for choosing A, and present some prelim(cid:173)\ninary but favorable simulation results, demonstrating its efficacy. This method is designed \nfor use with penalized likelihood estimates, but it is clear that it can be used with a variety \nof other methods which contain bias-variance parameters to be chosen, and for which mini(cid:173)\nmizing the Kullback-Liebler (K L) distance is the target. In the work of which this is a part, \nwe are concerned with A having multiple components. Thus, it will be highly convenient \nto have an in-sample method for selecting A, if one that is accurate and computationally \nconvenient can be found. \n\nLet P>. be the the estimate and p be the 'true' but unknown probability function and let \nPi = p(td,p>.i = p>.(ti). For in-sample tuning, our criteria for a good choice of A is \nthe KL distance KL(p,p>.) = ~ E~I[PilogP7. + (1- pdlogg~::?)]. We may replace \nK L(p,p>.) by the comparative K L distance (C K L), which differs from K L by a quantity \nwhich does not depend on A. Letting hi = h (ti), the C K L is given by \nCKL(p,p>.) == CKL(A) = ;;, 2:: [-pd>'i + b(l>.i)). \n\n1 n \n\n(3) \n\ni=) \n\nC K L(A) depends on the unknown p, and it is desired is to have a good estimate or proxy \nfor it, which can then be minimized with respect to A. \nIt is known (Wong 1992) that no exact unbiased estimate of CK L(A) exists in this case, so \nthat only approximate methods are possible. A number of authors have tackled this prob(cid:173)\nlem, including Utans and M90dy(1993), Liu(l993), Gu(1992). The iterative U BR method \nof Gu(l992) is included in GRKPACK (Wang 1997), which implements general smooth(cid:173)\ning spline ANOVA penalized likelihood estimates with multiple smoothing parameters. It \nhas been successfully used in a number of practical problems, see, for example, Wahba \net al (1994,1995). The present work represents an approach in the spirit of GRKPACK \nbut which employs several approximations, and may be used with any data set, no matter \nhow large, provided that an algorithm for solving the penalized likelihood equations, either \nexactly or approximately, can be implemented. \n\n\f622 \n\nG. Wahba et al. \n\n2 THE GACV ESTIMATE \n\nIn the general penalized likelihood problem the minimizer 1>,(-) of (2) has a representation \n\nM \n\n1>.(t) = L dv.(ti, t) \n\nn \n\n(4) \n\nv=l \n\ni=l \n\nwhere the \" Q>.(8, t) is a reproducing kernel (positive definite \nfunction) for the penalized part of 7-1., and C = (Cl' ... ,Cn)' satisfies M linear conditions, \nso that there are (at most) n free parameters in 1>.. Typically the unpenalized functions \n.C) is of the form (4) then 1>,(f>.) is a quadratic form in c. Substituting (4) into (2) \nresults in h a convex functional in C and d, and C and d are obtained numerically via a \nNewton Raphson iteration, subject to the conditions on c. For large n, the second sum on \nthe right of (4) may be replaced by L~=1 Cik Q>. (tik , t), where the tik are chosen via one \nof several principled methods. \n\nTo obtain the CACV we begin with the ordinary leaving-out-one cross validation function \nCV(.\\) for the CKL: \n\nCV .\\) -\n\n( \n\n_ 1 \"\" \n\n- LJ-yd>.i + b 1>.i) , \n] \nn \n\n[-i] \n\n( \n\nn \n\ni=1 \n\n(5) \n\nwhere fl- i ] the solution to the variational problem of (2) with the ith data point left out \nand fti] is the value of fl- i] at ti . Although f>.C) is computed by solving for C and d \nthe CACV is derived in terms of the values (it\"\", fn)' of f at the ti. Where there is \nno confusion between functions f(-) and vectors (it, ... ,fn)' of values of fat tl, ... ,tn, \nwe let f = (it, ... \" fn)'. For any f(-) of the form (4), J>. (f) also has a representation as \na non-negative definite quadratic form in (it, ... , fn)'. Letting L:>. be twice the matrix of \nthis quadratic form we can rewrite (2) as \n\nn \n\nh(f,Y) = L[-Ydi + b(/i)] + 2f'L:>.f. \n\n1 \n\n(6) \n\ni=1 \n\nLet W = W(f) be the n x n diagonal matrix with (/ii == Pi(l - Pi) in the iith position. \nUsing the fact that (/ii is the second derivative of b(fi), we have that H = [W + L:>.] - 1 \nis the inverse Hessian of the variational problem (6). In Xiang and Wahba (1996), several \nTaylor series approximations, along with a generalization of the leaving-out-one lemma \n(see Wahba 1990) are applied to (5) to obtain an approximate cross validation function \nACV(.\\), which is a second order approximation to CV(.\\) . Letting hii be the iith entry \nof H , the result is \n\nCV(.\\) ~ ACV('\\) = .!. t[-Yd>.i + b(f>.i)] + .!. t \nn i=1 \n\nn i= l \n\nhiiYi(Yi - P>.i) . \n\n[1 - hiwii] \n\n(7) \n\nThen the GACV is obtained from the ACV by replacing hii by ~ L~1 hii == ~tr(H) \nand replacing 1 - hiWii by ~tr[I - (Wl/2 HWl/2)], giving \n\nCACV('\\) = ;; t;;[-Yd>.i + b(1).i) + -n-tr[I _ (Wl/2HWl /2)] , \n\n] \n\ntr(H) L~l Yi(Yi - P>.i) \n\n1 ~ \n\n(8) \n\nwhere W is evaluated at 1>.. Numerical results based on an exact calculation of (8) appear \nin Xiang and Wahba (1996). The exact calculation is limited to small n however. \n\n\fThe Bias-Variance TradeofJand the Randomized GACV \n\n623 \n\n3 THE RANDOMIZED GACV ESTIMATE \n\nGiven any 'black box' which, given >., and a training set {Yi, ti} produces f>. (.) as the min(cid:173)\nimizer of (2), and thence f>. = (fA 1 , \" ' , f>.n)', we can produce randomized estimates of \ntrH and tr[! - W 1/ 2 HW 1/ 2 J without having any explicit calculations of these matrices. \nThis is done by running the 'black box' on perturbed data {Vi + <5i , td. For the Yi Gaus(cid:173)\nsian, randomized trace estimates of the Hessian of the variational problem (the 'influence \nmatrix') have been studied extensively and shown to be essentially as good as exact calcu(cid:173)\nlations for large n, see for example Girard(1998) . Randomized trace estimates are based \non the fact that if A is any square matrix and <5 is a zero mean random n-vector with inde(cid:173)\npendent components with variance (TJ, then E<5' A<5 = ~ tr A. See Gong et al( 1998) and \nreferences cited there for experimental results with multiple regularization parameters. Re-\nturning to the 0-1 data case, it is easy to see that the minimizer fA(') of 1;.. is continuous in \nY, not withstanding the fact that in our training set the Yi take on only values 0 or 1. Letting \nif = UA1,\"', f>.n)' be the minimizer of (6) given y = (Y1,\"', Yn)', and if+O be the \nminimizer given data y+<5 = (Y1 +<51, ... ,Yn +<5n)' (the ti remain fixed), Xiang and Wahba \n(1997) show, again using Taylor series expansions, that if+O - ff ,....., [WUf) + ~AJ-1<5. \nThis suggests that ~<5'Uf+O - ff) provides an estimate oftr[W(ff) + ~At1. However, \nif we take the solution ff to the nonlinear system for the original data Y as the initial value \nfor a Newton-Raphson calculation of ff+O things become even simpler. Applying a one \nstep Newton-Raphson iteration gives \n\nu\" \n\nu\" \n\n(9) \n\nSince Pjf(ff,y + <5) = \n[ 82 h(fY )J- 1 \n8?7if A' Y \n[WUf) + EAt 1<5. The result is the following ranGACV function: \n\n-<5 + PjfUf,Y) = -<5, and [:;~f(ff,Y + <5)J - 1 \nf y+o,l -\nfY \n- A \n-\n\nfY \nA + 8?7if A' Y \n\n[ 8 2 h(fY )J- 1 J: \n\n,we ave A \n\nu so t at A \n\nf y+o,l \n\nh \n\nh \n\nranGACV(>.) = .!. ~[- 'I '+bU .)J+ \n\nn \n\nn ~ Yz At \n\nAt \n\n<5' (f Y+O,l \nn \n\nA \n\nfY) \n- A \n\n) \n\",n \nwi=l Yi Yi - PAi \n\n( \n\n. \n\n[<5'<5 - <5'WUf)Uf+O,l - ff)J \n(10) \nin (10), we may draw R \n'+' in \nto obtain an R-replicated \n\nthe variance in \n\nTo reduce \nindependent replicate vectors <51,'\" ,<5R , and replace the term after the \n(1O)b \n\n1... \",R o:(fr Hr .1 -ff) \n\n2:7-1 y.(y.-P>..) \n\nterm after the \n\nthe \n\n'+' \n\n[O~Or-O~ W(fn(f~+Ar . l-ff)1 \n\ny R wr=l \n\nn \nranGACV(>.) function. \n\n4 NUMERICAL RESULTS \n\nIn this section we present simulation results which are representative of more extensive \nsimulations to appear elsewhere. In each case, K < < n was chosen by a sequential clus(cid:173)\ntering algorithm. In that case, the ti were grouped into K clusters and one member of each \ncluster selected at random. The model is fit. Then the number of clusters is doubled and the \nmodel is fit again. This procedure continues until the fit does not change. In the randomized \ntrace estimates the random variates were Gaussian. Penalty functionals were (multivariate \ngeneralizations of) the cubic spline penalty functional>. fa1 U\" (X))2, and smoothing spline \nANOVA models were fit. \n\n\f624 \n\nG. Wahba et at. \n\n4.1 EXPERIMENT 1. SINGLE SMOOTHING PARAMETER \n\nIn this experiment t E [0,1], f(t) = 2sin(10t), ti = (i -\n.5)/500, i = 1,\u00b7\u00b7\u00b7,500. A \nrandom number generator produced 'observations' Yi = 1 with probability Pi = el , /(1 + \neli), to get the training set. Q A is given in Wahba( 1990) for this cubic spline case, K = 50. \nSince the true P is known, the true CKL can be computed. Fig. \nl(a) gives a plot of \nCK L(A) and 10 replicates of ranGACV(A). In each replicate R was taken as 1, and J \nwas generated anew as a Gaussian random vector with (115 = .001. Extensive simulations \nwith different (115 showed that the results were insensitive to (115 from 1.0 to 10-6 \u2022 The \nminimizer of C K L is at the filled-in circle and the 10 minimizers of the 10 replicates of \nranGACV are the open circles. Anyone of these 10 provides a rather good estimate of \nthe A that goes with the filled-in circle. Fig. l(b) gives the same experiment, except that \nthis time R = 5. It can be seen that the minimizers ranGACV become even more reliable \nestimates of the minimizer of C K L, and the C K L at all of the ranG ACV estimates are \nactually quite close to its minimum value. \n\n4.2 EXPERIMENT 2. ADDITIVE MODEL WITH A = (Al' A2) \n\nHere t E [0,1] 0 [0,1]. n = 500 values of ti were generated randomly according to \na uniform distribution on the unit square and the Yi were generated according to Pi = \neli j(l + el ,) with t = (Xl,X2) and f(t) = 5 sin 27rXl - 3sin27rX2. An additive model \nas a special case of the smoothing spline ANOVA model (see Wahba et al, 1995), of the \nform f(t) = /-l + h(xd + h(X2) with cubic spline penalties on hand h were used. \n.001, R = 5. Figure l(c) gives a plot of CK L(Al' A2) and Figure l(d) \nK = 50, (115 = \ngives a plot of ranGACV(Al, A2). The open circles mark the minimizer of ranGACV \nin both plots and the filled in circle marks the minimizer of C K L. The inefficiency, as \nmeasured by CKL()..)/minACKL(A) is 1.01. Inefficiencies near 1 are typical of our \nother similar simulations. \n\n4.3 EXPERIMENT 3. COMPARISON OF ranGACV AND UBR \n\nThis experiment used a model similar to the model fit by GRKPACK for the risk of \nprogression of diabetic retinopathy given t = \n(Xl, X2, X3) = (duration, glycosylated \nhemoglobin, body mass index) in Wahba et al(l995) as 'truth'. A training set of 669 \nexamples was generated according to that model, which had the structure f(Xl, X2, X3) = \n/-l + fl (xd + h (X2) + h (X3) + fl,3 (Xl, X3). This (synthetic) training set was fit by GRK(cid:173)\nPACK and also using K = 50 basis functions with ranG ACV. Here there are P = 6 \nsmoothing parameters (there are 3 smoothing parameters in f13) and the ranGACV func(cid:173)\ntion was searched by a downhill simplex method to find its minimizer. Since the 'truth' is \nknown, the CKL for)\" and for the GRKPACK fit using the iterative UBR method were \ncomputed. This was repeated 100 times, and the 100 pairs of C K L values appears in Fig(cid:173)\nure l(e). It can be seen that the U BR and ranGACV give similar C K L values about 90% \nof the time, while the ranG ACV has lower C K L for most of the remaining cases. \n\n4.4 DATA ANALYSIS: AN APPLICATION \n\nFigure 1(f) represents part of the results of a study of association at baseline of pigmentary \nabnormalities with various risk factors in 2585 women between the ages of 43 and 86 in the \nBeaver Dam Eye Study, R. Klein et al( 1995). The attributes are: Xl = age, X2 =body mass \nindex, X3 = systolic blood pressure, X4 = cholesterol. X5 and X6 are indicator variables for \ntaking hormones, and history of drinking. The smoothing spline ANOVA model fitted was \nf(t) = /-l+dlXl +d2X2 + h(X3)+ f4(X4)+ h4(X3, x4)+d5I(x5) +d6I(x6), where I is the \nindicator function. Figure l(e) represents a cross section of the fit for X5 = no, X6 = no, \n\n\fThe Bias- Variance Tradeoff and the Randomized GACV \n\n625 \n\nX2, X3 fixed at their medians and Xl fixed at the 75th percentile. The dotted lines are the \nBayesian confidence intervals, see Wahba et al( 1995). There is a suggestion of a borderline \ninverse association of cholesterol. The reason for this association is uncertain. More details \nwill appear elsewhere. \n\nPrincipled soft classification procedures can now be implemented in much larger data sets \nthan previously possible, and the ranG ACV should be applicable in general learning. \n\nReferences \n\nGirard, D. (1998), 'Asymptotic comparison of (partial) cross-validation, GCV and random(cid:173)\nized GCV in nonparametric regression', Ann. Statist. 126, 315-334. \nGirosi, F., Jones, M. & Poggio, T. (1995), 'Regularization theory and neural networks \narchitectures', Neural Computatioll 7,219-269. \nGong, J., Wahba, G., Johnson, D. & Tribbia, J. (1998), 'Adaptive tuning of numerical \nweather prediction models: simultaneous estimation of weighting, smoothing and physical \nparameters', MOllthly Weather Review 125, 210-231. \nGu, C. (1992), 'Penalized likelihood regression: a Bayesian analysis', Statistica Sinica \n2,255-264. \nKlein, R., Klein, B. & Moss, S. (1995), 'Age-related eye disease and survival. the Beaver \nDam Eye Study', Arch Ophthalmol113, 1995. \nLiu, Y. (1993), Unbiased estimate of generalization error and model selection in neural \nnetwork, manuscript, Department of Physics, Institute of Brain and Neural Systems, Brown \nUniversity. \nUtans, J. & Moody, J. (1993), Selecting neural network architectures via the prediction \nrisk: application to corporate bond rating prediction, in 'Proc. First Int'I Conf. on Artificial \nIntelligence Applications on Wall Street', IEEE Computer Society Press. \nWahba, G. (1990), Spline Models for Observational Data, SIAM. CBMS-NSF Regional \nConference Series in Applied Mathematics, v. 59. \nWahba, G. (1995), Generalization and regularization in nonlinear learning systems, in \nM. Arbib, ed., 'Handbook of Brain Theory and Neural Networks', MIT Press, pp. 426-\n430. \nWahba, G., Wang, Y., Gu, c., Klein, R. & Klein, B. (1994), Structured machine learning \nfor 'soft' classification with smoothing spline ANOVA and stacked tuning, testing and \nevaluation, in J. Cowan, G. Tesauro & J. Alspector, eds, 'Advances in Neural Information \nProcessing Systems 6', Morgan Kauffman, pp. 415-422. \nWahba, G., Wang, Y., Gu, C., Klein, R. & Klein, B. (1995), 'Smoothing spline AN OVA for \nexponential families, with application to the Wisconsin Epidemiological Study of Diabetic \nRetinopathy' , Ann. Statist. 23, 1865-1895. \nWang, Y. (1997), 'GRKPACK: Fitting smoothing spline analysis of variance models to data \nfrom exponential families', Commun. Statist. Sim. Compo 26,765-782. \nWong, W. (1992), Estimation of the loss of an estimate, Technical Report 356, Dept. of \nStatistics, University of Chicago, Chicago, II. \nXiang, D. & Wahba, G. (1996), 'A generalized approximate cross validation for smoothing \nsplines with non-Gaussian data', Statistica Sinica 6, 675-692, preprint TR 930 available \nvia www. stat. wise. edu/-wahba - > TRLIST. \nXiang, D. & Wahba, G. (1997), Approximate smoothing spline methods for large data sets \nin the binary case, Technical Report 982, Department of Statistics, University of Wisconsin, \nMadison WI. To appear in the Proceedings of the 1997 ASA Joint Statistical Meetings, \nBiometrics Section, pp 94-98 (1998). Also in TRLIST as above. \n\n\fG, Wahba et aI, \n\nCKL \n\nCKL \nranGACV \n\n626 \n\n10 \n(0 \nc:i \n\n0 \n(0 \nc:i \n\n10 \n10 \nc:i \n\n0 \n~ \n0 \n\n.' . \n\n10 \n(0 \nc:i \n\no \n(0 \nc:i \n\n10 \n10 \nc:i \n\no \n10 \nc:i \n\n-8 \n\n-7 \n\n-6 \n-5 \nlog lambda \n\n(a) \n\n-4 \n\n-3 \n\n-8 \n\n-7 \n\n-6 \n-5 \nlog lambda \n\n(b) \n\n-4 \n\n-3 \n\n9.29 \n\nr,,6 \n:0' \n~, O. 7 O. 9 \n\n-7 \n\nlog lambda1 \n\n-6 \n\n(c) \n\n-5 \n\no \n\n12! \no \n\nC\\I \n(0 \nc:i \n\nco \n10 \nc:i \n\n\"f \n\n... 0,28 \n\n.. \u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7r ..... ranGACV \n\n\\~7 \n0\\ \n\n.' \n\n\" ...... \n\n0:\\13 \n\n': \n\n:.\u00b7 .. O-:!4!7 \n: \n\n0 \n\n. . \n. . \n\n. \n. \n\n.: 0'F5 0'F8 0.[32 \n\nlog lambda1 \n\n-6 \n\n(d) \n\n-5 \n\n-4 \n\nO. 4 \n\n-4 \n\n.25 0'.2,4 \n\n-7 \n\n(0 \nc:i \n\n.~ =...,. \n.0 . \nca O \n.0 e a.. \n\nC\\I \nc:i \n\n(0 \n10 \nc:i ~--------.-------.--------r--~ \n\n0,56 \n\n0,58 \n\n0,60 \nranGACV \n\n(e) \n\n0,62 \n\no \nc:i ~ __ ~ ____ , -____ . -__ -. ____ . -__ ~ \n\n100 \n\n150 \n\n200 \n300 \nCholesterol (mg/dL) \n\n250 \n\n(f) \n\n350 \n\n400 \n\nFigure 1: (a) and (b): Single smoothing parameter comparison of ranGACV and CK L. \n(c) and (d): Two smoothing parameter comparison of ranGACV and CK L. (e): Compar(cid:173)\nison of ranG ACV and U B R. (f): Probability estimate from Beaver Dam Study \n\n\fGraph Matching for Shape Retrieval \n\nBenoit Huet, Andrew D.J. Cross and Edwin R. Hancock' \n\nDepartment of Computer Science, University of York \n\nYork, YOI 5DD, UK \n\nAbstract \n\nThis paper describes a Bayesian graph matching algorithm for \ndata-mining from large structural data-bases. The matching al(cid:173)\ngorithm uses edge-consistency and node attribute similarity to de(cid:173)\ntermine the a posteriori probability of a query graph for each of the \ncandidate matches in the data-base. The node feature-vectors are \nconstructed by computing normalised histograms of pairwise ge(cid:173)\nometric attributes. Attribute similarity is assessed by computing \nthe Bhattacharyya distance between the histograms. Recognition \nis realised by selecting the candidate from the data-base which has \nthe largest a posteriori probability. We illustrate the recognition \ntechnique on a data-base containing 2500 line patterns extracted \nfrom real-world imagery. Here the recognition technique is shown \nto significantly outperform a number of algorithm alternatives. \n\n1 \n\nIntroduction \n\nSince Barrow and Popplestone [1] first suggested that relational structures could be \nused to represent and interpret 2D scenes, there has been considerable interest in the \nmachine vision literature in developing practical graph-matching algorithms [8, 3, \n10]. The main computational issues are how to compare relational descriptions when \nthere is significant structural corruption [8, 10] and how to search for the best match \n[3]. Despite resulting in significant improvements in the available methodology for \ngraph-matching, there has been little progress in applying the resulting algorithms \nto large-scale object recognition problems. Most of the algorithms developed in the \nliterature are evaluated for the relatively simple problem of matching a model-graph \nagainst a scene known to contain the relevant structure. A more realistic problem is \nthat of taking a large number (maybe thousands) of scenes and retrieving the ones \nthat best match the model. Although this problem is key to data-mining from large \nlibraries of visual information, it has invariably been approached using low-level \nfeature comparison techniques. Very little effort [7,4] has been devoted to matching \n\n\u2022 corresponding author erh@cs.york.ac.uk \n\n\fGraph Matching for Shape Retrieval \n\n897 \n\nhigher-level structural primitives such as lines, curves or regions. Moreover, because \nof the perceived fragility of the graph matching process, there has been even less \neffort directed at attempting to retrieve shapes using relational information. \n\nHere we aim to fill this gap in the literature by using graph-matching as a means \nof retrieving the shape from a large data-based that most closely resembles a query \nshape. Although the indexation images in large data-bases is a problem of current \ntopicality in the computer vision literature [5, 6, 9], the work presented in this \npaper is more ambitious. Firstly, we adopt a structural abstraction of the shape \nrecognition problem and match using attributed relational graphs. Each shape in \nour data-base is a pattern of line-segments. The structural abstraction is a nearest \nneighbour graph for the centre-points of the line-segments. In addition, we exploit \nattribute information for the line patterns. Here the geometric arrangement of the \nline-segments is encapsulated using a histogram of Euclidean invariant pairwise (bi(cid:173)\nnary) attributes. For each line-segment in turn we construct a normalised histogram \nof relative angle and length with the remaining line-segments in the pattern. These \nhistograms capture the global geometric context of each line-segment. Moreover, \nwe interpret the pairwise geometric histograms as measurement densities for the \nline-segments which we compare using the Bhattacharyya distance. \n\nOnce we have established the pattern representation, we realise object recognition \nusing a Bayesian graph-matching algorithm. This is a two-step process. Firstly, \nwe establish correspondence matches between the individual tokens in the query \npattern and each of the patterns in the data-base. The correspondences matches \nare sought so as to maximise the a posteriori measurement probability. Once the \nMAP correspondence matches have been established, then the second step in our \nrecognition architecture involves selecting the line-pattern from the data-base which \nhas maximum matching probability. \n\n2 MAP Framework \n\nFormally our recognition problem is posed as follows. Each ARG in the database is \na triple, G = (Vc, Ec, Ac), where Vc is the set of vertices (nodes), Ec is the edge \nset (Ec C Vc x Vc), and Ac is the set of node attributes. In our experimental \nexample, the nodes represent line-structures segmented from 2D images. The edges \nare established by computing the N-nearest neighbour graph for the line-centres. \nEach node j E Vc is characterised by a vector of attributes, ~j and hence Ac = \n{~j jj E Vc}. \nIn the work reported here the attribute-vector is represents the \ncontents of a normalised pairwise attribute histogram . \nThe data-base of line-patterns is represented by the set of ARG's D = {G}. The \ngoal is to retrieve from the data-base D, the individual ARG that most closely \nresembles a query pattern Q = (VQ' EQ, AQ). We pose the retrieval process as one \nof associating with the query the graph from the data-base that has the largest a \nposteriori probability. In other words, the class identity of the graph which most \nclosely corresponds to the query is \n\nwQ = arg max P(G' IQ) \n\nC'EV \n\nHowever, since we wish to make a detailed structural comparison of the graphs, \nrather than comparing their overall statistical properties, we must first establish \na set of best-match correspondences between each ARG in the data-base and the \nquery Q. The set of correspondences between the query Q and the ARG G is \na relation fc : Vc f-7 VQ over the vertex sets of the two graphs. The mapping \nfunction consists of a set of Cartesian pairings between the nodes of the two graphs, \n\n\f898 \n\nB. Huet, A. D. 1. Cross and E. R. Hancock \n\ni.e. Ie = {(a,a);a E Ve,a E VQ} ~ Ve x VQ . Although this may appear to be a \nbrute force method, it must be stressed that we view this process of correspondence \nmatching as the final step in the filtering of the line-patterns. We provide more \ndetails of practical implementation in the experimental section of this paper. \n\nWith the correspondences to hand we can re-state our maximum a posteriori prob(cid:173)\nability recognition objective as a two step process. For each graph G in turn, we \nlocate the maximum a posteriori probability mapping function Ie onto the query \nQ. The second step is to perform recognition by selecting the graph whose mapping \nfunction results in the largest matching probability. These two steps are succinctly \ncaptured by the following statement of the recognition condition \n\nwQ = arg max max P(fe,IG', Q) \n\ne'ED la' \n\nThis global MAP condition is developed into a useful local update formula by apply(cid:173)\ning the Bayes formula to the a posteriori matching probability. The simplification \nis as follows \n\nPU IG Q) = p(Ae, AQl/e)P(felVe, Ee, VQ, EQ)P(Ve , Ee)P(VQ, EQ) \n\ne \n\n, \n\nP(G)P(Q) \n\nThe terms on the right-hand side of the Bayes formula convey the following \nmeaning. The conditional measurement density p(Ae,AQl/e) models the mea(cid:173)\nsurement similarity of the node-sets of the two graphs. The conditional prob(cid:173)\nability P(feIEe, EQ) models the structural similarity of the two graphs under \nthe current set of correspondence matches. The assumptions used in develop(cid:173)\ning our simplification of the a posteriori matching probability are as follows. \nFirstly, we assume that the joint measurements are conditionally independent \nof the structure of the two graphs provided that the set of correspondences is \nknown, i.e. P(Ae, AQl/e, Ee, Ve, EQ, VQ) = P(Ae, AQl/e). Secondly, we as(cid:173)\nsume that there is conditional independence of the two graphs in the absence of \ncorrespondences. In other words, P(Ve, Ee, VQ, EQ) = P(VQ, EQ)P(Ve, Ee) and \nP(G, Q) = P(G)P(Q). Finally, the graph priors P(Ve, Ee) , P(VQ, EQ) P(G) and \nP( Q) are taken as uniform and are eliminated from the decision making process. \n\nTo continue our development, we first focus on the conditional measurement density, \np(Ae, AQl/e) which models the process of comparing attribute similarity on the \nnodes of the two graphs. Assuming statistical independence of node attributes, the \nconditional measurement density p( Ae, AQ lie), can be factorised over the Cartesian \npairs (a, a) E Ve x VQ which constitute the the correspondence match Ie in the \nfollowing manner \n\np(Ae, AQl/e) = II P(~a' ~ol/e(a) = a) \n\n(a,o)E/a \n\nAs a result the correspondence matches may be optimised using a simple node-by(cid:173)\nnode discrete relaxation procedure. The rule for updating the match assigned to \nthe node a of the graph G is \n\nle(a) = arg max \n\nP(~a'~o)l/(a) = a)P(feIEe,EQ) \n\noEVQU{4>} \n\nIn order to model the structural consistency of the set of assigned matches,we turn \nto the framework recently reported by Finch, Wilson and Hancock [2}. This work \nprovides a framework for computing graph-matching energies using the weighted \nHamming distance between matched cliques. Since we are dealing with a large-scale \nobject recognition system, we would like to minimise the computational overheads \nassociated with establishing correspondence matches. For this reason, rather than \n\n\fGraph Matchingfor Shape Retrieval \n\n899 \n\nworking with graph neighbourhoods or cliques, we chose to work with the relational \nunits of the smallest practical size. In other words we satisfy ourself with measuring \nconsistency at the edge level. For edge-units, the structural matching probability \nP(fa!Va, Ea, VQ, EQ) is computed from the formula \n\n(a,b)EEG (Ct ,(J )EEQ \n\nwhere Pe is the probability of an error appearing on one of the edges of the matched \nstructure. The Sa,Ct are assignment variables which are used to represent the current \nstate of match and convey the following meaning \n\nSa Ct = {I if fa (a) = a \n\n0 otherwise \n\n, \n\n3 Histogram-based consistency \n\nWe now furnish some details of the shape retrieval task used in our experimental \nevaluation of the recognition method. In particular, we focus on the problem of \nrecognising 2D line patterns in a manner which is invariant to rotation, translation \nand scale. The raw information available for each line segment are its orientation \n(angle with respect to the horizontal axis) and its length (see figure 1). To illustrate \nhow the Euclidean invariant pairwise feature attributes are computed, suppose that \nwe denote the line segments associated with the nodes indexed a and b by the \nvectors Ya and Yb respectively. The vectors are directed away from their point of \nintersection. The pairwise relative angle attribute is given by \n\n(Ja ,b = arccos [I:: 1\u00b71::1] \n\nFrom the relative angle we compute the directed relative angle. This involves giving \n\nd \n\n~:.~~: \n\nb - ! \n\n---------c:----;-~:---\n\no..b \n\n~------. \n\n---------------. \n\nD;b \n\nFigure 1: Geometry for shape representation \n\nthe relative angle a positive sign if the direction of the angle from the baseline Ya to \nits partner Yb is clockwise and a negative sign if it is counter-clockwise. This allows \nus to extend the range of angles describing pairs of segments from [0,7I\"J to [-7I\",7I\"J. \nThe directed relative position {}a,b is represented by the normalised length ratio \nbetween the oriented baseline vector Ya and the vector yl joining the end (b) of the \nbaseline segment (ab) to the intersection of the segment pair (cd). \n\n{}a,b = \n\n1 \nD \n\nl+~ \n2 \n\nDab \n\n\f900 \n\nB. Huet, A. D. 1. Cross and E. R. Hancock \n\nThe physical range of this attribute is (0, IJ. A relative position of 0 indicates that \nthe two segments are parallel, while a relative position of 1 indicates that the two \nsegments intersect at the middle point of the baseline. \nThe Euclidean invariant angle and position attributes 8a,b and {)a ,b are binned in a \nhistogram. Suppose that Sa(J-l, v) = {(a , b)18a,b E All 1\\ {)a,b E Rv 1\\ bE VD} is the \nset of nodes whose pairwise geometric attributes with the node a are spanned by \nthe range of directed relative angles All and the relative position attribute range \nRv. The contents of the histogram bin spanning the two attribute ranges is given \nby Ha(J-l, v) = ISa(J-l, v)l. Each histogram contains nA relative angle bins and nR \nlength ratio bins. The normalised geometric histogram bin-entries are computed as \nfollows \n\nha(J-l, v) = \"nA \"nR H ( \n\n) \n~Il'=l ~v'=l a J-l, v \n\nHa(J-l, v) \n\nThe probability of match between the pattern-vectors is computed using the Bhat(cid:173)\ntacharyya distance between the normalised histograms. \n\nP(f(a) = al~a' ~a) = L \n\nI:~~l I:~~l ha(J-l, v)ha(J-l, v) \n)h ( \n\nI:nA I:nR h ( \n\nj'EQ \n\nIl'=l \n\nv'=l a J-l, V \n\na J-l, V \n\n) = exp[-Ba ,aJ \n\nWith this modelling ingredient, the condition for recognition is \n\nWQ = arg~~% L L {-Ba,a-Bb,iJ+ln(I-Pe)Sa,aSb,iJ+lnPe(I-Sa,aSb,/3)} \n\n(a , b}EE~ (a,iJ}EEQ \n\n4 Experiments \n\nThe aim in this section is to evaluate the graph-based recognition scheme on a data(cid:173)\nbase of real-world line-patterns. We have conducted our recognition experiments \nwith a data-base of 2500 line-patterns each containing over a hundred lines. The \nline-patterns have been obtained by applying line/edge detection algorithms to the \nraw grey-scale images followed by polygonisation. For each line-pattern in the data(cid:173)\nbase, we construct the six-nearest neighbour graph. The feature extraction process \ntogether with other details of the data used in our study are described in recent \npapers where we have focussed on the issues of histogram representation [4J and the \noptimal choice of the relational structure for the purposes of recognition. In order to \nprune the set of line-patterns for detailed graph-matching we select about 10% of the \ndata-base using a two-step process. This consists of first refining the data-base using \na global histogram of pairwise attributes [4J . The top quartile of matches selected \nin this way are then further refined using a variant of the Haussdorff distance to \nselect the set of pairwise attributes that best match against the query. \n\nThe recognition task is posed as one of recovering the line-pattern which most closely \nresembles a digital map. The original images from which our line-patterns have been \nobtained are from a number of diverse sources. However, a subset of the images are \naerial infra-red line-scan views of southern England. Two of these infra-red images \ncorrespond to different views of the area covered by the digital map. These views \nare obtained when the line-scan device is flying at different altitudes. The line-scan \ndevice used to obtain the aerial images introduces severe barrel distortions and \nhence the map and aerial images are not simply related via a Euclidean or affine \ntransformation. The remaining line-patterns in the data-base have been extracted \nfrom trademarks and logos. It is important to stress that although the raw images \nare obtained from different sources, there is nothing salient about their associated \nline-pattern representations that allows us to distinguish them from one-another. \n\n\fGraph Matchingfor Shape Retrieval \n\n901 \n\n(a) Digital Map \n\n(b) Target 1 \n\n(c) Target 2 \n\nFigure 2: Images from the data-base \n\nMoreover, since it is derived from a digital map rather than one of the images in \nthe data-base, the query is not identical to any of the line-patterns in the model \nlibrary. \n\nWe aim to assess the importance of different attributes representation on the re(cid:173)\ntrieval process. To this end, we compare node-based and the histogram-based at(cid:173)\ntribute representation. \\Ve also consider the effect of taking the relative angle and \nrelative position attributes both singly and in tandem. The final aspect of the \ncomparison is to consider the effects of using the attributes purely for initialisation \npurposes and also in a persistent way during the iteration of the matching process. \nTo this end we consider the following variants of our algorithm . \n\n\u2022 Non-Persistent Attributes: Here we ignore the attribute information \nprovided by the node-histograms after the first iteration and attempt to \nmaximise the structural congruence of the graphs . \n\n\u2022 Local attributes: Here we use only the single node attributes rather than \n\nan attribute histogram to model the a posteriori matching probabilities. \n\nGraph Matching Strategy \n\nRetrieval \nAccuracy \n\nIterations \nper recall \n\nReI. Position Attribute iInitialisation only) \nReI. Angle Attribute (Initialisation only) \n\nReI. Angle + Position Attributes (Initialisation only) \n\n1D ReI. Position Histogram (Initialisation only) \n1D ReI. Angle Histogram (Initialisation only) \n\n2D Histogram (Initialisation only) \nReI. Position Attribute (Persistent) \nReI. Angle Attribute (Persistent) \n\nReI. Angle + Position Attributes (Persistent) \n\n1D ReI. Position Histogram (Persistent) \n1D ReI. Angle Histogram (Persistent) \n\n39% \n45% \n58% \n42% \n59% \n68% \n63% \n89% \n98% \n66% \n92% \n100% \n\n5.2 \n4.75 \n4.27 \n4.7 \n4.2 \n3.9 \n3.96 \n3.59 \n3.31 \n3.46 \n3.23 \n3.12 \n\n2D Histogram (Persistent) \n\nTable 1: Recognition performance of various recognition strategies averaged over \n26 queries in a database of 260 line-patterns \nIn Table 1 we present the recognition performance for each of the recognition strate(cid:173)\ngies in turn. The table lists the recall performance together with the average number \n\n\f902 \n\nB. Huet, A. D. 1. Cross and E. R. Hancock \n\nof iterations per recall for each of the recognition strategies in turn. The main fea(cid:173)\ntures to note from this table are as follows . Firstly, the iterative recall using the full \nhistogram representation outperforms each of the remaining recognition methods \nin terms of both accuracy and computational overheads. Secondly, it is interesting \nto compare the effect of using the histogram in the initialisation-only and iteration \npersistent modes. In the latter case the recall performance is some 32% better than \nin the former case. In the non-persistent mode the best recognition accuracy that \ncan be obtained is 68%. Moreover, the recall is typically achieved in only 3.12 it(cid:173)\nerations as opposed to 3.9 (average over 26 queries on a database of 260 images) . \nFinally, the histogram representation provides better performance, and more signif(cid:173)\nicantly, much faster recall than the single-attribute similarity measure. When the \nattributes are used singly, rather than in tandem, then it is the relative angle that \nappears to be the most powerful. \n\n5 Conclusions \n\nWe have presented a practical graph-matching algorithm for data-mining in large \nstructural libraries. The main conclusion to be drawn from this study is that the \ncombined use of structural and histogram information improves both recognition \nperformance and recall speed. There are a number of ways in which the ideas \npresented in this paper can be extended. Firstly, we intend to explore more a per(cid:173)\nceptually meaningful representation of the line patterns, using grouping principals \nderived from Gestalt psychology. Secondly, we are exploring the possibility of for(cid:173)\nmulating the filtering of line-patterns prior to graph matching using Bayes decision \ntrees. \n\nReferences \n\n[1] H. Barrow and R. Popplestone. Relational descriptions in picture processing. \n\nMachine Intelligence, 5:377- 396, 1971. \n\n[2] A. Finch, R. Wilson, and E. Hancock. Softening discrete relaxation. Advances \nin NIPS 9, Edited by M. Mozer, M. Jordan and T. Petsche, MIT Press, pages \n438- 444, 1997. \n\n[3] S. Gold and A. Rangarajan. A graduated assignment algorithm for graph \n\nmatching. IEEE PAMI, 18:377- 388, 1996. \n\n[4] B. Huet and E. Hancock. Relational histograms for shape indexing. IEEE \n\nICC V, pages 563- 569, 1998. \n\n[5] W. Niblack et al.. The QBIC project: Querying images by content using color, \n\ntexture and shape. Image and Vision Storage and Retrieval, 173- 187, 1993. \n[6] A. P. Pentland, R. W. Picard, and S. Scarloff. Photobook: tools for content(cid:173)\n\nbased manipulation of image databases. Storage and Retrieval for Image and \nVideo Database II, pages 34- 47, February 1994. \n\n[7] K. Sengupta and K. Boyer. Organising large structural databases. IEEE PAMI, \n\n17(4):321- 332,1995. \n\n[8] 1. Shapiro and R. Haralick. A metric for comparing relational descriptions. \n\nIEEE PAMI, 7(1):90- 94, 1985. \n\n[9] M. Swain and D. Ballard. Color indexing. International Journal of Computer \n\nVision, 7(1) :11- 32, 1991. \n\n[10] R. Wilson and E. R. Hancock. Structural matching by discrete relaxation. \n\nIEEE PAMI, 19(6):634- 648, June 1997. \n\n\f", "award": [], "sourceid": 1484, "authors": [{"given_name": "Benoit", "family_name": "Huet", "institution": null}, {"given_name": "Andrew", "family_name": "Cross", "institution": null}, {"given_name": "Edwin", "family_name": "Hancock", "institution": null}]}