{"title": "Clustering data through an analogy to the Potts model", "book": "Advances in Neural Information Processing Systems", "page_first": 416, "page_last": 422, "abstract": null, "full_text": "Clustering data through an analogy to \n\nthe Potts model \n\nMarcelo Blatt, Shai Wiseman and Eytan Domany \n\nDepartment of Physics of Complex Systems, \n\nThe Weizmann Institute of Science, Rehovot 76100, Israel \n\nAbstract \n\nA new approach for clustering is proposed. This method is based \non an analogy to a physical model; the ferromagnetic Potts model \nat thermal equilibrium is used as an analog computer for this hard \noptimization problem . We do not assume any structure of the un(cid:173)\nderlying distribution of the data. Phase space of the Potts model is \ndivided into three regions; ferromagnetic, super-paramagnetic and \nparamagnetic phases. The region of interest is that corresponding \nto the super-paramagnetic one, where domains of aligned spins ap(cid:173)\npear. The range of temperatures where these structures are stable \nis indicated by a non-vanishing magnetic susceptibility. We use a \nvery efficient Monte Carlo algorithm to measure the susceptibil(cid:173)\nity and the spin spin correlation function. The values of the spin \nspin correlation function, at the super-paramagnetic phase, serve \nto identify the partition of the data points into clusters. \n\nMany natural phenomena can be viewed as optimization processes, and the drive to \nunderstand and analyze them yielded powerful mathematical methods. Thus when \nwishing to solve a hard optimization problem, it may be advantageous to apply these \nmethods through a physical analogy. Indeed, recently techniques from statistical \nphysics have been adapted for solving hard optimization problems (see e.g. Yuille \nand Kosowsky, 1994). In this work we formulate the problem of clustering in terms \nof a ferromagnetic Potts spin model. Using the Monte Carlo method we estimate \nphysical quantities such as the spin spin correlation function and the susceptibility, \nand deduce from them the number of clusters and cluster sizes. \nCluster analysis is an important technique in exploratory data analysis and is ap(cid:173)\nplied in a variety of engineering and scientific disciplines. The problem of partitionaZ \nclustering can be formally stated as follows. With everyone of i = 1,2, ... N pat(cid:173)\nterns represented as a point Xi in a d-dimensional metric space, determine the \npartition of these N points into M groups, called clusters, such that points in a \ncluster are more similar to each other than to points in different clusters. The value \nof M also has to be determined. \n\n\fClustering Data through an Analogy to the Potts Model \n\n417 \n\nThe two main approaches to partitional clustering are called parametric and non(cid:173)\nparametric. In parametric approaches some knowledge of the clusters' structure is \nassumed (e.g . each cluster can be represented by a center and a spread around \nit) . This assumption is incorporated in a global criterion. The goal is to assign the \ndata points so that the criterion is minimized . A typical example is variance min(cid:173)\nimization (Rose, Gurewitz, and Fox, 1993) . On the other hand, in non-parametric \napproaches a local criterion is used to build clusters by utilizing local structure of \nthe data. For example, clusters can be formed by identifying high-density regions \nin the data space or by assigning a point and its K -nearest neighbors to the same \ncluster. In recent years many parametric partitional clustering algorithms rooted \nin statistical physics were presented (see e.g. Buhmann and Kiihnel , 1993). In the \npresent work we use methods of statistical physics in non-parametric clustering. \n\nOur aim is to use a physical problem as an analog to the clustering problem. The \nnotion of clusters comes very naturally in Potts spin models (Wang and Swendsen, \n1990) where clusters are closely related to ordered regions of spins. We place a Potts \nspin variable Si at each point Xi (that represents one of the patterns), and introduce \na short range ferromagnetic interaction Jij between pairs of spins, whose strength \ndecreases as the inter-spin distance Ilxi - Xj\" increases. The system is governed by \nthe Hamiltonian (energy function) \n\n1i = - L hj D8,,8j \n\n* \n\nSi = 1 . .. q , \n\n(1) \n\nwhere the notation < i, j > stands for neighboring points i and j in a sense that is \ndefined later. Then we study the ordering properties of this inhomogeneous Potts \nmodel. \n\nAs a concrete example, place a Potts spin at each of the data points of fig. 1. \n\n~~--~------~--------~--------~------~--------~------~--~ \n\n\u00b730 \n\n\u00b720 \n\n-10 \n\n10 \n\n20 \n\n30 \n\nFigure 1: This data set is made of three rectangles, each consisting of 800 points \nuniformly distributed , and a uniform rectangular background of lower density, also \nconsisting of 800 points. Points classified (with Tclus = 0.08 and () = 0.5) as \nbelonging to the three largest clusters are marked by crosses, squares and x's. The \nfourth cluster is of size 2 and all others are single point clusters marked by triangles . \n\nAt high temperatures the system is in a disordered (paramagnetic) phase. As \nthe temperature is lowered, larger and larger regions of high density of points (or \nspins) exhibit local ordering, until a phase transition occurs and spins in the three \nrectangular high density regions become completely aligned (i. e. within each region \nall Si take the same value - super-paramagnetic phase) . \nThe aligned regions define the clusters which we wish to identify. As the temperature \n\n\f418 \n\nM. BLATT, S. WISEMAN, E. DOMANY \n\nis further lowered, a pseudo-transition occurs and the system becomes completely \nordered (ferromagnetic). \n\n1 A mean field model \n\nTo support our main idea, we analyze an idealized set of points where the division \ninto natural classes is distinct. The points are divided into M groups. The distance \nbetween any two points within the same group is d1 while the distance between any \ntwo points belonging to different groups is d2 > d1 (d can be regarded as a similarity \nindex). Following our main idea, we associate a Potts spin with each point and an \ninteraction J1 between points separated by distance d1 and an h between points \nseparated by d2 , where a ~ J2 < J 1 \u2022 Hence the Hamiltonian (1) becomes; \n\n1{ = - ~ L L 6~; ,~j - ~ L L 6s; ,sj \n\n/10 \n\ni () they are defined as \"friends\". Then all mutual friends (including fl'iends \nof friends , etc) are assigned to the same cluster. We chose () = 0.5 . \n\nIn order to show how this algorithm works, let us consider the distribution of points \npresented in figure 1. Because of the overlap of the larger sparse rectangle with the \nsmaller rectangles, and due to statistical fluctuations, the three dense rectangles \nactually contain 883, 874 and 863 points. \nGoing through steps (a) to (d) we obtained the susceptibility as a function of the \ntemperature as presented in figure 3. The susceptibility X is maximal at T max = 0.03 \nand vanishes at Tvanish = 0.13 . In figure 1 we present the clusters obtained accord(cid:173)\ning to steps (f) and (g) at Tclus = 0.08. The size of the largest clusters in descending \norder is 900 , 894, 877, 2 and all the rest are composed of only one point. The three \nbiggest clusters correspond to the clusters we are looking for, while the background \nis decomposed into clusters of size one. \n\n0.035 \n\n0030 \n\n0025 \n\n0020 \n\n0015 \n\n0 0 10 \n\n0 005 \n\n0000 \n\n0 .00 \n\n0.02 \n\n0.04 \n\n0.06 \n\n0 ,08 \n\nT \n\n0.10 \n\n012 \n\no.te \n\n016 \n\nFigure 3: The susceptibil(cid:173)\nity density x;;. as a func(cid:173)\ntion of t.he t.emperature. \n\nLet us discuss the effect of the parameters on the procedure. The number of Potts \nstates, q, determines mainly the sharpness of the transition and the critical temper(cid:173)\nature. The higher q, the sharper the transition . On the other hand, it is necessary \nto perform more statistics (more SW sweeps) as the value of q increases . From our \nsimulations, we conclude that the influence of q is very weak . The maximal number \nof neighbors, f{, also affects the results very little; we obtained quite similar results \nfor a wide range of f{ (5 ~ f{ ~ 20). \nNo dramatic changes were observed in the classification, when choosing clustering \ntemperatures Tc1u3 other than that suggested in (e). However this choice is clearly \nad-hoc and a better choice should be found. Our method does not provide a natu(cid:173)\nral way to choose a threshold () for the spin spin correlation function. In practice \nthough, the classification is not very sensitive to the value of (), and values in the \nrange 0.2 < () < 0.8 yield similar results. The reason is that the frequency distri(cid:173)\nbution of the values of the spin spin correlation function exhibit.s t.wo peaks, one \nclose to 1 and the other close to 1, while for intermediate values it is verv close \nt.o zero. In figure (4) we present the average size of the largest S\\V cluster as a \nfunction of the temperature , along with the size of the largest cluster obtained by \nthe thresholding procedUl'e (described in (7)) using three different threshold values \n() = 0.1, 0 . .5 , o .~). Not.e the agreement. between the largest clust.er size defined by t.he \nthreshold e = 0.5 and the average size of the largest SW cluster for all t.emperatures \n(This agreement holds for the smaller clusters as well) . It support.s our thresholding \nprocedure as a sensible one at all temperatUl'es. \n\nq \n\nv \n\n\f422 \n\nM. BLATT, S. WISEMAN, E. DOMANY \n\nFigure 4: Average size of \nthe largest SW cluster as \na function of the temper(cid:173)\nature , is denoted by the \nsolid line. The triangles, \nx's and squares denote the \nsize of the largest cluster \nobtained with thresholds \n() = 0.2, 0.5 and 0.9 re(cid:173)\nspectively. \n\n500 \n\no~~~~~~~~~~~~~ \n0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 \n\nT \n\n6 Discussion \n\nOther methods that were proposed previously, such as Fukunaga's (1990) , can be \nformulated as a Metropolis relaxation of a ferromagnetic Potts model at T = O. \nThe clusters are then determined by the points having the same spin value at the \nlocal minima of the energy at which the relaxation process terminates. Clearly this \nprocedure depends strongly on the initial conditions. There is a high probability of \ngetting stuck in a metastable state that does not correspond to the desired answer. \nSuch a T = 0 method does not provide any way to distinguish between \"good\" and \n\"bad\" metastable states. We applied Fukunaga's method on the data set of figure \n(1) using many different initial conditions. The right answer was never obtained. \nIn all runs, domain walls that broke a cluster into two or more parts appeared. \nOur method generalizes Fukunaga's method by introducing a finite temperature at \nwhich the division into clusters is stable. In addition, the SW dynamics are com(cid:173)\npletely insensitive to the initial conditions and extremely efficient . \nWork in progress shows that our method is especially suitable for hierarchical clus(cid:173)\ntering. This is done by identifying clusters at several temperatures which are chosen \naccording to features of the susceptibility curve. In particular our method is suc(cid:173)\ncessful in dealing with \"real life\" problems such as the Iris data and Landsat data. \n\nAcknowledgments \n\nWe thank 1. Kanter for many useful discussions. This research has been supported \nby the US-Israel Bi-national Science Foundation (BSF) , and the Germany-Israel \nScience Foundation (GIF). \n\nReferences \n\nJ .M. Buhmann and H. Kuhnel (1993); Vector quantization with complexity costs, \nIEEE Trans. Inf. Theory 39, 1133. \nK. Fukunaga (1990); Introd. to statistical Pattern Recognition, Academic Press. \n\nK. Rose , E. Gurewitz, and G.C . Fox (1993); Constrained clustering as an optimiza(cid:173)\ntion method, IEEE Trans on Patt. Anal. and Mach. Intel. PAMI 15, 785. \n\nS. Wang and R.H . Swendsen (1990); Cluster Monte Carlo alg., Physica A 167,565. \n\nF.Y. Wu (1982) , The Potts model, Rev Mod Phys, 54, 235. \nA.L. Yuille and J.J . Kosowsky (1994); Statistical algorithms that converge, Neural \nComputation 6, 341 (1994). \n\n\f", "award": [], "sourceid": 1092, "authors": [{"given_name": "Marcelo", "family_name": "Blatt", "institution": null}, {"given_name": "Shai", "family_name": "Wiseman", "institution": null}, {"given_name": "Eytan", "family_name": "Domany", "institution": null}]}*