{"title": "Measures of Clustering Quality: A Working Set of Axioms for Clustering", "book": "Advances in Neural Information Processing Systems", "page_first": 121, "page_last": 128, "abstract": "Aiming towards the development of a general clustering theory, we discuss abstract axiomatization for clustering. In this respect, we follow up on the work of Kelinberg, (Kleinberg) that showed an impossibility result for such axiomatization. We argue that an impossibility result is not an inherent feature of clustering, but rather, to a large extent, it is an artifact of the specific formalism used in Kleinberg. As opposed to previous work focusing on clustering functions, we propose to address clustering quality measures as the primitive object to be axiomatized. We show that principles like those formulated in Kleinberg's axioms can be readily expressed in the latter framework without leading to inconsistency. A clustering-quality measure is a function that, given a data set and its partition into clusters, returns a non-negative real number representing how `strong' or `conclusive' the clustering is. We analyze what clustering-quality measures should look like and introduce a set of requirements (`axioms') that express these requirement and extend the translation of Kleinberg's axioms to our framework. We propose several natural clustering quality measures, all satisfying the proposed axioms. In addition, we show that the proposed clustering quality can be computed in polynomial time.", "full_text": "Measures of Clustering Quality: A Working Set of\n\nAxioms for Clustering\n\nMargareta Ackerman and Shai Ben-David\n\nSchool of Computer Science\nUniversity of Waterloo, Canada\n\nAbstract\n\nAiming towards the development of a general clustering theory, we discuss ab-\nstract axiomatization for clustering. In this respect, we follow up on the work of\nKleinberg, ([1]) that showed an impossibility result for such axiomatization. We\nargue that an impossibility result is not an inherent feature of clustering, but rather,\nto a large extent, it is an artifact of the speci\ufb01c formalism used in [1].\nAs opposed to previous work focusing on clustering functions, we propose to\naddress clustering quality measures as the object to be axiomatized. We show that\nprinciples like those formulated in Kleinberg\u2019s axioms can be readily expressed in\nthe latter framework without leading to inconsistency.\nA clustering-quality measure (CQM) is a function that, given a data set and its par-\ntition into clusters, returns a non-negative real number representing how strong or\nconclusive the clustering is. We analyze what clustering-quality measures should\nlook like and introduce a set of requirements (axioms) for such measures. Our\naxioms capture the principles expressed by Kleinberg\u2019s axioms while retaining\nconsistency.\nWe propose several natural clustering quality measures, all satisfying the proposed\naxioms. In addition, we analyze the computational complexity of evaluating the\nquality of a given clustering and show that, for the proposed CQMs, it can be\ncomputed in polynomial time.\n\n1 Introduction\n\nIn his highly in\ufb02uential paper, [1], Kleinberg advocates the development of a theory of clustering that\nwill be \u201cindependent of any particular algorithm, objective function, or generative data model.\u201d As a\nstep in that direction, Kleinberg sets up a set of \u201caxioms\u201d aimed to de\ufb01ne what a clustering function\nis. Kleinberg suggests three axioms, each sounding plausible, and shows that these seemingly natural\naxioms lead to a contradiction - there exists no function that satis\ufb01es all three requirements.\nKleinberg\u2019s result is often interpreted as stating the impossibility of de\ufb01ning what clustering is, or\neven of developing a general theory of clustering. We disagree with this view. In this paper we show\nthat the impossibility result is, to a large extent, due to the speci\ufb01c formalism used by Kleinberg\nrather than being an inherent feature of clustering.\nRather than attempting to de\ufb01ne what a clustering function is, and demonstrating a failed attempt,\nas [1] does, we turn our attention to the closely related issue of evaluating the quality of a given\ndata clustering. In this paper we develop a formalism and a consistent axiomatization of that latter\nnotion.\nAs it turns out, the clustering-quality framework is richer and more \ufb02exible than that of clustering\nfunctions. In particular, it allows the postulation of axioms that capture the features that Kleinberg\u2019s\naxioms aim to express, without leading to a contradiction.\n\n1\n\n\fA clustering-quality measure is a function that maps pairs of the form (dataset, clustering) to\nsome ordered set (say, the set of non-negative real numbers), so that these values re\ufb02ect how \u2018good\u2019\nor \u2018cogent\u2019 that clustering is.\nMeasures for the quality of a clusterings are of interest not only as a vehicle for axiomatizing clus-\ntering. The need to measure the quality of a given data clustering arises naturally in many clustering\nissues. The aim of clustering is to uncover meaningful groups in data. However, not any arbitrary\npartitioning of a given data set re\ufb02ects such a structure. Upon obtaining a clustering, usually via\nsome algorithm, a user needs to determine whether this clustering is suf\ufb01ciently meaningful to rely\nupon for further data mining analysis or practical applications. Clustering-quality measures (CQMs)\naim to answer that need by quantifying how good is any speci\ufb01c clustering.\nClustering-quality measures may also be used to help in clustering model-selection by comparing\ndifferent clusterings over the same data set (e.g., comparing the results of a given clustering paradigm\nover different choices of clustering parameters, such as the number of clusters).\nWhen posed with the problem of \ufb01nding a clustering-quality measure, a \ufb01rst attempt may be to\ninvoke the loss (or objective) function used by a clustering algorithm, such as k-means or k-median,\nas a clustering-quality measure. However, such measures have some shortcomings for the purpose\nat hand. Namely, these measures are usually not scale-invariant, and they cannot be used to compare\nthe quality of clusterings obtained by different algorithms aiming to minimize different clustering\ncosts (e.g., k-means with different values of k). See Section 6 for more details.\nClustering quality has been previously discussed in the applied statistics literature, where a variety\nof techniques for evaluating \u2018cluster validity\u2019 were proposed. Many of these methods, such as the\nexternal criteria discussed in [2], are based on assuming some predetermined data generative model,\nand as such do not answer our quest for a general theory of clustering. In this work, we are concerned\nwith quality measures regardless of any speci\ufb01c generative model, for examples, see the internal\ncriteria surveyed in [2].\nWe formulate a theoretical basis for clustering-quality evaluations. We propose a set of require-\nments (\u2018axioms\u2019) of clustering-quality measures. We demonstrate the relevance and consistency of\nthese axioms by showing that several natural notions satisfy these requirements. In particular, we\nintroduce quality-measures that re\ufb02ect the underlying intuition of center-based and linkage-based\nclustering. These notions all satisfy our axioms, and, given a data clustering, their value on that\nclustering can be computed in polynomial time.\nPaper outline: we begin by presenting Kleinberg\u2019s axioms for clustering functions and discuss their\nfailure. In Section 4.3 we show how these axioms can be translated into axioms pertaining clustering\nquality measures, and prove that the resulting set of axioms is consistent. In Section 4, we discuss\ndesired properties of an axiomatization and propose an accordingly revised set of axioms. Next, in\nSection 5 we present several clustering-quality measures, and claim that they all satisfy our axioms.\nFinally, in Section 5.3, we show that the quality of a clustering can be computed in polynomial time\nwith respect to our proposed clustering-quality measures.\n\n2 De\ufb01nitions and Notation\n\nLet X be some domain set (usually \ufb01nite). A function d : X \u00d7 X \u2192 R is a distance function over\nX if d(xi, xi) \u2265 0 for all xi \u2208 X, for any xi, xj \u2208 X, d(xi, xj) > 0 if and only if xi 6= xj, and\nd(xi, xj) = d(xj, xi) otherwise. Note that we do not require the triangle inequality.\nA k-clustering of X is a k-partition, C = {C1, C2, . . . , Ck}. That is, Ci \u2229 Cj = \u2205 for i 6= j and\n\u222ak\ni=1Ci = X. A clustering of X is a k-clustering of X for some k \u2265 1. A clustering is trivial if\neach of its clusters contains just one point, or if it consists of just one cluster.\nFor x, y \u2208 X and clustering C of X, we write x \u223cC y whenever x and y are in the same cluster of\nclustering C and x 6\u223cC y, otherwise.\nA clustering function for some domain set X is a function that takes a distance function d over X,\nand outputs a clustering of X.\n\n2\n\n\fA clustering-quality measure (CQM) is a function that is given a clustering C over (X, d) (where\nd is a distance function over X) and returns a non-negative real number, as well as satis\ufb01es some\nadditional requirements. In this work we explore the question of what these requirements should be.\n\n3 Kleinberg\u2019s Axioms\n\nKleinberg, [1], proposes the following three axioms for clustering functions. These axioms are\nintended to capture the meaning of clustering by determining which functions (from a domain set\nendowed with a distance function) are worthy of being considered clustering functions and which\nare not. Kleinberg shows that the set is inconsistent - there exist no functions that satis\ufb01es all three\naxioms.\nThe \ufb01rst two axioms require invariance of the clustering that f de\ufb01nes under some changes of the\ninput distance function.\nFunction Scale Invariance: Scale invariance requires that the output of a clustering function be\ninvariant to uniform scaling of the input.\nA function f is scale-invariant if for every distance function d and positive \u03bb, f(d) = f(\u03bbd) (where\n\u03bbd is de\ufb01ned by setting, for every pair of domain points x, y, \u03bbd(x, y) = \u03bb \u00b7 d(x, y)).\nFunction Consistency: Consistency requires that if within-cluster distances are decreased, and\nbetween-cluster distances are increased, then the output of a clustering function does not change.\nFormally,\n\nd0(x, y) \u2264 d(x, y) for all x \u223cC y, and d0(x, y) \u2265 d(x, y) for all x 6\u223cC y.\n\n\u2022 Given a clustering C over (X, d), a distance function d0 is a C-consistent variant of d, if\n\u2022 A function f is consistent if f(d) = f(d0) whenever d0 is an f(d)-consistent variant of d.\nFunction Richness: Richness requires that by modifying the distance function, any partition of the\nunderlying data set can be obtained.\nA function f is rich if for each partitioning, C, of X, there exists a distance function d over X so\nthat f(d) = C.\n\nTheorem 1 (Kleinberg, [1]) There exists no clustering function that simultaneously satis\ufb01es scale\ninvariance, consistency and richness.\n\nDiscussion: The intuition behind these axioms is rather clear. Let us consider, for example, the\nConsistency requirement.\nIt seems reasonable that by pulling closer points that are in the same\ncluster and pushing further apart points in different clusters, our con\ufb01dence in the given clustering\nwill only rise. However, while this intuition can be readily formulated in terms of clustering quality\n(namely, \u201cchanges as these should not decrease the quality of a clustering\u201d), the formulation through\nclustering functions says more. It actually requires that such changes to the underlying distance\nfunction should not create any new contenders for the best-clustering of the data.\nFor example, consider Figure 1, where we illustrate a good 6-clustering. On the right hand-side, we\nshow a consistent change of this 6-clustering. Notice that the resulting data has a 3-clustering that is\nreasonably better than the original 6-clustering. While one may argue that the quality of the original\n6-clustering has not decreased as a result of the distance changes, the quality of the 3-clustering has\nimproved beyond that of the 6-clustering. This illustrates a signi\ufb01cant weakness of the consistency\naxiom for clustering functions.\nThe implicit requirement that the original clustering remains the best clustering following a consis-\ntent change is at the heart of Kleinberg\u2019s impossibility result. As we shall see below, once we relax\nthat extra requirement the axioms are no longer unsatis\ufb01able.\n\n4 Axioms of Clustering-Quality Measures\n\nIn this section we change the primitive that is being de\ufb01ned by the axioms from clustering functions\nto clustering-quality measures (CQM). We reformulate the above three axioms in terms of CQMs\n\n3\n\n\fFigure 1: A consistent change of a 6-clustering.\n\nand show that this revised formulation is not only consistent, but is also satis\ufb01ed by a number of\nnatural clustering quality measures. In addition, we extend the set of axioms by adding another\naxiom (of clustering-quality measures) that is required to rule out some measures that should not be\ncounted as CQMs.\n\n4.1 Clustering-Quality Measure Analogues to Kleinberg\u2019s Axioms\n\nThe translation of the Scale Invariance axiom to the CQM terminology is straightforward:\n\nDe\ufb01nition 1 (Scale Invariance) A quality measure m satis\ufb01es scale invariance if for every cluster-\ning C of (X, d), and every positive \u03bb, m(C, X, d) = m(C, X, \u03bbd).\n\nThe translation of the Consistency axiom is the place where the resulting CQM formulation is indeed\nweaker than the original axiom for functions. While it clearly captures the intuition that consistent\nchanges to d should not hurt the quality of a given partition, it allows the possibility that, as a result\nof such a change, some partitions will improve more than others1.\n\nDe\ufb01nition 2 (Consistency) A quality measure m satis\ufb01es consistency if for every clustering C over\n(X, d), whenever d0 is a C consistent variant of d, then m(C, X, d0) \u2265 m(C, X, d).\nDe\ufb01nition 3 (Richness) A quality measure m satis\ufb01es richness if for each non-trivial clustering C\nof X, there exists a distance function d over X such that C = Argmax{m(C, X, d)}.\nTheorem 2 Consistency, scale invariance, and richness for clustering-quality measures form a con-\nsistent set of requirements.\n\nProof: To show that scale-invariance, consistency, and richness form a consistent set of axioms, we\npresent a clustering quality measure that satis\ufb01es the three axioms. This measure captures a quality\nthat is intuitive for center-based clusterings. In Section 5, we introduce more quality measures that\ncapture the goal of other types of clusterings. All of these CQM\u2019s satisfy the above three axioms.\nFor each point in the data set, consider the ratio of the distance from the point to its closest center to\nthe distance from the point to its second closest center. Intuitively, the smaller this ratio is, the better\nthe clustering (points are \u2018more con\ufb01dent\u2019 about their cluster membership). We use the average of\nthis ratio as a quality measure.\nDe\ufb01nition 4 (Relative Point Margin) The K-Relative Point Margin of x \u2208 X is K-RMX,d(x) =\nd(x,cx0 ) , where cx \u2208 K is the closest center to x, cx0 \u2208 K is a second closest center to x, and\nK \u2286 X.\n\nd(x,cx)\n\n1The following formalization assumes that larger values of m indicate better clustering quality. For some\nquality measures, smaller values indicate better clustering quality - in which case we reverse the direction of\ninequalities for consistency and use Argmin instead of Argmax for richness.\n\n4\n\n\fA set K is a representative set of a clustering C if it consists of exactly one point from each cluster\nof C.\nDe\ufb01nition 5 (Representative Set) A set K is a representative set of clustering C =\n{C1, C2, . . . , Ck} if |K| = k and for all i, K \u2229 Ci 6= \u2205.\nDe\ufb01nition 6 (Relative Margin) The Relative Margin of a clustering C over (X, d) is\n\nRMX,d(C) =\n\nmin\n\nK is a representative set of C\n\navgx\u2208X\\KK-RMX,d(x).\n\nSmaller values of Relative Margin indicate better clustering quality.\n\nLemma 1 Relative Margin is scale-invariant.\nproof: Let C be a clustering of (X, d). Let d0 be a distance function so that d0(x, y) = \u03b1d(x, y)\nfor all x, y \u2208 X and some \u03b1 \u2208 R+. Then for any points x, y, z \u2208 X, d(x,y)\nd0(x,z) . Note also\nthat scaling does not change the centers selected by Relative Margin. Therefore, RMX,d0(C) =\nRMX,d(C).\n\nd(x,z) = d0(x,y)\n\nLemma 2 Relative Margin is consistent.\nproof: Let C be a clustering of distance function (X, d). Let d0 be a C consistent variant of d. Then\nfor x \u223cC y, d0(x, y) \u2264 d(x, y) and for x 6\u223cC y, d0(x, y) \u2265 d(x, y). Therefore, RMX,d0(C) \u2264\nRMX,d(C).\n\nLemma 3 Relative Margin is rich.\n\nproof: Given a non-trivial clustering C over a data set X, consider the distance function d where\nd(x, y) = 1 for all x \u223cC y, and d(x, y) = 10 for all x 6\u223cC y. Then C = Argmin{m(C, X, d)}.\nIt follows that scale-invariance, consistency, and richness are consistent axioms.\n\n4.2 Soundness and Completeness of Axioms\n\nWhat should a set of \u201caxioms for clustering\u201d satisfy? Usually, when a set of axioms is proposed\nfor some semantic notion (or a class of objects, say clustering functions), the aim is to have both\nsoundness and completeness. Soundness means that every element of the described class satis\ufb01es\nall axioms (so, in particular, soundness implies consistency of the axioms), and completeness means\nthat every property shared by all objects of the class is implied by the axioms. Intuitively, ignoring\nlogic subtleties, a set of axioms is complete for a class of objects if any element outside that class\nfails at least one of these axioms.\nIn our context, there is a major dif\ufb01culty - there exist no semantic de\ufb01nition of what clustering is.\nWe wish to use the axioms as a de\ufb01nition of clustering functions, but then what is the meaning of\nsoundness and completeness? We have to settle for less. While we do not have a clear de\ufb01nition of\nwhat is clustering and what is not, we do have some examples of functions that should be considered\nclustering functions, and we can come up with some examples of partitionings that are clearly not\nworthy of being called \u201cclustering\u201d. We replace soundness by the requirement that all of our axioms\nare satis\ufb01ed by all these examples of common clustering functions (relaxed soundness), and we want\nthat partitioning functions that are clearly not clusterings fail at least one of our axioms (relaxed\ncompleteness).\nIn this respect, the axioms of [1] badly fail (the relaxed version of) soundness. For each of these\naxioms there are natural clustering functions that fail to satisfy it (this is implied by Kleinberg\u2019s\ndemonstration that any pair of axioms is satis\ufb01ed by a natural clustering, while the three together\nnever hold). We argue that our scale invariance, consistency, and richness, are sound for the class\nof CQMs. However, they do not make a complete set of axioms, even in our relaxed sense. There\nare functions that should not be considered \u201creasonable clustering quality measures\u201d and yet they\nsatisfy these three axioms. One type of \u201cnon-clustering-functions\u201d are functions that make cluster\nmembership decisions based on the identity of domain points. For example, the function that returns\n\n5\n\n\fthe Relative Margin of a data set whenever some speci\ufb01c pair of data points belong to the same\ncluster, and twice the Relative Margin of the data set otherwise. We overcome this problem by\nintroducing a new axiom.\n\n4.3\n\nIsomorphism Invariance\n\nThis axiom resembles the permutation invariance objective function axiom by Puzicha et al. [3],\nmodeling the requirement that clustering should be indifferent to the individual identity of clus-\ntered elements. This axiom of clustering-quality measures does not have a corresponding Kleinberg\naxiom.\nDe\ufb01nition 7 (Clustering Isomorphism) Two clusterings C and C0 over the same domain, (X, d),\nare isomorphic, denoted C \u2248d C0, if there exists a distance-preserving isomorphism \u03c6 : X \u2192 X,\nsuch that for all x, y \u2208 X, x \u223cC y if and only if \u03c6(x) \u223cC0 \u03c6(y).\nDe\ufb01nition 8 (Isomorphism Invariance) A quality measure m is isomorphism -invariant if for all\nclusterings C, C0 over (X, d) where C \u2248d C0, m(C, X, d) = m(C0, X, d).\nTheorem 3 The set of axioms consisting of Isomorphism Invariance, Scale Invariance, Consistency,\nand Richness, (all in their CQM formulation) is a consistent set of axioms.\n\nProof: Just note that the Relative Margin quality measure satis\ufb01es all four axioms.\nAs mentioned in the above discussion, to have a satisfactory axiom system, for any notion, one needs\nto require more than just consistency. To be worthy of being labeled \u2018axioms\u2019, the requirements we\npropose should be satis\ufb01ed by any reasonable notion of CQM. Of course, since we cannot de\ufb01ne\nwhat CQMs are \u201creasonable\u201d, we cannot turn this into a formal statement. What we can do, however,\nis demonstrate that a variety of natural CQMs do satisfy all our axioms. This is done in the next\nsection.\n\n5 Examples of Clustering Quality Measures\n\nIn a survey of validity measures, Milligan [2] discusses examples of quality measures that satisfy\nour axioms (namely, scale-invariance, consistency, richness, and perturbation invariance). We have\nveri\ufb01ed that the best performing internal criteria examined in [2], satisfy all our axioms.\nIn this section, we introduce two novel QCMs; a measure that re\ufb02ects the underlying intuition of\nlinkage-based clustering, and a measure for center-based clustering. In addition to satisfying the\naxioms, given a clustering, these measures can computed in polynomial time.\n\n5.1 Weakest Link\n\nIn linkage-based clustering, whenever a pair of points share the same cluster they are connected via\na tight chain of points in that cluster. The weakest link quality measure focuses on the longest link\nin such a chain.\n\nDe\ufb01nition 9 (Weakest Link Between Points)\n\nC-W LX,d(x, y) =\n\nmin\n\nx1,x2,...,x\u2018\u2208Ci\n\n(max(d(x, x1), d(x1, x2), . . . , d(x\u2018, y))),\n\nwhere C is a clustering over (X, d) and Ci is a cluster in C.\n\nThe weakest link of C is the maximal value of C-W LX,d(x, y) over all pairs of points belonging to\nthe same cluster, divided by the shortest between-cluster distance.\n\nDe\ufb01nition 10 (Weakest Link of C) The Weakest Link of a clustering C over (X, d) is\n\nW L(C) =\n\nmaxx\u223cC y C-W LX,d(x, y)\n\nminx6\u223cC y d(x, y)\n\n.\n\nThe range of values of weakest link is (0,\u221e).\n\n6\n\n\f5.2 Additive Margin\n\nIn Section 4.3, we introduced Relative Margin, a quality measure for center-based clustering. We\nnow introduce another quality measure for center-based clustering.\nInstead of looking at ratios,\nAdditive Margin evaluates differences.\nDe\ufb01nition 11 (Additive Point Margin) The K-Additive Point Margin of x is K-AMX,d(x) =\nd(x, cx0) \u2212 d(x, cx), where cx \u2208 K is the closest center to x, cx0 \u2208 K is a second closest cen-\nter to x, and K \u2286 X.\n\nThe Additive Margin of a clustering is the average Additive Point Margin, divided by the average\nwithin-cluster distance. The normalization is necessary for scale invariance.\n\nDe\ufb01nition 12 (Additive Margin) The Additive Margin of a center-based clustering C over (X, d)\nis\n\nAMX,d(C) =\n\nmin\n\nK is a representative set of C\n\n|{{x,y}\u2286X|x\u223cC y}|\n\n1\n\nx\u223cC y d(x, y)\n\n.\n\nP\nx\u2208X K-AMX,d(x)\n\nP\n\n1|X|\n\nUnlike Relative Margin, Additive Margin gives higher values to better clusterings.\n\n5.3 Computational complexity\n\nFor a clustering-quality measure to be useful, it is important to be able to quickly compute the quality\nof a clustering using that measure. The quality of a clustering using the measures presented in this\npaper can be computed in polynomial time in terms of n (the number of points in the data set).\nUsing relative or Additive Margin, it takes O(nk+1) operations to compute the clustering quality\nof a data set, which is exponential in k. If a set of centers is given, the Relative Margin can be\ncomputed in O(nk) operations and the Additive Margin can be computed in O(n2) operations. The\nweakest link of a clustering can be computed in O(n3) operations.\n\n5.4 Variants of quality measures\n\nGiven a clustering-quality measure, we can construct new quality measures with different charac-\nteristics by applying the quality measure on a subset of clusters. It suf\ufb01ces to consider a quality\nmeasure m that is de\ufb01ned for clusterings consisting of 2 clusters. Given such measure, we can\ncreate new quality measures. For example,\n\nmmin(C, X, d) = min\n\nS\u2286C,|S|=2\n\nm(S, X, d),\n\nmeasures the worst quality of a pair of clusters in C.\nAlternately, we can de\ufb01ne, mmax(C, X, d) and mavg(C, X, d), which evaluate the best or average\nquality of a pair of clusters in C. A nice feature of these variations is that if m satis\ufb01es the four\naxioms of clustering-quality measures then so do mmin, mmax, and mavg.\nMore generally, if m is de\ufb01ned for clusterings on an arbitrary number of clusters, we can de\ufb01ne a\nquality measure as a function of m over larger clusterings. For example, mmax subset(C, X, d) =\nmaxS\u2286C,|S|\u22652 m(S, X, d). Many such variations, which apply existing clustering-quality measures\non subsets of clusters, satisfy the axioms of clustering-quality measures whenever the original qual-\nity measure satis\ufb01es the axioms.\n\n6 Dependence on Number of Clusters\n\nThe clustering-quality measures discussed in this paper up to now are independent of the number\nof clusters, which enables the comparison of clusterings with a different number of clusters. In this\nsection we discuss an alternative type of clustering quality evaluation, that depends on the number of\nclusters. Such quality measures arise naturally from common loss functions (or, objective functions)\nthat drive clustering algorithm, such as k-means or k-median.\n\n7\n\n\fThese common loss functions fail to satisfy two of our axioms, scale-invariance and richness. One\ncan easily overcome the dependence on scaling by normalization. As we will show, the resulting\nnormalized loss functions make a different type of clustering-quality measures from the measures\nwe previously discussed, due to their dependence on the number of clusters.\nA natural remedy to the failure of scale invariance is to normalize a loss function by dividing it by\nthe variance of the data, or alternatively, by the loss of the 1-clustering of the data.\nDe\ufb01nition 13 (L-normalization) The L-normalization of a clustering C over (X, d) is\n\nL-normalize(C, X, d) =\n\nL(Call, X, d)\nL(C, X, d) .\n\nwhere Call denotes the 1-clustering of X.\n\nCommon loss functions, even after normalization, usually have a bias towards either more re\ufb01ned\nor towards more coarse clusterings \u2013 they assign lower cost (that is, higher quality) to more re\ufb01ned\n(respectively, coarse) clusterings. This prevents using them as a meaningful tool for comparing\nthe quality of clusterings with different number of clusters. We formalize this feature of common\nclustering loss functions through the notion of re\ufb01nement preference:\nDe\ufb01nition 14 (Re\ufb01nement and coarsening) For a pair of clusterings C, C0 of the same domain,\nwe say C0 is a re\ufb01nement of C (or, equivalently, that C is a coarsening of C0) if for every cluster Ci\nof C, Ci is a union of clusters of C0.\nDe\ufb01nition 15 (Re\ufb01nement/Coarsening Preference) A measure m is re\ufb01nement-preferring if for\nevery clustering C of (X, d) if it has a non-trivial re\ufb01nement, then there exists such a re\ufb01nement C0 of\nC for which m(C0, X, d) > m(C, X, d). Coarsening-preferring measures are de\ufb01ned analogously.\n\nNote that both re\ufb01nement preferring and coarsening preferring measures fail to satisfy the Richness\naxiom.\nIt seems that there is a divide between two types of evaluations for clusterings; those that satisfy\nrichness, and those that satisfy either re\ufb01nement or coarsening preference. To evaluate the quality of\na clustering using a re\ufb01nement (and coarsening) preferring measure, it is essential to \ufb01x the number\nof clusters. Since the correct number of clusters is often unknown, measures that are independent of\nthe number of clusters apply in a more general setting.\n\n7 Conclusions\n\nWe have investigated the possibility of providing a general axiomatic basis for clustering. Our\nstarting point was the impossibility result of Kleinberg. We argue that a natural way to overcome\nthese negative conclusions is by changing the primitive used to formulate the axioms from clustering\nfunctions to clustering quality measures (CQMs). We demonstrate the merits of the latter framework\nby providing a set of axioms for CQMs that captures the essence of all of Kleinberg\u2019s axioms while\nmaintaining consistency. We propose several CQMs that satisfy our proposed set of axioms. We\nhope that this work, and our demonstration of a way to overcome the \u201cimpossibility result\u201d will\nstimulate further research towards a general theory of clustering.\n\nReferences\n[1] Jon Kleinberg. \u201cAn Impossibility Theorem for Clustering.\u201d Advances in Neural Information Processing\n\nSystems (NIPS) 15, 2002.\n\n[2] Glen W. Milligan. \u201cA Monte-Carlo study of 30 internal criterion measures for cluster-analysis.\u201d Psycho-\n\nmetrica 46, 187-195, 1981.\n\n[3] J. Puzicha, T. Hofmann, and J. Buhmann. \u201cTheory of Proximity Based Clustering: Structure Detection by\n\nOptimization,\u201d Pattern Recognition, 33(2000).\n\n8\n\n\f", "award": [], "sourceid": 383, "authors": [{"given_name": "Shai", "family_name": "Ben-David", "institution": null}, {"given_name": "Margareta", "family_name": "Ackerman", "institution": null}]}