{"title": "Coresets for Clustering with Fairness Constraints", "book": "Advances in Neural Information Processing Systems", "page_first": 7589, "page_last": 7600, "abstract": "In a recent work, \\cite{chierichetti2017fair} studied the following ``fair'' variants of classical clustering problems such as k-means and k-median: given a set of n data points in R^d and a binary type associated to each data point, the goal is to cluster the points while ensuring that the proportion of each type in each cluster is roughly the same as its underlying proportion. Subsequent work has focused on either extending this setting to when each data point has multiple, non-disjoint sensitive types such as race and gender \\cite{bera2019fair}, or to address the problem that the clustering algorithms in the above work do not scale well. The main contribution of this paper is an approach to clustering with fairness constraints that involve {\\em multiple, non-disjoint} attributes, that is {\\em also scalable}. Our approach is based on novel constructions of coresets: for the k-median objective, we construct an \\eps-coreset of size O(\\Gamma k^2 \\eps^{-d}) where \\Gamma is the number of distinct collections of groups that a point may belong to, and for the k-means objective, we show how to construct an \\eps-coreset of size O(\\Gamma k^3\\eps^{-d-1}). The former result is the first known coreset construction for the fair clustering problem with the k-median objective, and the latter result removes the dependence on the size of the full dataset as in~\\cite{schmidt2018fair} and generalizes it to multiple, non-disjoint attributes. Importantly, plugging our coresets into existing algorithms for fair clustering such as \\cite{backurs2019scalable} results in the fastest algorithms for several cases. Empirically, we assess our approach over the \\textbf{Adult} and \\textbf{Bank} dataset, and show that the coreset sizes are much smaller than the full dataset; applying coresets indeed accelerates the running time of computing the fair clustering objective while ensuring that the resulting objective difference is small.", "full_text": "Coresets for Clustering with Fairness Constraints\n\nLingxiao Huang\u2217\nYale University, USA\n\nShaofeng H.-C. Jiang\u2217\n\nWeizmann Institute of Science, Israel\n\nNisheeth K. Vishnoi\u2217\nYale University, USA\n\nAbstract\n\nIn a recent work, [20] studied the following \u201cfair\u201d variants of classical clustering\nproblems such as k-means and k-median: given a set of n data points in Rd and\na binary type associated to each data point, the goal is to cluster the points while\nensuring that the proportion of each type in each cluster is roughly the same as\nits underlying proportion. Subsequent work has focused on either extending this\nsetting to when each data point has multiple, non-disjoint sensitive types such as\nrace and gender [7], or to address the problem that the clustering algorithms in the\nabove work do not scale well [42, 8, 6]. The main contribution of this paper is an\napproach to clustering with fairness constraints that involve multiple, non-disjoint\ntypes, that is also scalable. Our approach is based on novel constructions of\ncoresets: for the k-median objective, we construct an \u03b5-coreset of size O(\u0393k2\u03b5\u2212d)\nwhere \u0393 is the number of distinct collections of groups that a point may belong\nto, and for the k-means objective, we show how to construct an \u03b5-coreset of size\nO(\u0393k3\u03b5\u2212d\u22121). The former result is the \ufb01rst known coreset construction for the fair\nclustering problem with the k-median objective, and the latter result removes the\ndependence on the size of the full dataset as in [42] and generalizes it to multiple,\nnon-disjoint types. Plugging our coresets into existing algorithms for fair clustering\nsuch as [6] results in the fastest algorithms for several cases. Empirically, we\nassess our approach over the Adult, Bank, Diabetes and Athlete dataset, and\nshow that the coreset sizes are much smaller than the full dataset; applying coresets\nindeed accelerates the running time of computing the fair clustering objective while\nensuring that the resulting objective difference is small. We also achieve a speed-up\nto recent fair clustering algorithms [6, 7] by incorporating our coreset construction.\n\n1\n\nIntroduction\n\nClustering algorithms are widely used in automated decision-making tasks, e.g., unsupervised learn-\ning [43], feature engineering [33, 27], and recommendation systems [10, 40, 21]. With the increasing\napplications of clustering algorithms in human-centric contexts, there is a growing concern that, if\nleft unchecked, they can lead to discriminatory outcomes for protected groups, e.g., females/black\npeople. For instance, the proportion of a minority group assigned to some cluster can be far from\nits underlying proportion, even if clustering algorithms do not take the sensitive attribute into its\ndecision making [20]. Such an outcome may, in turn, lead to unfair treatment of minority groups,\ne.g., women may receive proportionally fewer job recommendations with high salary [22, 38] due to\ntheir underrepresentation in the cluster of high salary recommendations.\nTo address this issue, Chierichetti et al. [20] recently proposed the fair clustering problem that\nrequires the clustering assignment to be balanced with respect to a binary sensitive type, e.g., sex.2\nGiven a set X of n data points in Rd and a binary type associated to each data point, the goal is\nto cluster the points such that the proportion of each type in each cluster is roughly the same as\n\n\u2217Authors are listed in alphabetical order of family names. Full version: [31].\n2A type consists of several disjoint groups, e.g., the sex type consists of females and males.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fits underlying proportion, while ensuring that the clustering objective is minimized. Subsequent\nwork has focused on either extending this setting to when each data point has multiple, non-disjoint\nsensitive types [7] (De\ufb01nition 2.3), or to address the problem that the clustering algorithms do not\nscale well [20, 41, 42, 8, 6].\nDue to the large scale of datasets, several existing fair clustering algorithms have to take samples\ninstead of using the full dataset, since their running time is at least quadratic in the input size [20, 41, 8,\n7]. Very recently, Backurs et al. [6] propose a nearly linear approximation algorithm for fair k-median,\nbut it only works for a binary type. It is still unknown whether there exists a scalable approximation\nalgorithm for multiple sensitive types [6]. To improve the running time of fair clustering algorithms,\na powerful technique called coreset was introduced. Roughly, a coreset for fair clustering is a\nsmall weighted point set, such that for any k-subset and any fairness constraint, the fair clustering\nobjective computed over the coreset is approximately the same as that computed from the full dataset\n(De\ufb01nition 2.1). Thus, a coreset can be used as a proxy for the full dataset \u2013 one can apply any fair\nclustering algorithm on the coreset, achieve a good approximate solution on the full dataset, and hope\nto speed up the algorithm. As mentioned in [6], using coresets can indeed accelerate the computation\ntime and save storage space for fair clustering problems. Another bene\ufb01t is that one may want to\ncompare the clustering performance under different fairness constraints, and hence it may be more\nef\ufb01cient to repeatedly use coresets. Currently, the only known result for coresets for fair clustering\nis by Schmidt et al. [42], who constructed an \u03b5-coreset for fair k-means clustering. However, their\ncoreset size includes a log n factor and only restricts to a sensitive type. Moreover, there is no known\ncoreset construction for other commonly-used clusterings, e.g., fair k-median.\n\nOur contributions. Our main contribution is an ef\ufb01cient construction of coresets for clustering\nwith fairness constraints that involve multiple, non-disjoint types. Technically, we show ef\ufb01cient\nconstructions of \u03b5-coresets of size independent of n for both fair k-median and fair k-means, summa-\nrized in Table 1. Let \u0393 denote the number of distinct collections of groups that a point may belong to\n(see the \ufb01rst paragraph of Section 4 for the formal de\ufb01nition).\n\n\u2022 Our coreset for fair k-median is of size O(\u0393k2\u03b5\u2212d) (Theorem 4.1), which is the \ufb01rst known\n\u2022 For fair k-means, our coreset is of size O(\u0393k3\u03b5\u2212d\u22121) (Theorem 4.2), which improves the\n\u2022 As mentioned in [6], applying coresets can accelerate the running time of fair clustering\nalgorithms, while suffering only an additional (1 + \u03b5) factor in the approxiation ratio. Setting\n\u03b5 = \u2126(1) and plugging our coresets into existing algorithms [42, 7, 6], we directly achieve\nscalable fair clustering algorithms, summarized in Table 2.\n\n\u03b5k2 ) factor and generalizes it to multiple, non-disjoint types.\n\ncoreset to the best of our knowledge.\n\nresult of [42] by an \u0398( log n\n\nWe present novel technical ideas to deal with fairness constraints for coresets.\n\n\u2022 Our \ufb01rst technical contribution is a reduction to the case \u0393 = 1 (Theorem 4.3) which greatly\nsimpli\ufb01es the problem. Our reduction not only works for our speci\ufb01c construction, but also\nfor all coreset constructions in general.\n\u2022 Furthermore, to deal with the \u0393 = 1 case, we provide several interesting geometric ob-\nservations for the optimal fair k-median/means clustering (Lemma 4.1), which may be of\nindependent interest.\n\nWe implement our algorithm and conduct experiments on Adult, Bank, Diabetes and Athlete\ndatasets.\n\n\u2022 A vanilla implementation results in a coreset with size that depends on \u03b5\u2212d. Our implemen-\ntation is inspired by our theoretical results and produces coresets whose size is much smaller\nin practice. This improved implementation is still within the framework of our analysis, and\nthe same worst case theoretical bound still holds.\n\u2022 To validate the performance of our implementation, we experiment with varying \u03b5 for both\nfair k-median and k-means. As expected, the empirical error is well under the theoretical\nguarantee \u03b5, and the size does not suffer from the \u03b5\u2212d factor. Speci\ufb01cally, for fair k-median,\nwe achieve 5% empirical error using only 3% points of the original data sets, and we achieve\nsimilar error using 20% points of the original data set for the k-means case. In addition, our\ncoreset for fair k-means is better than uniform sampling and that of [42] in the empirical\nerror.\n\n2\n\n\fTable 1: Summary of coreset results. T1(n) and T2(n) denote the running time of an O(1)-approximate\nalgorithm for k-median/means, respectively.\n\nsize\n\nk-Median\n\nconstruction time\n\nsize\n\nk-Means\n\nconstruction time\n\n[42]\nThis O(\u0393k2\u03b5\u2212d) O(k\u03b5\u2212d+1n + T1(n))\n\nO(\u0393k\u03b5\u2212d\u22122 log n)\n\nO(\u0393k3\u03b5\u2212d\u22121)\n\n\u02dcO(k\u03b5\u2212d\u22122n log n + T2(n))\n\nO(k\u03b5\u2212d+1n + T2(n))\n\nTable 2: Summary of fair clustering algorithms. \u2206 denotes the maximum number of groups that a point may\nbelong to, and \u201cmulti\u201d means the algorithm can handle multiple non-disjoint types.\n\nmulti\n\nk-Median\n\napprox. ratio\n\n[20]\n[42]\n[6]\n[8]\n[7]\nThis\nThis\n\nO(1)\n\n\u02dcO(d log n)\n(3.488, 1)\n\n(O(1), 4\u2206 + 4)\n\n\u02dcO(d log n)\n\n(O(1), 4\u2206 + 4)\n\n(cid:88)\n\n(cid:88)\n\ntime\n\u2126(n2)\n\nmulti\n\nk-Means\napprox. ratio\n\nO(1)\n\ntime\n\nnO(k)\n\nO(dn log n + T1(n))\n\n\u2126(n2)\n\u2126(n2)\n\nO(dlk2 log(lk) + T1(lk2))\n\n\u2126(l2\u2206k4)\n\n(cid:88)\n\n(cid:88)\n\n(4.675, 1)\n\nO(1)\n\n(O(1), 4\u2206 + 4)\n\n\u2126(n2)\n\u2126(n2)\n(lk)O(k)\n(O(1), 4\u2206 + 4) \u2126(l2\u2206k6)\n\n\u2022 The small size of the coreset translates to more than 200x speed-up (with error ~10%) in the\nrunning time of computing the fair clustering objective when the fair constraint F is given.\nWe also apply our coreset on the recent fair clustering algorithm [6, 7], and drastically\nimprove the running time of the algorithm by approximately 5-15 times to [6] and 15-30\ntimes to [7] for all above-mentioned datasets plus a large dataset Census1990 that consists\nof 2.5 million records, even taking the coreset construction time into consideration.\n\n1.1 Other related works\n\nThere are increasingly more works on fair clustering algorithms. Chierichetti et al. [20] introduced\nthe fair clustering problem for a binary type and obtained approximation algorithms for fair k-\nmedian/center. Backurs et al. [6] improved the running time to nearly linear for fair k-median,\nbut the approximation ratio is \u02dcO(d log n). R\u00f6sner and Schmidt [41] designed a 14-approximate\nalgorithm for fair k-center, and the ratio is improved to 5 by [8]. For fair k-means, Schmidt et al. [42]\nintroduced the notion of fair coresets, and presented an ef\ufb01cient streaming algorithm. More generally,\nBercea et al. [8] proposed a bi-criteria approximation for fair k-median/means/center/supplier/facility\nlocation. Very recently, Bera et al. [7] presented a bi-criteria approximation algorithm for fair (k, z)-\nclustering problem (De\ufb01nition 2.3) with arbitrary group structures (potentially overlapping), and\nAnagnostopoulos et al. [5] improved their results by proposing the \ufb01rst constant-factor approximation\nalgorithm. It is still open to design a near linear time O(1)-approximate algorithm for the fair\n(k, z)-clustering problem.\nThere are other fair variants of clustering problems. Ahmadian et al. [4] studied a variant of the\nfair k-center problem in which the number of each type in each cluster has an upper bound, and\nproposed a bi-criteria approximation algorithm. Chen et al. [19] studied the fair clustering problem\nin which any n/k points are entitled to form their own cluster if there is another center closer in\ndistance for all of them. Kleindessner et al. [34] investigate the fair k-center problem in which each\ncenter has a type, and the selection of the k-subset is restricted to include a \ufb01xed amount of centers\nbelonging to each type. In another paper [35], they developed fair variants of spectral clusterings\n(a heuristic k-means clustering framework) by incorporating the proportional fairness constraints\nproposed by [20].\nThe notion of coreset was \ufb01rst proposed by Agarwal et al. [2]. There has been a large body of work\nfor unconstrained clustering problems in Euclidean spaces [3, 28, 18, 29, 36, 24, 25, 9]). Apart from\nthese, for the general (k, z)-clustering problem, Feldman and Langberg [24] presented an \u03b5-coreset of\nsize \u02dcO(dk\u03b5\u22122z) in \u02dcO(nk) time. Huang et al. [30] showed an \u03b5-coreset of size \u02dcO(ddim(X)\u00b7 k3\u03b5\u22122z),\nwhere ddim(X) is doubling dimension that measures the intrinsic dimensionality of a space. For\n\n3\n\n\fthe special case of k-means, Braverman et al. [9] improved the size to \u02dcO(k\u03b5\u22122 \u00b7 min{k/\u03b5, d}) by a\ndimension reduction approach. Works such as [24] use importance sampling technique which avoid\nthe size factor \u03b5\u2212d, but it is unknown if such approaches can be used in fair clustering.\n\n(cid:80)\n\ni\u2208[k]\n\nx\u2208Ci\n\n|Ci \u2229 Pj| = Fij, \u2200i \u2208 [k], j \u2208 [l].\n\nminimum value of(cid:80)\n\n2 Problem de\ufb01nition\nConsider a set X \u2286 Rd of n data points, an integer k (number of clusters), and l groups P1, . . . , Pl \u2286\nX. An assignment constraint, which was proposed by Schmidt et al. [42], is a k \u00d7 l integer matrix\nF . A clustering C = {C1, . . . , Ck}, which is a k-partitioning of X, is said to satisfy assignment\nconstraint F if\nFor a k-subset C = {c1, . . . , ck} \u2286 X (the center set) and z \u2208 R>0, we de\ufb01ne Kz(X, F, C) as the\ndz(x, ci) among all clustering C = {C1, . . . , Ck} that satis\ufb01es\nF , which we call the optimal fair (k, z)-clustering value. If there is no clustering satisfying F ,\nKz(X, F, C) is set to be in\ufb01nity. The following is our notion of coresets for fair (k, z)-clustering.\nThis generalizes the notion introduced in [42] which only considers a partitioned group structure.\nDe\ufb01nition 2.1 (Coreset for fair clustering). Given a set X \u2286 Rd of n points and l groups\nP1, . . . , Pl \u2286 X, a weighted point set S \u2286 Rd with weight function w : S \u2192 R>0 is an \u03b5-\ncoreset for the fair (k, z)-clustering problem, if for each k-subset C \u2286 Rd and each assignment\nconstraint F \u2208 Zk\u00d7l\u22650 , it holds that Kz(S, F, C) \u2208 (1 \u00b1 \u03b5) \u00b7 Kz(X, F, C).\nSince points in S might receive fractional weights, we change the de\ufb01nition of Kz a little, so that in\nevaluating Kz(S, F, C), a point x \u2208 S may be partially assigned to more than one cluster and the\ntotal amount of assignments of x equals w(x).\nThe currently most general notion of fairness in clustering was proposed by [7], which enforces both\nupper bounds and lower bounds of any group\u2019s proportion in a cluster.\nDe\ufb01nition 2.2 ((\u03b1, \u03b2)-proportionally-fair). A clustering C = (C1, . . . , Ck) is (\u03b1, \u03b2)-\nproportionally-fair (\u03b1, \u03b2 \u2208 [0, 1]l), if for each cluster Ci and j \u2208 [l], it holds that \u03b1j \u2264 |Ci\u2229Pj|\n|Ci| \u2264 \u03b2j.\nThe above de\ufb01nition directly implies for each cluster Ci and any two groups Pj1, Pj2 \u2208 [l], \u03b1j1\n\u2264\n|Ci\u2229Pj1|\n|Ci\u2229Pj2| \u2264 \u03b2j1\n. In other words, the fraction of points belonging to groups Pj1, Pj2 in each cluster\nis bounded from both sides. Indeed, similar fairness constraints have been investigated by works\non other fundamental algorithmic problems such as data summarization [14], ranking [16, 44],\nelections [12], personalization [17, 13], classi\ufb01cation [11], and online advertising [15]. Naturally,\nBera et al. [7] also de\ufb01ned the fair clustering problem with respect to (\u03b1, \u03b2)-proportionally-fairness\nas follows.\nDe\ufb01nition 2.3 ((\u03b1, \u03b2)-proportionally-fair (k, z)-clustering). Given a set X \u2286 Rd of n points,\nl groups P1, . . . , Pl \u2286 X, and two vectors \u03b1, \u03b2 \u2208 [0, 1]l, the objective of (\u03b1, \u03b2)-proportionally-\nfair (k, z)-clustering is to \ufb01nd a k-subset C = {c1, . . . , ck} \u2208 Rd and (\u03b1, \u03b2)-proportionally-fair\ndz(x, ci) is minimized.\n\nclustering C = {C1, . . . , Ck}, such that the objective function(cid:80)\n\n(cid:80)\n\ni\u2208[k]\n\nx\u2208Ci\n\nOur notion of coresets is very general, and we relate our notion of coresets to the (\u03b1, \u03b2)-proportionally-\nfair clustering problem, via the following observation, which is similar to Proposition 5 in [42].\nProposition 2.1. Given a k-subset C, the assignment restriction required by (\u03b1, \u03b2)-proportionally-\nfairness can be modeled as a collection of assignment constraints.\nAs a result, if a weighted set S is an \u03b5-coreset satisfying De\ufb01nition 2.1, then for any \u03b1, \u03b2 \u2208 [0, 1]l, the\n(\u03b1, \u03b2)-proportionally-fair (k, z)-clustering value computed from S must be a (1 \u00b1 \u03b5)-approximation\nof that computed from X.\n\n3 Technical overview\n\nWe introduce novel techniques to tackle the assignment constraints. Recall that \u0393 denotes the number\nof distinct collections of groups that a point may belong to. Our \ufb01rst technical contribution is a general\n\n4\n\n\u03b1j2\n\n\u03b2j2\n\n\freduction to the \u0393 = 1 case which works for any coreset construction algorithm (Theorem 4.3). The\nidea is to divide X into \u0393 parts with respect to the groups that a point belongs to, and construct a fair\ncoreset with parameter \u0393 = 1 for each group. The observation is that the union of these coresets is a\ncoreset for the original instance and \u0393.\nOur coreset construction for the case \u0393 = 1 is based on the framework of [29] in which unconstrained\nk-median/means coresets were provided. The main observation of [29] is that it suf\ufb01ces to deal with\nX that lies on a line. Speci\ufb01cally, they show that it suf\ufb01ces to construct at most O(k\u03b5\u2212d+1) lines,\nproject X to their closest lines and construct an \u03b5/3-coreset for each line. The coreset for each line\nis then constructed by partitioning the line into poly(k/\u03b5) contiguous sub-intervals, and designate\nat most two points to represent each sub-interval and include these points in the coreset. In their\nanalysis, a crucially used property is that the clustering for any given centers partitions X into k\ncontiguous parts on the line, since each point must be assigned to its nearest center. However, this\nproperty might not hold in fair clustering, which is our main dif\ufb01culty. Nonetheless, we manage\nto show a new structural lemma, that the optimal fair k-median/means clustering partitions X into\nO(k) contiguous intervals. Speci\ufb01cally, for fair k-median, the key geometric observation is that there\nalways exists a center whose corresponding optimal fair k-median cluster forms a contiguous interval\n(Claim 4.1), and this combined with an induction implies the optimal fair clustering partitions X into\n2k \u2212 1 intervals. For fair k-means, we show that each optimal fair cluster actually forms a single\ncontiguous interval. Thanks to the new structural properties, plugging in a slightly different set of\nparameters in [29] yields fair coresets.\n\n4 Coresets for fair clustering\nFor each x \u2208 X, denote Px = {i \u2208 [l] : x \u2208 Pi} as the collection of groups that x belongs to. Let\n\u0393X denote the number of distinct Px\u2019s, i.e. \u0393X := |{Px : x \u2208 X}|. Let Tz(n) denote the running\ntime of a constant approximation algorithm for the (k, z)-clustering problem. The main theorems are\nas follows.\nTheorem 4.1 (Coreset for fair k-median (z = 1)). There exists an algorithm that constructs an\n\u03b5-coreset for the fair k-median problem of size O(\u0393k2\u03b5\u2212d), in O(k\u03b5\u2212d+1n + T1(n)) time.\nTheorem 4.2 (Coreset for fair k-means (z = 2)). There exists an algorithm that constructs \u03b5-\ncoreset for the fair k-means problem of size O(\u0393k3\u03b5\u2212d\u22121), in O(k\u03b5\u2212d+1n + T2(n)) time.\nNote that \u0393X is usually small. For instance, if there is only one sensitive attribute [42], then each Px\nis singleton and hence \u0393X = l. More generally, let \u039b denote the maximum number of groups that\nany point belongs to, then \u0393X \u2264 l\u039b, but there is often only O(1) sensitive attributes for each point.\nAs noted above, the main technical dif\ufb01culty for the coreset construction is to deal with the assign-\nment constraints. We make an important observation (Theorem 4.3), that one only needs to prove\nTheorem 4.1 for the case l = 1.The proof of Theorem 4.3 can be found in the full version. This\ntheorem is a generalization of Theorem 7 in [42], and the coreset of [42] actually extends to arbitrary\ngroup structure thanks to our theorem.\nTheorem 4.3 (Reduction from l groups to a single group). Suppose there exists an algorithm\n\nthat computes an \u03b5-coreset of size t for the fair (k, z)-clustering problem of (cid:98)X with l = 1, in time\nT (|(cid:98)X|, \u03b5, k, z). Then there exists an algorithm that takes a set X, and computes an \u03b5-coreset of size\n\n\u0393X \u00b7 t for the fair (k, z)-clustering problem, in time \u0393X \u00b7 T (|X|, \u03b5, k, z).\nOur coreset construction for both fair k-median and k-means are similar to that in [29], except using\na different set of parameters. At a high level, the algorithm reduces general instances to instances\nwhere data lie on a line, and it only remains to give a coreset for the line case. Next, we focus on fair\nk-median, and the construction for the k-means case is similar and can be found in the full version.\nRemark 4.1. Theorem 4.3 can be applied to construct an \u03b5-coreset of size O(\u0393X k\u03b5\u2212d+1) for the\nfair k-center clustering problem, since Har-Peled\u2019s coreset result [28] directly provides an \u03b5-coreset\nof size O(k\u03b5\u2212d+1) for the case of l = 1.\n\n4.1 The line case\n(cid:80)\nSince l = 1, we interpret F as an integer vector in Zk\u22650. For a weighted point set S with weight\np\u2208S w(p) \u00b7 p and the error of S by\nw : S \u2192 R\u22650, we de\ufb01ne the mean of S by S := 1|S|\n\n5\n\n\fFigure 1: an illustration of Algorithm 1 that divides X into 9 batches.\n\n\u2206(S) :=(cid:80)\n\np\u2208S w(p) \u00b7 d(p, S). Denote OPT as the optimal value of the unconstrained k-median\nclustering. Our construction is similar to [29] and we summarize it in Algorithm 1. An illustration of\nAlgorithm 1 may be found in Figure 1.\n\nk \u2208 [n], a number OPT as the optimal value of k-median clustering.\n\nInput: X = {x1, . . . , xn} \u2282 Rd lying on the real line where x1 \u2264 . . . \u2264 xn, an integer\nOutput: an \u03b5-coreset S of X together with weights w : S \u2192 R\u22650.\n1 Set a threshold \u03be satisfying that \u03be = \u03b5\u00b7OPT\n30k ;\n2 Consider the points from x1 to xn and group them into batches in a greedy way: each batch\n\n3 Denote B(X) as the collection of all batches. Let S \u2190(cid:83)\n\nB is a maximal point set satisfying that \u2206(B) \u2264 \u03be;\n\n4 For each point x = B \u2208 S, w(x) \u2190 |B|;\n5 Return (S, w);\n\nB\u2208B(X) B;\n\nAlgorithm 1: FairMedian-1D(X, k)\n\nTheorem 4.4 (Coreset for fair k-median when X lies on a line). Algorithm 1 computes an \u03b5/3-\ncoreset S for fair k-median clustering of X, in time O(|X|).\nThe running time is immediate since for each batch B \u2208 B(X), it only costs O(|B|) time to compute\nB. Hence, Algorithm 1 runs in O(|X|) time. We focus on correctness in the following. In [29],\nit was shown that S is an \u03b5/3-coreset for the unconstrained k-median clustering problem. In their\nanalysis, it is crucially used that the optimal clustering partitions X into k contiguous intervals.\nUnfortunately, the nice \u201ccontiguous\u201d property does not hold in our case because of the assignment\nconstraint F \u2208 Rk. To resolve this issue, we prove a new structural property (Lemma 4.1) that the\noptimal fair k-median clustering actually partitions X into only O(k) contiguous intervals. With this\nproperty, Theorem 4.4 is implied by a similar argument as in [29]. The detailed proof can be found in\nthe full version.\nLemma 4.1 (Fair k-median clustering consists of 2k \u2212 1 contiguous intervals). Suppose\nX := {x1, . . . , xn} \u2282 Rd lies on the real line where x1 \u2264 . . . \u2264 xn. For every k-subset\nC = (c1, . . . , ck) \u2208 Rd and every assignment constraints F \u2208 Zk\u22650, there exists an optimal\nfair k-median clustering that partitions X into at most 2k \u2212 1 contiguous intervals.\nProof. We prove by induction on k. The induction hypothesis is that, for every k \u2265 1, Lemma 4.1\nholds for any data set X, any k-subset C \u2286 Rd and any assignment constraint F \u2208 Zk\u22650. The base\ncase k = 1 holds trivially since all points in X must be assigned to c1.\nAssume the lemma holds for k\u2212 1 (k \u2265 2) and we will prove the inductive step k. Let C (cid:63)\nthe optimal fair k-median clustering w.r.t. C and F , where C (cid:63)\nci. We present the structural property in Claim 4.1, whose proof can be found in the full version.\nClaim 4.1. There exists i \u2208 [k] such that C (cid:63)\nWe continue the proof of the inductive step by constructing a reduced instance (X(cid:48), F (cid:48), C(cid:48)) where a)\nC(cid:48) := C \\ {ci0}; b) X(cid:48) = X \\ C (cid:63)\ni0; c) F (cid:48) is formed by removing the i0-th coordinate of F . Applying\nthe hypothesis on (X(cid:48), F (cid:48), C(cid:48)), we know the optimal fair (k \u2212 1)-median clustering consists of at\n\nk be\ni \u2286 X is the subset assigned to center\n\ni consists of exactly one contiguous interval.\n\n1 , . . . , C (cid:63)\n\n6\n\nx1x2x3x4xn\u22122xn\u22121xnB1:w(B1)=4B9:w(B9)=3......B1:\u2206(B1)\u2264\u03beB9:\u2206(B9)\u2264\u03be\fmost 2k \u2212 3 contiguous intervals. Combining with C (cid:63)\ni0 which has exactly one contiguous interval\nwould increase the number of intervals by at most 2. Thus, we conclude that the optimal fair k-median\nclustering for (X, F, C) has at most 2k \u2212 1 contiguous intervals. This \ufb01nishes the inductive step.\n\n4.2 Extending to higher dimension\n\nThe extension is the same as that of [29]. We start with a set of k centers that is a O(1)-approximate\nsolution C (cid:63) to unconstrained k-median. Then we emit O(\u03b5\u2212d+1) rays around each center c in C (cid:63)\n(which correspond to an O(\u03b5)-net on the unit sphere centered at c), and project data points to the\nnearest ray, such that the total projection cost is \u03b5 \u00b7 OPT/3. Then for each line, we compute an\n\u03b5/3-coreset for fair k-median by Theorem 4.4, and let S denote the combination of coresets generated\nfrom all lines. By the same argument as in Theorem 2.9 of [29], S is an \u03b5-coreset for fair k-median\nclustering, which implies Theorem 4.1. The detailed proof can be found in the full version.\nRemark 4.2. In fact, it suf\ufb01ces to emit an arbitrary set of rays such that the total projection cost is\nat most \u03b5 \u00b7 OPT/3. This observation is crucially used in our implementations (Section 5) to reduce\nthe size of the coreset, particularly to avoid the construction of the O(\u03b5)-net which is of O(\u03b5\u2212d) size.\n\n5 Empirical results\n\nWe implement our algorithm and evaluate its performance on real datasets.3 The implementation\nmostly follows our description of algorithms, but a vanilla implementation would bring in an \u03b5\u2212d\nfactor in the coreset size. To avoid this, as observed in Remark 4.2, we may actually emit any set\nof rays as long as the total projection cost is bounded, instead of \u03b5\u2212d rays. We implement this idea\nby \ufb01nding the smallest integer m and m lines, such that the minimum cost of projecting data onto\nm lines is within the error threshold. In our implementation for fair k-means, we adopt the widely\nused Lloyd\u2019s heuristic [37] to \ufb01nd the m lines, where the only change to Lloyd\u2019s heuristic is that, for\neach cluster, we need to \ufb01nd a line that minimizes the projection cost instead of a point, and we use\nSVD to ef\ufb01ciently \ufb01nd this line optimally. Unfortunately, the above approach does not work for fair\nk-median, as the SVD does not give the optimal line. As a result, we still need to construct the \u03b5-net,\nbut we alternatively employ some heuristics to \ufb01nd the net adaptively w.r.t. the dataset.\nOur evaluation is conducted on four datasets: Adult (~50k), Bank (~45k) and Diabetes (~100k) from\nUCI Machine Learning Repository [23], and Athlete (~200k) from [1], which are also considered in\nprevious papers [20, 42, 7]. For all datasets, we choose numerical features to form a vector in Rd for\neach record, where d = 6 for Adult, d = 10 for Bank, d = 29 for Diabetes and d = 3 for Athlete.\nWe choose two sensitive types for the \ufb01rst three datasets: sex and marital for Adult (9 groups,\n\u0393 = 14); marital and default for Bank (7 groups, \u0393 = 12); sex and age for Diabetes (12 groups,\n\u0393 = 20), and we choose a binary sensitive type sex for Athlete (2 groups, \u0393 = 2). In addition, in the\nfull version, we will also discuss how the following affects the result: a) choosing a binary type as the\nsensitive type, or b) normalization of the dataset. We pick k = 3 (i.e. number of clusters) throughout\nour experiment. We de\ufb01ne the empirical error as | Kz(S,F,C)\nKz(X,F,C) \u22121| (which is the same measure as \u03b5) for\nsome F and C. To evaluate the empirical error, we draw 500 independent random samples of (F, C)\nand report the maximum empirical error among these samples. For each (F, C), the fair clustering\nobjectives Kz(\u00b7, F, C) may be formulated as integer linear programs (ILP). We use CPLEX [32] to\nsolve the ILP\u2019s, report the average running time4 TX and TS for evaluating the objective on dataset\nX and coreset S respectively, and also report the running time TC for constructing coreset S.\nFor both k-median and k-means, we employ uniform sampling (Uni) as a baseline, in which we\npartition X into \u0393 parts according to distinct Px\u2019s (the collection of groups that x belongs to) and take\nuniform samples from each collection. Additionally, for k-means, we select another baseline from\na recent work [42] that presented a coreset construction for fair k-means, whose implementation is\nbased on the BICO library which is a high-performance coreset-based library for computing k-means\nclustering [26]. We evaluate the performance of our coreset for fair k-means against BICO and Uni.\nAs a remark of BICO and Uni implementations, they do not support specifying parameter \u03b5, but a\nhinted size of the resulted coreset. Hence, we start with evaluating our coreset, and set the hinted size\nfor Uni and BICO as the size of our coreset.\n\n3https://github.com/sfjiang1990/Coresets-for-Clustering-with-Fairness-Constraints.\n4The experiments are conducted on a 4-Core desktop CPU with 64 GB RAM.\n\n7\n\n\fWe also showcase the speed-up to two recently published approximation algorithms by applying\na 0.5-coreset. The \ufb01rst algorithm is a practically ef\ufb01cient, O(log n)-approximate algorithm for\nfair k-median [6] that works for a binary type, referred to as FairTree. The other one is a bi-\ncriteria approximation algorithm [7] for both fair k-median and k-means, referred to as FairLP. We\nslightly modify the implementations of FairTree and FairLP to enable them work with our coreset,\nparticularly making them handle weighted inputs ef\ufb01ciently. We do experiments on a large dataset\nCensus1990 which consists of about 2.5 million records (where we select d = 13 features and a\nbinary type), in addition to the above-mentioned Adult, Bank, Diabetes and Athlete datasets.\n\nTable 3: performance of \u03b5-coresets for fair k-median w.r.t. varying \u03b5.\n\n\u03b5\n\nemp. err.\n\nOurs\n\nUni\n\nsize\n\nTS (ms)\n\nTC (ms)\n\nTX (ms)\n\nk\nn\na\nB\n\nt\nl\nu\nd\nA\n\n10% 2.36% 12.28%\n20% 4.36% 17.17%\n30% 4.46% 15.12%\n40% 8.52% 31.96%\n10% 1.45% 5.32%\n20% 2.24% 3.38%\n30% 4.18% 14.60%\n40% 5.35% 10.53%\n\n262\n215\n161\n139\n2393\n1101\n506\n293\ns 10% 0.55% 6.38% 85822\n20% 1.62% 15.44% 34271\n30% 3.61% 1.92%\n6693\n2949\n40% 5.33% 3.67%\n3959\ne 10% 1.14% 2.87%\n685\n20% 2.59% 4.38%\n316\n30% 4.86% 4.98%\n40% 8.25% 16.59%\n112\n\ne\nt\ne\nb\na\ni\nD\n\nh\nt\nA\n\nt\ne\nl\n\n13\n12\n9\n9\n111\n50\n24\n14\n\n12112\n3267\n411\n160\n96\n19\n11\n7\n\n408\n311\n295\n282\n971\n689\n476\n452\n\n141212\n16040\n5017\n3916\n8141\n3779\n2763\n2390\n\n7101\n\n-\n-\n-\n\n5453\n\n-\n-\n-\n\n17532\n\n-\n-\n-\n\n74851\n\n-\n-\n-\n\nTable 4: performance of \u03b5-coresets for fair k-means w.r.t. varying \u03b5.\nTC (ms)\n\nsize\n\nTS (ms)\n\n\u03b5\n\nUni\n\nt\nl\n\nu\nd\nA\n\nk\nn\na\nB\n\nOurs\n10% 0.28%\n20% 0.55%\n30% 1.17%\n40% 2.20%\n10% 2.85%\n20% 2.93%\n30% 2.68%\n40% 2.30%\n\nemp. err.\nBICO\n10.63%\n1.04%\n2.87%\n1.12%\n19.91%\n4.06%\n48.10%\n4.45%\n30.68%\n2.71%\n45.09%\n4.59%\n24.82%\n6.10%\n33.42%\n5.66%\n1.91%\ns 10% 4.39% 10.54%\n20% 11.24% 11.32%\n4.41%\n30% 14.52% 20.54% 13.46%\n40% 13.95% 22.05% 10.92%\n10.96%\n20% 11.41% 21.31% 10.62%\n30% 13.18% 29.97% 16.93%\n40% 13.01% 29.74% 152.31%\n\ne 10% 5.43%\n\ne\nt\ne\nb\na\ni\nD\n\nt\ne\nl\nh\nt\nA\n\n4.94%\n\n880\n610\n503\n433\n409\n280\n230\n194\n50163\n3385\n958\n775\n1516\n213\n98\n83\n\n44\n29\n26\n22\n19\n14\n11\n10\n5300\n168\n44\n35\n36\n9\n7\n6\n\nOurs\n1351\n511\n495\n492\n507\n478\n531\n505\n65189\n5138\n2680\n2657\n14534\n3566\n2591\n2613\n\nBICO\n786\n788\n750\n768\n718\n712\n711\n690\n2615\n1544\n1480\n1462\n1160\n1090\n1076\n1066\n\nTX (ms)\n\n7404\n\n-\n-\n-\n\n5128\n\n-\n-\n-\n\n16312\n\n-\n-\n-\n\n73743\n\n-\n-\n-\n\n5.1 Results\n\nTable 3 and 4 summarize the accuracy-size trade-off of our coresets for fair k-median and k-means\nrespectively, under different error guarantee \u03b5. Since the coreset construction time TC for Uni is very\nsmall (usually less than 50 ms) we do not report it in the table. From the table, a key \ufb01nding is that\nthe size of the coreset does not suffer from the \u03b5\u2212d factor thanks to our optimized implementation.\n\n8\n\n\fTable 5: speed-up of fair clustering algorithms using our coreset. objALG/objALG is the runtime/clustering\nobjective w/o our coreset and T (cid:48)\nALG\n\nALG is the runtime/clustering objective on top of our coreset.\n\nALG/obj(cid:48)\n\nAdult\n\nBank\n\nDiabetes\n\nAthlete\n\nCensus1990\n\nFairTree (z = 1)\nFairLP (z = 2)\nFairTree (z = 1)\nFairLP (z = 2)\nFairTree (z = 1)\nFairLP (z = 2)\nFairTree (z = 1)\nFairLP (z = 2)\nFairTree (z = 1)\nFairLP (z = 2)\n\nobjALG\n\n2.09 \u00d7 109\n1.23 \u00d7 1014\n5.69 \u00d7 106\n1.53 \u00d7 109\n1.13 \u00d7 106\n1.47 \u00d7 107\n2.50 \u00d7 106\n3.33 \u00d7 107\n9.38 \u00d7 106\n4.19 \u00d7 107\n\nobj(cid:48)\n\nALG\n\n1.23 \u00d7 109\n1.44 \u00d7 1014\n4.70 \u00d7 106\n1.46 \u00d7 109\n9.50 \u00d7 105\n1.08 \u00d7 107\n2.42 \u00d7 106\n2.89 \u00d7 107\n7.65 \u00d7 106\n1.32 \u00d7 107\n\nTALG (s)\n12.62\n19.92\n14.62\n17.41\n19.26\n55.11\n29.94\n37.50\n450.79\n1048.72\n\nT (cid:48)\nALG (s)\n0.38\n0.20\n0.64\n0.08\n1.70\n0.41\n1.34\n0.03\n23.36\n0.06\n\nTC (s)\n0.63\n1.03\n0.60\n0.50\n2.96\n2.61\n2.35\n2.42\n20.28\n31.05\n\nAs for the fair k-median, the empirical error of our coreset is well under control. In particular, to\nachieve 5% empirical error, only less than 3 percents of data is necessary for all datasets, and this\nresults in a ~200x acceleration in evaluating the objective and 10x acceleration even taking the coreset\nconstruction time into consideration.5 Regarding the running time, our coreset construction time\nscales roughly linearly with the size of the coreset, which means our algorithm is output-sensitive.\nThe empirical error of Uni is comparable to ours on Diabetes, but the worst-case error is unbounded\n(2x-10x to our coreset, even larger than \u03b5) in general and seems not stable when \u03b5 varies.\nOur coreset works well for fair k-means, and it also offers signi\ufb01cant acceleration of evaluating the\nobjective. Compared with BICO, our coreset achieves smaller empirical error for \ufb01xed \u03b5 and the\nconstruction time is between 0.5x to 2x that of BICO. Again, the empirical error of Uni could be 2x\nsmaller than ours and BICO on Diabetes, but the worst-case error is unbounded in general.\nTable 5 demonstrates the speed-up to FairTree and FairLP with the help of our coreset. We observed\nthat the adaption of our coresets offers a 5x-15x speed-up to FairTree and a 15x-30x speed-up to\nFairLP for all datasets, even taking the coreset construction time into consideration. Speci\ufb01cally,\nthe runtime on top of our coreset for FairLP is less than 1s for all datasets, which is extremely\nfast. We also observe that the clustering objective obj(cid:48)\nALG on top of our coresets is usually within\n0.6-1.2 times of objALG which is the objective without the coreset (noting that coresets might shrink\nthe objective). The only exception is FairLP on Census1990, in which obj(cid:48)\nALG is only 35% of\nobjALG. A possible reason is that in the implementation of FairLP, an important step is to compute\nan approximate (unconstrained) k-means clustering solution on the dataset by employing the sklearn\nlibrary [39]. However, sklearn tends to trade accuracy for speed when the dataset gets large. As a\nresult, FairLP actually \ufb01nds a better approximate k-means solution on the coreset than on the large\ndataset Census1990 and hence applying coresets can achieve a much smaller clustering objective.\n\n6 Future work\n\nThis paper constructs \u03b5-coresets for the fair k-median/means clustering problem of size independent\non the full dataset, and when the data may have multiple, non-disjoint types. Our coreset for fair\nk-median is the \ufb01rst known coreset construction to the best of our knowledge. For fair k-means, we\nimprove the coreset size of the prior result [42], and extend it to multiple non-disjoint types. The\nempirical results show that our coresets are indeed much smaller than the full dataset and result in\nsigni\ufb01cant reductions in the running time of computing the fair clustering objective.\nOur work leaves several interesting futural directions. For unconstrained clustering, there exist several\nworks using the sampling approach such that the coreset size does not depend exponentially on the\nEuclidean dimension d. It is interesting to investigate whether sampling approaches can be applied\nfor constructing fair coresets and achieve similar size bound as the unconstrained setting. Another\ndirection is to construct coresets for general fair (k, z)-clustering beyond k-median/means/center.\n\n5The same coreset may be used for clustering with any assignment constraints, so its construction time would\n\nbe averaged out if multiple fair clustering tasks are performed.\n\n9\n\n\fAcknowledgments\n\nThis research was supported in part by NSF CCF-1908347, SNSF 200021_182527, ONR Award\nN00014-18-1-2364 and a Minerva Foundation grant.\n\nReferences\n[1] 120 years of olympic history: athletes and results. https://www.kaggle.com/heesoo37/\n\n120-years-of-olympic-history-athletes-and-results.\n\n[2] Pankaj K Agarwal, Sariel Har-Peled, and Kasturi R Varadarajan. Approximating extent measures\n\nof points. Journal of the ACM (JACM), 51(4):606\u2013635, 2004.\n\n[3] Pankaj K Agarwal and Cecilia Magdalena Procopiuc. Exact and approximation algorithms for\n\nclustering. Algorithmica, 33(2):201\u2013226, 2002.\n\n[4] Sara Ahmadian, Alessandro Epasto, Ravi Kumar, and Mohammad Mahdian. Clustering without\nover-representation. In The 36th International Conference on Machine Learning (ICML), 2019.\n\n[5] Aris Anagnostopoulos, Luca Becchetti, Matteo B\u00f6hm, Adriano Fazzone, Stefano Leonardi,\nCristina Menghini, and Chris Schwiegelshohn. Principal fairness: Removing bias via projections.\nIn The 36th International Conference on Machine Learning (ICML), 2019.\n\n[6] Arturs Backurs, Piotr Indyk, Krzysztof Onak, Baruch Schieber, Ali Vakilian, and Tal Wagner.\nScalable fair clustering. In The 36th International Conference on Machine Learning (ICML),\n2019.\n\n[7] Suman K. Bera, Deeparnab Chakrabarty, and Maryam Negahbani. Fair algorithms for clustering.\n\nCoRR, abs/1901.02393, 2019.\n\n[8] Ioana O Bercea, Martin Gro\u00df, Samir Khuller, Aounon Kumar, Clemens R\u00f6sner, Daniel R\nSchmidt, and Melanie Schmidt. On the cost of essentially fair clusterings. arXiv preprint\narXiv:1811.10319, 2018.\n\n[9] Vladimir Braverman, Dan Feldman, and Harry Lang. New frameworks for of\ufb02ine and streaming\n\ncoreset constructions. CoRR, abs/1612.00889, 2016.\n\n[10] Robin Burke, Alexander Felfernig, and Mehmet H G\u00f6ker. Recommender systems: An overview.\n\nAI Magazine, 32(3):13\u201318, 2011.\n\n[11] L. Elisa Celis, Lingxiao Huang, Vijay Keswani, and Nisheeth K. Vishnoi. Classi\ufb01cation\nwith fairness constraints: A meta-algorithm with provable guarantees. In Proceedings of the\nConference on Fairness, Accountability, and Transparency, pages 319\u2013328. ACM, 2019.\n\n[12] L. Elisa Celis, Lingxiao Huang, and Nisheeth K. Vishnoi. Multiwinner voting with fairness\nconstraints. In Proceedings of the 27th International Joint Conference on Arti\ufb01cial Intelligence,\npages 144\u2013151. AAAI Press, 2018.\n\n[13] L. Elisa Celis, Sayash Kapoor, Farnood Salehi, and Nisheeth K. Vishnoi. Controlling po-\nlarization in personalization: An algorithmic framework. In Fairness, Accountability, and\nTransparency in Machine Learning, 2019.\n\n[14] L. Elisa Celis, Vijay Keswani, Damian Straszak, Amit Deshpande, Tarun Kathuria, and\nIn International\n\nNisheeth K. Vishnoi. Fair and diverse DPP-based data summarization.\nConference on Machine Learning, pages 715\u2013724, 2018.\n\n[15] L. Elisa Celis, Anay Mehrotra, and Nisheeth K. Vishnoi. Towards controlling discrimination in\n\nonline Ad auctions. In International Conference on Machine Learning, 2019.\n\n[16] L. Elisa Celis, Damian Straszak, and Nisheeth K. Vishnoi. Ranking with fairness constraints.\nIn 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018),\nvolume 107, page 28. Schloss Dagstuhl\u2013Leibniz-Zentrum fuer Informatik, 2018.\n\n10\n\n\f[17] L. Elisa Celis and Nisheeth K. Vishnoi. Fair personalization. In Fairness, Accountability, and\n\nTransparency in Machine Learning, 2017.\n\n[18] Ke Chen. On k-median clustering in high dimensions. In SODA, pages 1177\u20131185. Society for\n\nIndustrial and Applied Mathematics, 2006.\n\n[19] Xingyu Chen, Brandon Fain, Charles Lyu, and Kamesh Munagala. Proportionally fair clustering.\n\nIn The 36th International Conference on Machine Learning (ICML), 2019.\n\n[20] Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. Fair clustering\nthrough fairlets. In Advances in Neural Information Processing Systems, pages 5029\u20135037,\n2017.\n\n[21] Joydeep Das, Partha Mukherjee, Subhashis Majumder, and Prosenjit Gupta. Clustering-based\nrecommender system using principles of voting theory. In 2014 International Conference on\nContemporary Computing and Informatics (IC3I), pages 230\u2013235. IEEE, 2014.\n\n[22] Amit Datta, Michael Carl Tschantz, and Anupam Datta. Automated experiments on Ad privacy\nsettings: A tale of opacity, choice, and discrimination. Proceedings on Privacy Enhancing\nTechnologies, 2015(1):92\u2013112, 2015.\n\n[23] Dheeru Dua and Casey Graff. UCI machine learning repository. http://archive.ics.uci.\nedu/ml, University of California, Irvine, School of Information and Computer Sciences, 2017.\n\n[24] D. Feldman and M. Langberg. A uni\ufb01ed framework for approximating and clustering data. In\n\nSTOC, pages 569\u2013578, 2011.\n\n[25] Dan Feldman, Melanie Schmidt, and Christian Sohler. Turning big data into tiny data: Constant-\n\nsize coresets for k-means, PCA and projective clustering. In SODA, pages 1434\u20131453, 2013.\n\n[26] Hendrik Fichtenberger, Marc Gill\u00e9, Melanie Schmidt, Chris Schwiegelshohn, and Christian\n\nSohler. BICO: BIRCH meets coresets for k-means clustering. In ESA, 2013.\n\n[27] Elena L Glassman, Rishabh Singh, and Robert C Miller. Feature engineering for clustering\nstudent solutions. In Proceedings of the \ufb01rst ACM conference on Learning@ scale conference,\npages 171\u2013172. ACM, 2014.\n\n[28] Sariel Har-Peled. Clustering motion. Discrete & Computational Geometry, 31(4):545\u2013565,\n\n2004.\n\n[29] Sariel Har-Peled and Akash Kushal. Smaller coresets for k-median and k-means clustering.\n\nDiscrete & Computational Geometry, 37(1):3\u201319, 2007.\n\n[30] Lingxiao Huang, Shaofeng Jiang, Jian Li, and Xuan Wu. Epsilon-coresets for clustering\n(with outliers) in doubling metrics. In 2018 IEEE 59th Annual Symposium on Foundations of\nComputer Science (FOCS), pages 814\u2013825. IEEE, 2018.\n\n[31] Lingxiao Huang, Shaofeng H.-C. Jiang, and Nisheeth K. Vishnoi. Coresets for clustering with\n\nfairness constraints. CoRR, abs/1906.08484, 2019.\n\n[32] IBM. IBM ILOG CPLEX optimization studio CPLEX user\u2019s manual, version 12 release 6,\n\n2015.\n\n[33] Sheng-Yi Jiang, Qi Zheng, and Qian-Sheng Zhang. Clustering-based feature selection. Acta\n\nElectronica Sinica, 36(12):157\u2013160, 2008.\n\n[34] Matth\u00e4us Kleindessner, Pranjal Awasthi, and Jamie Morgenstern. Fair k-center clustering for\ndata summarization. In The 36th International Conference on Machine Learning (ICML), 2019.\n\n[35] Matth\u00e4us Kleindessner, Samira Samadi, Pranjal Awasthi, and Jamie Morgenstern. Guarantees\nIn The 36th International Conference on\n\nfor spectral clustering with fairness constraints.\nMachine Learning (ICML), 2019.\n\n[36] Michael Langberg and Leonard J. Schulman. Universal \u03b5-approximators for integrals. In SODA,\n\npages 598\u2013607, 2010.\n\n11\n\n\f[37] Stuart Lloyd. Least squares quantization in pcm. IEEE transactions on information theory,\n\n28(2):129\u2013137, 1982.\n\n[38] Claire Cain Miller. Can an algorithm hire better than a human? The New York Times, 25, 2015.\n\n[39] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,\nP. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,\nM. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine\nLearning Research, 12:2825\u20132830, 2011.\n\n[40] Manh Cuong Pham, Yiwei Cao, Ralf Klamma, and Matthias Jarke. A clustering approach for\ncollaborative \ufb01ltering recommendation using social network analysis. J. UCS, 17(4):583\u2013604,\n2011.\n\n[41] Clemens R\u00f6sner and Melanie Schmidt. Privacy preserving clustering with constraints. In 45th\nInternational Colloquium on Automata, Languages, and Programming (ICALP 2018). Schloss\nDagstuhl-Leibniz-Zentrum fuer Informatik, 2018.\n\n[42] Melanie Schmidt, Chris Schwiegelshohn, and Christian Sohler. Fair coresets and streaming\n\nalgorithms for fair k-means clustering. arXiv preprint arXiv:1812.10854, 2018.\n\n[43] Pang-Ning Tan, Michael Steinbach, Vipin Kumar, et al. Cluster analysis: basic concepts and\n\nalgorithms. Introduction to data mining, 8:487\u2013568, 2006.\n\n[44] Ke Yang and Julia Stoyanovich. Measuring fairness in ranked outputs. In Proceedings of the\n29th International Conference on Scienti\ufb01c and Statistical Database Management, page 22.\nACM, 2017.\n\n12\n\n\f", "award": [], "sourceid": 4148, "authors": [{"given_name": "Lingxiao", "family_name": "Huang", "institution": "EPFL"}, {"given_name": "Shaofeng", "family_name": "Jiang", "institution": "Weizmann Institute of Science"}, {"given_name": "Nisheeth", "family_name": "Vishnoi", "institution": "Yale University"}]}