Review for NeurIPS paper: Probabilistic Fair Clustering

NeurIPS 2020

Probabilistic Fair Clustering

Review 1

Summary and Contributions: The authors introduce a probabilistic version of fair clustering, where group membership is only known probabilistically. In unsupervised learning (and specifically fair clustering) this is a new problem that, as far as I know, has not yet been studied. Also, it seems to apply in many practical cases. The authors also present approximation algorithms for the special case of 2 groups, as well as the generalization to multiple groups. Additionally, the authors study “metric membership” where groups have ordering, e.g., age (where the fairness constraint is slightly different). Empirical experiments are executed on a handful of datasets to demonstrate the algorithms effectiveness in practice.First, this paper proposes to study a problem that is both seemingly useful in practice as well as interesting in its own right. Second, the approach taken to solve the two color variant of the problems follows previous work nicely and seems theoretically sound.

Strengths: First, this paper proposes to study a problem that is both seemingly useful in practice as well as interesting in its own right. Second, the approach taken to solve the two color variant of the problems is interesting, follows previous work nicely and seems theoretically sound. Third, the experimental results provide sufficient validation of the proposed algorithms. Fourth, the writing and clarity are very strong.

Weaknesses: Although I believe that in theory Algorithm 2 is correct, it is not very interesting (i.e., it boils down to simply sampling colors independently, and then solving a non-probabilistic problem). Moreover, I’m a bit concerned about the application of Algorithm 2 in practice, when the number of datapoints is small (indeed, the only empirical demonstration is performed with the largest dataset). In my opinion, the current motivation for metric membership is weak (i.e., that when groups are ordered information may be “left on the table”). Please motivate this with an example problem where guaranteeing fairness for metric membership is important. EDIT AFTER AUTHOR RESPONSE: thank you for addressing my concerns and pointing to the relevant sections in the appendix. Also, thank for agreeing to add more details and motivation for the metric membership piece.

Correctness: As far as I can tell, the theoretical claims are correct. The setup for the experiments is appropriate, well-described and well executed.

Clarity: Yes, the paper is well written and it is easy to follow most of the ideas. I especially appreciated the diagrams in appendix C (and have suggested additional improvements below). One thing that is unclear and I’m curious about is how the choice of the function mapping points fractionally to groups (in the 2 group case) affects the experimental results. Could you discuss this a bit? I imagine this has a significant effect on the empirical solutions.

Relation to Prior Work: Yes, relevant prior work is discussed and it’s clear how this work differs from what’s previously been done.

Reproducibility: Yes

Additional Feedback: In appendix C, you use the word “wedge” when I think you mean “edge”. The figures in appendix C could be improved by labeling E_{C,C_i} and E_{C_i, S} etc. I would consider moving Algorithm 1 to the appendix in exchange for a Figure that explains how to construct the flow network. Instead of leaving the proofs entirely to the appendix, I would suggest accompanying each theoretical statement with a “proof sketch” to describe the basic idea in the corresponding proof. The charts in Figures 2, 3 and 4 are tiny. Please make these bigger. I recommend changing the colors of the fair algs in Figure 2 to be a shade of green, and the unfair algs to be a shade of red. I’m curious how the solutions produced by the algorithms you present are different from: 1) assigning each point to the group to which it is most likely a member and 2) using a previous fair clustering algorithm. One thing that would add nicely to the paper is if, aside from the data inherently exhibiting probabilistic group memberships, there were advantages of your algorithms over previous approaches (with some pre-processing applied to assign points to groups).

Review 2

Summary and Contributions: The submission deals with clusterings that offer fair representation of data points from different classes or groups ("colors") representing, e.g., minorities. The contribution of the submission is to consider the case that group/class membership is not necessarily deterministic, but each point may belong to groups with a certain probability. A second, related model considered in the submission is the metric membership model (MMM). Here, instead of a group, each point is associated with a membership integer from 0,...,R. The model then imposes that in each cluster, the average membership lies within given bounds. The motivation here is that we could for instance want to have the same average of a protected ordered commodity (income, age,...) in each cluster. The author provides a black-box approximation that works both for PFC and MMM: Suppose that we have an alpha-approximation for the colorblind problem (without fairness constraints). Also suppose that we have an algorithm for the problem of fairly assigning points to fixed centers with minimum cost, and that this algorithm violates the above probabilistic fairness constraints/MMM constraints by at most gamma. Then, the author proves that there is an (alpha+2)a-approximation algorithm for PFC/MMM that violates the probabilistic fairness constraints by at most gamma. For the probabilistic case with more than two colors, the result is under the assumption that each cluster in the optimum solution has size n^r for some constant 0 < r < 1. The approximation algorithm works along similar lines to a Bera et al./Bercea et al. result: First compute a colorblind solution, then reassign the points to the the colorblind centers in a fair fashion. In the 2-color case and the MMM case, the re-assignment is done by computing a minimum cost flow (similarly to Bercea et al.); in the multi-color case, the algorithm simply samples a color for each point according to the given probabilities and then applies the deterministic algorithm by Bera et al./Bercea et al. The algorithms requires a lower bound for the cluster size in the input. The theoretical results are evaluated by computational experiments: The authors analyze the increase in cost when going from the approximate colorblind to the approximate fair solution, as well as the concrete fairness violation. Edit after the author rebuttal: I thank the authors for their detailed and convincing response that has addressed my concerns. I recommend to accept the paper.

Strengths: It is an interesting result that the Bercea et al./Bera et al. algorithm may be extended to the probabilistic case; even if the randomized algorithm is fairly straight-forward. The metric membership model looks interesting as well.

Weaknesses: The required lower bound on the minimum cluster size is an additional hurdle for practical applications. I feel that the computational results could be more extensive.

Correctness: Yes, but the authors could have tested the influence of cluster size assumption on more data sets.

Clarity: Yes.

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: - Theorem 4.1: What's the current best possible for gamma; in particular in the metric membership case? If I understand correctly, that value is R for MMM as by Theorem 4.2 and 2\eps for PFC by Theorem 4.5? I would state that explicitely somewhere. - In the beginning of 4.1.3, I feel the explanation of the connection between the two-color case and the metric membership model could be more verbose. - Figure 3: You should note that price of fairness is with respect to *two* approximation algorithms; to know the true price of fairness we would need to know both the optimum colorblind solution and the optimum fair solution. Figure 3b looks a bit unconclusive. - Figure 4: Why did you choose k=5? What happens for different values of k?

Review 3

Summary and Contributions: This paper studies the problem of fair clustering. In this case one wants to optimize an objective such as k-means/k-median/k-center clustering under additional constraints that in each cluster, the number of points belonging to a sensitive attribute category is at least some value (l) and at most some value(u). Extending prior work, the authors study the case when the membership into sensitive groups is not completely deterministic. This aims to capture realistic scenarios where is it hard to exactly infer the sensitive attribute information about data points. The authors model this scenario as follows. For each data point v, there is a probability distribution p_v over the possible demographic groups (colors in the terminology of the paper). In this case the fairness constraint requires that the expected group memberships in each cluster satisfy the lower and the upper bounds. The authors also study the metric case when the group attribute is integer valued in range [1,R] for a certain value R. Similar to the result of [11], the authors show that any alpha-approximation for the unconstrained problem provided an alpha-approximation to the probabilistic constrained problem with a violation of at most 1 in the constraints. This holds for two groups. For the general case the authors show that under the assumption that each optimal cluster is not too small the problem can be reduced to the deterministic fair clustering formulation. For the metric case this violation is at most R. The overall algorithmic strategy is similar to [11] and the novel aspect of the paper is the rounding algorithm for the LP relaxation of the constraints. Finally, the authors present experimental results comparing their algorithm to the unconstrained optimization in terms of constraint violations and the price of fairness, i.e., ratio of constrained objective to the unconstrained objective. *********Post Rebuttal********* My score post rebuttal remains the same and I'm leaning towards accepting this paper.

Strengths: + The paper extends prior work to a more realistic setting where group memberships are not deterministically known. + Provides algorithms with worst case theoretical guarantees on the approximation ratio and the constraint violations.

Weaknesses: - The algorithmic template closely follows the work of [11]. - For the case of multiple groups the authors need to assume a lower bound on cluster sizes. However, in practice I think this is a reasonable assumption.

Correctness: Yes, the claims are correct.

Clarity: The paper is generally well written.

Relation to Prior Work: Prior work is adequately discussed.

Reproducibility: Yes

Additional Feedback:

Review 4

Summary and Contributions: This paper studies a new variant of the fair clustering problem. Given a set of colored data points in a metric space, the objective of fair clustering is to find a set of k centers, and an assignment that maps every data point to one of the centers, so that the k-mean/k-median/k-center costs are minimized, while the generated clusters should satisfy the fairness requirement: for every color h and every cluster C, the percentage of points with color h in C should be upper and lower bounded by some constants. This paper studies a variant of the fair clustering problem, where the color for every point is not deterministic and is given as a discrete distribution. The fairness constraint is then modified to lower/upper bounds of expected percentage of points of every color in a cluster. Algorithms with proved theoretical quality guarantee are presented. Experiments using real-world datasets are also provided. The paper considers 2 cases of the problem: two color case and multiple color case. For both cases, approximate algorithms are presented. For two color case, the algorithm first identifies the cluster centers using known algorithm by ignoring the colors, then formulates the problem of assigning every data point to every cluster as an LP problem (with relaxation), and finally a min-cost flow based rounding scheme is applied to obtain the assignment from the solution of LP. For multiple color case, the paper makes the assumption that the size of every cluster is large enough. Under this assumption, a sampling based algorithm is presented for the problem.

Strengths: The presented result is interesting. This variant of fair clustering problem seems reasonable, the rounding algorithm seems new, and the analysis is non-trivial.

Weaknesses: The presentation of the paper needs significant improvement. In general, the paper is readable, but many places look awkward (see examples in Clarity). Also, the experiment part is less impressive than the theoretical analysis. The proposed algorithms are compared with classic color-blind clustering algorithm, not previous fair clustering algorithms. Such comparisons do not provide much information. # Update after the rebuttal: The author feedback addressed my concerns and I believe that with an improved writing, this can be an accepted paper.

Correctness: The results seem to be correct. No major issue has been identified.

Clarity: Below are some places that look awkward. (1) In the abstract and introduction, a brief description of the problem should be provided. I really have little idea of what this paper is about from a technical point of view until page 3, where the formal definitions are given. (2) While the two color Probabilistic Fair Clustering and Metric Membership Fair Clustering are highly related, it is not appropriate to use the same notation "FA-PFC" for the problem of solving their constraints ((3b) and 5(b)). They in fact use a different set of notations: u and l in 5(b) do not have subscript, while in 3(b) they have. This is extremely confusing. (3) Introduction of Metric Membership Fair Clustering in Section 3 is awkward. It just suddenly shows up, and at this point it seems irrelevant to the main result of the paper: "generalize this [the problem] by assuming imperfect knowledge of group membership" (4) In Section 4, before describing the algorithm details, the paper should give a high level description of main idea. Also consider labelling section 4.1.3 using "Step 3". (5) Many sentences are not well-crafted.

Relation to Prior Work: I think the paper should cite some work related to chromatic clustering, which can be viewed as a special case of the fairness clustering problem. For example, the paper "A Unified Framework for Clustering Constrained Data without Locality Property", Ding and Xu in SODA 2015. The approach in this paper is somewhat similar to the one in the SODA paper.

Reproducibility: Yes

Additional Feedback: