NeurIPS 2020

### Review 1

Summary and Contributions: EDIT: Thank you for the rebuttal. I appreciate you pointing out the "sensitive attribute included" portion of the experiments; it's good that you included that, and would be good to have an accompanying figure in the appendix (or main body, space permitting). My main sticking point is that, although the paper claims to learn representations which can transfer to new tasks, there are no theoretical results which address the model performance on new tasks. This seems like a large hole, and without those results the paper is definitely still marginal in my opinion; I worry that readers will not get much out of it. However, it is well put together and the contributions provided are real - so averaging that all out, my score remains at 6. -------------------------------------------------- - This paper gives results on fair representation learning, considering a multi-task learning setup, including a generalization bound to new input tasks on the fairness of representations in a binary sensitive attribute context.

Strengths: -- This is a useful idea: given some new task, how fair will be expect our learned representation function to be? Very relevant to the fair representation learning NeurIPS community - Theoretical results seems sound and answer a reasonable question about how fair representations might be given new distributions on the inputs and sensitive attributes - Experiments demonstrate that in the proposed setting, the method seems to be useful and better than some baselines

Weaknesses: -- It’s not clear to me that the paper’s theoretical results on transfer learning are that interesting, since they do not discuss shifts in P(Y | X, S), which will not affect fairness but will affect accuracy. Rather, I would characterize the results as on generalization of fair representations under covariate shift of (X, S) - Experiments are not particularly broad, and not necessarily the best comparison to baselines (weak adversary and no sensitive attribute included). But I’m not sure they are that important in this paper anyways

Correctness: -- Yes this paper seems correct.

Clarity: -- Mostly clear

Relation to Prior Work: -- Yes

Reproducibility: No

Additional Feedback: -- Would be nice to see the code - Should be a clarification here: since the fairness definition considered is label independent, when we talk about transferring performance to new tasks we are not concerned with a shift in the label distribution (marginal or conditional). This is a pretty important point which is not mentioned at all - Related work: would be nice to see more differentiation between different approaches in the current literature (for instance [4, 11, 23, 24, 27, 28, 37, 19, 40] are not all identical papers and present different approaches) - L140: clarify that the non-linearity is from dimension r to dimension r. As of now, it looks like h has a scalar output (which I know is not true) - L159: clarify what “Gap” is - L163: “approximately satisfied” – should clarify what this means - L199: Proposition 5 should be an inequality rather than an equality right? - In general should at least give a note on accuracy – would we expect accuracy to degrade at all? - L221: The (M [24]) notation is very confusing, not sure why you’re using this - L238: The scheme for choosing the model is interesting, although I prefer to see the full tradeoff between fairness and accuracy - Experiments: I’m not sure that the comparison to [11] and [24] is very fair, since those methods usually expect the sensitive attribute to be part of the data, and also require the usage of a powerful adversary to be useful - L253: Not sure why the University will be anonymized? - Conclusion: it feels like we didn’t fully explore transfer learning here, since different distributions P(Y | X, S) are extremely important in transfer learning, and shifts in that distribution don’t figure in to the theoretical results at all. - It was disappointing to me there are no results on fairness/accuracy tradeoff in the generalization setting

### Review 2

Summary and Contributions: Post author response: Thanks for the clarifying remarks. I agree that DP is a reasonable starting point for such work. I particularly appreciate the explanation of the potential relevance of the setup, which I'd love to see discussed in a later version of the paper. While the other reviewers have raised some interesting points (in particular about the distinction of shifts in (X, Y) vs P(Y | X, S) which are worthy of a brief discussion as well), I still believe this submission would be of value to the NeurIPS community and will stick to my original score. --------------- This submission develops theory and methodology for learning demographically fair representations in a multitask setting, such that the learned representations transfer to new tasks from a fixed distribution. It establishes bounds on demographic parity on new tasks and demonstrates the effectiveness of their MMD and sinkhorn based algorithms on various real-world datasets.

Strengths: Both the setup and theoretical claims are stated clearly and as far as I can tell correct. The empirical evaluation is thought through well, includes relevant comparisons to existing work and demonstrates the effectiveness of the algorithms (some comments about empirical evaluation in "Additional feedback" section.) I believe the contributions of this submission are novel and in general relevant to the NeurIPS community (see "Weaknesses" section for some potential caveats in terms of the relevance for fairness).

Weaknesses: * In my opinion, the biggest weakness is that it is not entirely clear how the contributions advance our understanding and practices of fair machine learning in applications. First, the submission focuses entirely on demographic parity, which is rarely ever considered as a standalone criterion to ensure fairness in practice. Moreover, as a community we are currently grappling with how, when, and where fairness enforcing algorithms may or may not make sense at all, swiftly moving away from context-blind statistical group fairness measures. The present submission further specializes these criteria for a fictitious multitask setting over the same base space. Against this backdrop, it is hard for me to envision scenarios in which the devised methods could have real-world impact. It would be great, if the authors could share their thoughts on whether and when these techniques may be useful and deployed with good conscious. * My second concern is related to the first one: I did not fully understand how the multitask setting is realized in the empirical experiments on static supervised datasets? What is the distribution $\rho$? How many tasks are there for each dataset and how are they defined? However, on a theoretical and methodological level I still believe that the shown contributions are significant and interesting enough to warrant publication at NeurIPS.

Correctness: I did not check all proofs in detail, but from the well-written proof sketches and glancing over the supplementary material, I believe that all claims in the submission are correct. The methods for the empirical evaluation seem to be valid, in particular the authors are careful about model selection and provide some error quantification across three datasets, comparing to the most relevant baselines from existing work. However, I would have wished for more details explaining the precise setup of the empirical evaluation for each dataset, including a description of $\rho$ and the different tasks used as well as training details (network sizes, optimizers, etc.). (Due to space restrictions, this can be moved to the appendix in my opinion.) Ideally, I would like to see an implementation to ensure reproducibility and a fair comparison to baselines. As it stands, there is not sufficient detail to reproduce the empirical results.

Clarity: The paper is very well written and structured. The only comment I have is that the meta-distribution $\rho$ over tasks is only mentioned in l.152. This should be moved to the first mention of "generalization to new tasks" (e.g. l.104 or even earlier). I was confused about this statement at first since surely some assumption on the similarity or distribution of tasks is required. I especially enjoyed the concise, yet rigorous, introduction of the setup and required assumptions as well as the recap of MMD and Sinkhorn divergence.

Relation to Prior Work: Relevant existing work is acknowledged appropriately to the best of my knowledge. Instead of the generic long list of references in l.65 (and what these works try to achieve in general), I would have liked to see a brief discussion of whether and how the presented methodology deviates from existing work. It seems to me that beyond having introduced multiple tasks, the methods of minimizing a similarity measure such as MMD, KL, Sinkhorn, etc. in a gradient-based fashion is quite similar to existing work?

Reproducibility: No

Additional Feedback: * I liked how the empirical estimators for MMD are described in detail. How is the Sinkhorn divergence in eq. (5) estimated from finite samples? * The restriction to 1-hidden layer networks as well as linear functions g (or general functions of linear projections) starting in l.128 came as a surprise to me, as it seems at least some of the following claims hold more generally? Can the authors comment on whether that was merely a choice for the empirical evaluation and to what extent the theoretical considerations depend on these assumptions? The linear projections seem to be necessary for some parts (e.g., bound outcome fairness from representation fairness), but could I just drop-in an k-layer NN and obtain similar results (theoretically and empirically)? * l.188: What are the major complications in extending Theorem 1 to Sinkhorn divergence? Have the authors tried it? From the abstract and introduction it seems that theoretical guarantees are provided for both MMD and Sinkhorn, this should be made more explicit. * typo: eq. (2) both expectations are over Q * typo: l.159 in termS of... * typo: l.188: in order to extenD Theorem...

### Review 3

Summary and Contributions: In this paper, the authors proposed a novel method to learn a fair shared representation among different tasks in a multi-task learning setting. The final objective function is composed of two terms: the loss term is represented by square or logistic loss while the fairness part is evaluated via Sinkhorn divergence and maximum mean discrepancy (MMD). The experimental results on three datasets could verify its effectiveness.

Strengths: 1. The paper focus on building fair and accurate models across different tasks, which is a very important problem in machine learning community. 2. Applying multi-task learning framework to leverage task similarities for fair representation learning. 3. The experimental results on three public datasets shows the proposed method is superior to its competitors.

Weaknesses: 1. The manuscript is difficult to read and the problem was not clearly stated, which makes it hard to evaluate the novelty and applicability of the method beyond the specific data sets evaluated here. Specifically, what is the main contribution of this study? Is it the first study that learn fair representation in the scenario of multi-task learning? 2. It is good to see that the authors have provided the learning bound for the objective function. However, little comment is given on the provided theorems to connect them to the problem context or the observed results, so the significance of the Theorem 1 is not well justified. 3. The contribution and innovation of the method are insufficient for the NeurIPS conference. Many of the methods involved are existing such as MMD, SNK. 4. There are many formula mistakes throughout the paper. For instance, Eq. (2) should be \|Ex~P \fi(X) − Ex~Q \fi(X)\|^2, the author should double-check them before submission. 5. Also, several symbol definitions are missed. For instance, what does the operator d(.) stand for in Eq.(6)?

Correctness: There are some mistakes in the formulas. The empirical methodology seems to be correct.

Clarity: The paper is difficult to follow and the main motivation was not clearly stated.

Relation to Prior Work: The difference between the proposed method and existing algorithm has not been clearly clarified.

Reproducibility: Yes