{"title": "On the Fairness of Disentangled Representations", "book": "Advances in Neural Information Processing Systems", "page_first": 14611, "page_last": 14624, "abstract": "Recently there has been a significant interest in learning disentangled representations, as they promise increased interpretability, generalization to unseen scenarios and faster learning on downstream tasks. \nIn this paper, we investigate the usefulness of different notions of disentanglement for improving the fairness of downstream prediction tasks based on representations.\nWe consider the setting where the goal is to predict a target variable based on the learned representation of high-dimensional observations (such as images) that depend on both the target variable and an unobserved sensitive variable.\nWe show that in this setting both the optimal and empirical predictions can be unfair, even if the target variable and the sensitive variable are independent.\nAnalyzing the representations of more than 12600 trained state-of-the-art disentangled models, we observe that several disentanglement scores are consistently correlated with increased fairness, suggesting that disentanglement may be a useful property to encourage fairness when sensitive variables are not observed.", "full_text": "On the Fairness of Disentangled Representations\n\nFrancesco Locatello2,5, Gabriele Abbati3, Tom Rainforth4, Stefan Bauer5, Bernhard Sch\u00f6lkopf5, and\n\nOlivier Bachem1\n\n2Dept. of Computer Science, ETH Zurich\n\n3Dept. of Engineering Science, University of Oxford\n\n1Google Research, Brain Team\n\n4Dept. of Statistics, University of Oxford\n\n5Max-Planck Institute for Intelligent Systems\n\nAbstract\n\nRecently there has been a signi\ufb01cant interest in learning disentangled representa-\ntions, as they promise increased interpretability, generalization to unseen scenarios\nand faster learning on downstream tasks. In this paper, we investigate the usefulness\nof different notions of disentanglement for improving the fairness of downstream\nprediction tasks based on representations. We consider the setting where the goal is\nto predict a target variable based on the learned representation of high-dimensional\nobservations (such as images) that depend on both the target variable and an\nunobserved sensitive variable. We show that in this setting both the optimal and em-\npirical predictions can be unfair, even if the target variable and the sensitive variable\nare independent. Analyzing the representations of more than 12 600 trained state-of-\nthe-art disentangled models, we observe that several disentanglement scores are con-\nsistently correlated with increased fairness, suggesting that disentanglement may be\na useful property to encourage fairness when sensitive variables are not observed.\n\n1\n\nIntroduction\n\nIn representation learning, observations are often assumed to be samples from a random variable x\nwhich is generated by a set of unobserved factors of variation z [6, 14, 53, 89]. Informally, the goal of\nrepresentation learning is to \ufb01nd a transformation r(x) of the data which is useful for different down-\nstream classi\ufb01cation tasks [6]. A recent line of work argues that disentangled representations offer\nmany of the desired properties of useful representations. Indeed, isolating each independent factor of\nvariation into the independent components of a representation vector should make it both interpretable\nand simplify downstream prediction tasks [6, 7, 29, 35, 56, 58, 60, 74, 82, 87, 89, 90, 1, 28].\nPrevious work [54, 61] has alluded to a possible connection between the motivations of disentangle-\nment and fair machine learning. Given the societal relevance of machine-learning driven decision\nprocesses, fairness has become a highly active \ufb01eld [4]. Assuming the existence of a complex causal\ngraph with partially observed and potentially confounded observations [48], sensitive protected\nattributes (e.g. gender, race, etc) can leak undesired information into a classi\ufb01cation task in different\nways. For example, the inherent assumptions of the algorithm might cause discrimination towards\nprotected groups, the data collection process might be biased or the causal graph itself might allow\nfor unfairness because society is unfair [5, 11, 68, 73, 75, 83]. The goal of fair machine learning al-\ngorithms is to predict a target variable y through a classi\ufb01er \u02c6y without being biased by some sensitive\nfactors s. The negative impact of s in terms of discrimination within the classi\ufb01cation task can be\nquanti\ufb01ed using a variety of fairness notions, such as demographic parity [10, 97], individual fairness\n[21], equalized odds or equal opportunity [34, 94], and concepts based on causal reasoning [48, 55].\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fACCURATE\n\nUnknown mixing\n\nObservation x\n\nRepresentation\n\nr(x)\n\nPrediction \u02c6y\n\nTarget variable y\n\nUnobserved\n\nFactors\nSensitive\nVariable s\n\nWHILE BEING FAIR\n\nFigure 1: Causal graph and problem setting. We assume the observations x are manifestations of\nindependent factors of variation. We aim at predicting the value of some factors of variation y without\nbeing in\ufb02uenced by the unobserved sensitive variable s. Even though target and sensitive variable are\nin principle independent, they are entangled in the observations by an unknown mixing mechanism.\nOur goal for fair representation learning is to learn a good representation r(x) so that any downstream\nclassi\ufb01er will be both accurate and fair. Note that the representation is learned without supervision\nand when training the classi\ufb01er we do not observe and do not know which variables are sensitive.\n\nIn this paper, we investigate the downstream usefulness of disentangled representations through the\nlens of fairness. For this, we consider the standard setup of disentangled representation learning, in\nwhich observations are the result of an (unknown) mixing mechanism of independent ground-truth\nfactors of variation as depicted in Figure 1. To evaluate the learned representations r(x) of these obser-\nvations, we assume that the set of ground-truth factors of variation include both a target factor y, which\nwe would like to predict from the learned representation, and an underlying sensitive factor s, which\nwe want to be fair to in the sense of demographic parity [10, 97], i.e. such that p(\u02c6y = y|s = s1) =\np(\u02c6y = y|s = s2) 8y, s1, s2. The key difference to prior work is that in this setting one never observes\nthe sensitive variable s nor the other factors of variation except the target variable, which is itself only\nobserved when learning the model for the downstream task. This setup is relevant when sensitive\nvariables may not be recorded due to privacy reasons. Examples include learning general-purpose\nembeddings from a large number of images or building a world model based on video input of a robot.\nOur key contributions can be summarized as follows:\n\u2022 We motivate the setup of Figure 1 and discuss how general-purpose representations can lead to\nunfair predictions. In particular, we show theoretically that predictions can be unfair even if we\nuse the Bayes optimal classi\ufb01er and if the target variable and the sensitive variable are independent.\nFurthermore, we motivate why disentanglement in the representation may encourage fairness of\nthe downstream prediction models.\n\n\u2022 We evaluate the demographic parity of more than 90 000 downstream prediction models trained on\nmore than 10 000 state-of-the-art disentangled representations on seven different data sets. Our\nresults indicate that there are considerable dissimilarities between different representations in terms\nof fairness, indicating that the representation used matters.\n\n\u2022 We relate the fairness of the representations to six different disentanglement scores of the same\nrepresentations and \ufb01nd that disentanglement, in particular when measured using the DCI Disen-\ntanglement score [22], appears to be consistently correlated with increased fairness.\n\n\u2022 We further investigate the relationship between fairness, the performance of the downstream\nmodels and the disentanglement scores. The fairness of the prediction also appears to be correlated\nto the accuracy of the downstream predictions, which is not surprising given that downstream\naccuracy is correlated with disentanglement.\n\nRoadmap:\nIn Section 2, we brie\ufb02y review the state-of-the-art approaches to extract and evaluate\ndisentangled representations. In Section 3, we highlight the role of the unknown mixing mechanism\n\n2\n\n\fon the fairness of the classi\ufb01cation. In Section 4, we describe our experimental setup and empirical\n\ufb01ndings. In Section 5 we brie\ufb02y review the literature on disentanglement and fair representation\nlearning. In Section 6, we discuss our \ufb01ndings and their implications.\n\n2 Background on learning disentangled representations\n\nConsider the setup shown in Figure 1 where the observations x are caused by k independent sources\nz1, . . . , zk. The generative model takes the form of [71]:\n\np(x, z) = p(x | z)Yi\n\np(zi).\n\nInformally, disentanglement learning treats the generative mechanisms as latent variables and aims\nat \ufb01nding a representation r(x) with independent components where a change in a dimension of z\ncorresponds to a change in a dimension of r(x) [6]. This intuitive de\ufb01nition can be formalized in a\ntopological sense [35] and in the causality setting [87]. A large number of disentanglement scores\nmeasuring different aspects of disentangled representations have been proposed in recent years.\nDisentanglement scores. The BetaVAE score [36] measures disentanglement by training a linear\nclassi\ufb01er to predict the index of a \ufb01xed factor of variation from the representation. The FactorVAE\nscore [49] corrects a failure case of the BetaVAE score using a majority vote classi\ufb01er on the relative\nvariance of each dimension of r(x) after intervening on z. The Mutual Information Gap (MIG) [13]\ncomputes for each factor of variation the normalized gap on the top two entries in the matrix of\npairwise mutual information between z and r(x). The Modularity [79] measures if each dimension\nof r(x) depends on at most one factor of variation using the matrix of pairwise mutual information\nbetween factors and representation dimensions. The Disentanglement metric of [22] (which we call\nDCI Disentanglement following [61]) is based on the entropy of the probability that a dimension of\nr(x) is useful for predicting z. This probability can be estimated from the feature importance of a\nrandom forest classi\ufb01er. Finally, the SAP score [54] computes the average gap in the classi\ufb01cation\nerror of the two most predictive latent dimensions for each factor.\nUnsupervised methods. State-of-the-art approaches for unsupervised disentanglement learning are\nbased on representations learned by VAEs [51]. For the representation to be disentangled, the loss\nis enriched with a regularizer that encourages structure in the aggregate encoder distribution [2, 14,\n13, 24, 36, 49, 65]. In causality, it is often argued that the true generative model is the simplest\nfactorization of the distribution of the variables in the causal graph [74]. Under this hypothesis,\n-VAE [36] and AnnealedVAE [9] limit the capacity of the VAE bottleneck so that it will be forced\nto learn disentangled representations. The Factor-VAE [49] and -TCVAE [13] enforce that the\naggregate posterior q(z) is factorial by penalizing its total correlation. The DIP-VAE [54] and\napproach of [65] introduce a \u201cdisentanglement prior\u201d for the aggregated posterior. We refer to\nAppendix B of [61] and Section 3 of [89] for a more detailed description of these regularizers.\n\n3 The dangers of general purpose representations for fairness\n\nOur goal in this paper is to understand how disentanglement impacts the fairness of general purpose\nrepresentations. For this reason, we put ourselves in the simple setup of Figure 1 where we assume\nthat the observations x depend on a set of independent ground-truth factors of variation through an\nunknown mixing mechanism. The key goal behind general purpose representations is to learn a vector\nvalued function r(x) that allows us to solve many downstream tasks that depend on the ground-truth\nfactors of variation. From a representation learning perspective, a good representation should thus\nextract most of the information on the factors of variation [6], ideally in a way that enables easy\nlearning from that representation, i.e., with few samples.\nAs one builds machine learning models for different tasks on top of such general purpose represen-\ntations, it is not clear how the properties of the representations relate to the fairness of the predictions.\nIn particular, for different downstream prediction tasks, there may be different sensitive variables that\nwe would like to be fair to. This is modeled in our setting of Figure 1 by allowing one ground-truth\nfactor of variation to be the target variable y and another one to be the sensitive variable s.1\n\n1Please see Section 4.1 for how this is done in the experiments.\n\n3\n\n\fThere are two key differences to prior setups in the fairness literature: First, we assume that one\nonly observes the observations x when learning the representation r(x) and the target variable y\nonly when solving the downstream classi\ufb01cation task. The sensitive variable s and the remaining\nground-truth factors of variation are not observed. We argue that this is an interesting setting because\nfor many large scale data sets labels may be scarce. Furthermore, if we can be fair with respect to\nunobserved but independent ground-truth factors of variation \u2013 for example by using disentangled\nrepresentations, this might even allow us to avoid biases for sensitive factors that we are not aware\nof. The second difference is that we assume that the target variable y and the sensitive variable s are\nindependent. While beyond the scope of this paper, it would be also interesting to study the setting\nwhere ground-truth factors of variations are dependent.\nWhy can representations be unfair in this setting? Despite the fact that the target variable y\nand the sensitive variable s are independent may seem like a overly restrictive assumption, we\nargue that even in this setting fairness is non-trivial to achieve. Since we only observe x or the\nlearned representations r(x), the target variable y and the sensitive variable s may be conditionally\ndependent. If we now train a prediction model based on x or r(x), there is no guarantee that\npredictions will be fair with respect to s.\nThere are additional considerations: \ufb01rst, the following theorem shows that the fairness notion\nof demographic parity may not be satis\ufb01ed even if we \ufb01nd the optimal prediction model (i.e.,\np(\u02c6y|x) = p(y|x)) on entangled representations (for example when the representations are the\nidentity function, i.e. r(x) = x).\nTheorem 1. If x is entangled with s and y, the use of a perfect classi\ufb01er for \u02c6y, i.e., p(\u02c6y|x) = p(y|x),\ndoes not imply demographic parity, i.e., p(\u02c6y = y|s = s1) = p(\u02c6y = y|s = s2),8y, s1, s2.\nThe proof is provided in Appendix A. While this result provides a worst-case example, it should\nbe interpreted with care. In particular, such instances may not allow for good and fair predictions\nregardless of the representations2 and real world data may satisfy additional assumptions not satis\ufb01ed\nby the provided counter example.\nSecond, the unknown mixing mechanism that relates y, s to x may be highly complex and in\npractice the downstream learned prediction model will likely not be equal to the theoretically optimal\nprediction model p(\u02c6y|r(x)). As a result, the downstream prediction model may be unable to properly\ninvert the unknown mixing mechanism and successfully separate y and s, in particular as it may not\nbe incentivized to do so. Finally, implicit biases and speci\ufb01c structures of the downstream model may\ninteract and lead to different overall predictions for different sensitive groups in s.\n\nWhy might disentanglement help? The key idea why disentanglement may help in this setting is\nthat disentanglement promises to capture information about different generative factors in different\nlatent dimensions. This limits the mutual information between different code dimensions and\nencourages the predictions to depend only on the latent dimensions corresponding to the target\nvariable and not to the one corresponding to the sensitive ground-truth factor of variation. More\nformally, in the context of Theorem 1, consider a disentangled representation where the two factors of\nvariations s and y are separated in independent components (say r(x)y only depends on y and r(x)s\non s). Then, the optimal classi\ufb01er can learn to ignore the part of its input which is independent of\ny since p(\u02c6y|r(x)) = p(y|r(x)) = p(y|r(x)y, r(x)s) = p(y|r(x)y) as y is independent from r(x)s.\nWhile such an optimal classi\ufb01er on the representation r(x) might be fairer than the optimal classi\ufb01er\non the observation x, it may also have a lower prediction accuracy.\n\n4 Do disentangled representations matter?\n\nExperimental conditions We adopt the setup of [61], which offers the most extensive benchmark\ncomparison of disentangled representations to date. Their analysis spans seven datasets: in four of\nthem (dSprites [36], Cars3D [78], SmallNORB [59] and Shapes3D [49]), a deterministic function of\nthe factors of variation is incorporated into the mixing process; they further introduce three additional\nvariants of dSprites, Noisy-dSprites, Color-dSprites, and Scream-dSprites. In the latter datasets, the\nmixing mechanism contains a random component that takes the form of noisy pixels, random colors\nand structured backgrounds from the scream painting. Each of these seven datasets provides access to\n\n2In this case, even properties of representations such as disentanglement may not help.\n\n4\n\n\fFigure 2: (Left) Distribution of unfairness for learned representations. Legend: dSprites = (A),\nColor-dSprites = (B), Noisy-dSprites = (C), Scream-dSprites = (D), SmallNORB = (E), Cars3D = (F),\nShapes3D = (G). (Right) Rank correlation of unfairness and disentanglement scores on the various\ndata sets.\n\nFigure 3: Unfairness of representations versus DCI Disentanglement on the different data sets.\n\nthe generative model for evaluation purposes. Our experimental pipeline works in three stages. First,\nwe take the 12 600 pre-trained models of [61], which cover a large number of hyperparameters and\nrandom seeds for the most prominent approaches: -VAE, AnnealedVAE, Factor-VAE, -TCVAE,\nDIP-VAE-I and II. These methods are trained on the raw data without any supervision. Details on\narchitecture, hyperparameter, implementation of the methods can be found in Appendices B, C, G,\nand H of [61]. In the second stage, we assume to observe a target variable y that we should predict\nfrom the representation while we do not observe the sensitive variable s. For each trained model, we\nconsider each possible pair of factors of variation as target and sensitive variables. For the prediction,\nwe consider the same gradient boosting classi\ufb01er [27] as in [61] which was trained on 10 000 labeled\nexamples (subsequently denoted by GBT10000) and which achieves higher accuracy than the cross-\nvalidated logistic regression. In the third stage, we observe the values of all the factors of variations\nand have access to the whole generative model. With this we compute the disentanglement metrics\nand use the following score to measure the unfairness of the predictions\n\nunfairness(\u02c6y) =\n\nT V (p(\u02c6y), p(\u02c6y | s = s)) 8 y\n\n1\n\n|S|Xs\n\nwhere T V is the total variation. In other words, we compare the average total variation of the\nprediction after intervening on s, thus directly measuring the violation of demographic parity. The\nreported unfairness score for each trained representation is the average unfairness of all downstream\nclassi\ufb01cation tasks we considered for that representation.\n\n4.1 The unfairness of general purpose representations and the relation to dientanglement\n\nIn Figure 2 (left), we show the distribution of unfairness scores for different representations on\ndifferent data sets. We clearly observe that learned representations can be unfair, even in the setting\nwhere the target variable and the sensitive variable are independent. In particular, the total variation\ncan reach as much as 15%  25% on \ufb01ve out of seven data sets. This con\ufb01rms the importance of\ntrying to \ufb01nd general-purpose representations that are less unfair.\nWe also observe in Figure 2 (left) that there is considerable spread in unfairness scores for different\nlearned representations. This indicates that the speci\ufb01c representation used matters and that pre-\ndictions with low unfairness can be achieved. To investigate whether disentanglement is a useful\nproperty to guarantee less unfair representations, we show rank correlation between a wide range\nof disentanglement scores and the unfairness score in Figure 2 (right). We observe that all disen-\ntanglement scores except Modularity appear to be consistently correlated with a lower unfairness\n\n5\n\n\fFigure 4: Unfairness of representations versus downstream accuracy on the different data sets.\n\nFigure 5: Rank correlation between the adjusted disentanglement scores (left) and between original\nscores and the adjusted version (right).\n\nscore for all data sets. While the considered disentanglement metrics (except Modularity) have been\nfound to be correlated (see [61]), we observe signi\ufb01cant differences in between scores: Figure 2\n(right) indicates that DCI Disentanglement is correlated the most followed by the Mutual Information\nGap, the BetaVAE score, the FactorVAE score, the SAP score and \ufb01nally Modularity. The strong\ncorrelation of DCI Disentanglement is con\ufb01rmed by Figure 3 where we plot the Unfairness score\nagainst the DCI Disentanglement score for each model. Again, we observe that the large gap in\nunfairness seem to be related to differences in the representation. We show the corresponding plots\nfor all metrics in Figure 9 in the Appendix.\nThese results provide an encouraging case for disentanglement being helpful in \ufb01nding fairer rep-\nresentations. However, they should be interpreted with care: Even though we have considered a\ndiverse set of methods and disentangled representations, the computed correlation scores depend on\nthe distribution of considered models. If one were to consider an entirely different set of methods,\nhyperparameters and corresponding representations, the observed relationship may differ.\n\n4.2 Adjusting for downstream performance\nPrior work [61] has observed that disentanglement metrics are correlated with how well ground-\ntruth factors of variations can be predicted from the representation using gradient boosted trees. It\nis thus not surprising that the unfairness of a representation is also consistently correlated to the\naverage accuracy of a gradient boosted trees classi\ufb01er using 10 000 samples (see Figure 4). In\nthis section, we investigate whether disentanglement is also correlated with a higher fairness if we\ncompare representations with the same accuracy as measured by GBT10000 scores. Given two\nrepresentations with the same downstream performance, is the more disentangled one also more fair?\nThe key challenge is that for a given representation there may not be other ones with exactly the same\ndownstream performance.\nFor this, we adjust all the disentanglement scores and the unfairness score for the effect of downstream\nperformance. We use a k-nearest neighbors regression from Scikit-learn [72] to predict, for any model,\neach disentanglement score and the unfairness from its \ufb01ve nearest neighbor in terms of GBT10000\n(which we write as N (GBT10000)). This can be seen as a one-dimensional non-parametric estimate\nof the disentanglement score (or fairness score) based on the GBT10000 score. The adjusted metric\nis computed as the residual score after the average score of the neighbors is subtracted, namely\n\nAdj. Metric = Metric \n\nMetrici\n\n1\n\n5 Xi2N (GBT10000)\n\nIntuitively, the adjusted metrics measure how much more disentangled (fairer) a given representation\nis compared to an average representation with the same downstream performance.\n\n6\n\n\fFigure 6: Latent traversals (each column corresponds to a different latent variable being varied) on\nShapes3D for the model with best adjusted MIG.\n\nIn Figure 5 (left), we observe that the rank correlation between the adjusted disentanglement scores\n(except Modularity) on Color-dSprites is consistenly positive. This indicates that the adjusted scores\ndo measure a similar property of the representation even when adjusted for performance. This\nresult is consistent across data sets (Figure 10 of the Appendix). The only exception appears to be\nSmallNORB, where the adjusted DCI Disentanglement, MIG and SAP score correlate with each\nother but do not correlate well with the BetaVAE and FactorVAE score (which only correlate with\neach other). On Shapes3D we observe a similar result, but the correlation between the two groups of\nscores is stronger than on SmallNORB. Similarly, Figure 5 (right) shows the rank correlation between\nthe disentanglement metrics and their adjusted versions. As expected, we observe that there still is a\nsigni\ufb01cant positive correlation. This indicates the adjusted scores still capture a signi\ufb01cant part of the\nunadjusted score. We observe in Figure 11 of the Appendix that this result appears to be consistent\nacross the different data sets, again with the exception of SmallNORB. As a sanity check, we \ufb01nally\ncon\ufb01rm by visual inspection that the adjusted metrics still measure disentanglement. In Figure 6, we\nplot latent traversals for the model with highest adjusted MIG score on Shapes3D and observe that\nthe model appears well disentangled.\n\nFinally, Figure 7 shows the rank correlation between the\nadjusted disentanglement scores and the adjusted fair-\nness score for each of the data sets. Overall, we observe\nthat higher disentanglement still seems to be correlated\nwith an increased fairness, even when accounting for\ndownstream performance. Exceptions appear to be the\nadjusted Modularity score, the adjusted BetaVAE and\nthe FactorVAE score on Shapes3D, and the adjusted\nMIG, DCI Disentanglement, Modularity and SAP on\nSmallNORB. As expected, the correlations appear to\nbe weaker than for the unadjusted scores (see Figure 2\n(right)) but we still observe some residual correlation.\n\nFigure 7: Rank correlation of unfairness\nand disentanglement scores on the various\ndata sets (left). Rank correlation of ad-\njusted unfairness and adjusted disentangle-\nment scores on the various data sets (right).\n\nHow do we identify fair models?\nIn this section, we\nobserved that disentangled representations allow to train\nfairer classi\ufb01ers, regardless of their accuracy. This leaves\nus with the question of how can we \ufb01nd fair representa-\ntions? [61] showed that without access to supervision or\ninductive biases, disentangled representations cannot be identi\ufb01ed. However, existing methods heavily\nrely on inductive biases such as architecture, hyperparameter choices, mean-\ufb01eld assumptions, and\nsmoothness induced through randomness [65, 80, 86]. In practice, training a large number of models\nwith different losses and hyperparameters will result in a large number of different representations,\nsome of which might be more disentangled than others as can be seen for example in Figure 3. From\nTheorem 1, we know that optimizing for accuracy on a \ufb01xed representation does not guarantee to\nlearn a fair classi\ufb01er as the demographic parity theoretically depends on the representation when the\nsensitive variable is not observed.\nWhen we \ufb01x a classi\ufb01cation algorithm, in our case GBT10000, and we train it over a variety of\nrepresentations with different degrees of disentanglement we obtain both different degrees of fairness\nand downstream performance. If the disentanglement of the representation is the only confounder\nbetween the performance of the classi\ufb01er and its fairness as depicted in Figure 8, the classi\ufb01cation\naccuracy may be used as a proxy for fairness. To test whether this holds in practice, we perform the\n\n7\n\n\ffollowing experiment. We sample a data set, a seed for the unsupervised disentanglement models and\namong the factors of variations we sample one to be y and one to be s. Then, we train a classi\ufb01er\npredicting y from r(x) using all the models trained on that data set on the speci\ufb01c seed. We compare\nthe unfairness of the classi\ufb01er achieving highest prediction accuracy on y with a randomly chosen\nclassi\ufb01er from the ones we trained. We observe that the classi\ufb01er selected using test accuracy is\nalso fairer 84.2% of the times. We remark that this result explicitly make use of a large amount of\nrepresentations of different quality on which we train the same classi\ufb01cation algorithm. Under the\nassumption of Figure 8, the disentanglement of the representation is the only difference explaining\ndifferent predictions, the best performing classi\ufb01er is also more fair than one trained on a different\nrepresentation. Since disentanglement is likely not the only confounder, model selection based on\ndownstream performance is not guaranteed to always be fairer than random model selection.\n\n5 Related Work\n\nDownstream\nPerformance\n\nFairness\n\nDisentanglement\n\nFigure 8: If disentanglement is a causal\nparent of downstream performance and\nfairness and there are no hidden con-\nfounders, then the former can be used as a\nproxy for the latter.\n\nIdeas related to disentangling the factors of variations\nhave a long tradition in machine learning, dating back to\nthe non-linear ICA literature [17, 3, 46, 42, 43, 44, 32].\nDisentangling pose from content and content from\nmotion are also classical computer vision problems that\nhave been tackled with various degrees of supervision\nand inductive bias [92, 93, 40, 25, 18, 31, 42].\nIn\nthis paper, we intend disentanglement in the sense\nof [6, 87, 35, 61]. [61] recently proved that without ac-\ncess to supervision or inductive biases, disentanglement\nlearning is impossible as disentangled models cannot be\nidenti\ufb01ed. In this paper, we evaluate the representation\nusing the supervised downstream task where both target\nand sensitive variables are observed. Semi-supervised\nvariants have been extensively studied during the years. [77, 15, 66, 70, 50, 52, 1] assume partially\nobserved factors of variation that should be disentangled from the other unobserved ones. Weaker\nforms of supervision like relational information or additional assumptions on the effect of the\nfactors of variation were also studied [39, 16, 47, 31, 91, 26, 19, 41, 93, 62, 53, 81, 8] and applied\nin the sequential data and reinforcement learning settings [88, 85, 57, 69, 37, 38]. Overall, the\ndisentanglement literature is interested in isolating the effect of every factor of variation regardless\nof how the representation should be used downstream.\nOn the fairness perspective, representation learning has been used as a mean to separate the detrimental\neffects that labeled sensitive factors could have on the classi\ufb01cation task [67, 34]. We remark that this\nsetup is different from what we consider in this paper, as we do not assume access to any labeled infor-\nmation when learning a representation. In particular, we do not assume to know what the downstream\ntask will be and what are the sensible variables (if any). [21, 95] introduce the idea that a fair repre-\nsentation should preserve all information about the individual\u2019s attributes except for the membership\nto protected groups. In practice, [63] extends the VAE objective with a Maximum Mean Discrepancy\n[33] to ensure independence between the latent representation and the sensitive factors. [12] intro-\nduces the idea of data pre-processing as a tool to control for downstream discrimination. The authors\nof [84] instead propose an information-theoretic approach in which the mutual information between\nthe data and the representation is maximized, while the one between the sensitive attributes and the rep-\nresentation is minimized. Furthermore, there are several approaches that employ adversarial [30] train-\ning to avoid information leakage between the sensitive attributes and the representation [23, 64, 96].\nFinally, representation learning has recently proved to be useful in counterfactual fairness [55, 45].\n\n6 Conclusion\nIn this paper, we observe the \ufb01rst empirical evidence that disentanglement might prove bene\ufb01cial\nto learn fair representations, providing evidence supporting the conjectures of [61, 54]. We show that\ngeneral purpose representations can lead to substantial unfairness, even in the setting where both the\nsensitive variable and target variable are independent and one only has access to observations that\ndepend on both of them. Yet, the choice of representation appears to be crucial as we \ufb01nd that that\nincreased disentanglement of a representation is consistently correlated with increased fairness on\n\n8\n\n\fdownstream prediction tasks across a wide range of representations and data sets. Furthermore, we dis-\ncuss the relationship between fairness, downstream accuracy and disentanglement and \ufb01nd evidence\nthat the correlation between disentanglement metrics and the unfairness of the downstream prediction\ntasks appears to also hold if one accounts for the downstream accuracy. We believe that these results\nserve as a motivation for further investigation on the practical bene\ufb01ts of disentangled representations,\nespecially in the context of fairness. Finally, we argue that fairness should be among the desired\nproperties of general purpose representation learning beyond VAEs [20, 76]. As we highlighted in this\npaper, it appears possible to learn representations that are both useful, interpretable and fairer. Progress\non this problem could allow machine-learning driven decision making to be both better and fairer.\n\nAcknowledgements\n\nThe authors thank Sylvain Gelly and Niki Kilbertus for helpful discussions and comments. Francesco\nLocatello is supported by the Max Planck ETH Center for Learning Systems, by an ETH core\ngrant (to Gunnar R\u00e4tsch), and by a Google Ph.D. Fellowship. This work was partially done while\nFrancesco Locatello was at Google Research Zurich. Gabriele Abbati acknowledges funding from\nGoogle Deepmind and the University of Oxford. Tom Rainforth is supported in part by the European\nResearch Council under the European Union\u2019s Seventh Framework Programme (FP7/2007\u20132013) /\nERC grant agreement no. 617071 and in part by EPSRC funding under grant EP/P026753/1.\n\nReferences\n[1] Tameem Adel, Zoubin Ghahramani, and Adrian Weller. Discovering interpretable represen-\ntations for both deep generative and discriminative models. In International Conference on\nMachine Learning, pages 50\u201359, 2018.\n\n[2] Alexander A Alemi, Ian Fischer, Joshua V Dillon, and Kevin Murphy. Deep variational\n\ninformation bottleneck. arXiv preprint arXiv:1612.00410, 2016.\n\n[3] Francis Bach and Michael Jordan. Kernel independent component analysis. Journal of Machine\n\nLearning Research, 3(7):1\u201348, 2002.\n\n[4] Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness and machine learning.\n\nhttps://fairmlbook.org/, 2019.\n\n[5] Solon Barocas and Andrew D Selbst. Big data\u2019s disparate impact. Calif. L. Rev., 104:671, 2016.\n[6] Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and\nnew perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798\u2013\n1828, 2013.\n\n[7] Yoshua Bengio, Yann LeCun, et al. Scaling learning algorithms towards AI. Large-scale Kernel\n\nMachines, 34(5):1\u201341, 2007.\n\n[8] Diane Bouchacourt, Ryota Tomioka, and Sebastian Nowozin. Multi-level variational autoen-\ncoder: Learning disentangled representations from grouped observations. In AAAI Conference\non Arti\ufb01cial Intelligence, 2018.\n\n[9] Christopher P Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Des-\njardins, and Alexander Lerchner. Understanding disentangling in beta-VAE. arXiv preprint\narXiv:1804.03599, 2018.\n\n[10] Toon Calders, Faisal Kamiran, and Mykola Pechenizkiy. Building classi\ufb01ers with independency\nconstraints. In 2009 IEEE International Conference on Data Mining Workshops, pages 13\u201318.\nIEEE, 2009.\n\n[11] Toon Calders and Indr\u02d9e \u017dliobait\u02d9e. Why unbiased computational processes can lead to discrimi-\nnative decision procedures. In Discrimination and privacy in the information society, pages\n43\u201357. Springer, 2013.\n\n[12] Flavio Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, and\nKush R Varshney. Optimized pre-processing for discrimination prevention. In Advances in\nNeural Information Processing Systems, pages 3992\u20134001, 2017.\n\n[13] Tian Qi Chen, Xuechen Li, Roger Grosse, and David Duvenaud. Isolating sources of disentan-\nglement in variational autoencoders. In Advances in Neural Information Processing Systems,\n2018.\n\n9\n\n\f[14] Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. Infogan:\nInterpretable representation learning by information maximizing generative adversarial nets. In\nAdvances in Neural Information Processing Systems, 2016.\n\n[15] Brian Cheung, Jesse A Livezey, Arjun K Bansal, and Bruno A Olshausen. Discovering hidden\n\nfactors of variation in deep networks. arXiv preprint arXiv:1412.6583, 2014.\n\n[16] Taco Cohen and Max Welling. Learning the irreducible representations of commutative lie\n\ngroups. In International Conference on Machine Learning, 2014.\n\n[17] Pierre Comon. Independent component analysis, a new concept? Signal Processing, 36(3):287\u2013\n\n314, 1994.\n\n[18] Zhiwei Deng, Rajitha Navarathna, Peter Carr, Stephan Mandt, Yisong Yue, Iain Matthews, and\nGreg Mori. Factorized variational autoencoders for modeling audience reactions to movies. In\nIEEE Conference on Computer Vision and Pattern Recognition, 2017.\n\n[19] Emily L Denton and Vighnesh Birodkar. Unsupervised learning of disentangled representations\n\nfrom video. In Advances in Neural Information Processing Systems, 2017.\n\n[20] Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin\nArjovsky, and Aaron Courville. Adversarially learned inference. In International Conference\non Learning Representations, 2016.\n\n[21] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness\nthrough awareness. In Proceedings of the 3rd innovations in theoretical computer science\nconference, pages 214\u2013226. ACM, 2012.\n\n[22] Cian Eastwood and Christopher KI Williams. A framework for the quantitative evaluation of\ndisentangled representations. In International Conference on Learning Representations, 2018.\n[23] Harrison Edwards and Amos Storkey. Censoring representations with an adversary. arXiv\n\npreprint arXiv:1511.05897, 2015.\n\n[24] Babak Esmaeili, Hao Wu, Sarthak Jain, Alican Bozkurt, N Siddharth, Brooks Paige, Dana H\nBrooks, Jennifer Dy, and Jan-Willem Meent. Structured disentangled representations. In The\n22nd International Conference on Arti\ufb01cial Intelligence and Statistics, pages 2525\u20132534, 2019.\n[25] Vincent Fortuin, Matthias H\u00fcser, Francesco Locatello, Heiko Strathmann, and Gunnar R\u00e4tsch.\nDeep self-organization: Interpretable discrete representation learning on time series. In Interna-\ntional Conference on Learning Representations, 2019.\n\n[26] Marco Fraccaro, Simon Kamronn, Ulrich Paquet, and Ole Winther. A disentangled recognition\nand nonlinear dynamics model for unsupervised learning. In Advances in Neural Information\nProcessing Systems, 2017.\n\n[27] Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of\n\nstatistics, pages 1189\u20131232, 2001.\n\n[28] Muhammad Waleed Gondal, Manuel W\u00fcthrich, Djordje Miladinovi\u00b4c, Francesco Locatello,\nMartin Breidt, Valentin Volchkov, Joel Akpo, Olivier Bachem, Bernhard Sch\u00f6lkopf, and Stefan\nBauer. On the transfer of inductive bias from simulation to the real world: a new disentanglement\ndataset. In Advances in Neural Information Processing Systems, 2019.\n\n[29] Ian Goodfellow, Honglak Lee, Quoc V Le, Andrew Saxe, and Andrew Y Ng. Measuring\ninvariances in deep networks. In Advances in Neural Information Processing Systems, 2009.\n[30] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil\nOzair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural\ninformation processing systems, pages 2672\u20132680, 2014.\n\n[31] Ross Goroshin, Michael F Mathieu, and Yann LeCun. Learning to linearize under uncertainty.\n\nIn Advances in Neural Information Processing Systems, 2015.\n\n[32] Luigi Gresele, Paul K. Rubenstein, Arash Mehrjou, Francesco Locatello, and Bernhard\nSch\u00f6lkopf. The incomplete rosetta stone problem: Identi\ufb01ability results for multi-view nonlinear\nica. In Conference on Uncertainty in Arti\ufb01cial Intelligence (UAI), 2019.\n\n[33] Arthur Gretton, Karsten Borgwardt, Malte Rasch, Bernhard Sch\u00f6lkopf, and Alex J Smola. A\nkernel method for the two-sample-problem. In Advances in neural information processing\nsystems, pages 513\u2013520, 2007.\n\n10\n\n\f[34] Moritz Hardt, Eric Price, Nati Srebro, et al. Equality of opportunity in supervised learning. In\n\nAdvances in neural information processing systems, pages 3315\u20133323, 2016.\n\n[35] Irina Higgins, David Amos, David Pfau, Sebastien Racaniere, Loic Matthey, Danilo Rezende,\nand Alexander Lerchner. Towards a de\ufb01nition of disentangled representations. arXiv preprint\narXiv:1812.02230, 2018.\n\n[36] Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick,\nShakir Mohamed, and Alexander Lerchner. beta-VAE: Learning basic visual concepts with a\nconstrained variational framework. In International Conference on Learning Representations,\n2017.\n\n[37] Irina Higgins, Arka Pal, Andrei Rusu, Loic Matthey, Christopher Burgess, Alexander Pritzel,\nMatthew Botvinick, Charles Blundell, and Alexander Lerchner. Darla: Improving zero-shot\ntransfer in reinforcement learning. In International Conference on Machine Learning, 2017.\n\n[38] Irina Higgins, Nicolas Sonnerat, Loic Matthey, Arka Pal, Christopher P Burgess, Matko\nBo\u0161njak, Murray Shanahan, Matthew Botvinick, Demis Hassabis, and Alexander Lerchner.\nScan: Learning hierarchical compositional visual concepts. In International Conference on\nLearning Representations, 2018.\n\n[39] Geoffrey E Hinton, Alex Krizhevsky, and Sida D Wang. Transforming auto-encoders. In\n\nInternational Conference on Arti\ufb01cial Neural Networks, 2011.\n\n[40] Jun-Ting Hsieh, Bingbin Liu, De-An Huang, Li F Fei-Fei, and Juan Carlos Niebles. Learning\nto decompose and disentangle representations for video prediction. In Advances in Neural\nInformation Processing Systems, 2018.\n\n[41] Wei-Ning Hsu, Yu Zhang, and James Glass. Unsupervised learning of disentangled and inter-\npretable representations from sequential data. In Advances in Neural Information Processing\nSystems, 2017.\n\n[42] Aapo Hyvarinen and Hiroshi Morioka. Unsupervised feature extraction by time-contrastive\n\nlearning and nonlinear ica. In Advances in Neural Information Processing Systems, 2016.\n\n[43] Aapo Hyv\u00e4rinen and Petteri Pajunen. Nonlinear independent component analysis: Existence\n\nand uniqueness results. Neural Networks, 1999.\n\n[44] Aapo Hyvarinen, Hiroaki Sasaki, and Richard E Turner. Nonlinear ica using auxiliary variables\nand generalized contrastive learning. In International Conference on Arti\ufb01cial Intelligence and\nStatistics, 2019.\n\n[45] Fredrik Johansson, Uri Shalit, and David Sontag. Learning representations for counterfactual\n\ninference. In International conference on machine learning, pages 3020\u20133029, 2016.\n\n[46] Christian Jutten and Juha Karhunen. Advances in nonlinear blind source separation.\n\nIn\nInternational Symposium on Independent Component Analysis and Blind Signal Separation,\npages 245\u2013256, 2003.\n\n[47] Theofanis Karaletsos, Serge Belongie, and Gunnar R\u00e4tsch. Bayesian representation learning\n\nwith oracle constraints. arXiv preprint arXiv:1506.05011, 2015.\n\n[48] N. Kilbertus, M. Rojas Carulla, G. Parascandolo, M. Hardt, D. Janzing, and B. Sch\u00f6lkopf. Avoid-\ning discrimination through causal reasoning. In Advances in Neural Information Processing\nSystems 30, pages 656\u2013666, 2017.\n\n[49] Hyunjik Kim and Andriy Mnih. Disentangling by factorising. In International Conference on\n\nMachine Learning, 2018.\n\n[50] Diederik P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. Semi-\nsupervised learning with deep generative models. In Advances in Neural Information Processing\nSystems, 2014.\n\n[51] Diederik P Kingma and Max Welling. Auto-encoding variational Bayes. In International\n\nConference on Learning Representations, 2014.\n\n[52] Jack Klys, Jake Snell, and Richard Zemel. Learning latent subspaces in variational autoencoders.\n\nIn Advances in Neural Information Processing Systems, 2018.\n\n[53] Tejas D Kulkarni, William F Whitney, Pushmeet Kohli, and Josh Tenenbaum. Deep convo-\nlutional inverse graphics network. In Advances in Neural Information Processing Systems,\n2015.\n\n11\n\n\f[54] Abhishek Kumar, Prasanna Sattigeri, and Avinash Balakrishnan. Variational inference of\ndisentangled latent concepts from unlabeled observations. In International Conference on\nLearning Representations, 2018.\n\n[55] Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In\n\nAdvances in Neural Information Processing Systems, pages 4066\u20134076, 2017.\n\n[56] Brenden M Lake, Tomer D Ullman, Joshua B Tenenbaum, and Samuel J Gershman. Building\n\nmachines that learn and think like people. Behavioral and Brain Sciences, 40, 2017.\n\n[57] Adrien Laversanne-Finot, Alexandre Pere, and Pierre-Yves Oudeyer. Curiosity driven explo-\n\nration of learned disentangled goal spaces. In Conference on Robot Learning, 2018.\n\n[58] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436,\n\n2015.\n\n[59] Yann LeCun, Fu Jie Huang, and Leon Bottou. Learning methods for generic object recognition\nwith invariance to pose and lighting. In IEEE Conference on Computer Vision and Pattern\nRecognition, 2004.\n\n[60] Karel Lenc and Andrea Vedaldi. Understanding image representations by measuring their\nequivariance and equivalence. In IEEE Conference on Computer Vision and Pattern Recognition,\n2015.\n\n[61] Francesco Locatello, Stefan Bauer, Mario Lucic, Sylvain Gelly, Bernhard Sch\u00f6lkopf, and\nOlivier Bachem. Challenging common assumptions in the unsupervised learning of disentangled\nrepresentations. In (To appear) International Conference on Machine Learning, 2019.\n\n[62] Francesco Locatello, Damien Vincent, Ilya Tolstikhin, Gunnar R\u00e4tsch, Sylvain Gelly, and\nBernhard Sch\u00f6lkopf. Competitive training of mixtures of independent deep generative models.\nIn Workshop at the 6th International Conference on Learning Representations (ICLR), 2018.\n[63] Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. The variational\n\nfair autoencoder. arXiv preprint arXiv:1511.00830, 2015.\n\n[64] David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. Learning adversarially fair\n\nand transferable representations. arXiv preprint arXiv:1802.06309, 2018.\n\n[65] Emile Mathieu, Tom Rainforth, N. Siddharth, and Yee Whye Teh. Disentangling disentangle-\n\nment in variational auto-encoders. arXiv preprint arXiv:1812.02833, 2018.\n\n[66] Michael F Mathieu, Junbo J Zhao, Aditya Ramesh, Pablo Sprechmann, and Yann LeCun.\nDisentangling factors of variation in deep representation using adversarial training. In Advances\nin Neural Information Processing Systems, 2016.\n\n[67] Daniel McNamara, Cheng Soon Ong, and Robert C Williamson. Provably fair representations.\n\narXiv preprint arXiv:1710.04394, 2017.\n\n[68] C Munoz, M Smith, and DJ Patil. Big data: A report on algorithmic systems, opportunity, and\n\ncivil rights. executive of\ufb01ce of the president, may, 2016.\n\n[69] Ashvin V Nair, Vitchyr Pong, Murtaza Dalal, Shikhar Bahl, Steven Lin, and Sergey Levine. Vi-\nsual reinforcement learning with imagined goals. In Advances in Neural Information Processing\nSystems, 2018.\n\n[70] Siddharth Narayanaswamy, T Brooks Paige, Jan-Willem Van de Meent, Alban Desmaison, Noah\nGoodman, Pushmeet Kohli, Frank Wood, and Philip Torr. Learning disentangled representations\nwith semi-supervised deep generative models. In Advances in Neural Information Processing\nSystems, 2017.\n\n[71] Judea Pearl. Causality. Cambridge University Press, 2009.\n[72] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,\nP. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,\nM. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine\nLearning Research, 12:2825\u20132830, 2011.\n\n[73] Dino Pedreshi, Salvatore Ruggieri, and Franco Turini. Discrimination-aware data mining. In\nProceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and\ndata mining, pages 560\u2013568. ACM, 2008.\n\n12\n\n\f[74] Jonas Peters, Dominik Janzing, and Bernhard Sch\u00f6lkopf. Elements of Causal Inference -\nFoundations and Learning Algorithms. Adaptive Computation and Machine Learning Series.\nMIT Press, 2017.\n\n[75] John Podesta, Penny Pritzker, Ernest J Moniz, John Holdren, and Jeffrey Zients. Big data:\nSeizing opportunities, preserving values. executive of\ufb01ce of the president. washington, dc: The\nwhite house, 2014.\n\n[76] Ben Poole, Sherjil Ozair, Aaron van den Oord, Alexander A Alemi, and George Tucker. On\nvariational bounds of mutual information. In International Conference on Machine Learning,\npages 5171\u20135180, 2019.\n\n[77] Scott Reed, Kihyuk Sohn, Yuting Zhang, and Honglak Lee. Learning to disentangle factors of\nvariation with manifold interaction. In International Conference on Machine Learning, 2014.\n[78] Scott Reed, Yi Zhang, Yuting Zhang, and Honglak Lee. Deep visual analogy-making. In\n\nAdvances in Neural Information Processing Systems, 2015.\n\n[79] Karl Ridgeway and Michael C Mozer. Learning deep disentangled embeddings with the\n\nf-statistic loss. In Advances in Neural Information Processing Systems, 2018.\n\n[80] Michal Rolinek, Dominik Zietlow, and Georg Martius. Variational autoencoders recover\npca directions (by accident). In Proceedings IEEE Conf. on Computer Vision and Pattern\nRecognition, 2019.\n\n[81] Adri\u00e0 Ruiz, Oriol Martinez, Xavier Binefa, and Jakob Verbeek. Learning disentangled repre-\nsentations with reference-based variational autoencoders. arXiv preprint arXiv:1901.08534,\n2019.\n\n[82] J\u00fcrgen Schmidhuber. Learning factorial codes by predictability minimization. Neural Computa-\n\ntion, 4(6):863\u2013879, 1992.\n\n[83] Wim Schreurs, Mireille Hildebrandt, Els Kindt, and Micha\u00ebl Van\ufb02eteren. Cogitas, ergo sum. the\nrole of data protection law and non-discrimination law in group pro\ufb01ling in the private sector.\nIn Pro\ufb01ling the European citizen, pages 241\u2013270. Springer, 2008.\n\n[84] Jiaming Song, Pratyusha Kalluri, Aditya Grover, Shengjia Zhao, and Stefano Ermon. Learning\n\ncontrollable fair representations. arXiv preprint arXiv:1812.04218, 2018.\n\n[85] Xander Steenbrugge, Sam Leroux, Tim Verbelen, and Bart Dhoedt. Improving generalization for\nabstract reasoning tasks using disentangled feature representations. In Workshop on Relational\nRepresentation Learning at NeurIPS, 2018.\n\n[86] Jan St\u00fchmer, Richard Turner, and Sebastian Nowozin. ISA-VAE: Independent subspace analysis\n\nwith variational autoencoders, 2019.\n\n[87] Raphael Suter, Djordje Miladinovi\u00b4c, Stefan Bauer, and Bernhard Sch\u00f6lkopf. Interventional\nrobustness of deep latent variable models. In (To appear) International Conference on Machine\nLearning, 2019.\n\n[88] Valentin Thomas, Emmanuel Bengio, William Fedus, Jules Pondard, Philippe Beaudoin, Hugo\nLarochelle, Joelle Pineau, Doina Precup, and Yoshua Bengio. Disentangling the indepen-\ndently controllable factors of variation by interacting with the world. Learning Disentangled\nRepresentations Workshop at NeurIPS, 2017.\n\n[89] Michael Tschannen, Olivier Bachem, and Mario Lucic. Recent advances in autoencoder-based\n\nrepresentation learning. arXiv preprint arXiv:1812.05069, 2018.\n\n[90] Sjoerd van Steenkiste, Francesco Locatello, J\u00fcrgen Schmidhuber, and Olivier Bachem.\narXiv preprint\n\nAre disentangled representations helpful for abstract visual reasoning?\narXiv:1905.12506, 2019.\n\n[91] William F Whitney, Michael Chang, Tejas Kulkarni, and Joshua B Tenenbaum. Understanding\n\nvisual concepts with continuation learning. arXiv preprint arXiv:1602.06822, 2016.\n\n[92] Jimei Yang, Scott E Reed, Ming-Hsuan Yang, and Honglak Lee. Weakly-supervised disentan-\ngling with recurrent transformations for 3D view synthesis. In Advances in Neural Information\nProcessing Systems, 2015.\n\n[93] Li Yingzhen and Stephan Mandt. Disentangled sequential autoencoder.\n\nConference on Machine Learning, 2018.\n\nIn International\n\n13\n\n\f[94] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi.\nFairness beyond disparate treatment & disparate impact: Learning classi\ufb01cation without dis-\nparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web,\npages 1171\u20131180. International World Wide Web Conferences Steering Committee, 2017.\n\n[95] Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representa-\n\ntions. In International Conference on Machine Learning, pages 325\u2013333, 2013.\n\n[96] Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. Mitigating unwanted biases with\nadversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and\nSociety, pages 335\u2013340. ACM, 2018.\n\n[97] Indre Zliobaite. On the relation between accuracy and fairness in binary classi\ufb01cation. arXiv\n\npreprint arXiv:1505.05723, 2015.\n\n14\n\n\f", "award": [], "sourceid": 8270, "authors": [{"given_name": "Francesco", "family_name": "Locatello", "institution": "ETH Z\u00fcrich - MPI T\u00fcbingen"}, {"given_name": "Gabriele", "family_name": "Abbati", "institution": "University of Oxford"}, {"given_name": "Thomas", "family_name": "Rainforth", "institution": "University of Oxford"}, {"given_name": "Stefan", "family_name": "Bauer", "institution": "MPI for Intelligent Systems"}, {"given_name": "Bernhard", "family_name": "Sch\u00f6lkopf", "institution": "MPI for Intelligent Systems"}, {"given_name": "Olivier", "family_name": "Bachem", "institution": "Google Brain"}]}