{"title": "Equality of Opportunity in Classification: A Causal Approach", "book": "Advances in Neural Information Processing Systems", "page_first": 3671, "page_last": 3681, "abstract": "The Equalized Odds (for short, EO) is one of the most popular measures of discrimination used in the supervised learning setting. It ascertains fairness through the balance of the misclassification rates (false positive and negative) across the protected groups -- e.g., in the context of law enforcement, an African-American defendant who would not commit a future crime will have an equal opportunity of being released, compared to a non-recidivating Caucasian defendant. Despite this noble goal, it has been acknowledged in the literature that statistical tests based on the EO are oblivious to the underlying causal mechanisms that generated the disparity in the first place (Hardt et al. 2016). This leads to a critical disconnect between statistical measures readable from the data and the meaning of discrimination in the legal system, where compelling evidence that the observed disparity is tied to a specific causal process deemed unfair by society is required to characterize discrimination. The goal of this paper is to develop a principled approach to connect the statistical disparities characterized by the EO and the underlying, elusive, and frequently unobserved, causal mechanisms that generated such inequality. We start by introducing a new family of counterfactual measures that allows one to explain the misclassification disparities in terms of the underlying mechanisms in an arbitrary, non-parametric structural causal model. This will, in turn, allow legal and data analysts to interpret currently deployed classifiers through causal lens, linking the statistical disparities found in the data to the corresponding causal processes. Leveraging the new family of counterfactual measures, we develop a learning procedure to construct a classifier that is statistically efficient, interpretable, and compatible with the basic human intuition of fairness. We demonstrate our results through experiments in both real (COMPAS) and synthetic datasets.", "full_text": "Equality of Opportunity in Classi\ufb01cation:\n\nA Causal Approach\n\nJunzhe Zhang\n\nPurdue University, USA\nzhang745@purdue.edu\n\nAbstract\n\nElias Bareinboim\n\nPurdue University, USA\n\neb@purdue.edu\n\nThe Equalized Odds (for short, EO) is one of the most popular measures of dis-\ncrimination used in the supervised learning setting. It ascertains fairness through\nthe balance of the misclassi\ufb01cation rates (false positive and negative) across the\nprotected groups \u2013 e.g., in the context of law enforcement, an African-American\ndefendant who would not commit a future crime will have an equal opportunity of\nbeing released, compared to a non-recidivating Caucasian defendant. Despite this\nnoble goal, it has been acknowledged in the literature that statistical tests based\non the EO are oblivious to the underlying causal mechanisms that generated the\ndisparity in the \ufb01rst place (Hardt et al. 2016). This leads to a critical disconnect\nbetween statistical measures readable from the data and the meaning of discrimina-\ntion in the legal system, where compelling evidence that the observed disparity is\ntied to a speci\ufb01c causal process deemed unfair by society is required to characterize\ndiscrimination. The goal of this paper is to develop a principled approach to con-\nnect the statistical disparities characterized by the EO and the underlying, elusive,\nand frequently unobserved, causal mechanisms that generated such inequality. We\nstart by introducing a new family of counterfactual measures that allows one to\nexplain the misclassi\ufb01cation disparities in terms of the underlying mechanisms\nin an arbitrary, non-parametric structural causal model. This will, in turn, allow\nlegal and data analysts to interpret currently deployed classi\ufb01ers through causal\nlens, linking the statistical disparities found in the data to the corresponding causal\nprocesses. Leveraging the new family of counterfactual measures, we develop a\nlearning procedure to construct a classi\ufb01er that is statistically ef\ufb01cient, interpretable,\nand compatible with the basic human intuition of fairness. We demonstrate our\nresults through experiments in both real (COMPAS) and synthetic datasets.\n\nIntroduction\n\n1\nThe goal of supervised learning is to provide a statistical basis upon which individuals with different\ngroup memberships can be reliably classi\ufb01ed. For instance, a bank may want to learn a function from\na set of background factors so as to determine whether a customer will repay her loan; a university\nmay train a classi\ufb01er to predict the future GPA of an applicant to decide whether to accept her into\nthe program. The growing adoption of automated systems based on standard classi\ufb01cation algorithms\nthroughout society (including in law enforcement, education, and \ufb01nance [13, 4, 8, 21, 1]) has raised\nconcerns about potential issues due to unfairness and discrimination.\nA recent high-pro\ufb01le example is a risk assessment tool called COMPAS,\nwhich has been widely used across the US to inform decisions in the criminal\njustice system. Fig. 1 graphically describes this setting \u2013 X represents the\nFigure 1: COMPAS\nrace (0 for Caucasian, 1 for African-American) of a defendant and Y stands\nfor the recidivism outcome (0 for no, 1 otherwise), which are mediated by the prior convictions W , and\nconfounded by other demographic information Z (e.g., age, gender) of the defendant. The COMPAS\n\nW\n\nZ\n\nX\n\nY\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fX\n\nZ\n\nW\n\nY\n\n\u02c6Y\n\nX\n\nZ\n\nW\n\nY\n\n\u02c6Y\n\nX\n\nZ\n\nW\n\nY\n\n\u02c6Y\n\nX\n\nZ\n\nW\n\nY\n\n\u02c6Y\n\n(a) f (x, z, w)\n\n(b) f1(x) = x\n\n(c) f2(w) = w\n\n(d) f3(z) = z\n\nFigure 2: (a-d) Causal diagrams of classi\ufb01ers f, f1, f2, f3 in COMPAS. Nodes represent variables,\ndirected arrows for functional relationships, and bi-directed arrows for unknown associations.\n\ntool is a classi\ufb01er f (x, z, w) (shown in Fig. 2(a)) providing a prediction \u02c6Y on whether the defendant\nis expected to commit a future crime. An analysis performed by the news organization ProPublica\nrevealed that the odds of receiving a positive prediction ( \u02c6Y = 1) for defendants who did not recidivate\nwere on average higher among African-Americans than their Caucasians counterparts [1]. In words,\nthe error rates of COMPAS disproportionately misclassi\ufb01ed African-American defendants.\nMany attempts have been made to model discrimination in the classi\ufb01cation setting [26, 14, 11, 9, 15].\nA recent, noteworthy framework comes under the rubric of Equalized Odds [7] (also referred to as\nError Rate Balance [5]), which constrains the classi\ufb01cation algorithm such that its disparate error rate\nERx0,x1 (\u02c6y|y) = P (\u02c6y|x1, y)\u2212 P (\u02c6y|x0, y) is equalized (and equal to 0) across different demographics\nx0, x1, i.e., the odds of misclassi\ufb01cation does not disproportionately affect any population sub-group.\nIn the COMPAS example, the condition ERx0,x1( \u02c6Y = 1|Y = 0) = 0 implies that an African-\nAmerican defendant who does not commit a future crime will have an equal opportunity of getting\nreleased, compared to non-recidivating Caucasian defendants. This notion of fairness is natural in\nmany learning settings and, indeed, has been implemented in a number of algorithms [7, 6, 25, 23].\nUnfortunately, the framework of equalized odds is not without its problems. To witness, consider\na binary instance of Fig. 1 where the values of X and Z are determined such that x = z and W\nis decided by the function w \u2190 x. We are concerned with the ER disparity induced by different\nclassi\ufb01ers f1, f2, f3 (Fig. 2(b-d)), where, for instance, \u02c6y \u2190 f1(x) = x (i.e., f1 takes only X as input,\nand ignores the other features). Remarkably, a simple analysis shows that ERx0,x1 ( \u02c6Y = 1|Y = 0) is\nthe same (and equal to 1) in all three classi\ufb01ers, despite their fundamentally different mechanisms\nassociating X and \u02c6Y . Note that f1, f2, f3 corresponds to the direct path X \u2192 \u02c6Y , the indirect path\nX \u2192 W \u2192 \u02c6Y , and the remaining spurious (non-causal) paths (e.g., X \u2194 Z \u2192 \u02c6Y ), respectively.\nThis observation is not entirely new, and is part of a pattern noted by [7] \u2013 statistical tests based on\nthe disparate ER are oblivious to the underlying causal mechanisms that generated the data. This\nrealization has dramatic implications to the applicability of supervised learning in the real world since\nit seems to suggest that commonsense notions of discrimination, for example, the unequalized false\npositive rate caused by direct discrimination (X \u2192 \u02c6Y ), cannot be formally articulated, measured\nfrom data, and, therefore, controlled. More importantly, the legal frameworks of anti-discrimination\nlaws in the US (e.g., Title VII) require that to establish a prima facie case of discrimination, the\nplaintiff must demonstrate \u201ca strong causal connection\u201d between the alleged discriminatory practice\nand the observed statistical disparity, otherwise the case will be dismissed (Texas Dept. of Housing\nand Community Affairs v. Inclusive Communities Project, Inc., 576 U.S. __ (2015)). Without a robust\ncausal basis, an evidence of disparate ER on its own is not suf\ufb01cient to lead to any legal liability.\nMore recently, the use of causal reasoning to help open the black-box of decision-making systems\nhas attracted considerable interest in the community, leading to \ufb01ne-grained explanations of observed\nstatistical biases [11, 10, 25, 9]. One of the main tasks of causal inference is to explain \u201chow\nnature works,\u201d or more technically, to decompose a composite statistical measure (e.g, the total\nvariation TVx0,x1(\u02c6y) = P (\u02c6y|x1) \u2212 P (\u02c6y|x0)), into its most elementary and interpretable components\n[24, 17, 29]. In particular, [28] introduced the causal explanation formula, which allows fairness\nanalysts to decompose TV into detailed counterfactual measures describing the effects along direct,\nindirect, and spurious paths from X to \u02c6Y . While [28] explains how the statistical inequality in the\nobserved outcome is brought about, it is unclear how to apply such insight to correct the problematic\nbehaviors of an alleged, discriminatory policy. Furthermore, the explanation formula allows the\ndecomposition of marginal measures such as TV, but it\u2019s unable to explain disparities represented by\nconditional ones, such as the ER (e.g., non-recidivating African-American defendants).\nThis paper aims to overcome these challenges. We develop a causal framework to link the disparities\nrealized through the ER and the (unobserved) causal mechanisms by which the protected attribute X\n\n2\n\n\faffects change in the prediction \u02c6Y . Speci\ufb01cally, (1) we introduce a family of counterfactual measures\ncapable of describing the ER in terms of the direct, indirect, and spurious paths from X to \u02c6Y on\nan arbitrary structural causal model (Defs. 1-3) and we prove different qualitative and quantitative\nproperties of these measures (Thms. 1-2); (2) we derive adjustment-like formulas to estimate the\ncounterfactual ERs from observational data (Thms. 3-4), which are accompanied with an ef\ufb01cient\nalgorithm (Alg. 1, Thm. 5) to \ufb01nd the corresponding admissible sets; (3) we operationalize the\nproposed counterfactual estimands through a novel procedure to learn a fair classi\ufb01er subject to\nconstraints over the effect along the underlying causal mechanisms (Algs. 2-3, Thm. 6).\n\n2 Preliminaries and Notations\nWe use capital letters to denote variables (X), and small letters for their values (x). We use the\nabbreviation P (x) to represent the probabilities P (X = x). For arbitrary sets A and B, let A\\B\ndenote the set difference {x : x \u2208 A and x (cid:54)\u2208 B}, and let |A| be the dimension of set A.\nThe basic semantical framework of our analysis rests on structural causal models (SCM) [16, Ch. 7].\nA SCM is a tuple (cid:104)M, P (u)(cid:105), where M consists of a set of endogenous (observed) variables V and\nexogenous (unobserved) variables U. The values of each Vi \u2208 V are determined by a structural\nfunction fVi taking as arguments a combination of other endogenous and exogenous variables (i.e.,\nVi \u2190 fVi(P Ai, Ui), P Ai \u2286 V , Ui \u2286 U )). Values of U are drawn from the distribution P (u). Each\nSCM is associated with a directed acyclic graph (DAG) G = (cid:104)V , E(cid:105), termed a causal diagram, where\nnodes V represent endogenous variables and directed edges E stand for functional relations (e.g., see\nFig. 1). By convention, U are not explicitly shown; a bi-directed arrow between Vi and Vj indicates\nthe presence of an unobserved confounder (UC) Uk affecting both Vi, Vj, i.e., Vi \u2190 Uk \u2192 Vj.\nA path is a sequence of edges where each pair of adjacent edges in the sequence share a node. We use\nd-separation and blocking interchangeably, following the convention in [16]. A path from a node X\nto a node \u02c6Y consists exclusively of direct arrows pointing away from X is called causal; all the other\nnon-causal paths are called spurious. The causal paths could be further categorized into the direct\npath X \u2192 \u02c6Y and the indirect paths, e.g., X \u2192 W \u2192 \u02c6Y of Fig. 2(a). Let (X \u2192 \u02c6Y )G, (X i\u2212\u2192 \u02c6Y )G\nand (X s\u2190\u2192 \u02c6Y )G denote, respectively, the direct, indirect and spurious paths between X and \u02c6Y in\na DAG G. A descendant of X is any node which X has a causal path to (including X itself). The\ndescendant set of a set X is all descendants of any node in X, which we denote by De(X)G.\nAn intervention on a set of variables X \u2286 V , denoted by do(x), is an operation where values of\nX are set to constants x, regardless of how they were ordinarily determined (through the functions\nfX). We denote by (cid:104)Mx, P (u)(cid:105) a sub-model of a SCM (cid:104)M, P (u)(cid:105) induced by do(x). The potential\nresponse of \u02c6Y to intervention do(x), denoted by \u02c6Yx(u), is the solution of \u02c6Y with U = u in the\nsub-model Mx; it can be read as the counterfactual sentence \u201cthe value that \u02c6Y would have obtained\nin situation U = u, had X been x.\u201d Statistically, averaging U\u2019s distribution (P (u)) leads to the\ncounterfactual variable \u02c6Yx. For a more detailed discussion on SCMs, please refer to [16, 2].\n\n3 Counterfactual Analysis of Unequalized Classi\ufb01cation Errors\nIn this section, we investigate the unequalized odds of misclassi\ufb01cation observed in COMPAS by\ndevising three simple thought experiments. These experiments could be generalized into a set of\nnovel counterfactual measures, providing a \ufb01ne-grained explanation of how the ER disparity of a\nclassi\ufb01er f ( \u02c6pa) is brought about. Throughout our analysis, we will let X be the protected attribute, \u02c6Y\nbe the prediction and Y be the true outcome; \u02c6PA is a set of (possible) input features of the predictor\n\u02c6Y . We will denote by value x1 the disadvantaged group and x0 the advantaged group. Given the\nspace constraints, all proofs are included in the full technical report [27, Appendix A].\nWe consider \ufb01rst the impact of the direct discrimination (i.e., the direct path X \u2192 \u02c6Y ) on the ER\ndisparity observed in the COMPAS. We will devise a thought experiment concerning with a Caucasian\ndefendant who does not recidivate (i.e., x0, y). Imagine a hypothetical situation where this defendant\nwere a non-recidivating African-American (x1, y), while keeping the prior convictions W and other\ndemographic information Z \ufb01xed at the level that the defendant x0, y currently has. We then measure\nthe prediction \u02c6Y in this imagined world (counterfactually), compared to what the defendant currently\nreceives from COMPAS (factually). If the prediction were different in these two situations, e.g., \u02c6Y\n\n3\n\n\fZ\n\nZ\n\n\u02c6Y\n\n\u02c6Y\n\nW\n\nx0\n\nx0 x1\n\n\u2212\n(a) P (\u02c6yx1,y,Wx0,y,Z|x0, y)\nFigure 3: Graphical representation of the coun-\nterfactual direct ER in COMPAS.\n\nchanges from 0 to 1, we could then say the path X \u2192 \u02c6Y is active, i.e., the direct discrimination\nagainst African-American defendants exists.\nFigs. 3(a-b) represent this thought experiment\ngraphically. Fig. 3(b) shows the conditional\nSCM (cid:104)M, P (u|x0, y)(cid:105) of the non-recidivating\nCaucasian defendant (x0, y): variables X, Z, W\nare correlated by conditioning on the collider Y\n[16, pp. 339]; we omit the true outcome Y for\nsimplicity. Using this model as the baseline (i.e.,\nwhat factually happened in reality), we change in Fig. 3(a) the input of X to the direct path X \u2192 \u02c6Y to\nx1 (edges in G represent functional relations), while keeping the value of X to other variables (W, Z)\n\ufb01xed at the baseline level x0, y. In this reality, variable Zx0,y = Z since Z is a non-descendant\nnode of X and Y [16, pp. 232]; the intervention on Y is omitted since Y does not directly affect the\nprediction \u02c6Y . Since the direct path X \u2192 \u02c6Y is the only difference between models of Figs. 3(a-b), the\nchange in \u02c6Y thus measure the in\ufb02uence of X \u2192 \u02c6Y . Indeed, this hypothetical procedure could be\ngeneralized, applicable to any classi\ufb01er in an arbitrary SCM, which we summarize as follows.\nDe\ufb01nition 1 (Counterfactual Direct Error Rate). Given a SCM (cid:104)M, P (u)(cid:105) and a classi\ufb01er f ( \u02c6pa),\nthe counterfactual direct error rate for a sub-population x, y (with prediction \u02c6y (cid:54)= y) is de\ufb01ned as:\n(1)\n\n|x, y) \u2212 P (\u02c6yx0,y|x, y)\n\n(\u02c6y|x, y) = P (\u02c6yx1,y,( \u02c6PA\\X)x0 ,y\n\n(b) P (\u02c6y|x0, y)\n\nERd\n\nx0,x1\n\nW\n\nZ\n\ncould be further simpli\ufb01ed as \u02c6Yx1,( \u02c6PA\\X)x0,y\n\nIn Eq. 1, \u02c6Yx1,y,( \u02c6PA\\X)x0,y\nsince Y is not an input of\nf ( \u02c6pa). The subscript ( \u02c6PA\\X)x0,y is the solution of the input features (besides X) ( \u02c6PA\\X)(u)\nin the sub-model Mx0,y; values of U are drawn from the distribution P (u) such that X(u) =\nx, Y (u) = y. The query of Eq. 1 could be read as: \u201cFor an individual with the protected attribute\nX = x and the true outcome Y = y, how would the prediction \u02c6Y change had X been x1, while\nkeeping all the other features \u02c6PA\\X at the level that they would attain had X = x0 and Y = y,\ncompared to the prediction \u02c6Y she/he would receive had X been x0 and Y been y?\u201d\nSimilarly, we could devise a thought experiment\nto measure the effect of the indirect discrimina-\ntion, mediated by the prior convictions W , i.e., the\nindirect path X \u2192 W \u2192 \u02c6Y . Consider again the\nnon-recidivating Caucasian defendant x0, y. We\nconceive a scenario where the prior convictions\nW of the defendant x0, y changes to the level that\nit would have achieved had the defendant been a non-recidivating African-American x1, y, while\nkeeping the other features X, Z \ufb01xed at the level that they currently are. Fig. 4(a) describes this\nhypothetical scenario: we change only input value of edge X \u2192 W to x1, while keeping all the\nother paths untouched (at the baseline). We then measure the prediction \u02c6Y in both the counterfactual\n(Fig. 4(a)) and factual (Fig. 4(b)) world and compare their differences. The change in the prediction of\nthese models thus represent the in\ufb02uence of indirect path X \u2192 W \u2192 \u02c6Y . We generalize this thought\nexperiment and provide an estimand of the indirect paths for any SCM and classi\ufb01er f, namely:\nDe\ufb01nition 2 (Counterfactual Indirect Error Rate). Given a SCM (cid:104)M, P (u)(cid:105) and a classi\ufb01er f ( \u02c6pa),\nthe counterfactual indirect error rate for a sub-population x, y (with prediction \u02c6y (cid:54)= y) is de\ufb01ned as:\n(2)\n\n\u2212\n(a) P (\u02c6yx0,y,Wx1,y,Z|x0, y)\nFigure 4: Graphical representations of the coun-\nterfactual indirect ER in COMPAS.\n\n|x, y) \u2212 P (\u02c6yx0,y|x, y).\n\n(\u02c6y|x, y) = P (\u02c6yx0,y,( \u02c6PA\\X)x1,y\n\n(b) P (\u02c6y|x0, y)\n\nx0 x1\n\nERi\n\nx0,x1\n\nx0\n\nW\n\nW\n\n\u02c6Y\n\n\u02c6Y\n\nZ\n\nZ\n\nFinally, we introduce a hypothetical procedure mea-\nsuring the in\ufb02uence of the spurious relations between\nthe protected attribute X and prediction \u02c6Y through\nthe population attributes that are non-descendants\nof both X and \u02c6Y , e.g., the path X \u2194 Z \u2192 \u02c6Y in\nFig. 2(a). We consider a Caucasian x0, y and an\nAfrican-American x1, y defendants who both would\nnot recidivate. We measure the prediction \u02c6Y these defendants would receive had they both been\n\n\u02c6Y\n\u02c6Y\n(a) P (\u02c6yx0,y|x1, y)\n(b) P (\u02c6yx0,y|x0, y)\nFigure 5: Graphical representations of the\ncounterfactual spurious ER in COMPAS.\n\nx1 x0\n\n\u2212\n\nx0\n\nW\n\nW\n\nZ\n\n4\n\n\fnon-recidivating Caucasians (x0, y). Figs. 5 (a-b) describes this experimental setup. Since the causal\nin\ufb02uence of X (on \u02c6Y ) are \ufb01xed at x0 in both models, the difference in \u02c6Y must be due to the population\ncharacteristics that are not affected by X i.e., the spurious X \u2212 \u02c6Y relationships.\nDe\ufb01nition 3 (Counterfactual Spurious Error Rate). Given a SCM (cid:104)M, P (u)(cid:105) and a classi\ufb01er f ( \u02c6pa),\nthe counterfactual spurious error rate for a sub-population x, y (with prediction \u02c6y (cid:54)= y) is de\ufb01ned as:\n(3)\n\n(\u02c6y|y) = P (\u02c6yx0,y|x1, y) \u2212 P (\u02c6yx0,y|x0, y)\n\nERs\n\nx0,x1\n\nDef. 3 generalizes the thought experiment described above to an arbitrary SCM. In the above\nequation, the distribution P (\u02c6yx0,y|x0, y) coincides with P (\u02c6y|x0, y) since variable \u02c6Yx0,y = \u02c6Y given\nthat X = x0, Y = y (the composition axiom [16, Ch. 7.3]). Eq. 3 can be read as the counterfactual\nsentence: \u201cFor two demographics x0, x1 with the same true outcome Y = y, how would the\nprediction \u02c6Y differ had they both been x0, y?\u201d\n3.1 Properties of Counterfactual Error Rates\nTheorem 1. Given a SCM (cid:104)M, P (u)(cid:105) and a classi\ufb01er f ( \u02c6pa), for any x0, x1, x, \u02c6y, y, the counter-\n(\u02c6y|x, y) = 0;\nfactual ERs of Defs. 1-3 obey the following properties : (1) (X (cid:54)\u2192 Y )G|Y \u21d2 ERd\n(2) |(X i\u2212\u2192 Y )G|Y | = 0 \u21d2 ERi\n(\u02c6y|x, y) = 0; (3) |(X s\u2190\u2192 Y )G|Y | = 0 \u21d2 ERs\n(\u02c6y|x, y) = 0,\nwhere G|Y is the causal diagram of a conditional SCM (cid:104)My, P (u|y)(cid:105).\nThe conditional causal diagram G|Y is obtained from the original model G by (1) removing the\nnode Y and (2) adding bi-directed arrows between nodes whose associated exogenous variables are\ncorrelated in P (u|y)1 (e.g., Fig. 3(b)). Thm. 1 says that Defs. 1-3 provide prima facie evidence for\n(\u02c6y|x, y) (cid:54)= 0 implies that the path X \u2192 \u02c6Y is active,\ndiscrimination detection. For instance, ERd\ni.e., the direct discrimination exists. It is expected that the proposed counterfactual measures capture\nthe relative strength of different active pathways connecting node X and \u02c6Y in the underlying SCM.\nWe now derive how the counterfactual ERs are quantitatively related with the unequalized odds of\nmisclassi\ufb01cation induced by an arbitrary classi\ufb01er.\nTheorem 2 (Causal Explanation Formula of Equalized Odds). For any x0, x1, \u02c6y, y, ERx0,x1(\u02c6y|x, y),\n(\u02c6y|y) obey the following non-parametric relationship:\nERd\n\n(\u02c6y|x, y) and ERs\n\n(\u02c6y|x, y), ERi\n\nx0,x1\n\nx0,x1\n\nx0,x1\n\nx0,x1\n\nx0,x1\n\nx0,x1\n\nx0,x1\n\nERx0,x1(\u02c6y|y) = ERd\n\nx0,x1\n\n(\u02c6y|x0, y) \u2212 ERi\n\nx1,x0\n\n(\u02c6y|x0, y) \u2212 ERs\n\nx1,x0\n\n(\u02c6y|y).\n\n(4)\n\nThm. 2 guarantees that the disparate ER with the transition from x0 to x1 is equal to the sum of\nthe counterfactual direct ER with this transition, minus the indirect and spurious ER with reverse\ntransition, from x1 to x0, on the sub-population x0, y. Together with Thm. 1, each decomposing\nterm in Eq. 4 thus estimates the adverse impact of its corresponding discriminatory mechanism\n(\u02c6y1|x0, y) explains how much the\non the total ER disparity. For instance, in COMPAS, ERd\ndirect racial discrimination accounts for the unequalized false positive rate ERx0,x1(\u02c6y1|y0) between\nnon-recidivating African American (x1, y) and Caucasian (x0, y) defendants. Perhaps surprisingly,\nthis result holds non-parametrically, which means that the counterfactual ERs decompose following\nThm. 2 for any functional form of the classi\ufb01er and the underlying causal models where the dataset\nwas generated. Owed to their generality and ubiquity, we refer to this equation as the \u201cCausal\nExplanation Formula\u201d for the disparate ER in classi\ufb01cation tasks.\n\nx0,x1\n\nConnections with Other Counterfactual Measures Defs. 1-3 can be seen as a generalization of\nthe marginal counterfactual measures, including the counterfactual effects introduced in [28] and the\nnatural effects in [17, 11, 15]. Unable to consider the additional evidence (in classi\ufb01cation, the true\noutcome Y = y), the fairness analysis framework based on these marginal measures fails to provide a\n\ufb01ne-grained quantitative explanation of the ER disparity (as in, Thm. 2). The counterfactual fairness\n[10] is another counterfactual measure. As noted in [28], however, it considers only the effects along\nthe causal paths from the protected attribute X and the outcome \u02c6Y , thus unable to provide a full\naccount of the X \u2212 \u02c6Y associations, including the spurious relations. We provide in Appendix B [27]\na more detailed discussion about the relationships between our measures and the existing ones.\n\n1G|Y explicitly represents the change of information \ufb02ow due to conditioning on the true outcome Y : the\ninformation via arrows pointing away from Y is intercepted; measuring the collider Y makes its (marginally\nindependent) common causes dependent, also known as the \u201cexplaining away\u201d effect [16, pp. 339].\n\n5\n\n\f4 Estimating Counterfactual Error Rates\nThe Explanation Formula provides the precise relation between the counterfactual ERs, but it does\nnot specify how they should be estimated from data. When the underlying SCM is provided, the\ncounterfactual direct, indirect and spurious ERs (Defs. 1-3) are all well-de\ufb01ned and computable via\nthe three-step algorithm of \u201cpredictions, interventions and counterfactuals\u201d described in [16, Ch. 7.1].\nHowever, the SCMs are not fully known in many applications, and one must estimate the proposed\ncounterfactual measures from the passively-collected (observational) data. Let a classi\ufb01er f ( \u02c6pa)\nbe denoted by f ( \u02c6w, \u02c6z), where \u02c6Z \u2286 \u02c6PA are non-descendants of both X and Y and the subset of\nfeatures \u02c6W = \u02c6PA\\ \u02c6Z. We \ufb01rst characterize a set of classi\ufb01ers where such estimation is still feasible.\nDe\ufb01nition 4 (Explanation Criterion). Given a DAG G and a classi\ufb01er \u02c6y \u2190 f ( \u02c6w, \u02c6z), a set of\ncovariates C satis\ufb01es the explanation criterion relative to f (called the explaining set) if and only if\n(1) \u02c6Z \u2286 C; (2) C \u2229 Forb({X, Y }, \u02c6W\\X) = \u2205 where Forb({X, Y }, \u02c6W\\X) is a set of descendants\nWi \u2208 De(W )G for some W (cid:54)\u2208 {X, Y } on a proper causal path2 from {X, Y } to \u02c6W\\X in G; and (3)\nall spurious paths from {X, Y } to \u02c6W\\X in G are blocked by C. A classi\ufb01er f is counterfactually\nexplainable (ctf-explainable) if and only if it has an explaining set C satisfying Conditions 1-3.\n\nConsider again the COMPAS model of Fig. 1. The classi\ufb01er f (x, w, z) has input features \u02c6W =\n{X, W} and \u02c6Z = {Z}. The set C = {Z} does not satisfy the explanation criterion relative to f\nsince it does not block the spurious path Y \u2190 W . Indeed, one could show that there exists no set\nC satisfying Def. 4 relative to f, i.e., f (x, w, z) is not ctf-explainable. However, if we remove the\nprior convictions W from the feature set, the new classi\ufb01er f (x, z) is ctf-explainable with C = {Z}:\n\u02c6Z = C = {Z} satis\ufb01es Condition 1; Conditions 2-3 follow immediately since \u02c6W\\X = \u2205.\nDefs. 4 constitutes a suf\ufb01cient condition upon which the counterfactual ERs could, at least in principle,\nbe estimated from the observational data. This yields identi\ufb01cation formulas as shown next:\nTheorem 3. Given a causal diagram G and a classi\ufb01er f ( \u02c6w, \u02c6z), if f is ctf-explainable (Def. 4) with\n(\u02c6y|y) can be estimated as follows:\nan explaining set C, ERd\n(\u02c6y|x, y) =\n(5)\n\n(\u02c6y|x, y), ERi\n(P (\u02c6yx1, \u02c6w\\x, \u02c6z) \u2212 P (\u02c6yx0, \u02c6w\\x, \u02c6z))P ( \u02c6w\\x|x0, c, y)P (c|x, y),\n\n(cid:88)\n(cid:88)\n(cid:88)\nP (\u02c6yx1, \u02c6w\\x, \u02c6z)P ( \u02c6w\\x|x1, c, y)(P (c|x1, y) \u2212 P (c|x0, y)).\n\nP (\u02c6yx1, \u02c6w\\x, \u02c6z)(P ( \u02c6w\\x|x1, c, y) \u2212 P ( \u02c6w\\x|x0, c, y))P (c|x, y),\n\n(\u02c6y|x, y) and ERs\n\nERs\n\nx0,x1\n\n(\u02c6y|y) =\n\n(\u02c6y|x, y) =\n\n(6)\n\n(7)\n\nERi\n\nx0,x1\n\nERd\n\nx0,x1\n\nx0,x1\n\nx0,x1\n\nx0,x1\n\n\u02c6w,c\n\n\u02c6w,c\n\n\u02c6w,c\n\nwhere P (\u02c6y \u02c6w, \u02c6z) is well-de\ufb01ned, computable from the classi\ufb01er f ( \u02c6w, \u02c6z)3.\nIn Eqs. 5-7, the conditional distributions P (c|x, y) and P ( \u02c6w\\x|x0, c, y) do not involve any counter-\nfactual variable, which means that they are readily estimable by any method from the observational\ndata (e.g., through deep nets). Continuing from the COMPAS example, we could thus estimate the\ncounterfactual ERs of f (x, z) from the distribution P (x, y, z, w) using Thm. 3 with C = {Z}.\n\nInverse Propensity Weighting Estimators Eqs. 5-7 involve summing over all possible values of\n\u02c6W , C, which may present computational and sample complexity challenges as the cardinalities\nof \u02c6W , C grow very rapidly. There exist robust statistical estimation techniques, known as the\ninverse propensity weighting (IPW) [12, 18], to circumvent such issues. Given the observed data\nD = {Yi, \u02c6Wi, Ci}n\n(\u02c6y|x, y) =\n\n(\u02c6y|x, y) as follows:\n\u02c6P (x|Ci, y)I{Xi=x0,Yi=y}\n\ni=1, we propose the IPW estimator for ERd\n\nn(cid:88)\n\n) \u2212 P (\u02c6yx0, \u02c6Wi\\Xi, \u02c6Zi\n\n(P (\u02c6yx1, \u02c6Wi\\Xi, \u02c6Zi\n\n\u02c6ERd\n\nx0,x1\n\nx0,x1\n\n(8)\n\n))\n\n,\n\n\u02c6P (x0|Ci, y) \u02c6P (x, y)\n\n1\nn\n\ni=1\n\nwhere I{\u00b7} is an indicator function and \u02c6P (x, y) is the sample mean estimator of P (x, y) (X, Y are\n\ufb01nite). \u02c6P (x|c, y) is a reliable estimator of the conditional distributions P (x|c, y) and, in practice,\ncould be estimated by assuming some parametric models such as logistic regression.\n\n2A causal path from {X, Y } to \u02c6W \\X is proper if it does not intersect {X, Y } except at the end point [20].\n3For a deterministic f ( \u02c6w, \u02c6z), the probabilities P (\u02c6y \u02c6w, \u02c6z) = I{\u02c6y=f ( \u02c6w, \u02c6z)} where I{\u00b7} is an indicator function.\n\n6\n\n\fAlgorithm 1: FindExpSet\n\nInput: Feature set { \u02c6W , \u02c6Z}, DAG G = (cid:104)V , E(cid:105)\nOutput: Explaining set C (Def. 4) relative to\nf ( \u02c6w, \u02c6z) in G, or \u22a5 if f is not ctf-explainable.\n1: Apply FindSep [22] to \ufb01nd a set C with\n\u02c6Z \u2286 C \u2286 V \\Forb({X, Y }, \u02c6W \\X) such that it\nd-separates {X, Y } and \u02c6W \\X in Gpbd\n.\n{X,Y }, \u02c6W \\X\n\n2: return C\n\nAlgorithm 2: Causal-SFFS\n\nInput: Samples D = {Yi, Vi}n\ndiagram G\nOutput: A family of ctf-explainable classi\ufb01ers F\nInitialization:\n\n\u02c6PA0 = \u2205, k = 0.\n\ni=1, a causal\n\n1: while k < |V | do\n2:\n\nLet subset \u02c6Vk be de\ufb01ned as\n\n{vi \u2208 V \\ \u02c6PAk : FindExpSet( \u02c6PAk \u222a vi, G) (cid:54)=\u22a5}.\n\nAlgorithm 3: Ctf-FairLearning\n\nInput: Samples D, DAG G, \u0001d, \u0001i, \u0001s > 0\nOutput: A fair classi\ufb01er f\n1: Let F = C-SFFS(D, G).\n2: Obtain a fair classi\ufb01er f from F by solving Eq. 9\n\nsubject to |ERd| \u2264 \u0001d, |ERi| \u2264 \u0001i, |ERs| \u2264 \u0001s.\n\n3:\n4:\n5:\n\nLet vk+1 = arg maxvi\u2208 \u02c6Vk\nLet \u02c6PAk+1 = \u02c6PAk \u222a vk+1; k = k + 1.\nContinue with the conditional exclusion of [19,\n\nJ( \u02c6PAk \u222a {vi}).\n\nStep 2-3] and update the counter k.\n6: end while\n7: return F = {\u2200f : \u02c6PAk \u2192 \u02c6Y }.\n\nTheorem 4. For a ctf-explainable classi\ufb01er f ( \u02c6w, \u02c6z), \u02c6ERd\nfor ERd\n\n(\u02c6y|x, y) (Eq. 5) if the model for P (x|c, y) is correctly speci\ufb01ed.\n\nx0,x1\n\nx0,x1\n\n(\u02c6y|x, y) (Eq. 8) is a consistent estimator\n\n{X,Y }, \u02c6W \\X\n\nWe provide IPW estimators for counterfactual indirect and spurious ERs in Appendix A [27].\n4.1 Finding Adjustment Set for Explainable Classi\ufb01ers\nA few natural questions arise here is (1) how to systematically test whether a classi\ufb01er f is ctf-\nexplainable, and (2) if so, to \ufb01nd a set C satisfying the explanation criterion so that the counterfactual\nERs could be identi\ufb01ed. In this section, we will develop an ef\ufb01cient method to answer these questions.\nGiven a DAG G, by Gpbd\nwe denote the proper backdoor graph obtained from G by removing\nthe \ufb01rst edge of every proper causal path from {X, Y } to \u02c6W\\X [22]. We formulate next in graphical\nterms a set of identi\ufb01cation conditions equivalent to the explanation criterion de\ufb01ned in Def. 4.\nDe\ufb01nition 5 (Constructive Explanation Criterion). Given a DAG G and a classi\ufb01er f ( \u02c6w, \u02c6z), co-\nvariates C satisfy the constructive explanation criterion relative to f if and only if (1) \u02c6Z \u2286 C \u2286\nV \\Forb({X, Y }, \u02c6W\\X), where Forb({X, Y }, \u02c6W\\X) is a set of nodes forbidden by Def. 4; (2) C\nd-separates {X, Y } and \u02c6W\\X in the proper backdoor graph Gpbd\nTheorem 5. Given a causal diagram G and a classi\ufb01er f, covariates C satis\ufb01es the explanation\ncriterion (Def. 4) to f if and only if it satis\ufb01es the constructive explanation criterion (Def. 5) to f.\n\n{X,Y }, \u02c6W \\X\n\n.\n\nThm. 5 allows us to use the algorithmic framework developed by [22] for constructing d-separating\nsets in DAGs. We summarize this procedure as FindExpSet, in Alg. 1. Speci\ufb01cally, the sub-routine\nFindSep \ufb01nd a covariates set C with \u02c6Z \u2286 C \u2286 V \\Forb({X, Y }, \u02c6W\\X), such that C d-separates\nall paths between {X, Y } and \u02c6W\\X in Gpbd\n, i.e., the explaining set relative to classi\ufb01er\nf ( \u02c6w, \u02c6z) (Def. 4). This algorithm can be solved in O(n + m) runtime where n is the number of nodes\nand m is the number of edges in the proper backdoor graph Gpbd\n\n{X,Y }, \u02c6W \\X\n\n.\n\n{X,Y }, \u02c6W \\X\n\n5 Achieving Equalized Counterfactual Error Rates\nSo far we have focused on analyzing the unequalized counterfactual ERs of an existing predictor\nin the environment. A more interesting problem is how to obtain an optimal classi\ufb01er such that its\ninduced counterfactual ERs along with a speci\ufb01c discriminatory mechanism are equalized.\nGiven \ufb01nite samples D = {Yi, Vi}n\ni=1 drawn from P (y, v) (where the protected attribute X \u2208 V ),\nthe associated causal diagram G, and a set of candidate ctf-explainable classi\ufb01ers F, the goal of\nthe supervised learning is to obtain an optimal classi\ufb01er f\u2217( \u02c6pa) from F such that a loss function\nL(D, f ) measuring the distance between the prediction \u02c6Y and the true outcome Y is minimized. We\nwill elaborate later about how to construct the ctf-explainable set F. Among the quantities evolved\nby Thm. 3, the counterfactual distribution P (\u02c6yx, \u02c6w\\x, \u02c6z) is de\ufb01ned from the classi\ufb01er f and the other\nconditional distributions (e.g., P (c|x, y)) are estimable from the data D. We could thus represent\na counterfactual ER (e.g., direct) of a classi\ufb01er f \u2208 F as a function g(D, f ) (e.g., Eq. 8). A fair\n\n7\n\n\fclassi\ufb01er is obtained by minimizing L(D, f ) subject to a box constraint over g(D, f ), namely,\n\nf\u2208F L(D, f ) s.t. |g(D, f )| \u2264 \u0001,\nmin\n\n(9)\n\n(cid:124)\n\n(cid:124)\n\nx0,x1\n\nx0,x1\n\nwhere \u0001 \u2208 R+ and the smaller \u0001 is, the fairer the learned classi\ufb01er would be. In general, the constraints\n|g(D, f )| \u2264 \u0001 are non-convex and solving the problem of Eq. 9 seems to be dif\ufb01cult. However, this\noptimization problem could be signi\ufb01cantly simpler in certain cases, solvable using standard convex\noptimization methods [3]. We provide two canonical settings that \ufb01t this requirement.\nFirst, we assume that the features V are discrete, and let \u03b8\u02c6y,x, \u02c6w\\x, \u02c6z denote the probabilities\nP (\u02c6yx, \u02c6w\\x, \u02c6z). The counterfactual constraints |g(D, f )| \u2264 \u0001 are thus reducible to a set of linear\ninequalities on the parameter space {\u03b8}. Second, consider a classi\ufb01er making decision based on a\n\u03c6(x, \u02c6w\\x, \u02c6z) (e.g., logistic regression), where \u03c6(\u00b7) is the basis function.\ndecision boundary \u02dcY = \u03b8\n(\u02dcy|x, y) = 0\nThe boundary \u02dcY acts as a proxy to the prediction \u02c6Y . For instance, the condition ERd\n(\u02c6y|x, y) = 0. The same reasoning applies to the counterfactual indirect and spurious\nimplies ERd\nERs. We will employ the techniques in [25] and approximate the constraints |g(D, f )| \u2264 \u0001 using the\ncounterfactual ERs of X on the boundary \u02dcY . Assume that we are interested in the mean effect and\n\u03c6(x, \u02c6w\\x, \u02c6z). Given the convexity of L(D, f ),\nreplace the quantities P (\u02c6yx, \u02c6w\\x, \u02c6z) in Thm. 3 with \u03b8\nEq. 9 is a convex optimization problem and can thus be ef\ufb01ciently solved using standard methods.\n5.1 Constructing Counterfactually Explainable Classi\ufb01ers\nThe counterfactual explainability (Def. 4) of a classi\ufb01er f relies on its input feature \u02c6PA: the smaller\nthe set \u02c6PA is, the easier it would be to \ufb01nd a explaining set C relative to f ( \u02c6pa). In practice, some\nfeatures contain critical information about the prediction task, which means that their exclusion\ncould lead to poorer performance. This observation suggests a novel feature selection problem in\nthe fairness-aware classi\ufb01cation task: we would like to \ufb01nd a subset \u02c6PA from the available features\nV such that each classi\ufb01er in the candidate set F = {\u2200f : \u02c6PA \u2192 \u02c6Y } is ctf-explainable, without\nsigni\ufb01cant loss of prediction accuracy.\nOur solution builds on the procedure FindExpSet (Alg. 1) and the classic method of Sequential\nFloating Forward Selection (SFFS) [19]. Let \u02c6PAk be the set of k features. The score function\nJ( \u02c6pa k) evaluates the candidate subset \u02c6PAk and returns a measure of its \u201cgoodness\u201d. In practice,\nthis score could be obtained by computing the statistical measures of dependence, or by evaluating\nthe best in-class predictive accuracy for classi\ufb01ers in {\u2200f : \u02c6PAk \u2192 \u02c6Y } on the validation data. We\ndenote our method by Causal SFFS (C-SFFS) and summarize it in Alg. 2. Starting with a subset\n\u02c6PAk, C-SFFS (Step 2-3) adds one feature which gives the highest score J. FindExpSet ensures that\nthe resulting subset \u02c6PAk+1 induces a ctf-explainable classi\ufb01er f ( \u02c6pa k+1). Step 5 repeatedly removes\nthe least signi\ufb01cant feature vd from the newly-formed \u02c6PAk until no feature could be excluded to\nimprove the score J. During the exclusion phase, we do not apply FindExpSet, since removing\nfeatures from a ctf-explainable classi\ufb01er does not violate the explanation criterion (Def. 4). It follows\nimmediately from the soundness of FindExpSet that C-SFFS always returns a ctf-explainable set F.\nTheorem 6. For F = C-SFFS(D, G), each classi\ufb01er f \u2208 F is ctf-explainable.\n\nWe summarize in Alg. 3 the procedure of training an optimal classi\ufb01er satisfying the fairness\nconstraints over the counterfactual ERs. ERd, ERi, and ERs stand for the counterfactual quantities\n(\u02c6y|y), respectively. We use C-SFFS (Alg. 2) to\nERd\nobtain a candidate set F such that each f \u2208 F is ctf-explainable. The fair classi\ufb01er is computed by\nsolving the optimization problem in Eq. 9 subject to the box constraints over ERd, ERi, and ERs.\n\n(\u02c6y|x0, y), and ERs\n\n(\u02c6y|x0, y), ERi\n\nx0,x1\n\nx1,x0\n\nx1,x0\n\n6 Simulations and Experiments\nIn this section, we will illustrate our approach on both synthetic and real datasets. We focus on the false\npositive rate ERx0,x1(\u02c6y1|y0) across demographics x0 = 0, x1 = 1, where \u02c6y1 = 1, y0 = 0, and the\n(\u02c6y1|y0) (following\ncorresponding components ERd\nThm. 2). We shorten the notation and write ERx0,x1 (\u02c6y1|y0) = ER, and similarly to ERd, ERi and\nERs. Details of the experiments are provided in Appendix C [27].\n\n(\u02c6y1|x0, y0) and ERs\n\n(\u02c6y1|x0, y0), ERi\n\nx0,x1\n\nx1,x0\n\nx1,x0\n\n8\n\n\f(a) Standard Prediction Model\n\n(b) COMPAS\n\nX\n\nZ\n\nW\n\nY\n\nD\n\nctf = 0.191, ERd\n\nctf = \u22120.194, ERd\n\nFigure 6: Standard fair-\nness prediction model\n\neo = 0.620) than in the unconstrained fopt (ERd\n\nFigure 7: Results of Experiments 1-2. Measures that are not estimable via the explanation criterion\nare shaded and highlighted. ER stands for the false positive rate ERx0,x1 (\u02c6y1|y0); ERd, ERi and ERs\nrepresent the corresponding counterfactual direct, indirect, and spurious ERs (Thm. 2). Classi\ufb01er\nfopt, fer, and fctf in Exp. 1 correspond to, respectively, color blue, orange, and yellow in Fig. (a); fopt,\nfer, fopt-, fer-, and fctf- in Exp. 2 correspond to blue, orange, yellow, purple, and green in Fig. (b).\nExperiment 1: Standard Prediction Model We consider a general-\nized COMPAS model containing the common descendant D, shown in\nFig. 6, which we call here the standard fairness prediction model (for\nshort, standard prediction model). We train two classi\ufb01ers with the same\nfeature set {X, W, Z, D} where the \ufb01rst is obtained via the standard,\nunconstrained optimization, which we call fopt, and the second one con-\nstrains the disparate ER to half of that of fopt, which we call fer. We\nfurther compute the counterfactual ERs (Defs. 1-3). The results are shown in Fig. 7(a). We \ufb01rst\ncon\ufb01rm that the procedure fer is sound in the sense that feo (90.4%) achieves a comparable predictive\naccuracy to fopt (90.4%) while reducing the disparate ER in half (ERer = \u22120.238, ERopt = \u22120.476).\nSecond, ERd is larger in fer (ERd\nopt = 0.381). This ma-\nterializes the concern acknowledged in [7], namely, that optimizing based on ER may not be enforcing\nany type of real-life fairness notion related to the underlying causal mechanism. To circumvent this\nissue, we train a classi\ufb01er with the same feature set such that its counterfactual ERs are reduced to\nhalf of that of the unconstrained fopt, called fctf . The results (Fig. 7(a)) support the counterfactual\napproach: fctf (90.1%) reports ER comparable to fer (ERctf = \u22120.238), but a smaller signi\ufb01cant\nctf = \u22120.236).\ndirect, indirect and spurious ER disparities (ERd\nExperiment 2: COMPAS In the COMPAS model of Fig. 1, we are interested in predicting whether\na defendant would recidivate, while avoiding the direct discrimination (the threshold \u0001 = 0.01). We\ncompute a classi\ufb01er fer with a feature set {X, Z, W} subject to |ERer| \u2264 \u0001. We also include\nan unconstrained classi\ufb01er fopt as the baseline. The results (Fig. 7(b)) reveal that fer (73.7%)\nand fopt (74.6%) are comparable in prediction accuracy while fer has much smaller disparate ER\n(ERer = \u22120.005, ERopt = \u22120.077). Given that the underlying causal model is not fully known, we\ncould only estimate the counterfactual direct ER from passively-collected samples. Since classi\ufb01ers\nwith feature set {X, W, Z} are not ctf-explainable in the COMPAS model (Def.4), ERd of fer and\nfopt cannot be identi\ufb01ed via Thm. 3. Previous analysis (Experiment 1) implies that ERd could be\nsigni\ufb01cant even when ER is small, which suggests one should be wary of the direct discrimination\nof fer and fopt. To overcome this issue, we remove W from the feature set and obtain fopt- and\nfer- following a similar procedure. We estimate their ERd via Thm. 3 with covariates C = {Z}.\nThe results show that the direct discrimination are signi\ufb01cant in both fer- and fopt- (ERd\neo\u2212 = 0.015,\nopt\u2212 = \u22120.066). To remove the direct discrimination, we train a classi\ufb01er fctf- following Alg. 3\nERd\nwith the features {X, Z} and \u0001d = \u0001. The results support the ef\ufb01cacy of Alg. 3: fctf- performs slightly\nctf\u2212 = \u22120.001).\nworse in prediction accuracy (72.1%) but ascertains no direct discrimination (ERd\n7 Conclusions\nWe introduced a new family of counterfactual measures capable of explaining disparities in the\nmisclassi\ufb01cation rates (false positive and false negative) across different demographics in terms of\nthe causal mechanisms underlying the speci\ufb01c prediction process. We then developed machinery\nbased on these measures to allow data scientists (1) to diagnose whether a classi\ufb01er is operating in a\ndiscriminatory fashion against speci\ufb01c groups, and (2) to learn a new classi\ufb01er subject to fairness\nconstraints in terms of \ufb01ne-grained misclassi\ufb01cation rates. In practice, this approach constitutes a\nformal solution to the notorious lack of interpretability of the equalized odds. We hope the causal\nmachinery put forwarded here will help data scientists to analyze already deployed systems as well\nas to construct new classi\ufb01ers that are fair even when the training data comes from an unfair world.\n\n9\n\n\fAcknowledgments\n\nThis research is supported in parts by grants from IBM Research, Adobe Research, NSF IIS-1704352,\nand IIS-1750807 (CAREER).\n\nReferences\n[1] J. Angwin, J. Larson, S. Mattu, and L. Kirchner. Machine bias: There\u2019s software used across\n\nthe country to predict future criminals. and it\u2019s biased against blacks. ProPublica, 23, 2016.\n\n[2] E. Bareinboim and J. Pearl. Causal inference and the data-fusion problem. Proceedings of the\n\nNational Academy of Sciences, 113:7345\u20137352, 2016.\n\n[3] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.\n\n[4] T. Brennan, W. Dieterich, and B. Ehret. Evaluating the predictive validity of the compas risk\n\nand needs assessment system. Criminal Justice and Behavior, 36(1):21\u201340, 2009.\n\n[5] A. Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction\n\ninstruments. Big data, 5(2):153\u2013163, 2017.\n\n[6] G. Goh, A. Cotter, M. Gupta, and M. P. Friedlander. Satisfying real-world goals with dataset\nconstraints. In Advances in Neural Information Processing Systems, pages 2415\u20132423, 2016.\n\n[7] M. Hardt, E. Price, N. Srebro, et al. Equality of opportunity in supervised learning. In Advances\n\nin Neural Information Processing Systems, pages 3315\u20133323, 2016.\n\n[8] A. E. Khandani, A. J. Kim, and A. W. Lo. Consumer credit-risk models via machine-learning\n\nalgorithms. Journal of Banking & Finance, 34(11):2767\u20132787, 2010.\n\n[9] N. Kilbertus, M. R. Carulla, G. Parascandolo, M. Hardt, D. Janzing, and B. Sch\u00f6lkopf. Avoiding\ndiscrimination through causal reasoning. In Advances in Neural Information Processing Systems,\npages 656\u2013666, 2017.\n\n[10] M. J. Kusner, J. Loftus, C. Russell, and R. Silva. Counterfactual fairness. In Advances in Neural\n\nInformation Processing Systems, pages 4069\u20134079, 2017.\n\n[11] X. W. Lu Zhang, Yongkai Wu. A causal framework for discovering and removing direct and\nindirect discrimination. In Proceedings of the Twenty-Sixth International Joint Conference on\nArti\ufb01cial Intelligence, IJCAI-17, pages 3929\u20133935, 2017.\n\n[12] J. K. Lunceford and M. Davidian. Strati\ufb01cation and weighting via the propensity score in\nestimation of causal treatment effects: a comparative study. Statistics in medicine, 23(19):2937\u2013\n2960, 2004.\n\n[13] J. F. Mahoney and J. M. Mohen. Method and system for loan origination and underwriting,\n\nOct. 23 2007. US Patent 7,287,008.\n\n[14] K. Mancuhan and C. Clifton. Combating discrimination using bayesian networks. Arti\ufb01cial\n\nIntelligence and Law, 22(2):211\u2013238, Jun 2014.\n\n[15] R. Nabi and I. Shpitser. Fair inference on outcomes.\n\nConference on Arti\ufb01cial Intelligence, 2018.\n\nIn Proceedings of the 32nd AAAI\n\n[16] J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, New York,\n\n2000. 2nd edition, 2009.\n\n[17] J. Pearl. Direct and indirect effects. In Proc. of the Seventeenth Conference on Uncertainty in\n\nArti\ufb01cial Intelligence, pages 411\u2013420. Morgan Kaufmann, CA, 2001.\n\n[18] J. Pearl, M. Glymour, and N. P. Jewell. Causal inference in statistics: a primer. John Wiley &\n\nSons, 2016.\n\n[19] P. Pudil, J. Novovi\u02c7cov\u00e1, and J. Kittler. Floating search methods in feature selection. Pattern\n\nrecognition letters, 15(11):1119\u20131125, 1994.\n\n10\n\n\f[20] I. Shpitser, T. VanderWeele, and J. Robins. On the validity of covariate adjustment for estimating\ncausal effects. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Arti\ufb01cial\nIntelligence, pages 527\u2013536. AUAI, Corvallis, OR, 2010.\n\n[21] L. Sweeney. Discrimination in online ad delivery. Queue, 11(3):10, 2013.\n\n[22] B. van der Zander, M. Li\u00b4skiewicz, and J. Textor. Constructing separators and adjustment sets in\nancestral graphs. In Proceedings of the 30th Conference on Uncertainty in Arti\ufb01cial Intelligence.\nAUAI, 2014.\n\n[23] B. Woodworth, S. Gunasekar, M. I. Ohannessian, and N. Srebro. Learning non-discriminatory\n\npredictors. In Conference on Learning Theory, pages 1920\u20131953, 2017.\n\n[24] S. Wright. The method of path coef\ufb01cients. The annals of mathematical statistics, 5(3):161\u2013215,\n\n1934.\n\n[25] M. B. Zafar, I. Valera, M. Gomez Rodriguez, and K. P. Gummadi. Fairness beyond disparate\ntreatment & disparate impact: Learning classi\ufb01cation without disparate mistreatment.\nIn\nProceedings of the 26th International Conference on World Wide Web, pages 1171\u20131180.\nInternational World Wide Web Conferences Steering Committee, 2017.\n\n[26] M. B. Zafar, I. Valera, M. G. Rogriguez, and K. P. Gummadi. Fairness constraints: Mechanisms\n\nfor fair classi\ufb01cation. In Arti\ufb01cial Intelligence and Statistics, pages 962\u2013970, 2017.\n\n[27] J. Zhang and E. Bareinboim. Equality of opportunity in classi\ufb01cation: A causal approach.\n\nTechnical Report R-37, AI Lab, Purdue University., 2018.\n\n[28] J. Zhang and E. Bareinboim. Fairness in decision-making \u2014 the causal explanation formula. In\n\nProceedings of AAAI Conference on Arti\ufb01cial Intelligence, pages 2037\u20132045, 2018.\n\n[29] J. Zhang and E. Bareinboim. Non-parametric path analysis in structural causal models. In\n\nProceedings of the 34th Conference on Uncertainty in Arti\ufb01cial Intelligence, 2018.\n\n11\n\n\f", "award": [], "sourceid": 1851, "authors": [{"given_name": "Junzhe", "family_name": "Zhang", "institution": "Purdue University"}, {"given_name": "Elias", "family_name": "Bareinboim", "institution": "Purdue"}]}