{"title": "Predict Responsibly: Improving Fairness and Accuracy by Learning to Defer", "book": "Advances in Neural Information Processing Systems", "page_first": 6147, "page_last": 6157, "abstract": "In many machine learning applications, there are multiple decision-makers involved, both automated and human. The interaction between these agents often goes unaddressed in algorithmic development. In this work, we explore a simple version of this interaction with a two-stage framework containing an automated model and an external decision-maker. The model can choose to say PASS, and pass the decision downstream, as explored in rejection learning. We extend this concept by proposing \"learning to defer\", which generalizes rejection learning by considering the effect of other agents in the decision-making process. We propose a learning algorithm which accounts for potential biases held by external decision-makers in a system. Experiments demonstrate that learning to defer can make systems not only more accurate but also less biased. Even when working with inconsistent or biased users, we show that deferring models still greatly improve the accuracy and/or fairness of the entire system.", "full_text": "Predict Responsibly: Improving Fairness and\n\nAccuracy by Learning to Defer\n\nDavid Madras, Toniann Pitassi & Richard Zemel\n\n{madras,toni,zemel}@cs.toronto.edu\n\nUniversity of Toronto\n\nVector Institute\n\nAbstract\n\nIn many machine learning applications, there are multiple decision-makers involved,\nboth automated and human. The interaction between these agents often goes\nunaddressed in algorithmic development. In this work, we explore a simple version\nof this interaction with a two-stage framework containing an automated model\nand an external decision-maker. The model can choose to say PASS, and pass the\ndecision downstream, as explored in rejection learning. We extend this concept by\nproposing learning to defer, which generalizes rejection learning by considering\nthe effect of other agents in the decision-making process. We propose a learning\nalgorithm which accounts for potential biases held by external decision-makers\nin a system. Experiments demonstrate that learning to defer can make systems\nnot only more accurate but also less biased. Even when working with inconsistent\nor biased users, we show that deferring models still greatly improve the accuracy\nand/or fairness of the entire system.\n\n1\n\nIntroduction\n\nCan humans and machines make decisions jointly? A growing use of automated decision-making in\ncomplex domains such as loan approvals [5], medical diagnosis [14], and criminal justice [26], has\nraised questions about the role of machine learning in high-stakes decision-making, and the role of\nhumans in overseeing and applying machine predictions. Consider a black-box model which outputs\nrisk scores to assist a judge presiding over a bail case [19]. How does a risk score factor into the\ndecision-making process of an external agent such as a judge? How should this in\ufb02uence how the\nscore is learned? The model producing the score may be state-of-the-art in isolation, but its true\nimpact comes as an element of the judge\u2019s decision-making process.\nWe argue that since these models are often used as part of larger systems e.g. in tandem with another\ndecision maker (like a judge), they should learn to predict responsibly: the model should predict\nonly if its predictions are reliably aligned with the system\u2019s objectives, which often include accuracy\n(predictions should mostly indicate ground truth) and fairness (predictions should be unbiased with\nrespect to different subgroups).\nRejection learning [8, 10] proposes a solution: allow models to reject (not make a prediction) when\nthey are not con\ufb01dently accurate. However, this approach is inherently nonadaptive: both the model\nand the decision-maker act independently of one another. When a model is working in tandem\nwith some external decision-maker, the decision to reject should depend not only on the model\u2019s\ncon\ufb01dence, but also on the decision-maker\u2019s expertise and weaknesses. For example, if the judge\u2019s\nblack-box is uncertain about some subgroup, but the judge is very inaccurate or biased towards that\nsubgroup, we may prefer the model make a prediction despite its uncertainty.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fOur main contribution is the formulation of adaptive rejection learning, which we call learning\nto defer, where the model works adaptively with the decision-maker. We provide theoretical and\nexperimental evidence that learning to defer improves upon the standard rejection learning paradigm,\nif models are intended to work as part of a larger system. We show that embedding a deferring model\nin a pipeline can improve the accuracy and fairness of the system as a whole. Experimentally, we\nsimulate three scenarios where our model can defer judgment to external decision makers, echoing\nrealistic situations where downstream decision makers are inconsistent, biased, or have access to side\ninformation. Our experimental results show that in each scenario, learning to defer allows models to\nwork with users to make fairer, more responsible decisions.\n\n2 Learning to Defer\n\n2.1 A Joint Decision-Making Framework\n\nFigure 1: A larger decision system containing an automated model. When the model predicts, the\nsystem outputs the model\u2019s prediction; when the model says PASS, the system outputs the decision-\nmaker\u2019s (DM\u2019s) prediction. Standard rejection learning considers the model stage, in isolation, as the\nsystem output, while learning-to-defer optimizes the model over the system output.\n\nA complex real-world decision system can be modeled as an interactive process between various\nagents including decision makers, enforcement groups, and learning systems. Our focus in this\npaper is on a two-agent model, between one decision-maker and one learning model, where the\ndecision \ufb02ow is in two stages. This simple but still interactive setup describes many practical systems\ncontaining multiple decision-makers (Fig 1). The \ufb01rst stage is an automated model whose parameters\nwe want to learn. The second stage is some external decision maker (DM) which we do not have\ncontrol over e.g. a human user, a proprietary black-box model. The decision-making \ufb02ow is modeled\nas a cascade, where the \ufb01rst-step model can either predict (positive/negative) or say PASS. If it\npredicts, the DM will output the model\u2019s prediction. However, if it says PASS, the DM makes its own\ndecision. This scenario is one possible characterization of a realistic decision task, which can be an\ninteractive (potentially game-theoretic) process.\nWe can consider the \ufb01rst stage to be \ufb02agging dif\ufb01cult cases for review, culling a large pool of inputs,\nauditing the DM for problematic output, or simply as an assistive tool. In our setup, we assume\nthat the DM has access to information that the model does not \u2014 re\ufb02ecting a number of practical\nscenarios where DMs later in the chain may have more resources for ef\ufb01ciency, security, or contextual\nreasons. However, the DM may be \ufb02awed, e.g. biased or inconsistent. A tradeoff suggests itself:\ncan a machine learning model be combined with the DM to leverage the DM\u2019s extra insight, but\novercome its potential \ufb02aws?\nWe can describe the problem of learning an automated model in this framework as follows. There\nexist data X \u2208 Rn, ground truth labels Y \u2208 {0, 1}, and some auxiliary data Z \u2208 Rm which is only\navailable to the DM. If we let s \u2208 {0, 1} be a PASS indicator variable (s = 1 means PASS), then the\njoint probability of the system in Fig. 1 can be expressed as follows:\n\nPdef er(Y |X, Z) =\n\n[PM (Yi = 1|Xi)Yi(1 \u2212 PM (Yi = 1|Xi))1\u2212Yi](1\u2212si|Xi)\n\n(cid:89)\n\ni\n\n[PD(Yi = 1|Xi, Zi)Yi (1 \u2212 PD(Yi = 1|Xi, Zi))1\u2212Yi](si|Xi)\n\n(1)\n\nwhere PM is the probability assigned by the automated model, PD is the probability assigned by\nthe DM, and i indexes examples. This can be seen as a mixture of Bernoullis, where the labels are\ngenerated by either the model or the DM as determined by s. For convenience, we compress the\nprobabilistic notation:\n\n\u02c6YM = f (X) = PM (Y = 1|X) \u2208 [0, 1];\n\u02c6Y = (1 \u2212 s) \u02c6YM + s \u02c6YD \u2208 [0, 1];\n\n\u02c6YD = h(X, Z) = PD(Y = 1|X, Z) \u2208 [0, 1]\ns = g(X) \u2208 {0, 1}\n\n(2)\n\n2\n\nDMModelDataOutput10100010X1X2X3X40PASS1PASS\f\u02c6YM , \u02c6YD, \u02c6Y are model predictions, DM predictions, and system predictions, respectively (left to right\nin Fig. 1). The DM function h is a \ufb01xed, unknown black-box. Therefore, learning good { \u02c6YM , s}\ninvolves learning functions f and g which can adapt to h \u2013 the goal is to make \u02c6Y a good predictor\nof Y . To do so, we want to \ufb01nd the maximum likelihood solution of Eq. 1. We can minimize its\nnegative log-likelihood Ldef er, which can be written as:\n\nLdef er(Y, \u02c6YM , \u02c6YD, s) = \u2212 log Pdef er(Y |X, Z) = \u2212(cid:88)\n\ni\n\n[(1 \u2212 si)(cid:96)(Yi, \u02c6YM,i) + si(cid:96)(Yi, \u02c6YD,i)] (3)\nwhere (cid:96)(Y, p) = Y log p + (1 \u2212 Y ) log (1 \u2212 p) i.e. the log probability of the label with respect to\nsome prediction p. Minimizing Ldef er is what we call learning to defer. In learning to defer, we\naim to learn a model which outputs predictive probabilities \u02c6YM and binary deferral decisions s, in\norder to optimize the output of the system as a whole. The role of s is key here: rather than just an\nexpression of uncertainty, we can think of it as a gating variable, which tries to predict whether \u02c6YM or\n\u02c6YD will have lower loss on any given example. This leads naturally to a mixture-of-experts learning\nsetup; however, we are only able to optimize the parameters for one expert ( \u02c6YM ), whereas the other\nexpert ( \u02c6YD) is out of our control. We discuss further in Sec. 3.\nWe now examine the relationship between learning to defer and rejection learning. Speci\ufb01cally,\nwe will show that learning to defer is a generalization of rejection learning and argue why it is an\nimportant improvement over rejection learning for many machine learning applications.\n\n2.2 Learning to Reject\n\nRejection learning is the predominant paradigm for learning models with a PASS option (see Sec.\n4). In this area, the standard method is to optimize the accuracy-rejection tradeoff: how much can a\nmodel improve its accuracy on the cases it does classify by PASS-ing some cases? This is usually\nlearned by minimizing a classi\ufb01cation objective Lreject with a penalty \u03b3reject for each rejection [10],\nwhere Y is ground truth, \u02c6YM is the model output, and s is the reject variable (s = 1 means PASS); all\nbinary:\n\nLreject(Y, \u02c6YM , s) = \u2212(cid:88)\n\n[(1 \u2212 si)(cid:96)(Yi, \u02c6YM,i) + si\u03b3reject]\n\n(4)\n\n(5)\n\nwhere (cid:96) is usually classi\ufb01cation accuracy, i.e. (cid:96)(Yi, \u02c6Yi) = 1[Yi = \u02c6Yi]. If we instead consider (cid:96) to\nbe the log-probability of the label, then we can interpret Lreject probabilistically as the negative\nlog-likelihood of the joint distribution Preject:\n\nPreject(Y |X) =\n\nM,i(1 \u2212 \u02c6YM,i)(1\u2212Yi)]1\u2212si exp(\u03b3reject)si\n[ \u02c6Y Yi\n\ni\n\n(cid:89)\n\n2.3 Learning to Defer is Adaptive Rejection Learning\n\ni\n\nIn learning to defer, the model leverages information about the DM to make PASS decisions adaptively.\nWe can consider how learning to defer relates to rejection learning. Examining their loss functions\nEq. 3 and Eq. 4 respectively, the only difference is that the rejection loss has a constant \u03b3reject where\nthe deferring loss has variable (cid:96)(Y, \u02c6YD). This leads us to the following:\nTheorem. Let (cid:96)(Y, \u02c6Y ) be our desired example-wise objective, where Y = arg min \u02c6Y \u2212 (cid:96)(Y, \u02c6Y ).\nThen, if the DM has constant loss (e.g. is an oracle), there exist values of \u03b3reject, \u03b3def er for which\nthe learning-to-defer and rejection learning objectives are equivalent.\nProof. As in Eq. 4, the standard rejection learning objective is\n\n[(1 \u2212 si)(cid:96)(Yi, \u02c6YM,i) + si\u03b3reject]\n\n(6)\n\nLreject(Y, \u02c6YM , \u02c6YD, s) = \u2212(cid:88)\n\ni\n\nwhere the \ufb01rst term encourages a low negative loss (cid:96) for non-PASS examples and the second term\npenalizes PASS at a constant rate, \u03b3reject. If we include a similar \u03b3def er penalty, the deferring loss\nfunction is\n\nLdef er(Y, \u02c6YM , \u02c6YD, s) = \u2212(cid:88)\n\n[(1 \u2212 si)(cid:96)(Yi, \u02c6YM,i) + si(cid:96)(Yi, \u02c6YD,i) + si\u03b3def er]\n\n(7)\n\ni\n\n3\n\n\fNow, if the DM has constant loss, meaning (cid:96)(Y, \u02c6YD) = \u03b1, we have (with \u03b3def er = \u03b3reject \u2212 \u03b1):\n\n[(1 \u2212 si)(cid:96)(Yi, \u02c6YM,i) + si \u00b7 \u03b1 + si\u03b3def er]\n[(1 \u2212 si)(cid:96)(Yi, \u02c6YM,i) + si(\u03b3def er + \u03b1)] = Lreject(Y, \u02c6YM,i, s)\n\n(8)\n\n(cid:4)\n\nLdef er(Y, \u02c6YM , \u02c6YD, s) = \u2212(cid:88)\n= \u2212(cid:88)\n\ni\n\ni\n\n2.4 Why Learn to Defer?\n\nThe proof in Sec. 2.3 shows the central point of learning to defer: rejection learning is exactly a\nspecial case of learning to defer: a DM with constant loss \u03b1 on each example. We \ufb01nd that the\nadaptive version (learning to defer), more accurately describes real-world decision-making processes.\nOften, a PASS is not the end of a decision-making sequence. Rather, a decision must be made\neventually on every example by a DM, whether the automated model predicts or not, and the DM\nwill not, in general, have constant loss on each example.\nSay our model is trained to detect melanoma, and when it says PASS, a human doctor can run an\nextra suite of medical tests. The model learns that it is very inaccurate at detecting amelanocytic\n(non-pigmented) melanoma, and says PASS if this might be the case. However, suppose that the\ndoctor is even less accurate at detecting amelanocytic melanoma than the model is. Then, we may\nprefer the model to make a prediction despite its uncertainty. Conversely, if there are some illnesses\nthat the doctor knows well, then the doctor may have a more informed, nuanced opinion than the\nmodel. Then, we may prefer the model say PASS more frequently relative to its internal uncertainty.\nSaying PASS on the wrong examples can also have fairness consequences. If the doctor\u2019s decisions\nbias against a certain group, then it is probably preferable for a less-biased model to defer less\nfrequently on the cases of that group. If some side information helps a DM achieve high accuracy on\nsome subgroup, but confuses the DM on another, then the model should defer most frequently on the\nDM\u2019s high accuracy subgroup, to ensure fair and equal treatment is provided to all groups. In short,\nif the model we train is part of a larger pipeline, then we should train and evaluate the performance of\nthe pipeline with this model included, rather than solely focusing on the model itself. We note that\nit is unnecessary to acquire decision data from a speci\ufb01c DM; rather, data could be sampled from\nmany DMs (potentially using crowd-sourcing). Research suggests that common trends exist in DM\nbehavior [6, 11], suggesting that a model trained on some DM or group of DMs could generalize to\nunseen DMs.\n\n3 Formulating Adaptive Models within Decision Systems\n\nIn our decision-making pipeline, we aim to formulate a fair model which can be used for learning to\ndefer (Eq. 3) (and by extension non-adaptive rejection learning as well (Eq. 4)). Such a model must\nhave two outputs for each example: a predictive probability \u02c6YM and a PASS indicator s. We draw\ninspiration from the mixture-of-experts model [21]. One important difference between learning-to-\ndefer and a mixture-of-experts is that one of the \u201cexperts\u201d in this case is the DM, which is out of our\ncontrol; we can only learn the parameters of \u02c6YM .\nIf we interpret the full system as a mixture between the model\u2019s prediction \u02c6YM and the DM\u2019s\npredictions \u02c6YD, we can introduce a mixing coef\ufb01cient \u03c0, where s \u223c Ber(\u03c0). \u03c0 is the probability of\ndeferral, i.e. that the DM makes the \ufb01nal decision on an example X, rather than the model; 1 \u2212 \u03c0 is\nthe probability that the model\u2019s decision becomes the \ufb01nal output of the system. Recall that \u02c6YM , \u03c0\nare functions of the input X; they are parametrized below by \u03b8. Then, if there is some loss (cid:96)(Y, \u02c6Y )\nwe want our system to minimize, we can learn to defer by minimizing an expectation over s:\n\nLdef er(Y, \u02c6YM , \u02c6YD, \u03c0; \u03b8) = Es\u223cBer(\u03c0)L(Y, \u02c6YM , \u02c6YD, s; \u03b8)\n\n=\n\nEsi\u223cBer(\u03c0i)[(1 \u2212 si)(cid:96)(Yi, \u02c6YM,i; \u03b8) + si(cid:96)(Yi, \u02c6YD,i)]\n\n(9)\n\n(cid:88)\n\ni\n\n4\n\n\for, in the case of rejection learning:\nLreject(Y, \u02c6YM , \u02c6YD, \u03c0; \u03b8) =\n\n(cid:88)\n\ni\n\nEsi\u223cBer(\u03c0i)[(1 \u2212 si)(cid:96)(Yi, \u02c6YM,i; \u03b8) + si\u03b3reject]\n\n(10)\n\nNext, we give two methods of specifying and training such a model and present our method of\nlearning these models fairly, using a regularization scheme.\n\n3.1 Post-hoc Thresholding\n\nOne way to formulate an adaptive model with a PASS op-\ntion is to let \u03c0 be a function of \u02c6YM alone; i.e. \u02c6YM = f (X)\nand \u03c0 = g( \u02c6YM ). One such function g is a thresholding\nfunction \u2014 we can learn two thresholds t0, t1 (see Figure\n2) which yield a ternary classi\ufb01er. The third category is\nPASS, which can be outputted when the model prefers not\nto commit to a positive or negative prediction. A conve-\nnience of this method is that the thresholds can be trained\npost-hoc on an existing model with an output in [0, 1] e.g.\nmany binary classi\ufb01ers. We use a neural network as our\nbinary classi\ufb01er, and describe our post-hoc thresholding\nscheme in Appendix D. At test time, we use the thresholds\nto partition the examples. On each example, the model\noutputs a score \u03b2 \u2208 [0, 1].\nIf t0 < \u03b2 < t1, then we\noutput \u03c0 = 1 and defer (the value of \u02c6YM becomes irrelevant). Otherwise, if t0 \u2265 \u03b2 we output\n\u03c0 = 0, \u02c6YM = 0; if t1 \u2264 \u03b2 we output \u03c0 = 0, \u02c6YM = 1. Since \u03c0 \u2208 {0, 1} here, the expectation over\ns \u223c Ber(\u03c0) in Eq. 9 is trivial.\n\nFigure 2: Binary classi\ufb01cation (one\nthreshold) vs. ternary classi\ufb01cation with\na PASS option (two thresholds)\n\n.\n\n3.2 Learning a Differentiable Model\nWe may wish to use continuous outputs \u02c6YM , \u03c0 \u2208 [0, 1] and train our models with gradient-based\noptimization. To this end, we consider a method for training a differentiable adaptive model. One\ncould imagine extending the method in Sec. 3.1 to learn smooth thresholds end-to-end on top\nof a predictor. However, to add \ufb02exibility, we can allow \u03c0 to be a function of X as well as \u02c6YM ,\ni.e. \u02c6YM = f (X) and \u03c0 = g( \u02c6YM , X). This is advantageous because a DM\u2019s actions may depend\nheterogenously on the data: the DM\u2019s expected loss may change as a function of X, and it may do so\ndifferently than the model\u2019s. We can parametrize \u02c6YM and \u03c0 with neural networks, and optimize Eq.\n9 or 10 directly using gradient descent. At test time, we defer when \u03c0 > 0.5.\nWe estimate the expected value in Eq. 9 by sampling s \u223c Ber(\u03c0) during training (once per example).\nTo estimate the gradient through this sampling procedure, we use the Concrete relaxation [28].\nAdditionally, it can be helpful to stop the gradient from \u03c0 from backpropogating through \u02c6YM . This\nallows for \u02c6YM to still be a good predictor independently of \u03c0.\nSee Appendix F for a brief discussion of a third model we consider, a Bayesian Neural Network [3].\n\n3.3 Fair Classi\ufb01cation through Regularization\n\nWe can build a regularized fair loss function to combine error rate with a fairness metric. We can\nextend the loss in Eq. 9 to include a regularization term R, with a coef\ufb01cient \u03b1f air to balance\naccuracy and fairness:\n\nLdef er(Y, \u02c6YM , \u02c6YD, \u03c0; \u03b8) = Es\u223cBer(\u03c0)[L(Y, \u02c6YM , \u02c6YD, s; \u03b8) + \u03b1f airR(Y, \u02c6YM , \u02c6YD, s)\n\n(11)\n\nWe now provide some fairness background. In fair binary classi\ufb01cation, we have input labels Y ,\npredictions \u02c6Y , and sensitive attribute A (e.g., gender, race, age, etc.), assuming for simplicity that\nY, \u02c6Y , A \u2208 {0, 1}. In this work we assume that A is known. The aim is twofold: \ufb01rstly, accurate\n\n5\n\n\u00a0 \u00a0+ + + + + + +\u00a0- - - - -- - - - - -+ + + + +\u00a0PASStt0t1\fclassi\ufb01cation i.e Yi = \u02c6Yi; and fairness with respect to A i.e. \u02c6Y does not discriminate unfairly against\nparticular values of A. Adding fairness constraints can provably hurt classi\ufb01cation error [29]. We\nthus de\ufb01ne a loss function which trades off between these two objectives, yielding a regularizer.\nWe choose equalized odds as our fairness metric [20], which requires that false positive and false\nnegative rates are equal between the two groups. We will refer to the difference between these rates\nas disparate impact (DI). Here we de\ufb01ne a continuous relaxation of DI, having the model output a\nprobability p and considering \u02c6Y \u223c Ber(p). The resulting term D acts as our regularizer R in Eq. 11:\nDIY =i(Y, A, \u02c6Y ) = |E \u02c6Y \u223cBer(p)( \u02c6Y = 1 \u2212 Y |A = 0, Y = i) \u2212 E \u02c6Y \u223cBer(p)( \u02c6Y = 1 \u2212 Y |A = 1, Y = i)|\n\nD(Y, A, \u02c6Y ) =\n\n(DIY =0(Y, A, \u02c6Y ) + DIY =1(Y, A, \u02c6Y ))\n\n1\n2\n\n(12)\nOur regularization scheme is similar to Bechavod and Ligett [2], Kamishima et al. [24]; see Appendix\nB for results con\ufb01rming the ef\ufb01cacy of this scheme in binary classi\ufb01cation. We show experimentally\nthat the equivalence between learning to defer with an oracle-DM and rejection learning holds in the\nfairness case (see Appendix E).\n\n4 Related Work\n\nNotions of Fairness. A challenging aspect of machine learning approaches to fairness is formulating\nan operational de\ufb01nition. Several works have focused on the goal of treating similar people similarly\n(individual fairness) and the necessity of fair-awareness [13, 35].\nSome de\ufb01nitions of fairness center around statistical parity [23, 24], calibration [17, 30] or equalized\nodds [7, 20, 27, 34]. It has been shown that equalized odds and calibration cannot be simultaneously\n(non-trivially) satis\ufb01ed [7, 27]. Hardt et al. [20] present the related notion of \u201cequal opportunity\u201d.\nZafar et al. [34] and Bechavod and Ligett [2] develop and implement learning algorithms that integrate\nequalized odds into learning via regularization.\nIncorporating PASS. Several works have examined the PASS option (cf. rejection learning), begin-\nning with Chow [8, 9] who studies the tradeoff between error-rate and rejection rate. Cortes et al.\n[10] develop a framework for integrating PASS directly into learning. Attenberg et al. [1] discuss the\ndif\ufb01culty of a model learning what it doesn\u2019t know (particularly rare cases), and analyzes how human\nusers can audit such models. Wang et al. [33] propose a cascading model, which can be learned\nend-to-end. However, none of these works look at the fairness impact of this procedure. From the AI\nsafety literature, Had\ufb01eld-Menell et al. [18] give a reinforcement-learning algorithm for machines to\nwork with humans to achieve common goals. We also note that the phrase \u201cadaptive rejection\u201d exists\nindependently of this work, but with a different meaning [15].\nA few papers have addressed topics related to both above sections. Bower et al. [4] describe fair\nsequential decision making but do not have a PASS concept or provide a learning procedure. In Joseph\net al. [22], the authors show theoretical connections between KWIK-learning and a proposed method\nfor fair bandit learning. Grgi\u00b4c-Hlaca et al. [16] discuss fairness that can arise out of a mixture of\nclassi\ufb01ers. Varshney and Alemzadeh [32] propose \u201csafety reserves\u201d and \u201csafe fail\u201d options which\ncombine learning with rejection and fairness/safety, but do not analyze learning procedures or larger\ndecision-making frameworks. [31] argue that the choice to model \u201conly certain technical parts of the\nsystem\u201d is a \ufb02aw in many approaches to fair ML; our method is a step towards addressing what this\npaper calls \u201cthe framing trap\u201d.\n\n5 Experiments\n\nWe experiment with three scenarios, each of which represent an important class of real-world\ndecision-makers:\n\n1. High-accuracy DM, ignores fairness: This may occur if the extra information available to\n\nthe DM is important, yet withheld from the model for privacy or computational reasons.\n\n2. Highly-biased DM, strongly unfair: Particularly in high-stakes/sensitive scenarios, DMs\n\ncan exhibit many biases.\n\n6\n\n\f3. Inconsistent DM, ignores fairness (DM\u2019s accuracy varies by subgroup, with total accuracy\nlower than model): Human DMs can be less accurate, despite having extra information [12].\nWe add noise to the DM\u2019s output on some subgroups to simulate human inconsistency.\n\nDue to dif\ufb01culty obtaining and evaluating real-life decision-making data, we use \u201csemi-synthetic\ndata\u201d: real datasets, and simulated DM data by training a separate classi\ufb01er under slightly different\nconditions (see Experiment Details). In each scenario, the simulated DM receives access to extra\ninformation which the model does not see.\nDatasets and Experiment Details. We use two datasets: COMPAS [26], where we\npredict a defendant\u2019s recidivism without discriminating by race, and Heritage Health\n(https://www.kaggle.com/c/hhp), where we predict a patient\u2019s Charlson Index (a comorbidity indica-\ntor) without discriminating by age. We train all models and DMs with a fully-connected two-layer\nneural network 1. See Appendix C for details on datasets and experiments.\nWe found post-hoc and end-to-end models performed qualitatively similarly for high-accuracy DMs,\nso we show results from the simpler model (post-hoc) for those. However, the post-hoc model cannot\nadjust to the case of the inconsistent DM (scenario 3), since it does not take X as an input to \u03c0 (as\ndiscussed in Sec. 3.2), so we show results from the end-to-end model for the inconsistent DM.\nEach DM receives extra information in training. For COMPAS, this is the defendant\u2019s violent\nrecidivism; for Health, this is the patient\u2019s primary condition group. To simulate high-bias DMs (scen.\n2) we train a regularized model with \u03b1f air = \u22120.1 to encourage learning a \u201cDM\u201d with high disparate\nimpact. To create inconsistent DMs (scen. 3), we \ufb02ip a subset of the DM\u2019s predictions post-hoc with\n30% probability: on COMPAS, this subset is people below the mean age; on Health this is males.\nDisplaying Results. We show results across various hyperparameter settings (\u03b1f air, \u03b3def er/\u03b3reject),\nto illustrate accuracy and/or fairness tradeoffs. Each plotted line connects several points, which are\neach a median of 5 runs at one setting. In Fig. 6, we only show the Pareto front, i.e., points for which\nno other point had both better accuracy and fairness. All results are on held-out test sets.\n\n5.1 Learning to Defer to Three Types of DM\n\n(a) COMPAS, High-Accuracy DM (b) COMPAS, Highly-Biased DM (c) COMPAS, Inconsistent DM\nFigure 3: Comparing learning-to-defer, rejection learning and binary models. dataset only; Health\ndataset results in Appendix A. Each \ufb01gure is a different DM scenario. In Figs. 3a and 3b, X-axis is\nfairness (lower is better); in Fig. 3c, X-axis is deferral rate. Y-axis is accuracy for all \ufb01gures. Square\nis a baseline binary classi\ufb01er, trained only to optimize accuracy; dashed line is fair rejection model;\nsolid line is fair deferring model. Yellow circle is DM alone. In Fig. 3a, green dotted line is a binary\nmodel also optimizing fairness. Figs. 3a and 3b are hyperparameter sweep over \u03b3reject/def er/\u03b1f air;\nFig. 3c sweeps \u03b3reject/def er only, with \u03b1f air = 0 (for \u03b1f air \u2265 0, see Appendix G).\n\nHigh-Accuracy DM. In this experiment, we consider the scenario where a DM has higher accuracy\nthan the model we train, due to the DM having access to extra information/resources for security,\nef\ufb01ciency, or contextual reasons. However, the DM is not trained to be fair. In Fig. 3a, we show\nthat learning-to-defer achieves a better accuracy-fairness tradeoff than rejection learning. Hence,\nlearning-to-defer can be a valuable fairness tool for anyone who designs or oversees a many-part\nsystem - an adaptive \ufb01rst stage can improve the fairness of a more accurate DM. The fair rejection\nlearning model also outperforms binary baselines, by integrating the extra DM accuracy on some\n\n1Code available at https://github.com/dmadras/predict-responsibly.\n\n7\n\n0.000.050.100.150.20Disparate Impact0.660.680.700.720.740.760.78Accuracydefer-fairreject-fairbinary-fairbaseline-accDM0.000.050.100.150.200.250.300.350.40Disparate Impact0.660.680.700.720.740.760.78Accuracyreject-fairdefer-fairbaseline-accDM_biased0.20.00.20.40.60.81.01.2Deferral Rate0.670.680.690.700.710.720.730.74Accuracylearn-deferlearn-rejectDMbaseline-acc\fexamples. For further analysis, we break out the results in Figs. 3a by deferral rate, and \ufb01nd that most\nof the bene\ufb01t in this scenario is indeed coming from added fairness by the model (see Appendix H).\nHighly-Biased DM. In this scenario, we consider the case of a DM which is extremely biased (Fig.\n3b). We \ufb01nd that the advantage of a deferring model holds in this case, as it adapts to the DM\u2019s\nextreme bias. For further analysis, we examine the deferral rate of each model in this plot (see\nAppendix I). We \ufb01nd that the deferring model\u2019s adaptivity brings two advantages: it can adaptively\ndefer at different rates for the two sensitive groups to counteract the DM\u2019s bias; and it is able to\nmodulate the overall amount that it defers when the DM is biased.\nInconsistent DM. In this experiment, we consider a DM with access to extra information, but which\ndue to inconsistent accuracy across subgroups, has a lower overall accuracy than the model. In Fig.\n3c, we compare deferring and rejecting models, examining their classi\ufb01cation accuracy at different\ndeferral rates. We observe that for each deferral rate, the model that learned to defer achieves a higher\nclassi\ufb01cation accuracy. Furthermore, we \ufb01nd that the best learning-to-defer models outperform both\nthe DM and a baseline binary classi\ufb01er. Note that although the DM is less accurate than the model,\nthe most accurate result is not to replace the DM, but to use a DM-model mixture. Critically, only\nwhen the model is adaptive (i.e. learns to defer) is the potential of this mixture unlocked.\n\n(a) COMPAS, Reliable\n\n(b) COMPAS, Unreliable\n\n(c) Health, Reliable\n\n(d) Health, Unreliable\n\nFigure 4: Each point is some setting of \u03b3reject/def er. X-axis is total deferral rate, Y-axis is deferral\nrate on DM reliable/unreliable subgroup (COMPAS: Old/Young; Health: Female/Male). Gray line\n= 45\u25e6: above is more deferral; below is less. Solid line: learning to defer; dashed line: rejection\nlearning.\n\n(a) COMPAS dataset\n\n(b) Health dataset\n\nTo analyze further how the deferring model in\nFig. 3c achieves its accuracy, we examine two\nsubgroups from the data: where the DM is re-\nliable and unreliable (the unreliable subgroup\nis where post-hoc noise was added to the DM\u2019s\noutput; see Experiment Details). Fig. 4 plots\nthe deferral rate on these subgroups against the\noverall deferral rates. We \ufb01nd that the defer-\nring models deferred more on the DM\u2019s reliable\nsubgroup, and less on the unreliable subgroup,\nparticularly as compared to rejection models.\nThis shows the advantage of learning to defer;\nthe model was able to adapt to the strengths and\nweaknesses of the DM.\nWe also explore how learning-to-defer\u2019s errors\ndistribute across subgroups. We look at accuracy\non four subgroups, de\ufb01ned by the cross-product\nof the sensitive attribute and the attribute de\ufb01n-\ning the DM\u2019s unreliability, both binary. In Fig. 5,\nwe plot the minimum subgroup accuracy (MSA)\nand the overall accuracy. We \ufb01nd that the deferring models (which were higher accuracy in general),\ncontinue to achieve higher accuracy even when requiring that models attain a certain MSA. This\nmeans that the improvement we see in the deferring models are not coming at the expense of the least\naccurate subgroups. Instead, we \ufb01nd that the most accurate deferring models also have the highest\nMSA, rather than exhibiting a tradeoff. This is a compelling natural fairness property of learning to\ndefer which we leave to future work for further investigation.\n\nFigure 5: Each point is a single run in sweep over\n\u03b3reject/\u03b3def er. X-axis is the model\u2019s lowest accu-\nracy over 4 subgroups, de\ufb01ned by the cross prod-\nuct of binarized (sensitive attribute, unreliable at-\ntribute), which are (race, age) and (age, gender)\nfor COMPAS and Health respectively. Y-axis is\nmodel accuracy. Only best Y-value for each X-\nvalue shown. Solid line is learning to defer; dashed\nline is rejection learning.\n\n8\n\n0.00.20.40.60.81.0Deferral Rate0.00.20.40.60.81.0Deferral Rate - Subgroup (Reliable)learn-deferlearn-reject0.00.20.40.60.81.0Deferral Rate0.00.20.40.60.81.0Deferral Rate - Subgroup (Unreliable)learn-deferlearn-reject0.00.20.40.60.81.0Deferral Rate0.00.20.40.60.81.0Deferral Rate - Subgroup (Reliable)learn-deferlearn-reject0.00.20.40.60.81.0Deferral Rate0.00.20.40.60.81.0Deferral Rate - Subgroup (Unreliable)learn-deferlearn-reject0.600.620.640.660.68Minimum Subgroup Accuracy0.690.700.710.720.73Accuracylearn-deferlearn-reject0.620.640.660.680.700.72Minimum Subgroup Accuracy0.7850.7900.7950.8000.8050.8100.8150.820Accuracylearn-deferlearn-reject\f6 Conclusion\n\nIn this work, we de\ufb01ne a framework for multi-agent decision-making which describes many practical\nsystems. We propose a method, learning to defer (or adaptive rejection learning), which generalizes\nrejection learning under this framework. We give an algorithm for learning to defer in the context of\nlarger systems and explain how to do so fairly. Experimentally, we demonstrate that deferring models\ncan optimize the performance of decision-making pipelines as a whole, beyond the improvement\nprovided by rejection learning. This is a powerful, general framework, with rami\ufb01cations for many\ncomplex domains where automated models interact with other decision-making agents. Through\ndeferring, we show how models can learn to predict responsibly within their surrounding systems, an\nessential step towards fairer, more responsible machine learning.\n\nReferences\n[1] Josh Attenberg, Panagiotis G Ipeirotis, and Foster J Provost. Beat the machine: Challenging\n\nworkers to \ufb01nd the unknown unknowns. Human Computation, 11(11), 2011.\n\n[2] Yahav Bechavod and Katrina Ligett. Learning Fair Classi\ufb01ers: A Regularization-Inspired\nApproach. arXiv:1707.00044 [cs, stat], June 2017. URL http://arxiv.org/abs/1707.\n00044. arXiv: 1707.00044.\n\n[3] Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight un-\nIn Francis Bach and David Blei, editors, Proceedings of the\ncertainty in neural network.\n32nd International Conference on Machine Learning, volume 37 of Proceedings of Ma-\nchine Learning Research, pages 1613\u20131622, Lille, France, 07\u201309 Jul 2015. PMLR. URL\nhttp://proceedings.mlr.press/v37/blundell15.html.\n\n[4] Amanda Bower, Sarah N. Kitchen, Laura Niss, Martin J. Strauss, Alexander Vargas, and\nSuresh Venkatasubramanian. Fair Pipelines. arXiv:1707.00391 [cs, stat], July 2017. URL\nhttp://arxiv.org/abs/1707.00391. arXiv: 1707.00391.\n\n[5] Jenna Burrell. How the machine \u2018thinks\u2019: Understanding opacity in machine learning algorithms.\n\nBig Data & Society, 3(1):2053951715622512, 2016.\n\n[6] Lowell W Busenitz and Jay B Barney. Differences between entrepreneurs and managers in\nlarge organizations: Biases and heuristics in strategic decision-making. Journal of business\nventuring, 12(1):9\u201330, 1997.\n\n[7] Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism\n\nprediction instruments. Big data, 5(2):153\u2013163, 2017.\n\n[8] C. Chow. An optimum character recognition system using decision function. IEEE T. C., 1957.\n\n[9] C. Chow. On optimum recognition error and reject trade-off. IEEE T. C., 1970.\n\n[10] Corinna Cortes, Giulia DeSalvo, and Mehryar Mohri. Learning with rejection. In International\n\nConference on Algorithmic Learning Theory, pages 67\u201382. Springer, 2016.\n\n[11] Shai Danziger, Jonathan Levav, and Liora Avnaim-Pesso. Extraneous factors in judicial\n\ndecisions. Proceedings of the National Academy of Sciences, 108(17):6889\u20136892, 2011.\n\n[12] Robyn M Dawes, David Faust, and Paul E Meehl. Clinical versus actuarial judgment. Science,\n\n243(4899):1668\u20131674, 1989.\n\n[13] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness\nthrough awareness. In Proceedings of the 3rd innovations in theoretical computer science\nconference, pages 214\u2013226. ACM, 2012.\n\n[14] Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter, Helen M Blau, and\nSebastian Thrun. Dermatologist-level classi\ufb01cation of skin cancer with deep neural networks.\nNature, 542(7639):115\u2013118, 2017.\n\n9\n\n\f[15] Lydia Fischer and Thomas Villmann. A probabilistic classi\ufb01er model with adaptive rejec-\ntion option. Technical Report 1865-3960, January 2016. URL https://www.techfak.\nuni-bielefeld.de/~fschleif/mlr/mlr_01_2016.pdf.\n\n[16] Nina Grgi\u00b4c-Hlaca, Muhammad Bilal Zafar, Krishna P. Gummadi, and Adrian Weller. On\nFairness, Diversity and Randomness in Algorithmic Decision Making. arXiv:1706.10208 [cs,\nstat], June 2017. URL http://arxiv.org/abs/1706.10208. arXiv: 1706.10208.\n\n[17] Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On calibration of modern neural\nnetworks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International\nConference on Machine Learning, volume 70 of Proceedings of Machine Learning Research,\npages 1321\u20131330, International Convention Centre, Sydney, Australia, 06\u201311 Aug 2017. PMLR.\nURL http://proceedings.mlr.press/v70/guo17a.html.\n\n[18] Dylan Had\ufb01eld-Menell, Stuart J Russell, Pieter Abbeel, and Anca Dragan. Cooperative inverse\nreinforcement learning. In Advances in neural information processing systems, pages 3909\u2013\n3917, 2016.\n\n[19] Kelly Hannah-Moffat. Actuarial sentencing: An \u201cunsettled\u201d proposition. Justice Quarterly, 30\n\n(2):270\u2013296, 2013.\n\n[20] Moritz Hardt, Eric Price, Nati Srebro, et al. Equality of opportunity in supervised learning. In\n\nAdvances in neural information processing systems, pages 3315\u20133323, 2016.\n\n[21] Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. Adaptive mixtures\n\nof local experts. Neural computation, 3(1):79\u201387, 1991.\n\n[22] Matthew Joseph, Michael Kearns, Jamie H Morgenstern, and Aaron Roth. Fairness in learning:\nClassic and contextual bandits. In Advances in Neural Information Processing Systems, pages\n325\u2013333, 2016.\n\n[23] F. Kamiran and T. Calders. Classifying without discriminating. In 2nd International Conference\non Computer, Control and Communication, 2009. IC4 2009, pages 1\u20136, February 2009. doi:\n10.1109/IC4.2009.4909197.\n\n[24] Toshihiro Kamishima, Shotaro Akaho, Hideki Asoh, and Jun Sakuma. Fairness-aware classi\ufb01er\nwith prejudice remover regularizer. Machine Learning and Knowledge Discovery in Databases,\npages 35\u201350, 2012.\n\n[25] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint\n\narXiv:1412.6980, 2014.\n\n[26] Lauren Kirchner and Jeff Larson. How we analyzed the compas recidivism algorithm. In\nhttps://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm, 2016.\n\n[27] Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan.\n\nInherent Trade-Offs in the\nFair Determination of Risk Scores. arXiv:1609.05807 [cs, stat], September 2016. URL\nhttp://arxiv.org/abs/1609.05807. arXiv: 1609.05807.\n\n[28] Chris J Maddison, Andriy Mnih, and Yee Whye Teh. The concrete distribution: A continuous\n\nrelaxation of discrete random variables. arXiv preprint arXiv:1611.00712, 2016.\n\n[29] Aditya Krishna Menon and Robert C Williamson. The cost of fairness in binary classi\ufb01cation.\n\nIn Conference on Fairness, Accountability and Transparency, pages 107\u2013118, 2018.\n\n[30] Geoff Pleiss, Manish Raghavan, Felix Wu, Jon Kleinberg, and Kilian Q Weinberger. On fairness\nand calibration. In Advances in Neural Information Processing Systems, pages 5680\u20135689,\n2017.\n\n[31] Andrew D Selbst, Sorelle Friedler, Suresh Venkatasubramanian, Janet Vertesi, et al. Fairness\nand abstraction in sociotechnical systems. In ACM Conference on Fairness, Accountability, and\nTransparency (FAT*), 2018.\n\n[32] Kush R Varshney and Homa Alemzadeh. On the safety of machine learning: Cyber-physical\n\nsystems, decision sciences, and data products. Big data, 5(3):246\u2013255, 2017.\n\n10\n\n\f[33] Xin Wang, Yujia Luo, Daniel Crankshaw, Alexey Tumanov, and Joseph E. Gonzalez. IDK\nCascades: Fast Deep Learning by Learning not to Overthink. arXiv:1706.00885 [cs], June\n2017. URL http://arxiv.org/abs/1706.00885. arXiv: 1706.00885.\n\n[34] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi.\nFairness beyond disparate treatment & disparate impact: Learning classi\ufb01cation without dis-\nparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web,\npages 1171\u20131180. International World Wide Web Conferences Steering Committee, 2017.\n\n[35] Richard Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning Fair\nRepresentations. In PMLR, pages 325\u2013333, February 2013. URL http://proceedings.mlr.\npress/v28/zemel13.html.\n\n11\n\n\f", "award": [], "sourceid": 3023, "authors": [{"given_name": "David", "family_name": "Madras", "institution": "University of Toronto"}, {"given_name": "Toni", "family_name": "Pitassi", "institution": "University of Toronto"}, {"given_name": "Richard", "family_name": "Zemel", "institution": "Vector Institute/University of Toronto"}]}