{"title": "Rational inference of relative preferences", "book": "Advances in Neural Information Processing Systems", "page_first": 2303, "page_last": 2311, "abstract": "Statistical decision theory axiomatically assumes that the relative desirability of different options that humans perceive is well described by assigning them option-specific scalar utility functions. However, this assumption is refuted by observed human behavior, including studies wherein preferences have been shown to change systematically simply through variation in the set of choice options presented. In this paper, we show that interpreting desirability as a relative comparison between available options at any particular decision instance results in a rational theory of value-inference that explains heretofore intractable violations of rational choice behavior in human subjects. Complementarily, we also characterize the conditions under which a rational agent selecting optimal options indicated by dynamic value inference in our framework will behave identically to one whose preferences are encoded using a static ordinal utility function.", "full_text": "Rational inference of relative preferences\n\nNisheeth Srivastava\n\nDept of Computer Science\nUniversity of Minnesota\n\nPaul R Schrater\nDept of Psychology\n\nUniversity of Minnesota\n\nAbstract\n\nStatistical decision theory axiomatically assumes that the relative desirability of\ndifferent options that humans perceive is well described by assigning them option-\nspeci\ufb01c scalar utility functions. However, this assumption is refuted by ob-\nserved human behavior, including studies wherein preferences have been shown\nto change systematically simply through variation in the set of choice options\npresented. In this paper, we show that interpreting desirability as a relative com-\nparison between available options at any particular decision instance results in a\nrational theory of value-inference that explains heretofore intractable violations of\nrational choice behavior in human subjects. Complementarily, we also character-\nize the conditions under which a rational agent selecting optimal options indicated\nby dynamic value inference in our framework will behave identically to one whose\npreferences are encoded using a static ordinal utility function.\n\n1\n\nIntroduction\n\nNormative theories of human choice behavior have long been based on how economic theory has\npostulated they should be made. The standard version of the theory states that consumers seek to\nmaximize innate, stable preferences over the options they consume. Preferences are represented by\nnumerical encoding of value in terms of utilities, and subjects are presumed to select the option with\nthe maximum expected utility. The most dif\ufb01cult part of this theory is that preferences must exist\nbefore decisions can be made. The standard response, in both economics and decision theory, to the\nbasic question \u201cWhere do preferences come from?\u201d is \u201cWe\u2019ll leave that one to the philosophers,\nutilities are simply abstractions we assume for the work we do.\u201d, which, while true, is not an answer.\nWhile this question has been studied before in the form of learning utility values from behavior [5,\n14, 10], human preferences exhibit patterns of behavior that are impossible to reconcile with the idea\nthat stable numerical representations of value can be ascribed to each item they choose between.\nBehavioral experiments in the last half century have conclusively demonstrated (see [18] for a\ncomprehensive review) that human choice strongly violates the key axioms that the existence of\nstable utility values depends on. A particular subset of these violations, called context effects, wound\nthe utility maximization program the most deeply, since such violations cannot be explained away\nas systematic distortions of underlying utility and/or probability representations [22]. Consider for\ninstance, the \u201cfrog legs\u201d thought problem, pictured in Figure 1, introduced by Luce and Raiffa in\ntheir seminal work [15]. No possible algebraic reformulation of option-speci\ufb01c utility functions\ncan possibly explain preference reversals of the type exhibited in the frog legs example. Preference\nreversals elicited through choice set variation have been observed in multiple empirical studies,\nusing a variety of experimental tasks, and comprise one of the most powerful criticisms of the use\nof expected utility as a normative standard in various economic programs, e.g.\nin public goods\ntheory. However, for all its problems, the mathematical simplicity of the utility framework and lack\nof principled alternatives has allowed it to retain its central role in microeconomics [12], machine\nlearning [1], computational cognitive science [7] and neuroscience [11].\n\n1\n\n\f(a) When asked to select between just\nsalmon and steak, the diner picks salmon, in-\ndicating salmon (cid:31) steak by his choice\n\n(b) When presented with an additional third\nmenu item, the diner picks steak, indicating\nsteak (cid:31) salmon\n\nFigure 1: Illustration of Luce\u2019s \u2018frog legs\u2019 thought experiment. No possible absolute utility assigna-\ntion to individual items can account for the choice behavior exhibited by the diner in this experiment.\nThe frog legs example is illustrative of reversals in preference occuring solely through variation in\nthe set of options a subject has to choose from.\n\nOur contribution in this paper is the development of a rational model that infers preferences from\nlimited information about the relative value of options. We postulate that there is a value inference\nprocess that predicts the relative goodness of items in enabling the agent to achieve its homeostatic\nand other longer-range needs (e.g. survival and reproductive needs). While this process should be\nfully explicated, we simply don\u2019t know enough to make detailed mathematical models. However,\nwe show that we only have to postulate that feedback from decisions provides limited information\nabout the relative worth of options within the choice set for a decision to retrieve an inductive rep-\nresentation of value that is equivalent to traditional preference relations. Thus, instead of assuming\nutilities as being present in the environment, we learn an equivalent sense of option desirability from\ninformation in a limited format that depends on the set of options in the decision set. This induc-\ntive methodology naturally makes choice sets informative about the value of options, and hence\naffords simple explanations for context effects. We show how to formalize the idea of relative value\ninference, and that it provides a new rational foundation for understanding the origins of human\npreferences.\n\n2 Human Preferences via Value Inference\n\nWe begin by reviewing and formalizing probabilistic decision-making under uncertainty. An agent\nselects between possibilities x in the world represented by the set X . The decision-making problem\ncan be formulated as one wherein the agent forms a belief b(x), x \u2208 X about the relative desirability\nof different possibilities in X and uses this belief to choose an element or subset X \u2217 \u2282 X . When\nthese beliefs satisfy the axioms of utility, the belief function simply the expected utility associated\nwith individual possibilities u(x), u : X \u2192 R.\nWe assume these desirabilities must be learned from experience, suggesting a reinforcement learn-\ning approach. The agent\u2019s belief about the relative desirability of the world is constantly updated by\ninformation that it receives about the desirability of options in terms of value signals r(x). Belief\nupdating produces transition dynamics on bt(x). Given a sequence of choices, the normative expec-\ntation is for agents to select possibilities in a way that maximizes their in\ufb01nite-horizon cumulative\ndiscounted desirability,\n\n\u221e(cid:88)\n\nt\n\narg max\nx(t)\n\n\u03b3tbt(x).\n\n(1)\n\nThe sequence of choices selected describes the agent\u2019s expected desirability maximizing behavior\nin a belief MDP-world.\n\n2\n\n\fFrom a Bayesian standpoint, it is critical to describe the belief updating about the desirability of dif-\nferent states. Let p(x|r(1:t)) represent the belief a value x is the best option given a sequence of value\nsignals. Since the agent learns this distribution from observing r(x) signals from the environment,\nan update of the form,\n\np(x|r(t)) = p(r(t)|x) \u00d7 p(x|{r(1), r(2) \u00b7\u00b7\u00b7 , r(t\u22121)}),\n\n(2)\nre\ufb02ects the basic process of belief formation via value signals. When value signals are available for\nevery option, independent of other options, the likelihood term p(r|x) in Equation (2) is a probabilis-\ntic representation of observed utility, which remains unaffected in the update by the agent\u2019s history\nof sampling past possibilities and hence is invariant to transition probabilities. Such separation be-\ntween utilities and probabilities in statistical decision theory is called probabilistic sophistication,\nan axiom that underlies almost all existing computational decision theory models [11].\nThe crux of our new approach is that we assume that value signals p(r|x) are not available for\nevery option . Instead, we assume we get partial information about the value of one or more options\nwithin the set of options c available in the decision instance t. In this case value signals are hidden\nfor most options x. However, the set of options c \u2208 C \u2286 P(X )1 observed can now potentially be\nused as auxiliary information to impute values for options whose value has not been observed. In\nsuch a scenario, the agent requires a more sophisticated inference process,\n\np(x|r(1:t)) =\n\n=\n\n1\n\np(r(1:t))\n\n1\n\np(r(1:t))\n\nc\n\nc\n\np(x, c, r(1:t)),\n\n[p(rt|x, c)p(c|x)] \u00d7 p(x|{r(1), r(2) \u00b7\u00b7\u00b7 , r(t\u22121)}).\n\n(cid:90)\n(cid:90)\n\nImportantly, we concentrate on understanding the meaning of utility in this framework. As in the\ncase of value observability for all options, a probabilistic representation of utility under indirect\nobservability must be equivalent to,\n\n(cid:82)\n(cid:82)\nc p(r, x, c)\nc p(x|c)p(c)\n\n(cid:82)\n\n=\n\n(cid:82)\nc p(r|x, c)p(x|c)p(c)\nc p(x|c)p(c)\n\n.\n\n(3)\n\np(r|x) =\n\np(r, x)\np(x)\n\n=\n\nThe resulting prediction of value of an option couples value signals received across decision in-\nstances with different option sets, or contexts. The intuition behind this approach is contained in the\nfrog leg\u2019s example - the set of options become informative about the hidden state of the world, like\nwhether the restaurant has a good chef.\nNaively, one could assume that altering existing theory to include this additional source of informa-\ntion would be an incremental exercise. However, a formidable epistemological dif\ufb01culty arises as\nsoon as we attempt to incorporate context into utility-based accounts of decision-making. To see\nthis, let us assume that we have de\ufb01ned a measure of utility u(x, c) that is sensitive to the context c\nof observing possibility x. Now, for such a utility measure, if it is true that for any two possibilities\n{xi, xj} and any two contexts {ck, cl},\n\nu(xi, ck) > u(xj, ck) \u21d2 u(xi, cl) > u(xj, cl),\n\nthen the choice behavior of an agent maximizing u(x, c) would be equivalent to one maximiz-\ning u(x). Thus, for the inclusion of context to have any effect, there must exist at least some\n{xi, xj, ck, cl} for which the propositions u(xi, ck) > u(xj, ck) and u(xi, cl) < u(xj, cl) can hold\nsimultaneously.\nNote however, that the context in this operationalization is simply a collection of other possibilities,\ni.e. c \u2286 X which ultimately implies u(x, c) = u(X \u2217) = u(X ),X \u2217 = {x, c} \u2286 X . Such a\nmeasure could assign absolute numbers to each of the possibilities, but any such static assignment\nwould make it impossible for the propositions u(x1,X ) > u(x2,X ) and u(x1,X ) < u(x2,X )\nto hold simultaneously, as is desired of a context-sensitive utility measure. Thus, we see that it is\nimpossible to design a utility function u such that u : X \u00d7 C \u2192 R. If we wish to incorporate the\neffects of context variation on the desirability of a particular world possibility, we must abandon a\nfoundational premise of existing statistical decision theory - the representational validity of absolute\nutility.\n\n1P(\u00b7) references the power set operation throughout this paper.\n\n3\n\n\f3 Rational decisions without utilities\n\nIn place of the traditional utility framework, we de\ufb01ne an alternative conceptual partitioning of the\nworld X as a discrete choice problem. In this new formulation, at any decision instant t, agents\nobserve the feasibility of a subset o(t) \u2286 X of all the possibilities in the world. In the following\nexposition, we use yt to denote an indicator function on X encoding the possibilities observed as\no(t),\n\nyt(x) =\n\n\u03b4(x \u2212 i),\n\n(cid:88)\n\ni\u2208o(t)\n\n(cid:88)\n\ni\u2208c(t)\n\n. An intelligent agent will encode its understanding of partial observability as a belief over which\npossibilities of the world likely co-occur. We call an agent\u2019s belief about the co-occurrence of\npossibilities in the world its understanding of the context of its observation. We instantiate contexts\nc as subsets of X that the agent believes will co-occur based on its history of partial observations of\nthe world and index them with an indicator function z on X , so that for context c(t),\n\nzt(x) =\n\n\u03b4(x \u2212 i).\n\nInstead of computing absolute utilities on all x \u2208 X , a context-aware agent evaluates the comparable\ndesirability of only those possibilities considered feasible in a particular context c. Hence, instead\nof using scalar values to indicate which possibility is more preferable, we introduce preference\ninformation into our system via a desirability function d that simply \u2018points\u2019 to the best option in a\ngiven context, i.e. d(c) = B, where B is a binary relation (c, c, m) and mi = 1 iff ci (cid:31) ci(cid:48)\u2200ci(cid:48) \u2208\nc \\ {ci} and zero otherwise. The desirability indicated by d(c) can be remapped on to the larger\nset of options by de\ufb01ning a relative desirability across all possibilities r(x) = m, x \u2208 c and zero\notherwise.\nRecall now that we have already de\ufb01ned what we mean by utility in our system in Equation 3.\nInstantiated in the discrete choice setting, this can be restated as a probabilistic de\ufb01nition of relative\ndesirability at decision instant t as,\n\nR(t)(x) = p(r(t)|x) =\n\n(4)\nwhere it is understood that p(c) = p(c|{o1, o2,\u00b7\u00b7\u00b7 , ot\u22121}) is a distribution on the set of all possible\ncontexts inferred from the agent\u2019s observation history. From the de\ufb01nition of desirability, we can\nalso obtain a simple de\ufb01nition of p(r|x, c) as p(ri|xi, c) = 1 iff rixi = 1 and zero otherwise.\nTo instantiate Eqn (4) concretely, it is \ufb01nally necessary to de\ufb01ne a speci\ufb01c form for the likelihood\nterm p(x|c). While multiple mathematical forms can be proposed for this expression, depending on\nquantitative assumptions about the amount of uncertainty intrinsic to the observation, the underlying\nintuition must remain one that obtains the highest possible value for c = o and penalizes mismatches\nin set membership. Such de\ufb01nitions can be introduced in the mathematical de\ufb01nition of the element-\nwise mismatch probability, p(\u00acyt\ni ), we can use these element-\nwise probabilities to compute the likelihood of any particular observation o(t) as,\n\ni ). Since p(xi|c(t)) = 1 \u2212 p(\u00acyt\n\ni|zt\n\ni|zt\n\n(cid:80)C\n(cid:80)C\nc p(r(t)|x, c)p(x|c)p(c)\nc p(x|c)p(c)\n\n,\n\n\uf8eb\uf8ed|o(t)|(cid:91)\n\n|c(t)|(cid:91)\n\n\uf8f6\uf8f8 , = 1 \u2212 \u03b2\n\n|X|(cid:88)\n\nP (o(t)|c(t)) = 1 \u2212 p\n\n{\u00acyt\ni}|\n\n{zt\ni}\n\np(\u00acyt\n\ni|zt\ni ),\n\ni\n\ni\n\ni\n\nwhere \u03b2 is a parameter controlling the magnitude of the penalty imposed for each mismatch ob-\nserved.\nThis likelihood function can then be used to update the agent\u2019s posterior belief about the contexts it\nconsiders viable at decision instance t, given its observation history as,\n\np(c(t)|{o(1), o(2),\u00b7\u00b7\u00b7 , o(t)}) =\n\n(cid:80)C\np(o(t)|c)p(c|{o(1), o(2),\u00b7\u00b7\u00b7 , o(t\u22121)})\nc p(o(t)|c)p(c|{o(1), o(2),\u00b7\u00b7\u00b7 , o(t\u22121)})\n\n,\n\n(5)\n\nTo outline a decision theory within this framework, observe that, at decision instant t, a Bayesian\nagent could represent its prior preference for different world possibilities in the form of a probability\n\n4\n\n\fdistribution over the possible outcomes in X , conditioned on desirability information obtained in\nearlier decisions, p(x|c(t),{r(1), r(2),\u00b7\u00b7\u00b7 r(t\u22121)}). New evidence for the desirability of outcomes\nobserved in context c(t) is incorporated using p(r(t)|x, c(t)), a distribution encoding the relative\ndesirability information obtained from the environment at the current time step, conditioned on the\ncontext in which the information is obtained. This formulation immediately yields the belief update,\n\np(x|c(t), r(t)) \u221d p(r(t)|c(t), x) \u00d7 p(x|c(t),{r(1), r(2),\u00b7\u00b7\u00b7 r(t\u22121)}),\n\n(6)\nto obtain a posterior probability encoding the desirability of different possibilities x, while also\naccounting tractably for the context in which desirability information is obtained at every decision\ninstance. De\ufb01ning a choice function to select the mode of the posterior belief completes a rational\ncontext-sensitive decision theory.\n\n4 Demonstrations\n\nTo demonstrate the value of the relative desirability-based encoding of preferences, in Section 4.1,\nwe describe situations in which the in\ufb02uence of context shifting signi\ufb01cantly affects human pref-\nerence behavior in ways that utility-based decision theories have historically been hard-pressed to\nexplain. Complementarily, in Section 4.2 we characterize conditions under which the relative desir-\nability framework yields predictions of choice behavior equivalent to that predicted by ordinal utility\ntheories, and hence, is an equivalent representation for encoding preferences.\n\n4.1 Where context matters ...\n\nIn this section, we show how our inductive theory of context-sensitive value inference leads, not\nsurprisingly, to a simple explanation for the major varieties of context effects seen in behavioral\nexperiments. These are generally enumerated as attraction, similarity, comparison and reference\npoint effects [2]. Interestingly, we \ufb01nd that each of these effects can be described as a special case\nof the frog legs example, with the specialization arising out of additional assumptions made about\nthe relationship of the new option added to the choice set. Table 1, with some abuse of notation,\ndescribes this relationship between the effects in set-theoretic terms. Space constraints necessitate\n\nEffect name\n\nFrog legs\nSimilarity\nAttraction\nCompromise\nReference point\n\nDescription\n\nc1 \u2190 {X, Y } \u21d2 X (cid:31) Y , c2 \u2190 {X, Y, Z} \u21d2 Y (cid:31) X\nc1 \u2190 {X, Y } \u21d2 X (cid:31) Y , c2 \u2190 {X, Y, Z} \u21d2 Y (cid:31) X\nc1 \u2190 {X, Y } \u21d2 X \u223c Y , c2 \u2190 {X, Y, Z} \u21d2 X (cid:31) Y\nc1 \u2190 {X, Y } \u21d2 X (cid:31) Y , c2 \u2190 {X, Y, Z} \u21d2 Y (cid:31) X\nc1 \u2190 {X, Y } \u21d2 X \u223c Y , c2 \u2190 {X, Y, Z} \u21d2 X (cid:31)(\u2212) Y\n\nAssumptions\n\n-\n\nZ \u2248 X\nX (cid:31) Z\nZ (cid:31) X\n\nY (cid:31)(c) X, Z\n\nTable 1: A uni\ufb01ed description of context effects. (cid:31) indicates stochastic preference for one item\nover another. (cid:31)(c) indicates that the preference in question holds only in some observation contexts.\n(cid:31)(\u2212) indicates that the preference in question is stochastically weaker than before.\n\nan abbreviate description of our results. Detailed descriptions of these effects, supplemented with\nan explanation of how they may be elicited in our framework, is provided in SI. We use available\nspace to completely describe how the most general version of preference reversal, as seen in the frog\nlegs example, emerges from our framework and provide a brief overview of the other results. To\ninstantiate our likelihood de\ufb01nition in (5), we de\ufb01ne a speci\ufb01c mismatch probability,\n\n(cid:0)(1 \u2212 zt\n\n(cid:1) ,\n\np(\u00acyt\n\ni|zt\n\ni ) =\n\n1\n|X|\n\ni + (1 \u2212 yt\n\ni )zt\ni\n\ni )yt\n\n(7)\n\nwith \u03b2 = 1 for all our demonstrations.\nIn the frog legs example, the reversal in preferences is anecdotally explained by the diner originally\nforming a low opinion of the restaurant\u2019s chef, given the paucity of choices on the menu, deciding to\npick the safe salmon over a possibly a burnt steak. However, the waiter\u2019s presenting frog legs as the\ndaily special suddenly raises the diner\u2019s opinion of the chef\u2019s abilities, causing him to favor steak.\nThis intuition maps very easily into our framework of choice selection, wherein the diner\u2019s partial\n\n5\n\n\fmenu observations o1 = {steak, salmon} and o2 = {steak, salmon, frog legs} are associated with two\nseparate contexts c1 and c2 of observing the menu X . Bad experiences related to ordering steak in\nmenus typically observed under context c1 (interpretable as \u2018cheap restaurants\u2019) may be encoded by\nde\ufb01ning the vector m = {1, 0, 0, 0} for c1 and good experiences ordering steak off menues observed\nin context c2 (interpretable as \u2018upscale restaurants\u2019) as m = {0, 1, 0, 0} for c2. Then, by de\ufb01nition,\np(r|salmon, c1) > p(r|steak, c1), while p(r|salmon, c2) < p(r|steak, c2). For the purposes of this\ndemonstration, let us assume these probability pairs, obtained through the diner\u2019s past experiences in\nrestaurants to be {0.7, 0.3} and {0.3, 0.7} respectively. Now, when the waiter \ufb01rst offers the diner a\nchoice between steak or salmon, the diner computes relative desirabilities using (4), where the only\ncontext for the observation is {salmon, steak}. Hence, the relative desirabilities of steak and salmon\nare computed over a single context, and are simply R(salmon) = 0.7, R(steak) = 0.3. When the\ndiner is next presented with the possibility of ordering frog legs, he now has two possible contexts to\nevaluate the desirability of his menu options: {salmon, steak} and {salmon, steak, frog legs}. Based\non the sequence of his history of experience with both contexts, the diner will have some posterior\nbelief p(c) = {p, 1 \u2212 p} on the two contexts. Then, the relative desirability of salmon, after having\nobserved frog legs on the menu can be calculated using (4) as,\n\nR(salmon) =\n\n=\n\np(r|salmon, c1)p(salmon|c1)p(c1) + p(r|salmon, c2)p(salmon|c2)p(c2)\n\n,\n\n0.7 \u00d7 1 \u00d7 p + 0.3 \u00d7 1 \u00d7 (1 \u2212 p)\n\np(salmon|c1)p(c1) + p(salmon|c2)p(c2)\n= 0.7p + 0.3(1 \u2212 p).\n\n1 \u00d7 p + 1 \u00d7 (1 \u2212 p)\n\nSimilarly, we obtain R(steak) = 0.3p + 0.7(1 \u2212 p). Clearly, for 1 \u2212 p > p, R(steak) > R(salmon),\nand the diner would be rational in switching his preference. Thus, through our inferential machinery,\nwe retrieve the anecdotal explanation for the diner\u2019s behavior: if he believes that he is more likely\nto be in a good restaurant (with probability (1 \u2212 p)) than not, he will prefer steak.\nAlong identical lines, making reasonable assumptions about the contexts of past observations, our\ndecision framework accomodates parsimonious explanations for each of the other effects detailed in\nTable 1. Attraction effects are traditionally studied in market research settings where a consumer is\nunsure about which of two items to prefer. The introduction of a third item that is clearly inferior to\none of the two earlier options leads the consumer towards preferring that particular earlier option.\nOur framework elicits this behavior through the introduction of additional evidence of the desir-\nability of one of the options from a new context, causing the relative desirability of this particular\noption to rise. Similarity effects arise when, given that a consumer prefers one item to another, giv-\ning him further options that resemble his preferred item causes him to subsequently prefer the item\nhe earlier considered inferior. This effect is elicited simply as a property of division of probability\namong multiple similar options, resulting in reduced desirabiliy of the previously superior option.\nCompromise effects arise when the introduction of a third option to a choice set where the consumer\nalready prefers one item to another causes the consumer to consider the previously inferior option\nas a compromise between the formerly superior option and the new option, and hence prefer it. We\n\ufb01nd that the compromise effect arises through a combination of reduction in the desirability of the\nsuperior option through negative comparions with the new item and increase in the desirability of\nthe formerly inferior item through positive comparisons with the new item, and that this inference\noccurs automatically in our framework assuming equal history of comparisons between the exist-\ning choice set items and the new item. Reference point effects have typically not been associated\nwith explicit studies of context variation, and may in fact be used to reference a number of behavior\npatterns that do not satisfy the de\ufb01nition we provide in Table 1. Our de\ufb01nition of the reference\npoint effect is particularized to explain data on pain perception collected by [23], demonstrating\nrelativity in evaluation of objectively identical pain conditions depending on the magnitude of al-\nternatively experienced pain conditions. In concord with empirical observation, we show that the\nrelative (un)desirability of an intermediate pain option reduces upon the experience of greater pain,\na simple demonstration of prospect relativity that utility-based accounts of value cannot match.\nCompeting hypotheses that seek to explain these behaviors are either normative and static, (e.g. ex-\ntended discrete choice models ( [13] provides a recent review), componential context theory [21],\nquantum cognition [8]) or descriptive and dynamic, (speci\ufb01cally, decision \ufb01eld theory [3]). In con-\ntrast, our approach not only takes a dynamic inductive view of value elicitation, it retains a norma-\ntivity criterion (Bayes rationality) for falsifying observed predictions, a standard that is expected of\nany rational model of decision-making [6].\n\n6\n\n\f4.2\n\n... and where it doesn\u2019t\n\nIt could be conjectured that the relative desirability indicator d will be an inadequate representation\nof preference information compared with scalar utility signals assigned to each world possibility,\nwhich would leave open the possibility that we may have retrieved a context-sensitive decision\ntheory at the expense of theoretical assurance of rational choice selection, as has been the case in\nmany previous attempts cited above. Were this conjecture to be true, it would severely limit the\nscope and applicability of our proposal. To anticipate this objection, we theoretically prove that\nour framework reduces to the standard utility-based representation of preferences under equivalent\nepistemic conditions, showing that our theory retains equivalent rational representational ability\nas utility theory in simple, and simply extends this representational ability to explain preference\nbehaviors that utility theory can\u2019t.\nWhat does it mean for a measure to represent preference information? To show that a utility function\nu completely represents a preference relation on X it is suf\ufb01cient [12] to show that, \u2200x1, x2 \u2208\nX , x1 (cid:31) x2 \u21d4 u(x1) > u(x2). Hence, equivalently, to show that our measure of relative desirability\nR also completely represents preference information, it should be suf\ufb01cient to show that, for any two\npossibilities xi, xj \u2208 X , and for any observation context c\n\nxi (cid:31) xj \u21d4 R(xi) > R(xj).\n\n(8)\n\nIn SI, we prove that (8) holds at decision instant t under three conditions,\n\nj\\i|.2.\n\ni\\j| = |C(t)\n\n(I) Context consistency: \u2203c \u2208 C, s.t. xi (cid:31) xj \u21d2 xi (cid:31) xj\u2200c \u2208 Cij,{xi, xj} \u2208 Cij \u2286 C.\n(II) Transitivity between contexts: if xi (cid:31) xj in c1 and xj (cid:31) xk in c2,\u2200c \u2208 C, xi (cid:31) xk.\n(III) Symmetry in context observability: \u2200xi, xj \u2208 X , limt\u2192\u221e |C(t)\nOf the three assumptions we need to prove this equivalence result, (I) and (II) simply de\ufb01ne a sta-\nble preference relation across observation contexts and \ufb01nd exact counterparts in the completeness\nand transitivity assumptions necessary for representing preferences using ordinal utility functions.\n(III), the only additional assumption we require, ensures that the agent\u2019s history of partial obser-\nvations of the environment does not contain any useful information. The restriction of in\ufb01nite data\nobservability, while stringent and putatively implausible, actually uncovers an underlying epistemo-\nlogical assumption of utility theory, viz.\nthat utility/desirability values can somehow be obtained\ndirectly from the environment. Any inference based preference elicitation procedure will therefore\nnecessarily need in\ufb01nite data to attain formal equivalence with the utility representation. Finally,\nwe point out that our equivalence result does not require us to assume continuity or the equiva-\nlent Archimedean property to encode preferences, as required in ordinal utility de\ufb01nitions. This is\nbecause the continuity assumption is required as a technical condition in mapping a discrete math-\nematical object (a preference relation) to a continuous utility function. Since relative desirability is\nde\ufb01ned constructively on Q \u2286 Q,|Q| < \u221e, a continuity assumption is not needed.\n\n5 Discussion\n\nThroughout this exegesis, we have encountered three different representations of choice preferences:\nrelative (ordinal) utilities, absolute (cardinal) utilities and our own proposal, viz. relative desirability.\nEach representation leads to a slightly different de\ufb01nition of rationality, so that, assuming a rational\nset selection function \u03c3 in each case we have,\n\npreference modeling in neoclassical economics [12]], e.g. discrete choice modeling [9].\n\n\u2022 Economic rationality: x \u2208 \u03c3(X ) \u21d2 (cid:64)y \u2208 X , s.t. y (cid:31) x, predominantly used in human\n\u2022 VNM-rationality: x \u2208 \u03c3(X ) \u21d2 (cid:64)y \u2208 X , s.t. u(y) > u(x), predominantly used in\nstudying decision-making under risk [19], e.g. reinforcement learning [1].\n\u2022 Bayes rationality: x \u2208 \u03c3(X ) \u21d2 (cid:64)y \u2208 X , s.t. R(y,{H}) > R(x,{H}), which we have\nproposed. The term {H} here is shorthand for {o1, o2,\u00b7\u00b7\u00b7 , ot\u22121},{r1, r2,\u00b7\u00b7\u00b7 rt\u22121}, the\nentire history of choice set and relative desirability observations made by an agent leading\nup to the current decision instance.\n\n2The notation Ci\\j references the subset of all observed contexts that contain xi but not xj.\n\n7\n\n\fBayes rationality simply claims that value inference with the same history of partial observations\nwill lead to a consistent preference for a particular option in discrete choice settings. In Section\n4.2, we have shown conditions on choice set observations under which Bayes-rationality will be\nequivalent to economic rationality. VNM-rationality is a further specialization of economic ratio-\nnality, valid for preference relations that, in addition to being complete, transitive and continuous (as\nrequired for economic preferences representable via ordinal utilities) also satisfy an independence\nof irrelevant attributes (IIA) assumption [16]. Bayes-rationality specializes to economic rationality\nonce we instantiate the underlying intuitions behing the completeness and transitivity assumptions\nin a context-sensitive preference inference theory. Therefore, rational value inference in the form\nwe propose can formally replace static assumptions about preference orderings in microeconomic\nmodels that currently exclusively use ordinal utilities [12]. As such, context-sensitive preference\nelicitation is immediately useful for the nascent agent-based economic modeling paradigm as well\nas in dynamic stochastic general equilibrium models of economic behavior. Further work is nec-\nessary to develop a context-sensitive equivalent of the IIA assumption, which is necessary for our\nsystem to be directly useful in modeling decision-making behaviors under uncertainty. However,\neven in its current form, our inference model can be used in conjunction with existing \u2018inverse plan-\nning\u2019 models of utility elicitation from choice data [17] that infer absolute utilities from choice data\nusing extraneous constraints on the form of the utility function from the environment. In such a\nsynthesis, our model could generate a preference relation sensitive to action set observability, which\ninverse planning models could use along with additional information from the environment to gen-\nerate absolute utilities that account for observational biases in the agent\u2019s history.\nA philosophically astute reader will point out a subtle \ufb02aw in our inferential de\ufb01nition of rationality.\nNamely, while we assume an intuitive notion of partial observability of the world, in practice, our\nagents compile desirability statistics on the set of all possibilities, irrespective of whether they have\never been observed, a problem that is rooted in an inherent limitation of Bayesian epistemology\nof being restricted to computing probabilities over a \ufb01xed set of hypotheses. How can a desirabil-\nity representation that assumes that observers maintain probabilistic preferences over all possible\nstates of the world be more epistemologically realistic than one that assumes that observers main-\ntain scalar utility values over the same state space3? As a partial response to this criticism, we point\nout that we do not require an ontic commitment to the computation of joint probability distributions\non all x \u2208 X . In practice, it is likely that Bayesian computations are implemented in the brain via\nsampling schemes that, in hierarchical formulations, allow approximating information of the joint\ndistribution as a set of the most likely marginals (in our case, relative desirability in typical observa-\ntion contexts). Neural implementations of such sampling schemes have been proposed in the recent\ncognitive science literature [20]. Devising a sampling scheme that matches the intuition of context\nretrieval from memory to supplement our value-inference scheme presents a promising direction for\nfuture research.\nAnother straightforward extension of our framework would imbue observable world possibilities\nwith attributes, resulting in the possibility of deriving a more general de\ufb01nition of contexts as clus-\nters in the space of attributes. Such an extension would result in the possibility of transferring pref-\nerences to entirely new possibilities, allowing the set X to be modi\ufb01ed dynamically, which would\nfurther address the epistemological criticism above. Even further, such an extension maps directly\nto the intuition of value inference resulting from organisms\u2019 monitoring of internal need states, here\nmodeled as attributes. Canini\u2019s recent modeling of transfer learning using hierarchical Dirichlet\nprocesses [4] provides most of the mathematical apparatus required to perform such an extension,\nmaking this a promising direction for future work in our project.\nIn conclusion, it has long been recognized that state-speci\ufb01c utility representations of the desirability\nof options are insuf\ufb01cient to capture the rich variety of systematic behavior patterns that humans ex-\nhibit. In this paper, we show that reformulating the atomic unit of desirability as a context-sensitive\n\u2018pointer\u2019 to the best option in the observed set recovers a rational way of representing desirability in\na manner suf\ufb01ciently powerful to describe a broad range of context effects in decisions. Since it is\nlikely that preferences for options do not exist a priori and are induced via experience, our present\nproposal is expected to approximate the true mechanisms for the emergence of context-sensitive\npreference variation better than alternative static theories, while retaining normativity criteria miss-\ning in alternative dynamic accounts.\n\n3One could argue that we are essentially observing the state space (to be able to index using its membership),\n\nbut pretending to not observe it.\n\n8\n\n\fReferences\n[1] A.G. Barto and R.S. Sutton. Reinforcement Learning: an introduction. Univesity of Cambridge\n\nPress, 1998.\n\n[2] J. R. Busemeyer, R. Barkan, S. Mehta, and A. Chaturvedi. Context effects and models of\npreferential choice: implications for consumer behavior. Marketing Theory, 7(1):39\u201358, 2007.\n[3] J.R. Busemeyer and J.T. Townsend. Decision \ufb01eld theory: A dynamic cognition approach to\n\ndecision making. Psychological Review, 100:432\u2013459, 1993.\n\n[4] K. Canini, M. Shashkov, and T. Grif\ufb01ths. Modeling transfer learning in human categorization\n\nwith the hierarchical dirichlet process. In ICML, pages 151\u2013158, 2010.\n\n[5] U. Chajewska, D. Koller, and D. Ormoneit. Learning an agent\u2019s utility function by observing\n\nbehavior. In ICML, pages 35\u201342, 2001.\n\n[6] N. Chater. Rational and mechanistic perspectives on reinforcement learning. Cognition,\n\n113(3):350 \u2013 364, 2009. Reinforcement learning and higher cognition.\n\n[7] N. Daw and M. Frank. Reinforcement learning and higher level cognition: Introduction to spe-\ncial issue. Cognition, 113(3):259 \u2013 261, 2009. Reinforcement learning and higher cognition.\n[8] L. Gabora and D. Aerts. Contextualizing concepts using a mathematical generalization of the\nquantum formalism. Joural of Experimental and Theoretical Arti\ufb01cial Intelligence, 14(4):327\u2013\n358, 2002.\n\n[9] D. Hensher, J. Rose, and W. Greene. Applied Choice Analysis: A Primer. Cambridge Univer-\n\nsity Press, 2005.\n\n[10] A. Jern, C. Lucas, and C. Kemp. Evaluating the inverse decision-making approach to prefer-\n\nence learning. In NIPS, pages 2276\u20132284, 2011.\n\n[11] D. Kahneman. Perception, action and utility: the tangled skein. In M. Rabinovich, K. Friston,\nand P. Varona, editors, Principles of Brain Dynamics: Global State Interactions. MIT Pres,\n2012.\n\n[12] D. Kreps. A Course in Microeconomic Theory, pages 17\u201369. Princeton University Press, 1990.\n[13] W. Leong and D. Hensher. Embedding decision heuristics in discrete choice models: A review.\n\nTransport Reviews, 32(3):313\u2013331, 2012.\n\n[14] C.G. Lucas, T. Grif\ufb01ths, F. Xu, and C. Fawcett. A rational model of preference learning and\n\nchoice prediction by children. In NIPS, pages 985\u2013992, 2008.\n\n[15] R. D. Luce and H. Raiffa. Games and Decisions: Introduction and Critical Survey. Wiley,\n\nNew York, 1957.\n\n[16] J.v. Neumann and O. Morgenstern. Theory of Games and Economic Behavior. Princeton\n\nUniversity Press, 1953.\n\n[17] A. Y. Ng and S. J. Russell. Algorithms for inverse reinforcement learning. In Proceedings of\nthe Seventeenth International Conference on Machine Learning, ICML \u201900, pages 663\u2013670,\n2000.\n\n[18] M. Rabin. Psychology and economics. Journal of Economic Literature, 36(1):pp. 11\u201346, 1998.\n[19] S.J. Russell and P. Norvig. Arti\ufb01cial Intelligence: A Modern Approach. MIT Press, 1998.\n[20] L. Shi and T. Grif\ufb01ths. Neural Implementation of Hierarchical Bayesian Inference by Impor-\ntance Sampling. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta,\neditors, Advances in Neural Information Processing Systems 22, pages 1669\u20131677. 2009.\n\n[21] A. Tversky and I. Simonson. Context-dependent preferences. Management Science, 39(10):pp.\n\n1179\u20131189, 1993.\n\n[22] I. Vlaev, N. Chater, N. Stewart, and G. Brown. Does the brain calculate value? Trends in\n\nCognitive Sciences, 15(11):546 \u2013 554, 2011.\n\n[23] I. Vlaev, B. Seymour, R.J. Dolan, and N. Chater. The price of pain and the value of suffering.\n\nPsychological Science, 20(3):309\u2013317, 2009.\n\n9\n\n\f", "award": [], "sourceid": 1123, "authors": [{"given_name": "Nisheeth", "family_name": "Srivastava", "institution": null}, {"given_name": "Paul", "family_name": "Schrater", "institution": null}]}