{"title": "Magnitude-sensitive preference formation`", "book": "Advances in Neural Information Processing Systems", "page_first": 1080, "page_last": 1088, "abstract": "Our understanding of the neural computations that underlie the ability of animals to choose among options has advanced through a synthesis of computational modeling, brain imaging and behavioral choice experiments. Yet, there remains a gulf between theories of preference learning and accounts of the real, economic choices that humans face in daily life, choices that are usually between some amount of money and an item. In this paper, we develop a theory of magnitude-sensitive preference learning that permits an agent to rationally infer its preferences for items compared with money options of different magnitudes. We show how this theory yields classical and anomalous supply-demand curves and predicts choices for a large panel of risky lotteries. Accurate replications of such phenomena without recourse to utility functions suggest that the theory proposed is both psychologically realistic and econometrically viable.", "full_text": "Magnitude-sensitive preference formation\n\nNisheeth Srivastava\u2217\nDepartment of Psychology\nUniversity of San Diego\n\nLa Jolla, CA 92093\n\nEdward Vul\n\nnisheeths@gmail.com\n\nedwardvul@gmail.com\n\nDepartment of Psychology\nUniversity of San Diego\n\nLa Jolla, CA 92093\n\nPaul R Schrater\nDept of Psychology\n\nUniversity of Minnesota\nMinneapolis, MN, 55455\nschrater@umn.edu\n\nAbstract\n\nOur understanding of the neural computations that underlie the ability of animals\nto choose among options has advanced through a synthesis of computational mod-\neling, brain imaging and behavioral choice experiments. Yet, there remains a\ngulf between theories of preference learning and accounts of the real, economic\nchoices that humans face in daily life, choices that are usually between some\namount of money and an item. In this paper, we develop a theory of magnitude-\nsensitive preference learning that permits an agent to rationally infer its prefer-\nences for items compared with money options of different magnitudes. We show\nhow this theory yields classical and anomalous supply-demand curves and pre-\ndicts choices for a large panel of risky lotteries. Accurate replications of such\nphenomena without recourse to utility functions suggest that the theory proposed\nis both psychologically realistic and econometrically viable.\n\n1\n\nIntroduction\n\nWhile value/utility is a useful abstraction for macroeconomic applications, it has little psychological\nvalidity [1]. Valuations elicited in laboratory conditions are known to be extremely variable under\ndifferent elicitation conditions, liable to anchor on arbitrary observations, and extremely sensitive\nto the set of options presented [2]. This last property constitutes the most straightforward refutation\nof the existence of object-speci\ufb01c utilities. Consider for example, an experiment conducted by [3],\nwhere subjects were endowed with a \ufb01xed amount of money, which they could use across multiple\ntrials to buy out of receiving an electric shock of one of three different magnitudes (see left panel\nin Figure 1). The large systematic differences found in the prices for different shock magnitudes\nthat subjects in this study were willing to pay demonstrate the absence of any \ufb01xed psychophysical\nmeasurements of value. Thus, while utility maximization is a mathematically useful heuristic in\neconomic applications, it is unlikely that utility functions can represent value in any signi\ufb01cant\npsychological sense.\nNeurological studies also demonstrate the existence of neuron populations sensitive not to absolute\nreward values, but to one of the presented options being better relative to the others, a phenomenon\ncalled comparative coding. Comparative coding was \ufb01rst reported in [4], who observed activity in\nthe orbito-frontal neurons of monkeys when offered varying juice rewards presented in pairs within\nseparate trial blocks in patterns that depended only on whether a particular juice is preferred within\nits trial. Elliott et al. [5] found similar results using fMRI in the medial orbitofrontal cortex of human\nsubjects a brain region known to be involved in value coding. Even more strikingly, Plassmann et\nal [6] found that falsely assigning a high price to a particular item (wine) caused both greater self-\nreported experienced pleasantness (EP) (see right panel of Figure 1) and greater mOFC activity\nindicative of pleasure. What is causing this pleasure? Where is the \u2018value\u2019 assigned to the pricier\nwine sample coming from?\n\n\u2217Corresponding author: nisheeths@gmail.com\n\n1\n\n\fFigure 1: Valuations of options elicited in the lab can be notoriously labile. Left: An experiment\nwhere subjects had to pay to buy out of receiving electric shock saw subjects losing or gaining\nvalue for the price of pain of particular magnitudes both as a function of the amount of money the\nexperimenters initially gave them and the relative magnitude of the pair of shock options they were\ngiven experience with. Right: Subjects asked to rate \ufb01ve (actually three) wines rated arti\ufb01cially\nhighly-priced samples of wine as more preferable. Not only this, imaging data from orbitofrontal\ncortex showed that they actually experienced these samples as more pleasurable.\n\nViewed in light of these various dif\ufb01culties, making choices for options that involve magnitudes,\nappears to be a formidable challenge. However humans, and even animals [7] are well-known to\nperform such operations easily. Therefore, one of two possibilities holds: one, that it is possible,\nnotwithstanding the evidence laid out above, for humans to directly assess value magnitudes (except\nin corner cases like the ones we describe); two, that some alternative set of computations permits\nthem to behave as if they can estimate value magnitudes. This paper formalizes the set of computa-\ntions that operationalizes this second view.\nWe build upon a framework of preference learning proposed in [8] that avoids the necessity for\nassuming psychophysical access to value and develop a model that can form preferences for quan-\ntities of objects directly from history of past choices. Since the most common modality of choices\ninvolving quantities in the modern world is determining the prices of objects, pricing forms the pri-\nmary focus of our experiments. Speci\ufb01cally, we derive from our theory (i) classical and anomalous\nsupply-demand curves, and (ii) choice predictions for a large panel of risky lotteries. Hence, in this\npaper we present a theory of magnitude-sensitive preference formation that, as an important special\ncase, provides an account of how humans learn to value money.\n\n2 Learning to value magnitudes\n\n2.1 Rational preference formation\n\nTraditional treatments of preference learning (e.g. [9]) assume that there is some hidden state func-\ntion U : X \u2192 R+ such that x (cid:31) x(cid:48) iff U (x) > U (x(cid:48)) \u2200x(cid:48) \u2208 X , where X is the set of all possible op-\ntions. Preference learning, in such settings, is reduced to a task of statistically estimating a monotone\ndistortion of U, thereby making two implicit assumptions (i) that there exists some psychophysical\napparatus that can compute hedonic utilities and (ii) that there exists some psychophysical apparatus\ncapable of representing absolute magnitudes capable of comparison in the mind. The data we de-\nscribe above argues against either possibility being true. In order to develop a theory of preference\nformation that avoids commitments to psychophysical value estimation, a novel approach is needed.\nSrivastava & Schrater [8] provide us with the building blocks for such an approach. They pro-\npose that the process of learning preferences can be modeled as an ideal Bayesian observer directly\nlearning \u2018which option among the ones offered is best\u2019, retaining memory of which options were\npresented to it at every choice instance. However, instead of directly remembering option sets, their\nmodel allows for the possibility that option set observations map to latent contexts in memory. In\npractice, this mapping is assumed to be identi\ufb01ed in all their demonstrations. Formally, the com-\nputation corresponding to utility in this framework is p(r|x, o), which is obtained by marginalizing\n\n2\n\nPrice o\ufb00ered010203040506070Key:highmediumlowPain optionslow-mediummedium-highlow-mediummedium-highEndowment40Endowment80123456123456ABCDE$5$45$10$90$35LikingLikingPrice labelsNo price labelsWine 1Wine 2Wine 3* Reconstructed from Figure 1(a) in (Vlaev, 2011)* Reconstructed from Figure 1, panels B,D in (Plassmann, 2008)\fover the set of latent contexts C,\n\nD(x) = p(r|x, o) =\n\n(cid:80)C\n(cid:80)C\nc p(r|x, c)p(x|c)p(c|o)\nc p(x|c)p(c|o)\n\n,\n\n(1)\n\nwhere it is understood that the context probability p(c|o) = p(c|{o1, o2,\u00b7\u00b7\u00b7 , ot\u22121}) is a distribution\non the set of all possible contexts incrementally inferred from the agent\u2019s observation history. Here,\np(r|x, c) encodes the probability that the item x was preferred to all other items present in choice in-\nstances linked with the context c, p(x|c) encodes the probability that the item x was present in choice\nsets indexed by the context c and p(c) encodes the frequency with which the observer encounters\nthese contexts.\nThe observer also continually updates p(c|o) via recursive Bayesian estimation,\n\np(c(t)|o(1:t)) =\n\n(cid:80)C\np(o(t)|c)p(c|o(1:t\u22121))\nc p(o(t)|c)p(c|o(1:t\u22121))\n\n,\n\n(2)\n\nwhich, in conjunction with the desirability based state preference update, and a simple decision rule\n(e.g. MAP, softmax) yields a complete decision theory.\nWhile this theory is complete in the formal sense that it can make testable predictions of options\nchosen in the future given options chosen in the past, it is incomplete in its ability to represent\noptions: it will treat a gamble that pays $20 with probability 0.1 against safely receiving $1 and\none that pays $20000 with probability 0.1 against safely receiving $1 as equivalent, which is clearly\nunsatisfactory. This is because it considers only simple cases where options have nominal labels.\nWe now augment it to take the information that magnitude labels1 provide into account.\n\n2.2 Magnitude-sensitive preference formation\nTypically, people will encounter monetary labels m \u2208 M in a large number of contexts, often en-\ntirely outside the purview of the immediate choice to be made. In the theory of [8] incorporating\ndesirability information related to m will involve marginalizing across all these contexts. Since\nthe set of such contexts across a person\u2019s entire observation history is larg, explicit marginaliza-\ntion across all contexts would imply explicit marginalization across every observation involving the\nmonetary label m, which is unrealistic. Thus information about contexts must be compressed or\nsummarized2.\nWe can resolve this by assuming that instead that animals generate contexts as clusters of observa-\ntions, thereby creating the possibility of learning higher-order abstract relationships between them.\nSuch models of categorization via clustering are widely accepted in cognitive psychology [10].\nNow, instead of recalling all possible observations containing m, an animal with a set of observation\nclusters (contexts) would simply sample a subset of these that would be representative of all con-\ntexts wherein observations containing m are statistically typical. In such a setting, p(m|c) would\ncorrespond to the observation likelihood of the label m being seen in the cluster c, p(c) would cor-\nrespond to the relative frequency of context occurrences, and p(r|x, m, c) would correspond to the\ninferred value for item x when compared against monetary label m while the active context c. The\nremaining probability term p(x|m) encodes the probability of seeing transactions involving item x\nand the particular monetary label m. We de\ufb01ne r to take the value 1 when x (cid:31) x(cid:48)\u2200x(cid:48) \u2208 X \u2212 {x}.\nFollowing a similar probabilistic calculus as in Equation 1, the inferred value of x becomes p(r|x)\nand can be calculated as,\n\n(cid:80)M\n\nm\n\np(r|x) =\n\n(cid:80)C p(r|x, m, c)p(x|m)p(m|c)p(c)\n(cid:80)M\n\n(cid:80)C p(x|m)p(m|c)p(c)\n\nm\n\n,\n\n(3)\n\n1Note that taking monetary labels into account is not the same as committing to a direct psychophysical\nevaluation of money. In our account, value judgments are linked not with magnitudes, but with labels, that just\nhappen to correspond to numbers in common practice.\n2Mechanistic considerations of neurobiology also suggest sparse sampling of prior contexts. The memory\nand computational burden of recalculating preferences for an ever-increasing C would quickly prove insupera-\nble.\n\n3\n\n\fFigure 2: Illustrating a choice problem an animal might face in the wild (left) and how the interme-\ndiate probability terms in our proposed model would operationalize different forms of information\nneeded to solve such a problem (right). Marginalizing across situation contexts and magnitude labels\ntells us what the animal will do.\n\nwith the difference from the earlier expression arising from an additional summation over the set M\nof monetary labels that the agent has experience with.\nFigure 2 illustrates how these computations could be practically instantiated in a general situation\ninvolving magnitude-sensitive value inference that animals could face. Our hunter-gatherer ancestor\nhas to choose which berry bush to forage in, and we must infer the choice he will make based on\nrecorded history of his past behavior. The right panel in this \ufb01gure illustrates natural interpretations\nfor the intermediate conditional probabilities in Equation 3. The term p(m|c) encodes prior under-\nstanding of the fertility differential in the soils that characterize each of the three active contexts.\nThe p(r|x, m, c) term records the history of the forager\u2019s choice within the context in via empiri-\ncally observed relative frequencies. What drives the forager to prefer a sparsely-laden tree on the\nhill instead of the densely laden tree in the forest in our example, though, is his calculation of the\nunderlying context probability p(c). In our story, because he lives near the hill, he encounters the\nbushes on the hill more frequently, and so they dominate his preference judgment. A wide palette\nof possible behaviors can be similarly interpreted and rationalized within the framework we have\noutlined.\nWhat exactly is this model telling us though that we aren\u2019t putting into it ourselves? The only strong\nconstraint it imposes on the form of preferences currently is that they will exhibit context-speci\ufb01c\nconsistency, viz. an animal that prefers one option over another in a particular context will continue\nto do so in future trials. While this constraint itself is only valid if we have some way of pinning\ndown particular contexts, it is congruent with results from marketing research that describe the\ngeneral form of human preferences as being \u2018 arbitrarily coherent\u2019 - consumer preferences are labile\nand sensitive to changes in option sets, framing effects, loss aversion and a host of other treatments\nbut are longitudinally reliable within these treatments [2]. For our model to make more interesting\neconomic predictions, we must further constrain the form of the preferences it can emit to match\nthose seen in typical monetary transactions; we do this by making further assumptions about the\nintermediate terms in Equation 3 in the next three sections that describe economic applications.\n\n3 Living in a world of money\n\nEquation 3 gives us predictions about how people will form preferences for various options that\nco-occur with money labels. Here we specialize this model to make predictions about the value of\noptions that are money labels, viz. \ufb01at currency. The institutional imperatives of legal tender im-\npose a natural ordering on preferences involving monetary quantities. Ceteris paribus, subjects will\nprefer a larger quantity of money to a smaller quantity of money. Thus, while the psychological de-\n\n4\n\nForagerBerry bushwhere to go?HillForestValleySparseNormalDenseMC sndsndsndp(m|c)p(x|m)sndIs there a bush where I see m red splotches?Typically high for interesting m values X = all berry bushesp(r|x,m,c)c = hill(pmf for all x's with one m shown in one bar)c = valleyc = forest(too crowded!)(easy to get to)p(r|x)hillvalleyforestforhillvalleyforestp(c)(live close to hill)\fsirability pointer could assign preferences to monetary labels capriciously (as an infant who prefers\nthe drawings on a $1 bill to those on a $100 bill might), to model desirability behavior corresponding\nto knowledgeable use of currency, we constrain it to follow arithmetic ordering such that,\n\n(4)\nwhere the notation xm denotes an item (currency token) x associated with the money label m. Then,\nEquation 3 reduces to,\n\nxm\u2217 (cid:31) xm \u21d4 m\u2217 > m \u2200m \u2208 M,\n\n(cid:80)M(cid:48)\n(cid:80)M\n\nm\n\nm\n\n(cid:80)C p(x|m)p(m|c)p(c)\n(cid:80)C p(x|m)p(m|c)p(c)\n\np(r|xm\u2217 ) =\n\n,\n\n(5)\n\nwhere max(M(cid:48)) \u2264 m\u2217, since the contribution to p(r|x, m, c) for all larger m terms, is set to zero\nby the arithmetic ordering condition; the p(x|m) term binds x to all the m(cid:48)s it has been seen with\nbefore.\nAssuming no uncertainty about which currency token goes with which label, p(x|m) becomes a\nsimple delta function pointing to m that the subject has experience with, and Equation 5 can be\nrewritten as,\n\np(r|x) =\n\n(6)\nIf we further assume that the model gets to see all possible money labels, i.e. M = R+, this can be\nfurther simpli\ufb01ed as,\n\n0\n\n.\n\n(cid:82) m\u2217\n(cid:82) \u221e\n\n0\n\n(cid:80)C p(x|m, c)p(m|c)p(c)\n(cid:80)C p(x|m, c)p(m|c)p(c)\n(cid:82) m\u2217\n(cid:80)C p(m|c)p(c)\n(cid:82) \u221e\n(cid:80)C p(m|c)p(c)\n\n0\n\n,\n\n0\n\np(r|x) =\n\n(7)\n\nre\ufb02ecting strong dependence on the shape of p(m), the empirical distribution of monetary outcomes\nin the world.\nWhat can we say about the shape of the general frequency distribution of numbers in the world?\nNumbers have historically arisen as ways to quantify, which helps plan resource foraging, consump-\ntion and conservation. Scarcity of essential resources naturally makes being able to differentiate\nsmall magnitudes important for selection \ufb01tness. This motivates the development of number sys-\ntems where objects counted frequently (essential resources) are counted with small numbers (for\nbetter discriminability). Thus, it is reasonable to assume that, in general, larger numbers will be\nencountered relatively less frequently than smaller ones in natural environments, and hence, that the\nfunctions p(m) and p(c) will be monotone decreasing3. For analytical tractability, we formalize this\nassumption by setting p(m|c) to be gamma distributed on the domain of monetary labels, and p(c)\nto be an exponential distribution on the domain of the typical \u2018wealth\u2019 rate of individual contexts.\nThe wealth rate is an empirically accessible index for the set of situation contexts, and represents\nthe typical (average) monetary label we expect to see in observations associated with this context.\nThus, for instance, the wealth rate for \u2018steakhouses\u2019 will be higher than that of \u2018fast food\u2019. For\nany particular value of the wealth rate, the \u2018price\u2019 distribution p(m|c) will re\ufb02ect the relative fre-\nquencies of seeing various monetary labels in the world in observations typical to context c. The\ngamma/log-normal shape of real-world prices in speci\ufb01c contexts is well-attested empirically. The\nwealth rate distribution p(c) can be always made monotone decreasing simply by shuf\ufb02ing the order\nof presentation of contexts in the measure of the distribution.\nWith these distributional assumptions, the marginalized product p(m) is assured to be a Pareto\ndistribution. Data from [12] as well as supporting indirect observations in [13], suggest that we are\non relatively safe ground by making such assumptions for the general distribution of monetary units\nin the world [14]. This set of assumptions further reduces Equation 7 to,\n\nwhere \u03c8(\u00b7) is the Pareto c.d.f.\n\np(r|x) = \u03c8(xm\u2217 ),\n\n(8)\n\n3Convergent evidence may also be found in the Zip\ufb01an principle of communication ef\ufb01ciency [11]. While it\nmight appear incongruous to speak of differential ef\ufb01ciency in communicating numbers, recall that the histor-\nical origins of numbers involved tally marks and other explicit token-based representations of numbers which\nimposed increasing resource costs in representing larger numbers.\n\n5\n\n\fReduced experience with monetary options will be re\ufb02ected in a reduced membership of M. Sam-\npling at random from M corresponds to approximating \u03c8 with a limited number of samples. So long\nas the sampling procedure is not systematically biased away from particular x values, the resulting\ncurve will not be qualitatively different from the true one. Systematic differences will arise, though,\nif the sampling is biased by, say, the range of values observers are known to encounter. For instance,\nit is reasonable to assume that the wealth of a person is directly correlated with the upper limit of\nmoney values they will see. Substituting this upper limit in Equation 7, we obtain a systematic differ-\nence in the curvature of the utility function that subjects with different wealth endowments will have\nfor the same monetary labels. The trend we obtain from a simulation (see gray inset in Figure 3) with\nthree different wealth levels ($1000, $10000 and $ 1 million) matches the empirically documented\nincrease in relative risk aversion (curvature of the utility function) with wealth [15]. Observe that\nthe log concavity of the Pareto c.d.f. has the practical effect of essentially converting our inferred\nvalue for money into a classical utility function. Thus, using two assumptions (number ordering and\nscarcity of essential resources), we have situated economic measurements of preference as a special,\n\ufb01xed case of a more general dynamic process of desirability evaluation.\n\n4 Modeling willingness-to-pay\n\nFigure 3: Illustrating derivations of pricing theory predictions for goods of various kinds from our\nmodel.\n\nHaving studied how our model works for choices between items that all have money labels, the\nlogical next step is to study choices involving one item with a money label and one without, i.e.,\npricing. Note that asking how much someone values an option, as we did in the section above, is\ndifferent from asking if they would be willing to buy it at a particular price. The former corresponds\nto the term p(r|x), as de\ufb01ned above. The latter will correspond to p(m|r, x), with m being the price\nthe subject is willing to pay to complete the transaction. Since the contribution of all terms where\nr = 0, i.e. the transaction is not completed, is identically zero this term can be computed as,\n\n(cid:80)C p(x|m)p(m|c)p(c)\n(cid:80)M\n(cid:80)C p(x|m)p(m|c)p(c)\n\nm\n\np(m|x) =\n\n(9)\nfurther replacing the integral over M with an integral over the real line as in Equation 5 for analytical\ntractability when necessary.\nWhat aspects of pricing behavior in the real world can our model explain? Interesting variations\nin pricing arise from assumptions about the money distribution p(m|c) and/or the price distribu-\ntion p(x|m). Figure 3 illustrates our model\u2019s explanation for three prominent variations of classical\n\n,\n\n6\n\np(m|c)m(a)(b)(c)p(m|x)msame historyp(m|c)mp(x|m)p(m|x)mrelatively \ufb02at distribution in the tailexclusive goodsseen at relatively fewprice pointssame history(a)(b)(c)(a)(b)(c)(a)(b)(c)(a)(b)(c)p(m|c)m(a)(b)++p(m|x)mp(r|x,m,c)Money labelDesirabilityWealthWealth e\ufb00ect on risk aversion02040608010000.10.20.30.40.50.60.70.80.91k10k1Mitem 1 preferred inexisting choice setp(x|m)p(x|m)Item 2 preferred after prices risep(m|c)t1t2t3t4t5t6t7t812p(m|x)At t8p(m|x)At t2Preference anchored to initial numeric labelWell-behaved classical demand curvemmClassical demand curveVeblen demand curveGi\ufb00en substitutionPrice anchoringMoney distribution is learned over time ...Initial samples in money distribution can skew initial value estimates in novel contexts\fdemand curves documented in the microeconomics literature. Consumers typically reduce prefer-\nence for goods when prices rise, and increases it when prices drop. This fact about the structure of\npreferences involved in money transactions is replicated in our model (see \ufb01rst column in Figure\n3) via the reduction/increase of the contribution of the p(m|c) term to the numerator of Equation 9.\nMarketing research reports anomalous pricing curves that violate this behavior in some cases. One\nimportant case comprises of Veblen goods, wherein the demand for high-priced exclusive goods\ndrops when prices are lowered. Our model explains this behavior (see second column in Figure 3)\nvia unfamiliarity with the price re\ufb02ected in a lower contribution from the price distribution p(x|m)\nfor such low values. Such non-monotonic preference behavior is dif\ufb01cult for utility-based models,\nbut sits comfortably within ours, where familiarity with options at typical price points drives de-\nsirability. Another category of anomalous demand curves comes from Giffen goods, which rise in\ndemand upon price increases because another substitute item becomes too expensive. Our approach\naccounts for this behavior (see third column in Figure 3) under the assumption that price changes\naffect the Giffen good less because its price distribution has a larger variance, which is in line with\nempirical reports showing greater price inelasticity of Giffen goods [16].\nThe last column in Figure 3 addresses an aspect of the temporal dynamics of our model that po-\ntentially explains both (i) why behavioral economists can continually \ufb01nd new anchoring results\n(e.g. [6, 2]) and (ii) why classical economists often consider such results to be marginal and unin-\nteresting [17]. Behavioral scientists running experiments in labs ask subjects to exhibit preferences\nfor which they may not have well-formed price and label distributions, which causes them to an-\nchor and show other forms of preference instability. Economists fail to \ufb01nd similar results in their\n\ufb01eld studies, because they collect data from subjects operating in contexts for which their price and\nlabel distributions are well-formed. Both conclusions fall out of our model of sequential prefer-\nence learning, where initial samples can bias the posterior, but the long-run distribution remains\nstable. Parenthetically, this demonstration also renders transparent the mechanisms by which con-\nsumers process rapid in\ufb02ationary episodes, stock price volatility, and transferring between multiple\ncurrency bases. In all these cases, empirical observations suggests inertia followed by adaptation,\nwhich is precisely what our model would predict.\n\n5 Modeling risky monetary choices\n\nFinally, we ask: how well can our model \ufb01t the choice behavior of real humans making economic\ndecisions? The simplest economic setup to perform such a test is in predicting choices between\nrisky lotteries, since the human prediction is always treated as a stochastic choice preference that\nmaps directly onto the output of our model. We use a basic expected utility calculation, where the\ndesirability for lottery options is computed as in Equation 8. For a choice between a risky lottery\nx1 = {mh, ml} and a safe choice x2 = ms, with a win probability q and where mh > ms > ml,\nthe value calculation for the risky option will take the form,\n\nms\n\n(cid:82) mh\n(cid:82) \u221e\np(m|c)p(c)\n(cid:82) ml\n0 p(m|c)p(c)\n(cid:82) \u221e\np(m|c)p(c)\n0 p(m|c)p(c)\n\nms\n\np(r|x) =\n\np(r|x) =\n\n, in wins\n\n, in losses\n\n(10)\n\n(11)\n\n\u21d2 EV (risky) = q (\u03c8x(mh) \u2212 \u03c8x(ms)) + (1 \u2212 q) (\u03c8x(ml) \u2212 \u03c8x(ms)) .\n\n(12)\nwhere \u03c8(\u00b7) is the c.d.f. of the Pareto distribution on monetary labels m and p(x) is the given lottery\nprobability.\nUsing Equation 12, where \u03c8 is the c.d.f of a Pareto distribution, (\u03b8 = {2.9, 0.1, 1} \ufb01tted empiri-\ncally), assuming that subjects distort perceived probabilities [18] via an inverse-S shaped weighting\nfunction4, and using an \u0001-random utility maximization decision rule5, we obtain choice predictions\n4We use Prelec\u2019s version of this function, with the slope parameter \u03b3 distributed N (0.65, 0.2) across our\n\nagent population. The quantitative values for \u03b3 are taken from (Zhang & Maloney, 2012).\n\n5\u0001-random decision utility maximization is a simple way of introducing stochasticity into the decision rule,\nand is a common econometric practice when modeling population-level data. It predicts that subjects pick the\noption with higher computed expected utility with a probability 1 \u2212 \u0001, and predict randomly with a probability\n\n7\n\n\fFigure 4: Comparing proportion of subjects selecting risky options predicted by our theory with data\nobtained in a panel of 35 different risky choice experiments. The x-axis plots the probability of the\nrisky gamble; the y-axis plots the expected value of gambles scaled to the smallest EV gamble. Left:\nChoice probabilities for risky option plotted for 7 p values and 5 expected value levels. Each of the\n35 choice experiments was conducted using between 70-100 subjects. Right: Choice probabilities\npredicted by relative desirability computing agents in the same 35 choice experiments. Results are\ncompiled by averaging over 1000 arti\ufb01cial agents.\n\nthat match human performance (see Figure 4) on a large and comprehensive panel of risky choice\nexperiments obtained from [19] to within statistical con\ufb01dence6.\n\n6 Conclusion\n\nThe idea that preferences about options can be directly determined psychophysically is strongly\nembedded in traditional computational treatments of human preferences, e.g. reinforcement learn-\ning [20]. Considerable evidence, some of which we have discussed, suggests that the brain does\nnot in fact, compute value [3]. In search of a viable alternative, we have demonstrated a variety of\nbehaviors typical of value-based theories using a stochastic latent variable model that simply tracks\nthe frequency with which options are seen to be preferred in latent contexts and then compiles this\nevidence in a rational Bayesian manner to emit preferences. This proposal, and its success in explain-\ning fundamental economic concepts, situates the computation of value (as it is generally measured)\nwithin the range of abilities of neural architectures that can only represent relative frequencies, not\nabsolute magnitudes.\nWhile our demonstrations are computationally simple, they are substantially novel. In fact, com-\nputational models explaining any of these effects even in isolation are dif\ufb01cult to \ufb01nd [1]. While\nthe results we demonstrate are preliminary, and while some of the radical implications of our pre-\ndictions about the effects of choice history on preferences (\u201cyou will hesitate in buying a Macbook\nfor $100 because that is an unfamiliar price for it\u201d7) remain to be veri\ufb01ed, the plain ability to de-\nscribe these economic concepts within an inductively rational framework without having to invoke\na psychophysical value construct by itself constitutes a non-trival success and forms the essential\ncontribution of this work.\n\nAcknowledgments\n\nNS and PRS acknowledge funding from the Institute for New Economic Thinking. EV acknowl-\nedges funding from NSF CPS Grant #1239323.\n\n\u0001. The value of \u0001 is \ufb01tted to the data; we used \u0001 = 0.25, the value that maximized our \ufb01t to the endpoints of\nthe data. Since we are computing risk attitudes over a population, we should ideally also model stochasticity in\nutility computatation.\n\n6While [19] do not give standard deviations for their data, we assume that binary choice probabilities can be\nmodeled by a binomial distribution, which gives us a theoretical estimate for the standard deviation expected in\nthe data. Our optimal \ufb01ts lie within 1 SD of the raw data for 34 of 35 payoff-probability combinations, yielding\na \ufb01t in probability.\n\n7You will! You\u2019ll think there\u2019s something wrong with it.\n\n8\n\nProbability of risky gambleExpected value0.010.050.20.330.40.50.67105002100400100200.10.20.30.40.50.60.70.8\fReferences\n[1] M. Rabin. Psychology and economics. Journal of Economic Literature, 36(1):pp. 11\u201346, 1998.\n[2] Dan Ariely. Predictably irrational: The Hidden Forces That Shape Our Decisions. Harper\n\nCollins, 2009.\n\n[3] I. Vlaev, N. Chater, N. Stewart, and G. Brown. Does the brain calculate value? Trends in\n\nCognitive Sciences, 15(11):546 \u2013 554, 2011.\n\n[4] L. Tremblay and W. Schultz. Relative reward preference in primate orbitofrontal cortex. Na-\n\nture, 398:704\u2013708, 1999.\n\n[5] R. Elliott, Z. Agnew, and J. F. W. Deakin. Medial orbitofrontal cortex codes relative rather than\nabsolute value of \ufb01nancial rewards in humans. European Journal of Neuroscience, 27(9):2213\u2013\n2218, 2008.\n\n[6] Hilke Plassmann, John O\u2019Doherty, Baba Shiv, and Antonio Rangel. Marketing actions can\nmodulate neural representations of experienced pleasantness. Proceedings of the National\nAcademy of Sciences, 105(3):1050\u20131054, 2008.\n\n[7] M Keith Chen, Venkat Lakshminarayanan, and Laurie R Santos. How basic are behavioral\nbiases? evidence from capuchin monkey trading behavior. Journal of Political Economy,\n114(3):517\u2013537, 2006.\n\n[8] N Srivastava and PR Schrater. Rational inference of relative preferences. In Proceedings of\n\nAdvances in Neural Information Processing Systems 25, 2012.\n\n[9] A. Jern, C. Lucas, and C. Kemp. Evaluating the inverse decision-making approach to prefer-\n\nence learning. In NIPS, pages 2276\u20132284, 2011.\n\n[10] J. Anderson. The Adaptive character of thought. Erlbaum Press, 1990.\n[11] John Z Sun, Grace I Wang, Vivek K Goyal, and Lav R Varshney. A framework for bayesian op-\ntimality of psychophysical laws. Journal of Mathematical Psychology, 56(6):495\u2013501, 2012.\n[12] Neil Stewart, Nick Chater, and Gordon D.A. Brown. Decision by sampling. Cognitive Psy-\n\nchology, 53(1):1 \u2013 26, 2006.\n\n[13] Christian Kleiber and Samuel Kotz. Statistical size distributions in economics and actuarial\n\nsciences, volume 470. Wiley-Interscience, 2003.\n\n[14] Adrian Dragulescu and Victor M Yakovenko. Statistical mechanics of money. The European\n\nPhysical Journal B-Condensed Matter and Complex Systems, 17(4):723\u2013729, 2000.\n\n[15] Daniel Paravisini, Veronica Rappoport, and Enrichetta Ravina. Risk aversion and wealth:\nEvidence from person-to-person lending portfolios. Technical report, National Bureau of Eco-\nnomic Research, 2010.\n\n[16] Kris De Jaegher. Giffen behaviour and strong asymmetric gross substitutability. In New In-\n\nsights into the Theory of Giffen Goods, pages 53\u201367. Springer, 2012.\n\n[17] Faruk Gul and Wolfgang Pesendorfer. The case for mindless economics. The foundations of\n\npositive and normative economics, pages 3\u201339, 2008.\n\n[18] D. Kahneman and A. Tversky. Prospect theory: An analysis of decision under risk. Economet-\n\nrica, 47:263\u2013291, 1979.\n\n[19] Pedro Bordalo, Nicola Gennaioli, and Andrei Shleifer. Salience theory of choice under risk.\n\nThe Quarterly Journal of Economics, 127(3):1243\u20131285, 2012.\n\n[20] Richard S Sutton and Andrew G Barto. Introduction to reinforcement learning. MIT Press,\n\n1998.\n\n9\n\n\f", "award": [], "sourceid": 642, "authors": [{"given_name": "Nisheeth", "family_name": "Srivastava", "institution": "University of California San Diego"}, {"given_name": "Ed", "family_name": "Vul", "institution": "University of California, San Diego"}, {"given_name": "Paul", "family_name": "Schrater", "institution": "University of Minnesota"}]}