{"title": "Bayesian Model of Behaviour in Economic Games", "book": "Advances in Neural Information Processing Systems", "page_first": 1345, "page_last": 1352, "abstract": "Classical Game Theoretic approaches that make strong rationality assumptions have difficulty modeling observed behaviour in Economic games of human subjects. We investigate the role of finite levels of iterated reasoning and non-selfish utility functions in a Partially Observable Markov Decision Process model that incorporates Game Theoretic notions of interactivity. Our generative model captures a broad class of characteristic behaviours in a multi-round Investment game. We invert the generative process for a recognition model that is used to classify 200 subjects playing an Investor-Trustee game against randomly matched opponents.", "full_text": "Bayesian Model of Behaviour in Economic Games\n\nDebajyoti Ray\n\nComputation and Neural Systems\nCalifornia Institute of Technology\n\nPasadena, CA 91125. USA\n\ndray@caltech.edu\n\nBrooks King-Casas\n\nComputational Psychiatry Unit\nBaylor College of Medicine.\nHouston, TX 77030. USA\n\nbkcasas@cpu.bcm.tmc.edu\n\nP. Read Montague\n\nHuman NeuroImaging Lab\nBaylor College of Medicine.\nHouston, TX 77030. USA\n\nmontague@hnl.bcm.tmc.edu\n\nPeter Dayan\n\nGatsby Computational Neuroscience Unit\n\nUniversity College London\nLondon. WC1N 3AR. UK\n\ndayan@gatsby.ucl.ac.uk\n\nAbstract\n\nClassical game theoretic approaches that make strong rationality assumptions have\ndif\ufb01culty modeling human behaviour in economic games. We investigate the role\nof \ufb01nite levels of iterated reasoning and non-sel\ufb01sh utility functions in a Partially\nObservable Markov Decision Process model that incorporates game theoretic no-\ntions of interactivity. Our generative model captures a broad class of characteristic\nbehaviours in a multi-round Investor-Trustee game. We invert the generative pro-\ncess for a recognition model that is used to classify 200 subjects playing this game\nagainst randomly matched opponents.\n\n1 Introduction\n\nTrust tasks such as the Dictator, Ultimatum and Investor-Trustee games provide an empirical basis\nfor investigating social cooperation and reciprocity [11]. Even in completely anonymous settings,\nhuman subjects show rich patterns of behavior that can be seen in terms of such personality concepts\nas charity, envy and guilt. Subjects also behave as if they model these aspects of their partners\nin games, for instance acting to avoid being taken advantage of. Different subjects express quite\ndifferent personalities, or types, and also have varying abilities at modelling their opponents.\nThe burgeoning interaction between economic psychology and neuroscience requires formal treat-\nments of these issues. From the perspective of neuroscience, such treatments can provide a precise\nquantitative window into neural structures involved in assessing utilties of outcomes, capturing risk\nand probabilities associated with interpersonal interactions, and imputing intentions and beliefs to\nothers. In turn, evidence from brain responses associated with these factors should elucidate the neu-\nral algorithms of complex interpersonal choices, and thereby illuminate economic decision-making.\nHere, we consider a sequence of paradigmatic trust tasks that have been used to motivate a variety\nof behaviorally-based economic models. In brief, we provide a formalization in terms of partially\nobservable Markov decision processes, approximating type-theoretic Bayes-Nash equilibria [8] us-\ning \ufb01nite hierarchies of belief, where subjects\u2019 private types are construed as parameters of their\ninequity averse utility functions [2]. Our inference methods are drawn from machine learning.\nFigure 1a shows a simple one-round trust game. In this, an Investor is paired against a randomly\nassigned Trustee. The Investor can either choose a safe option with a low payoff for both, or take\na risk and pass the decision to the Trustee who can either choose to defect (and thus keep more for\nherself) or choose the fair option that leads to more gains for both players (though less pro\ufb01table\n\n1\n\n\fFigure 1: (a) In a simple Trust game, the Investor can take a safe option with a payoff of $[In-\nvestor=20,Trustee=20] (i.e. the Investor gets $20 and the Trustee gets $20). The game ends if the\nInvestor chooses the safe option; alternatively, he can pass the decision to the Trustee. The Trustee\ncan now choose a fair option $[25,25] or choose to defect $[15,30]. (b) In the multi-round version\nof the Trust game, the Investor gets $20 dollars at every round. He can invest any (integer) part; this\nquantity is trebled on the way to the Trustee. In turn, she has the option of repaying any (integer)\namount of her resulting allocation to the Investor. The game continues for 10 rounds.\n\nfor herself alone than if she defected). Figure 1b shows the more sophisticated game we consider,\nnamely a multi-round, sequential, version of the Trust game [15].\nThe fact that even in a purely anonymized setting, Investors invest at all, and Trustees reciprocate at\nall in games such as that of \ufb01gure 1a, is a challenge to standard, money-maximizing doctrines (which\nexpect to \ufb01nd the Nash equilibrium where neither happens), and pose a problem for modeling. One\npopular strategy is to retain the notion that subjects attempt to optimize their utilities, but to include\nin these utilities social factors that penalize cases in which opponents win either more (crudely envy,\nparameterized by \u03b1) or less (guilt, parameterized by \u03b2) than themselves [2]. One popular Inequity-\nAversion utility function [2] characterizes player i by the type Ti = (\u03b1i, \u03b2i) of her utility function:\n(1)\n\nU(\u03b1i, \u03b2i) = xi \u2212 \u03b1i max{(xj \u2212 xi), 0} \u2212 \u03b2i max{(xi \u2212 xj), 0}\n\nwhere xi, xj are the amounts received by players i and j respectively.\nIn the multi-round version of \ufb01gure 1b, reputation formation comes into play [15]. Investors have\nthe possibility of gaining higher rewards from giving money to the Trustee; and, at least until the\n\ufb01nal round, the Trustee has an incentive to maintain a reputation of trustworthiness in order to coax\nthe Investor to offer more (against any Nash tendencies associated with solipsistic utility functions).\nSocial utility functions such as that of equation 1 mandate probing, belief manipulation and the like.\nWe cast such tasks as Bayesian Games. As in the standard formulation [8], players know their own\ntypes but not those of their opponents; dyads are thus playing games of incomplete information.\nA player also has prior beliefs about their opponent that are updated in a Bayesian manner after\nobserving the opponent\u2019s actions. Their own actions also in\ufb02uence their opponent\u2019s beliefs. This\nleads to an in\ufb01nite hierarchy of beliefs: what the Trustee thinks of the Investor; what the Trustee\nthinks the Investor thinks of him; what the Trustee thinks the Investor thinks the Trustee thinks of\nher; and so on. If players have common prior beliefs over the possible types in the game, and this\nprior is common knowledge, then (at least one) subjective equilibrium known as the Bayes-Nash\nEquilibrium (BNE), exists [8]. Algorithms to compute BNE solutions have been developed but, in\nthe general case, are NP-hard [6] and thus infeasible for complex multi-round games [9].\nOne obvious approach to this complexity is to consider \ufb01nite rather than in\ufb01nite belief hierarchies.\nThis has both theoretical and empirical support. First, a \ufb01nite hierarchy of beliefs can provably\napproximate the equilibrium solution that arises in an in\ufb01nite belief hierarchy arbitrarily closely [10],\nan idea that has indeed been employed in practice to compute equilibria in a multi-agent setting [5].\nSecond, based on a whole wealth of games such as the p-Beauty game [11], it has been suggested\nthat human subjects only employ a very restricted number of steps of strategic thinking. According\nto cognitive hierarchy theory, a celebrated account of this, this number is on average a mere 1.5 [13].\nIn order to capture the range of behavior exhibited by subjects in these games, we built a \ufb01nite\nbelief hierarchy model, using inequity averse utility functions in the context of a partially observable\nhidden Markov model of the ignorance each subject has about its opponent\u2019s type and in the light of\nsequential choice. We used inference strategies from machine learning to \ufb01nd approximate solutions\nto this model. In this paper, we use this generative model to investigate the qualitative classes of\nbehaviour that can emerge in these games.\n\n2\n\n\fFigure 2: Each player\u2019s decision-making requires solving a POMDP, which involves solving the op-\nponent\u2019s POMDP. Higher order beliefs are required as each player\u2019s action in\ufb02uences the opponent\u2019s\nbeliefs which in turn in\ufb02uence their policy.\n\n2 Partially Observable Markov Games\n\ni\n\ni\n\ni\n\n, b(0)(cid:48)\n\n, b(0)(cid:48)(cid:48)\n\ni = {b(0)\n\nAs in the framework of Bayesian games, player i\u2019s inequity aversion type Ti = (\u03b1i, \u03b2i) is known to\nit, but not to the opponent. Player i does have a prior distribution over the type of the other player j,\nb(0)\n(Tj); and, if suitably sophisticated, can also have higher-order priors over the whole hierarchy\ni\n, ...}.\nof recursive beliefs about types. We denote the collection of priors as (cid:126)b(0)\nPlay proceeds sequentially, with player i choosing action a(t)\nat time t according to the expected fu-\nture value of this choice. In this (hidden) Markovian setting, this value, called a Q-value depends on\ni\nthe stage (given the \ufb01nite horizon), the current beliefs of the player (cid:126)b(t)\n(which are suf\ufb01cient statis-\ni\ni = a|D(t)) (which depend on the observations\ntics for the past observations), and the policies P (a(t)\nD(t)) of both players up to time t:\ni }) (cid:88)\ni ) = U (t)\n((cid:126)b(t)\nj |{D(t), a(t)\n(cid:16)\n\ni+1 |{D(t), a(t)\n(cid:17)\n\nwhere we arbitrarily de\ufb01ne the softmax policy,\n\n, a(t)\nP (a(t)\n\nj })\n, a(t)\n\n(cid:88)\n\n)P (a(t+1)\n\ni ((cid:126)b(t)\n\ni\n(t+1)\ni\n\n, a(t+1)\n\n((cid:126)b(t+1)\n\nQ(t+1)\n\ni )+\n\nj \u2208A\n\n(t+1)\na\ni\n\n, a(t)\n\n(cid:17)\n\n(cid:16)\n\nQ(t)\n\n(2)\n\n\u2208A\n\n(t)\nj\n\ni\n\n(t)\n\na\n\ni\n\ni\n\ni\n\ni\n\ni\n\n(cid:88)\n\nP (a(t)\n\ni = a|D(t)) = exp\n\n\u03c6Q(t)\n\ni ((cid:126)b(t)\n\ni\n\n, a)\n\n/\n\nexp\n\n\u03c6Q(t)\n\ni ((cid:126)b(t)\n\ni\n\n, b)\n\n(3)\n\nb\n\ni\n\ni\n\n, b(t)\n\nb(t+1)\ni\n\nj , a(t)\n\ni ) = P (Tj, a(t)\n\nj )(cid:80)\n\n(Tj) = P (Tj|a(t)\n\nakin to Quantal Response Equilibrium [12], which depends on player i\u2019s beliefs about player j,\nj |{D(t), a(t)\ni })\nwhich are, in turn, updated using Bayes\u2019 rule based on the likelihood function P (a(t)\nj |a(t)\n\n, a(t)\nswitching between history-based (Dt) and belief-based (b(t)\ni (Tj)) representations. Given the inter-\ndependence of beliefs and actions, we expect to see probing (to \ufb01nd out the type and beliefs of one\u2019s\nopponent) and belief manipulation (being nice now to take advantage of one\u2019s opponent later).\nIf the other player\u2019s decisions are assumed to emerge from equivalent softmax choices, then for the\nsubject to calculate this likelihood, they must also solve their opponent\u2019s POMDP. This leads to an\nin\ufb01nite recursion (illustrated in \ufb01g. 2). In order to break this, we assume that each player has k\nlevels of strategic thinking as in the Cognitive Hierarchy framework [13]. Thus each k-level player\nassumes that his opponent is a k \u2212 1-level player. At the lowest level of the recursion, the 0-level\nplayer uses a simple likelihood to update their opponent\u2019s beliefs.\n\nj)/P (a(t)\n\ni (T (cid:48)\nb(t)\n\n, b(t)\ni )\n\n(4)\n\nT (cid:48)\n\ni\n\nj\n\ni\n\n(a(t)\n\ni ) is calculated at every round for each player i for action a(t)\n\nThe utility U (t)\ni by marginalizing\nover the current beliefs b(t)\n. It is extremely challenging to compute with belief states, since they\ni\nare probability distributions, and are therefore continuous-valued rather than discrete. To make this\ncomputationally reasonable, we discretize the values of the types. As an example, if there are only\ntwo types for a player the belief state, which is a continuous probability distribution over the interval\n\n3\n\n\fi ) = (cid:88)\n\nU (t)\n\ni\n\n(a(t)\n\n[0, 1] is discretized to take K values bi1 = 0, . . . , biK = 1. The utility of an action is obtained by\nmarginalizing over the beliefs as:\n\nbikQ(t)\n\ni (b(t)\n\nik , a(t)\ni )\n\n(5)\n\nk=1:K\n\nFurthermore, we solve the resulting POMDP using a mixture of explicit expansion of the tree from\nthe current start point to three stages ahead, and a stochastic, particle-\ufb01lter-based scheme (as in [7]),\nfrom four stages ahead to the end of the game.\nOne characteristic of this explicit process model, or algorithmic approach, is that it is possible to\nconsider what happens when the priors of the players differ. In this case, as indeed also for the\ncase of only a \ufb01nite belief hierarchy, there is typically no formal Bayes-Nash equilibrium. We also\nveri\ufb01ed our algorithm against the QRE and BNE solutions provided by GAMBIT ([14]) on a 1 and\n2 round Trust game for k = 1, 2 respectively. However unlike the BNE solution in the extensive\nform game, our algorithm gives rise to belief manipulation and effects at the end of the game.\n\n3 Generative Model for Investor-Trustee Game\n\nReputation-formation plays a particularly critical role in the Investor-Trustee game, with even the\nmost sel\ufb01sh players trying to bene\ufb01t from cooperation, at least in the initial rounds. In order to\nreduce complexity in analyzing this, we set \u03b1I = \u03b2I = 0 (i.e., a purely sel\ufb01sh Investor) and\nconsider 2 values of \u03b2T (0.3 and 0.7) such that in the last round the Trustee with type \u03b2T = 0.3 will\nnot return any amount to the Investor and will choose fair outcome if \u03b2T = 0.7. We generate a rich\ntapestry of behavior by varying the prior expectations as to \u03b2T and the values of strategic (k) level\n(0,1,2) for the players.\n\n3.1 Factors Affecting Behaviour\nAs an example, \ufb01g. 3 shows the evolution of the Players\u2019 Q-values and 1st-order beliefs of the\nInvestor and 2nd-order beliefs of the Trustee (i.e., her beliefs as to the Investor\u2019s beliefs about her\nvalue of \u03b2T ) over the course of a single game. Here, both players have kI = kT = 1 (i.e. they are\nstrategic players), but the Trustee is actually less guilty \u03b2T = 0.3.\nIn the \ufb01rst round, the Investor gives $15, and receives back $30 from the Trustee. This makes\nthe Investor\u2019s beliefs about \u03b2T go from being uniform to being about 0.75 for \u03b2T = 0.7 and 0.25\nfor \u03b2T = 0.3 (showing the success in the Trustee\u2019s exercise in belief manipulation). This causes\nthe Q-value for the action corresponding to giving $20 dollars to be highest, inspiring the Investor\u2019s\ngenerosity in round 2. Equally, the Trustee\u2019s (2nd-order) beliefs after receiving $15 in the \ufb01rst round\npeak for the value \u03b2T = 0.7, corresponding to thinking that the Investor believes the Trustee is Nice.\nIn subsequent rounds, the Trustee\u2019s nastiness limits what she returns, and so the Investor ceases\ngiving high amounts. In response, in rounds 5 and 7, the Trustee tries to coax the Investor. We \ufb01nd\nthis \u201creciprocal give and take\u201d to be a characteristic behaviour of strategic Investors and Trustees\n(with k = 1). For naive Players with k = 0, a return of a very low amount for a high amount\ninvested would lead to a complete breakdown of Trust formation.\nFig. 4 shows the statistics of dyadic interactions between Investors and Trustees with Uniform priors.\nThe amount given by the Investor varies signi\ufb01cantly depending on whether or not he is strategic,\nand also on his priors. In round 1, Investors with kI = 0 and 1 offer $20 \ufb01rst (the optimal probing\naction based on uniform prior beliefs) and for kI = 2 offers $15 dollars. The corresponding amount\nreturned by the Trustee depends signi\ufb01cantly on kT . A Trustee with kT = 0 and low \u03b2T will return\nnothing whereas an unconditionally cooperative Trustee (high \u03b2T ) returns roughly the same amount\nas received. Irrespective of the Trustee\u2019s \u03b2T type, the amount returned by strategic Trustees with\nkT = 1, 2 is higher (between 1.5 and 2 times the amount received).\nIn round 2 we \ufb01nd that the low amount received causes trust to break down for Investors with\nkI = 0. In fact, naive Investors and Trustees do not form Trust in this game. Strategic Trustees return\nmore initially and are able to coax naive Investors to give higher amounts in the game. Generally\nunconditionally cooperative Trustees return more, and form Trust throughout the game if they are\nstrategic or if they are playing against strategic Investors. Trustees with low \u03b2T defect towards the\nend of the game but coax more investment in the beginning of the game.\n\n4\n\n\fFigure 3: The generated game shows the amount given by an Investor with kI = 1 and a Trustee\nwith \u03b2T = 0.3 and kT = 1. The red bar indicates amount given by the Investor and the blue bar\nis the amount returned by the Trustee (after receiving 3 times amount given by the Investor). The\n\ufb01gures on the right reveal the inner workings of the algorithm: Q-values through the rounds of the\ngame for 5 different actions of the Investor (0, 5, 10, 15, 20) and 5 actions of the Trustee between\nvalues 0 and 3 times amount given by Investor. Also shown are the Investor\u2019s 1st-order beliefs (left\nbar for \u03b2T = 0.3 and right bar for \u03b2T = 0.7) and Trustee\u2019s 2nd-order beliefs over the rounds.\n\nFigure 4: The dyadic interactions between the Investor and Trustee across the 10 rounds of the\ngame. The top half shows Investor playing against Trustee with low \u03b2T (= 0.3) and the bottom half\nis the Trustee with high \u03b2T (= 0.7): unconditionally cooperative. The top dyad shows the amount\ngiven the Investor and the bottom dyad shows the amount returned by Trustee. Within each dyad\nthe rows represent the strategic (kI) levels of Investor (0, 1 or 2) and the columns represent kT\nlevel of the Trustee (0, 1 or 2). The dyads are shown here for the \ufb01rst 2 and \ufb01nal 2 rounds. Two\nparticular examples are highlighted within the dyads: Investor with kI = 0 and Trustee with kT = 2,\nuncooperative (\u03b2low\n). Lighter colours\nreveal higher amounts (with amount given by Investor in \ufb01rst round being 15 dollars).\n\nT ) and Investor kI = 1 and Trustee kT = 2, cooperative (\u03b2high\n\nT\n\nThe effect of strategic level is more dramatic for the Investor, since his ability to defect at any\npoint places him in effective charge of the interaction. Strategic Investors give more money in the\ngame than naive Investors. Consequently they also get more return on their investment because of\nthe bene\ufb01cial effects of this on their reputations. A further observation is that strategic Investors\nare more immune to the Trustee\u2019s actions. While this means that break-downs in the game due to\n\n5\n\n\fmistakes of the Trustee (or unfortunate choices from her softmax) are more easily corrected by the\nstrategic Investor, he is also more likely to continue investing even if the Trustee doesn\u2019t reciprocate.\nIt is also worth noting the differences between k = 1 and k = 2 players. The latter typically offer\nless in the game and are also less susceptible to the actions of their opponent. Overall in this game,\nthe Investors with kI = 1 make the most amount of money playing against a cooperative Trustee\nwhile kI = 0 Investors make the least. The best dyad consists of a kI = 1 Investor playing with a\ncooperative Trustee with kT = 0 or 1.\nA very wide range of patterns of dyadic interaction, including the main observations of [15], can\nthus be captured by varying just the limited collection of parameters of our model\n\n4 Recognition and Classi\ufb01cation\n\nOne of the main reasons to build this generative model for play is to have a re\ufb01ned method for\nclassifying individual players on the basis of the dyadic behaviour. We do this by considering the\nstatistical inverse of the generative model as a recognition model. Denote the sequence of plays in the\n2 ]}. Since the game is Markovian\n10-round Investor-Trustee game as D = {[a(1)\nwe can calculate the probability of player i taking the action sequence {a(t)\n, t = 1, ..., 10} given his\nType Ti and prior beliefs (cid:126)b(0)\n\n2 ], .., [a(10)\n\n1 , a(1)\n\n, a10\n\nas:\n\n1\n\ni\n\ni\n\n10(cid:89)\n\nt=2\n\nP ({at\n\ni}|Ti,(cid:126)b(0)\n\ni\n\n) = P (a(1)\n\n1 |Ti,(cid:126)b(0)\n\ni\n\n)\n\nP (a(t)\n\ni\n\n|D(t), Ti)\n\n(6)\n\n1 |Ti,(cid:126)b(0)\n\ni\n\ni\n\ni\n\ni\n\ni\n\ni\n\ni\n\ni\n\n).\n\n(0)\ni\n\ni ,(cid:126)b(0)\n\nP (D|Ti,(cid:126)b(0)\n\n\u2217) = maxTi,(cid:126)b\n\n) is the probability of initial action a(1)\n\nwhere P (a(1)\ngiven by the softmax distribution and\n|D(t), Ti) is the probability of action a(t)\nafter updating beliefs (cid:126)b(t)\n, and P (a(t)\nprior beliefs (cid:126)b(0)\ni\ni\nfrom previous beliefs (cid:126)b(t\u22121)\nupon the observation of the past sequence of moves D(t). This is a\n, and so can be used for posterior inference about type given D. We\nlikelihood function for Ti,(cid:126)b(0)\nclassify the players for their utility function (\u03b2T value for the Trustee), strategic (ToM) levels and\nprior beliefs using the MAP value (T \u2217\nWe used our recognition model to classify subject pairs playing the 10-round Investor-Trustee game\n[15]. The data included 48 student pairs playing an Impersonal task for which the opponents\u2019 iden-\ntities were hidden and 54 student pairs playing a Personal task for which partners met.\nEach Investor-Trustee pair was classi\ufb01ed for their level of strategic thinking k and the Trustee\u2019s \u03b2T\ntype (cooperative/uncooperative; see the table in Figure 5). We are able to capture some character-\nistic behaviours with our model. The highlighted interactions reveal that many of the pairs in the\nImpersonal task consisted of strategic Investors and cooperative Trustees, who formed trust in the\ngame with the levels of investment decreasing towards the end of the game. We also highlight the\ndifference between strategic and non-strategic Investors. An Investor with kI = 0 will not form\ntrust if the Trustee does not return a signi\ufb01cant amount initially whilst an Investor with kI = 2 will\ncontinue offering money in the game even if the Trustee gives back less than fair amounts in return.\nThere is also a strong correlation between the proportion of Trustees classi\ufb01ed as being cooperative:\nestimated as 48%, 30%, on the Impersonal and Personal tasks respectively and the corresponding\nReturn on Investment (how much the Investor receives for the amount Invested): 120%, 109%.\nAlthough the recognition model captures key characteristics, we do not expect the Trustees to have\nT = 0.7. To test the robustness of the recognition model\nthe speci\ufb01ed values of \u03b2low\nT = [0, 0.1, 0.2, 0.3, 0.4] and\nwe generated behaviours (450 dyads) with different values of \u03b2T (\u03b2low\n\u03b2high\n= [0.6, 0.7, 0.8, 0.9, 1.0]), that were classi\ufb01ed using the recognition model. Figure 5 shows\nT\nhow con\ufb01dently players of the given type were classi\ufb01ed to have that type.\nWe \ufb01nd that the recognition model tends to misclassify Trustees with low \u03b2T as having kT = 2.\nThis is because the Trustees with those characteristics will offer high amounts to coax the Investor.\nInvestor are shown to be correctly classi\ufb01ed in most cases. Overall the recognition model has a\ntendency to assign higher kT to the Trustees than their true type, though the model correctly assigns\nthe right cooperative/uncooperative type to the Trustee.\n\nT = 0.3 and \u03b2high\n\n6\n\n\fFigure 5: Subject pairs are classi\ufb01ed into levels of Theory of Mind for the Investor (rows) and\nTrustee (columns). The number of subject-pairs with the classi\ufb01cation are shown in each entry\n). The\nalong with whether the Trustee was classi\ufb01ed as uncooperative / cooperative (\u03b2low\nsubjects play an Impersonal game where they do not know the identities of the opponent and a\nPersonal game where identities are revealed.\nWe reveal the dominant or unique behavioural classi\ufb01cation within tables (highlighted): Impersonal\n(kI = 1, kT = 2, cooperative) group averaged over 10 subjects, Personal group (kI = 0, kT = 0,\nuncooperative) averaged over 3 subjects, and Personal group with (kI = 2, kT = 0, uncooperative)\naveraged over 11 subjects.\nWe also show the classi\ufb01cation con\ufb01dence for the types given the behaviour was generated from our\nmodel with other values of \u03b2T for the Trustee, as well as the type that the player is most likely to be\nclassi\ufb01ed as in brackets. (A Trustee with low \u03b2T and kT = 1 is very likely to be misclassi\ufb01ed as a\nplayer with kT = 2, while a player with kT = 2 will mostly be classi\ufb01ed with kT = 2)\n\nT , \u03b2high\n\nT\n\n5 Discussion\n\nWe built a generative model that captures classes of observed behavior in multi-round trust tasks.\nThe critical features of the model are a social utility function, with parameters covering different\ntypes of subjects; partial observability, accounting for subjects\u2019 ignorance about their opponents;\nan explicit and \ufb01nite cognitive hierarchy to make approximate equilibrium calculations marginally\ntractable; and partly deterministic and partly sample-based evaluation methods.\nDespite its descriptive adequacy, we do not claim that it is uniquely competent. We also do not\nsuggest a normative rationale for pieces of the model such as the social utility function. Nevertheless,\nthe separation between the vagaries of utility and the exactness of inference is attractive, not the least\nby providing clearly distinct signals as to the inner workings of the algorithm that can be extremely\nuseful to capture neural \ufb01ndings. Indeed, the model is relevant to a number of experimental \ufb01ndings,\nincluding those due to [15], [18], [19]. The underlying foundation in reinforcement learning is\ncongenial, given the substantial studies of the neural bases of this [20].\nThe model does directly license some conclusions. For instance, we postulate that higher activation\nwill be observed in regions of the brain associated with theory of mind for Investors that give more\nin the game, and for Trustees that can coax more. However, unlike [13] our Naive players still build\nmodels, albeit unsophisticated ones, of the other player (in contrast to level 0 players who assume\nthe opponent to play a random strategy). So this might lead to an investigation of how sophisticated\nand naive theory of mind models are built by subjects in the game.\nWe also constructed the recognition model, which is the statistical inverse to this generative model.\nWhile we showed this to capture a broad class of behaviours, it only explains the coarse features\nof the behaviour. We need to incorporate some of the other parameters of our model, such as\nthe Investor\u2019s envy and the temperature parameter of the softmax distribution in order to capture\nthe nuances in the interactions. Further it would be interesting to use the recognition model in\npathological populations, looking at such conditions as autism and borderline personality disorder.\n\n7\n\n\fFinally, this computational model provides a guide for designing experiments to probe aspects of\nsocial utility, strategic thinking levels and prior beliefs, as well as inviting ready extensions to related\ntasks such as Public Goods games. The inference method may also have wider application, for\ninstance to identifying which of a collection of Bayes-Nash equilibria is most likely to arise, given\npsychological factors about human utilities.\n\nAcknowledgments\n\nWe thank Wako Yoshida, Karl Friston and Terry Lohrenz for useful discussions.\n\nReferences\n[1] K.A. McCabe, M.L. Rigdon and V.L. Smith. Positive Reciprocity and Intentions in Trust Games (2003).\n\nJournal of Economic Behaviour and Organization.\n\n[2] E. Fehr and K.M. Schmidt. A Theory of Fairness, Competition and Cooperation (1999). The Quarterly\n\nJournal of Economics.\n\n[3] E. Fehr and S. Gachter. Fairness and Retaliation: The Economics of Reciprocity (2000). Journal of Eco-\n\nnomic Perspectives.\n\n[4] E. Fehr and U. Fischbacher. Social norms and human cooperation (2004). TRENDS in Cog. Sci. 8:4.\n[5] P.J. Gmytrasiewicz and P. Doshi. A Framework for Sequential Planning in Multi-Agent Settings (2005).\n\nJournal of Arti\ufb01cial Intelligence Research.\n\n[6] V. Conitzer and T. Sandholm (2002). Complexity Results about Nash Equilibria. Technical Report CMU-\n\nCS-02-135, School of Computer Science, Carnegie-Mellon University.\n\n[7] S. Thrun. Monte Carlo POMDPs (2000). Advances in Neural Information Processing Systems 12.\n[8] JC Harsanyi (1967). Games with Incomplete Information Played by \u201cBayesian\u201d Players, I-III. Management\n\nScience.\n\n[9] J.F. Mertens and S. Zamir. Formulation of Bayesian analysis for games with incomplete information (1985).\n\nInternational Journal of Game Theory.\n\n[10] Y. Nyarko. Convergence in Economic Models with Bayesian Hierarchies of Beliefs (1997). Journal of\n\nEconomic Theory.\n\n[11] C. Camerer. Behavioural Game Theory: Experiments in Strategic Interaction (2003). Princeton Univ.\n[12] R. McKelvey and T. Palfrey. Quantal Response Equilibria for Extensive Form Games (1998). Experimen-\n\ntal Economics 1:9-41.\n\n[13] C. Camerer, T-H. Ho and J-K. Chong. A Cognitive Hierarchy Model of Games (2004). The Quarterly\n\nJournal of Economics.\n\n[14] R.D. McKelvey, A.M. McLennan and T.L. Turocy (2007). Gambit: Software Tools for Game Theory.\n[15] B. King-Casas, D. Tomlin, C. Anen, C.F. Camerer, S.R. Quartz and P.R. Montague (2005). Getting to\n\nknow you: Reputation and Trust in a two-person economic exchange. Science 308:78-83.\n\n[16] D. Tomlin, M.A. Kayali, B. King-Casas, C. Anen, C.F. Camerer, S.R. Quartz and P.R. Montague (2006).\n\nAgent-speci\ufb01c responses in cingulate cortex during economic exchanges. Science 312:1047-1050.\n\n[17] L.P. Kaelbling, M.L. Littman and A.R. Cassandra. Planning and acting in partially observable stochastic\n\ndomains (1998). Arti\ufb01cial Intelligence.\n\n[18] K. McCabe, D. Houser, L. Ryan, V. Smith, T. Trouard. A functional imaging study of cooperation in\n\ntwo-person reciprocal exchange. Proc. Natl. Acad. Sci. USA 98:11832-35.\n\n[19] K. Fliessbach, B. Weber, P. Trautner, T. Dohmen, U. Sunde, C.E. Elger and A. Falk. Social Comparison\n\nAffects Reward-Related Brain Activity in the Human Ventral Striatum (2007). Science 318:1302-1305.\n\n[20] B. Lau and P. W. Glimcher (2008). Representations in the Primate Striatum during Matching Behaviour.\n\nNeuron 58.\n\n8\n\n\f", "award": [], "sourceid": 92, "authors": [{"given_name": "Debajyoti", "family_name": "Ray", "institution": null}, {"given_name": "Brooks", "family_name": "King-casas", "institution": null}, {"given_name": "P.", "family_name": "Montague", "institution": null}, {"given_name": "Peter", "family_name": "Dayan", "institution": null}]}