{"title": "A rational decision making framework for inhibitory control", "book": "Advances in Neural Information Processing Systems", "page_first": 2146, "page_last": 2154, "abstract": "Intelligent agents are often faced with the need to choose actions with uncertain consequences, and to modify those actions according to ongoing sensory processing and changing task demands. The requisite ability to dynamically modify or cancel planned actions is known as inhibitory control in psychology. We formalize inhibitory control as a rational decision-making problem, and apply to it to the classical stop-signal task. Using Bayesian inference and stochastic control tools, we show that the optimal policy systematically depends on various parameters of the problem, such as the relative costs of different action choices, the noise level of sensory inputs, and the dynamics of changing environmental demands. Our normative model accounts for a range of behavioral data in humans and animals in the stop-signal task, suggesting that the brain implements statistically optimal, dynamically adaptive, and reward-sensitive decision-making in the context of inhibitory control problems.", "full_text": "A Rational Decision-Making Framework for Inhibitory Control\n\nPradeep Shenoy\n\nDepartment of Cognitive Science\nUniversity of California, San Diego\n\npshenoy@ucsd.edu\n\nRajesh P. N. Rao\n\nDepartment of Computer Science\n\nUniversity of Washington\nrao@cs.washington.edu\n\nAngela J. Yu\n\nDepartment of Cognitive Science\nUniversity of California, San Diego\n\najyu@ucsd.edu\n\nAbstract\n\nIntelligent agents are often faced with the need to choose actions with uncertain\nconsequences, and to modify those actions according to ongoing sensory process-\ning and changing task demands. The requisite ability to dynamically modify or\ncancel planned actions is known as inhibitory control in psychology. We formal-\nize inhibitory control as a rational decision-making problem, and apply to it to the\nclassical stop-signal task. Using Bayesian inference and stochastic control tools,\nwe show that the optimal policy systematically depends on various parameters of\nthe problem, such as the relative costs of different action choices, the noise level\nof sensory inputs, and the dynamics of changing environmental demands. Our\nnormative model accounts for a range of behavioral data in humans and animals\nin the stop-signal task, suggesting that the brain implements statistically optimal,\ndynamically adaptive, and reward-sensitive decision-making in the context of in-\nhibitory control problems.\n\n1\n\nIntroduction\n\nIn natural behavior as well as in engineering applications, there is often the need to choose, under\ntime pressure, an action among multiple options with imprecisely known consequences. For exam-\nple, consider the decision of buying a house. A wise buyer should collect suf\ufb01cient data to make an\ninformed decision, but waiting too long might mean missing out on a dream home. Thus, balanced\nagainst the informational gain afforded by lengthier deliberation is the opportunity cost of inaction.\nFurther complicating matters is the possible occurrence of a rare and unpredictably timed adverse\nevent, such as job loss or serious illness, that would require a dynamic reformulation of one\u2019s plan of\naction. This ability to dynamically modify or cancel a planned action that is no longer advantageous\nor appropriate is known as inhibitory control in psychology.\nIn psychology and neuroscience, inhibitory control has been studied extensively using the stop-\nsignal (or countermanding) task [17]. In this task, subjects perform a simple two-alternative forced\nchoice (2AFC) discrimination task on a go stimulus, whereby one of two responses is required de-\npending on the stimulus. On a small fraction of trials, an additional stop signal appears after some\ndelay, which instructs the subject to withhold the discrimination or go response. As might be ex-\npected, the later the stop signal appears, the harder it is for subjects to stop the response [9] (see\nFigure 3). The classical model of the stop-signal task is the race model [11], which posits a race\nto threshold between independent go and stop processes. It also hypothesizes a stopping latency,\nthe stop-signal reaction time (SSRT), which is the delay between stop signal onset and successful\nwithholding of a go response. The (unobservable) SSRT is estimated as shown in Figure 1A, and is\n\n1\n\n\fthought to be longer in patient populations associated with inhibitory de\ufb01cit than in healthy controls\n(attention-de\ufb01cit hyperactivity disorder [1], obsessive-compulsive disorder [12], and substance de-\npendence [13]). Some evidence suggests a neural correlate of the SSRT [8, 14, 5]. Although the race\nmodel is elegant in its simplicity and captures key experimental data, it is descriptive in nature and\ndoes not address how the stopping latency and other elements of the model depend on various un-\nderlying cognitive factors. Consequently, it cannot explain why behavior and stopping latency varies\nsystematically across different experimental conditions or across different subject populations.\nWe present a normative, optimal decision-making framework for inhibitory control. We formalize\ninteractions among various cognitive components: the continual monitoring of noisy sensory in-\nformation, the integration of sensory inputs with top-down expectations, and the assessment of the\nrelative values of potential actions. Our model has two principal components: (1) a monitoring pro-\ncess, based on Bayesian statistical inference, that infers the go stimulus identity within each trial, as\nwell as task parameters across trials, (2) a decision process, formalized in terms of stochastic con-\ntrol, that translates current belief state based on sensory inputs into a moment-by-moment valuation\nof whether to choose one of the two go responses, or to wait longer. Given a certain belief state, the\nrelative values of the various actions depend both on experimental parameters, such as the fraction\nof stop trials and the dif\ufb01culty of go stimulus discrimination, as well as subject-speci\ufb01c parameters,\nsuch as learning rate and subjective valuation of rewards and costs. Within our normative model of\ninhibitory control, stopping latency is an emergent property, arising from interactions between the\nmonitoring and decision processes. We show that our model captures classical behavioral data in\nthe task, makes quantitative behavioral predictions under different experimental manipulations, and\nsuggests that the brain may be implementing near-optimal decision-making in the stop-signal task.\n\n2 Sensory processing as Bayes-optimal statistical inference\n\nWe model sensory processing in the stop-signal task as Bayesian statistical inference. In the gen-\nerative model (see Figure 1B for graphical model), there are two independent hidden variables,\ncorresponding to the identity of the go stimulus, d \u2208 {0, 1}, and whether or not the current trial is\na stop trial, s \u2208 {0, 1}. Priors over d and s re\ufb02ect experimental parameters: e.g. P (d = 1) = .5,\nP (s = 1) = .25 in typical stop signal experiments. Conditioned on d, a stream of iid inputs are\ngenerated on each trial, x1, . . . , xt, . . ., where t indexes small increments of time within a trial,\np(xt|d = 0) = f0(xt), and p(xt|d = 1) = f1(xt). For simplicity, we assume f0 and f1 to be\nBernoulli distributions with distinct rate parameters qd and 1\u2212qd, respectively. The dynamic vari-\nable zt denotes the presence/absence of the stop signal: if the stop signal appears at time \u03b8 then\nz1 = . . . = z\u03b8\u22121 = 0 and z\u03b8 = z\u03b8+1 = . . . = 1. On a go trial, s = 0, the stop-signal of course\nnever appears, P (\u03b8 = \u221e) = 1. On a stop trial, s = 1, we assume for simplicity that the onset of\nthe stop signal follows a constant hazard rate, i.e. \u03b8 is generated from an exponential distribution:\np(\u03b8|s = 1) = \u03bbe\u2212\u03bb\u03b8. Conditioned on zt, there is a separate iid stream of observations associated\nwith the stop signal: p(yt|zt = 0) = g0(yt), and p(yt|zt = 1) = g1(yt). Again, we assume for\nsimplicity that g0 and g1 are Bernoulli distributions with distinct rate parameters qs and 1\u2212 qs,\nrespectively.\n\nIn the recognition model, the posterior probability associated with signal identity pt\nd\nwhere xt (cid:44){x1, . . . , xt} denotes all the data observed so far, can be computed via Bayes\u2019 Rule:\n\n(cid:44) P (d = 1|xt),\n\npt\nd =\n\nd f1(xt) + (1 \u2212 pt\u22121\npt\u22121\n\nd\n\n=\n\n)f0(xt)\n\np0\nd\u03a0t\n\np0\nd\u03a0t\n\ni=1f1(xi)\ni=1f1(xi) + (1 \u2212 p0\n\nd)\u03a0t\n\ni=1f0(xi)\n\npt\u22121\nd f1(xt)\n\nz as the posterior probability that the stop signal has already appeared pt\nz\n\nInference about the stop signal is slightly more complicated due to the dynamics in zt. First, we\n(cid:44) P{\u03b8 \u2264 t|yt},\nde\ufb01ne pt\nwhere yt (cid:44){y1, . . . , yt}. It can also be computed iteratively:\nz + (1 \u2212 pt\u22121\n)h(t)) + g0(yt)(1 \u2212 pt\u22121\n\nz + (1 \u2212 pt\u22121\n\ng1(yt)(pt\u22121\n\ng1(yt)(pt\u22121\n\n)(1 \u2212 h(t))\n\npt\nz =\n\n)h(t))\n\nz\n\nz\n\nz\n\nwhere h(t) is the posterior probability that the stop-signal will appear in the next instant given it has\nnot appeared already, h(t)(cid:44) P (\u03b8 = t|\u03b8 > t\u22121, yt\u22121).\n\nr \u00b7 P (\u03b8 = t|s = 1)\n\nr\u03bbe\u2212\u03bbt\n\nh(t) =\n\nr \u00b7 P (\u03b8 > t \u2212 1|s = 1) + (1 \u2212 r)\n\n=\n\nre\u2212\u03bb(t\u22121) + (1 \u2212 r)\n\n2\n\n\fFigure 1: Modeling inhibitory control in the stop-signal task. (A) shows the race model for behavior\nin the stop-signal task [11]. Go and stop stimuli, separated by a stop signal delay (SSD), initiate\ntwo independent processes that race to thresholds and determine trial outcome. On go trials, noise\nin the go process results in a broad distribution over threshold-crossing times, i.e., the go reaction\ntime (RT) distribution. The stop process is typically modeled as deterministic, with an associated\nstop signal reaction time or SSRT. The SSRT determines the fraction of go responses successfully\nstopped: the go RT cumulative density function evaluated at SSD+SSRT should give the stopping\nerror rate at that SSD. Based on these assumptions, the SSRT is estimated from data given the go RT\ndistribution, and error rate as a function of SSD. (B) Graphical model for sensory input generation in\nour Bayesian model. Two separate streams of observations, {x1, . . . , xt, . . .} and {y1, . . . , yt, . . .},\nare associated with the go and stop stimuli, respectively. xt depend on the identity of the target,\nd \u2208 {0, 1}. yt depends on whether the current trial is a stop trial, s = {0, 1}, and whether the\nstop-signal has already appeared by time t, zt\u2208{0, 1}.\n\nwhere r = P (s = 1) is the prior probability of a stop trial. Note that h(t) does not depend on the\nobservations, since given that the stop signal has not yet appeared, whether it will appear in the next\ninstant does not depend on previous observations.\nIn the stop-signal task, a stop trial is considered a stop trial even if the subject makes the go response\nearly, before the stop signal is presented. Following this convention, we need to compute the pos-\ns, which depends both on the current\nterior probability that the current trial is a stop trial, denoted pt\nbelief about the presence of the stop signal, and the expectation that it will appear in the future:\n\n(cid:44) P (s = 1|yt) = pt\n\nz \u00b7 1 + (1 \u2212 pt\n\nz) \u00b7 P (s = 1|\u03b8 > t, yt)\n\npt\ns\n\nwhere P (s = 1|\u03b8 > t, yt) = P (s = 1|\u03b8 > t) again does not depend on past observations:\ne\u2212\u03bbt \u00b7 r\n\nP (\u03b8 > t|s = 1)P (s = 1)\n\nP (s = 1|\u03b8 > t) =\n\ne\u2212\u03bbt \u00b7 r + 1 \u00b7 (1 \u2212 r)\n\nP (\u03b8 > t|s = 1)P (s = 1) + P (\u03b8 > t|s = 0)P (s = 0)\ns).\nd, pt\n\n=\n\nFinally, we de\ufb01ne the belief state at time t to be the vector bt = (pt\nFigure 2A shows the evolution of belief states for various trial types: (1) go trials, where no stop\nsignal appears, (2) stop error (SE) trials, where a stop signal is presented but a response is made,\nand (3) stop success (SS) trials, where the stop signal is successfully processed to cancel the re-\nsponse. For simplicity, only trials where d = 1 are shown, and \u03b8s on stop trials is 17 steps. Due to\nstochasticity in the sensory information, the go stimulus is processed slower and the stop signal is\ndetected faster than average on some trials; these lead to successful stopping, with SE trials showing\nthe opposite trend. On all trials, ps shows an initial increase due to anticipation of the stop signal.\nParameters used for the simulation were chosen to approximate typical experimental conditions (see\ne.g., Figure 3), and kept constant throughtout except where explicitly noted. The results do not\nchange qualitatively when these settings are varied (data not shown).\n\n3 Decision making as optimal stochastic control\n\nIn order to understand behavior as optimal decision-making, we need to specify a loss function that\ncaptures the reward structure of the task. We assume there is a deadline D for responding on go\ntrials, and an opportunity cost of c per unit time on each trial. In addition, there is a penalty cs\nfor choosing to respond on a stop-signal trial, and a unit cost for making an error on a go trial (by\n\n3\n\n\fchoosing the wrong discrimination response or exceeding the deadline for responding). Because\nonly the relative costs matter in the optimization, we can normalize the coef\ufb01cients associated with\nall the costs such that one of them is unit cost. Let \u03c4 denote the trial termination time, so that \u03c4 = D\nif no response is made before the deadline, and \u03c4 < D if a response is made. On each trial, the policy\n\u03c0 produces a stopping time \u03c4 and a possible binary response \u03b4\u2208{0, 1}. The loss function is:\n\nl(\u03c4, \u03b4; d, s, \u03b8, D) = c\u03c4 + cs1{\u03c4 t+1}(cid:104)V t+1(bt+1)|bt(cid:105)bt+1 + 1{D=t+1}(c(t + 1) + 1 \u2212 pt\nQt\ns)\nV t(bt) = min(Qt\n\nw(bt), respectively.\n\ns + (1 \u2212 pt\n\ng(bt) and Qt\n\ns)min(pt\n\ng, Qt\n\nw)\n\nw, and the optimal decision policy\nThe value function is the smaller of the Q-factors Qt\nchooses the action corresponding to the smallest Q-factor. Note that the go action results in either\n\u03b4 = 1 or \u03b4 = 0, depending on whether p\u03c4\nd is greater or smaller than .5, respectively. The dependence\nw on V t+1 allows us to recursively compute the value function backwards in time. Given V t+1,\nof Qt\nwe can compute (cid:104)V t+1(cid:105) by summing over the uncertainty about the next observations xt+1, yt+1,\nsince the belief state bt+1 is a deterministic function of bt and the observations.\n(cid:104)V t+1(bt+1)|bt(cid:105)bt =\n\np(xt+1, yt+1|bt)V t+1(bt+1(bt, xt+1, yt+1))\n\n(cid:88)\n\ng and Qt\n\np(xt+1, yt+1|bt) = p(xt+1|pt\n\nxt+1,yt+1\n\np(xt+1|pt\np(yt+1|pt\n\nd) = pt\ns) = (pt\n\nd)p(yt+1|pt\ns)\ndf1(xt+1) + (1 \u2212 pt\nd)f0(xt+1)\nz + (1 \u2212 pt\n\nz)h(t + 1))g1(yt+1) + (1 \u2212 pt\n\nz)(1 \u2212 h(t + 1))g0(yt+1)\nThe initial condition of the value function can be computed exactly at the deadline since there is only\none outcome (subject is no longer allowed to go or stop): V D(bD) = cD + (1 \u2212 pD\ns ). We can then\nt=1 and the corresponding optimal decision policy backwards in time from t = D\u22121\ncompute {V t}D\ns into\nto t = 1. In our simulations, we do so numerically by discretizing the probability space for pt\n1000 bins; pt\nd is represented exactly using its suf\ufb01cient statistics. Note that dynamic programming\nis merely a convenient tool for computing the exact optimal policy. Our results show that humans\nand animals behave in a manner consistent with the optimal policy, indicating that the brain must\nuse computations that are similar in nature. The important question of how such a policy may be\ncomputed or approximated neurally will be explored in future work.\nFigure 2B demonstrates graphically how the Q-factors Qg, Qw evolve over time for the trial types\nindicated in Figure 2A. Re\ufb02ecting the sensory processing differences, SS trials show a slower drop\nin the cost of going, and a faster increase after the stop signal is processed; this is the converse\nof stop error trials. Note that although the average trajectory Qg does not dip below Qw in the\nnon-canceled (error) stop trials, there is substantial variability in the individual trajectories under a\nBernoulli observation model, and each one of them dips below Qw at some point. The histograms\nshow reaction time distributions for go and SE trials.\n\n4 Results\n\n4.1 Model captures classical behavioral data in the stop-signal task\n\nWe \ufb01rst show that our model captures the basic behavioral results characteristic of the stop-signal\ntask. Figure 3 compares our model predictions to data from Macaque monkeys performing a version\n\n4\n\n\fFigure 2: Mean trajectories of posteriors and Q-factors. (A) Evolution of the average belief states\npd and ps corresponding to go and stop signals, for various trials\u2013GO: go trials, SS: stop trials\nwith successfully canceled response, SE: stop error trials. Stochasticity results in faster or slower\nprocessing of the two sensory input streams; these lead to stop success or error. For simplicity, d = 1\nfor all trials in the \ufb01gure. The stop signal is presented at \u03b8s = 17 time steps (dashed vertical line);\nthe initial rise in ps corresponds to anticipation of a potential stop signal. (B) Go and Wait costs for\nthe same partitioning of trials, along with the reaction time distributions for go and SE trials. On SE\ntrials, the cost of going drops faster, and crosses below the cost of waiting before the stop signal can\nbe adequately processed. Although the average go cost does not drop below the average wait cost,\neach individual trajectory crosses over at various time points, as indicated by the RT histograms.\nSimulation parameters: qd = 0.68, qs = 0.72, \u03bb = 0.1, r = 0.25, D = 50 steps, cs = 50 \u2217 c, where\nc = 0.005 per time step. c is approximately the rate at which monkeys earn rewards in the task,\nwhich is equivalent to assuming that the cost of time (opportunity cost) should be set by the reward\nrate. Unless otherwise stated, these parameters are used in all the subsequent simulations. Thickness\nof lines indicates standard errors of the mean.\n\nFigure 3: Optimal decision-making model captures classical behavioral effects in the stop-signal\ntask. (A) Inhibition function: errors on stop trials increase as a function of SSD. (B) Effect repro-\nduced by our model. (C) Discrimination RT is faster on non-canceled stop trials than go trials. (D)\nEffect reproduced by our model. (A,C) Data of two monkeys performing the stopping task (from\n[9]).\n\nof the stop-signal task [9]. One of the basic measures of performance is the inhibition function,\nwhich is the average error rate on stop trials as a function of SSD. Error increases as SSD increases,\nas shown in the monkeys\u2019 behavior and also in our model (Figure 3A;B). Another classical result\nin the stop-signal task is that RT\u2019s on non-canceled (error) stop trials are on average faster than\nthose on go trials (Figure 3C). Our model also reproduces this result (Figure 3D). Intuitively, this\nis because inference about the go stimulus identity can proceed slowly or rapidly on different trials,\ndue to noise in the observation process. Non-canceled trials are those in which pd happens to evolve\nrapidly enough for a go response to be initiated before the stop signal is adequately processed. Go\ntrial RT\u2019s, on the other hand, include all trajectories, whether pd happens to evolve quickly or not\n(see Figure 2).\n\n5\n\n\f4.2 Effect of stop trial frequency on behavior\n\nThe overall frequency of stop signal trials has systematic effects on stopping behavior [6]. As the\nfraction of stop trials is increased, go responses slow down and stop errors decrease in a graded\nfashion (Figure 4A;B). In our model (Figure 4C;D), the stop signal frequency, r, in\ufb02uences the\nspeed with which a stop signal is detected, whereby larger r leads to greater posterior belief that\na stop signal is present, and also greater con\ufb01dence that a stop signal will appear soon even it has\nnot already. It therefore controls the tradeoff between going and stopping in the optimal policy. If\nstop signals are more prevalent, the optimal decision policy can use that information to make fewer\nerrors on stop trials, by delaying the go response, and by detecting the stop signal faster.\nEven in experiments where the fraction of stop trials is held constant, chance runs of stop or go\ntrials may result in \ufb02uctuating local frequency of stop trials, which in turn may lead to trial-by-trial\nbehavioral adjustments due to subjects\u2019 \ufb02uctuating estimate of r. Indeed, subjects speed up after\na chance run of go trials, and slow down following a sequence of stop trials [6] (see Figure 4E).\nWe model these effects by assuming that subjects believe that the stop signal frequency rk on trial\nk has probability \u03b1 of being the same as rk\u22121 and probability 1 \u2212 \u03b1 of being re-sampled from a\nprior distribution p0(r), chosen in our simulations to be a beta distribution with a bias toward small\nr (infrequent stop trials). Previous work has shown that this is essentially equivalent to using a\ncausal, exponential window to estimate the current rate of stop trials [20], where the exponential\ndecay constant is monotonically related to the assumed volatility in the environment in the Bayesian\nmodel. The probability of trial k being a stop trial, P (sk = 1|sk\u22121), where sk (cid:44){s1, . . . , sk}, is\nrkp(rk|sk\u22121)drk = (cid:104)rk|sk\u22121(cid:105) .\n\nP (sk = 1|rk)p(rk|sk\u22121)drk =\n\n(cid:90)\n\n(cid:90)\n\nP (sk = 1|sk\u22121) =\n\nIn other words, the predictive probability of seeing a stop trial is just the mean of the predictive\ndistribution p(rk|sk\u22121). We denote this mean as \u02c6rk. The predictive distribution is a mixture of the\nprevious posterior distribution and a \ufb01xed prior distribution, with \u03b1 and 1\u2212\u03b1 acting as the mixing\ncoef\ufb01cients, respectively:\n\np(rk|sk\u22121) = \u03b1p(rk\u22121|sk\u22121) + (1 \u2212 \u03b1)p0(rk)\n\nand the posterior distribution is updated according to Bayes\u2019 Rule:\np(rk|sk) \u221d P (sk|rk)p(rk|sk\u22121) .\n\nAs shown in Figure 4F, our model successfully explains observed sequential effects in behavioral\ndata. Since the majority of trials (75%) are go trials, a chance run of go trials impacts RT much\nless than a chance run of stop trials. The \ufb01gure also shows results for different values of \u03b1, with all\nother parameters kept constant. These values encode different expectations about volatility in the\nstop trial frequency, and produce slightly different predictions about sequential effects. Thus, \u03b1 may\nbe an important source of individual variability observed in the data, along with the other model\nparameters.\nRecent data shows that neural activity in the supplementary eye \ufb01eld is predictive of trial-by-trial\nslowing as a function of the recent stop trial frequency [15]. Moreover, microstimulation of supple-\nmentary eye \ufb01eld neurons results in slower responses to the go stimulus and fewer stop errors [16].\nTogether, this suggests that supplementary eye \ufb01eld may encode the local frequency of stop trials,\nand in\ufb02uence stopping behavior in a statistically appropriate manner.\n\n4.3\n\nIn\ufb02uence of reward structure on behavior\n\nThe previous section demonstrated how adjustments to behavior in the face of experimental manip-\nulations can be seen as instances of optimal decision-making in the stop signal task. An important\ncomponent of the race model for stopping behavior [11] is the SSRT, which is thought to be a stable,\nsubject-speci\ufb01c index of stopping ability. In this section, we demonstrate that the SSRT can be seen\nas an emergent property of optimal decision-making, and is consequently modi\ufb01ed in predictable\nways by experimental manipulation.\nLeotti & Wager showed that subjects can be biased toward stopping or going when the relative\npenalties associated with go and stop errors are experimentally manipulated [10]. Figure 5A;B\nshow that as subjects are biased toward stopping, they make fewer stop trial errors and have slower\n\n6\n\n\fFigure 4: Effect of global and local frequency of stop trials on behavior. We compare model predic-\ntions with experimental data from a monkey performing the stop-signal task (adapted from Emeric et\nal., 2007). (A) Go reaction times shift to the right (slower), as the fraction of stop trials is increased.\n(B) Inhibitory function (stop error rate as a function of SSD) shifts to the right (fewer errors), as\nthe fraction of stop trials is increased. Data adapted from [6]. (C;D) Our model predicts similar\neffects. (E) Sequential effects in reaction times from 6 subjects showing faster go RTs following\nlonger sequences of go trials (columns 1-3), and slower RTs following longer sequences of stop\ntrials (columns 4-6, data adapted from [6]). (F) Our model reproduces these changes; the parameter\n\u03b1 controls the responsiveness to trial history, and may explain inter-subject differences. Values of\nalpha: low=0.85, med=0.95, high=0.98.\n\ngo responses. Our model reproduces this behavior when cs, the parameter representing the cost of\nstopping, is set to small, medium and high values. Increasing the cost of a stop error induces an\nincrease in reaction time and an associated decrease in the fraction of stop errors. This is a direct\nconsequence of the optimal model attempting to minimize the total expected cost \u2013 with stop errors\nbeing more expensive, there is an incentive to slow down the go response in order to minimize the\npossibility of missing a stop signal.\nCritically, the SSRT in the human data decreases with increasing bias toward stopping (Figure 5C).\nAlthough the SSRT is not an explicit component of our model, we can nevertheless estimate it from\nthe reaction times and fraction of stop errors produced by our model simulations, following the race\nmodel\u2019s prescribed procedure [11]. Essentially, the SSRT is estimated as the difference between\nmean go RT and the SSD at which 50% stop errors are committed (see Figure 1). By reconciling\nthe competing demands of stopping and going in an optimal manner, the estimated SSRT from\nour simulations is automatically adjusted to mimic the observed human behavior (Figure 5F). This\nsuggests that the SSRT emerges naturally out of rational decision-making in the task.\n\n5 Discussion\n\nWe presented an optimal decision-making model for inhibitory control in the stop-signal task. The\nparameters of the model are either set directly by experimental design (cost function, stop frequency\nand timing), or correspond to subject-speci\ufb01c abilities that can be estimated from behavior (sensory\nprocessing); thus, there are no \u201cfree\u201d parameters. The model successfully captures classical behav-\nioral results, such as the increase in error rate on stop trials with the increase of SSD, as well as the\ndecreases in average response time from go trials to error stop trials. The model also captures more\nsubtle changes in stopping behavior, when the fraction of stop-signal trials, the penalties for various\ntypes of errors, and the history of experienced trials are manipulated. The classical model for the task\n\n7\n\n\fFigure 5: Effect of reward on stopping. (A-C) Data from human subjects performing a variant of\nthe stop-signal task where the ratio of rewards for quick go responses and successful stopping was\nvaried, inducing a bias towards going or stopping (Data from [10]). An increased bias towards\nstopping (i.e., fewer stop errors, (A)) is associated with an increase in the average reaction time on\ngo trials (B), and a decrease in the stopping latency or SSRT (C). (D-F) Our model captures this\nchange in SSRT as a function of the inherent tradeoff between RT and stop errors. Values of cs:\nlow=0.15, med=0.25, high=0.5.\n\n(the race model) does not directly explain or quantitatively predict these changes in behavior. More-\nover, the stopping latency measure prescribed by the race model (the SSRT) changes systematically\nacross various experimental manipulations, indicating that it cannot be used as a simplistic, global\nmeasure of inhibitory control for each subject. Instead, inhibitory control is a multifaceted function\nof factors such as subject-speci\ufb01c sensory processing rates, attentional factors, and internal/external\nbias towards stopping or going, which are explicitly related to parameters in our normative model.\nThe close correspondence of model predictions with human and animal behavior suggests that the\ncomputations necessary for optimal behavior are exactly or approximately implemented by the brain.\nWe used dynamic programming as a convenient tool to compute the optimal monitoring and deci-\nsional computations, but the brain is unlikely to use this computationally expensive method. Recent\nstudies of the frontal eye \ufb01elds (FEF, [8]) and superior colliculus [14] of monkeys show neural re-\nsponses that diverge on go and correct stop trials, indicating that they may encode computations\nleading to the execution or cancellation of movement. It is possible that optimal behavior can be\napproximated by a diffusion process implementing the race model [4, 19], with the rate and thresh-\nold parameters adjusted according to task demands. In future work, we will study more explicitly\nhow optimal decision-making can be approximated by a diffusion model implementation of the\nrace model (see e.g., [18], and how the parameters of such an implementation may be set to re\ufb02ect\ntask demands. We will also assess alternatives to the race model, in the form of other approximate\nalgorithms, in terms of their ability to capture behavioral data and explain neural data.\nOne major aim of our work is to understand how stopping ability and SSRT arise from various\ncognitive factors, such as sensitivity to rewards, learning capacity related to estimating stop signal\nfrequency, and the rate at which sensory inputs are processed. This composite view of stopping\nability and SSRT may help explain group differences in stopping behavior, in particular, differences\nin SSRT observed in a number of psychiatric and neurological conditions, such as substance abuse\n[13], attention-de\ufb01cit hyperactivity disorder [1], schizophrenia [3], obsessive-compulsive disorder\n[12], Parkinson\u2019s disease [7], Alzheimer\u2019s disease [2], et cetera. One of our goals for future research\nis to map group differences in stopping behavior to the parameters of our model, thus gaining insight\ninto exactly which cognitive components go awry in each dysfunctional state.\n\n8\n\n\fReferences\n[1] R.M. Alderson, M.D. Rapport, and M.J. Ko\ufb02er. Attention-de\ufb01cit/hyperactivity disorder and\nbehavioral inhibition: a meta-analytic review of the stop-signal paradigm. Journal of Abnormal\nChild Psychology, 35(5):745\u2013758, 2007.\n\n[2] H Amieva, S Lafont, S Auriacombe, N Le Carret, J F Dartigues, J M Orgogozo, and C Fab-\nrigoule. Inhibitory breakdown and dementia of the Alzheimer type: A general phenomenon?\nJournal of Clinical and Experimental Neuropsychology, 24(4):503\u2013516, 2992.\n\n[3] J C Badcock, P T Michie, L Johnson, and J Combrinck. Acts of control in schizophrenia:\n\nDissociating the components of inhibition. Psychological Medicine, 32(2):287\u2013297, 2002.\n\n[4] L. Boucher, T.J. Palmeri, G.D. Logan, and J.D. Schall. Inhibitory control in mind and brain: an\ninteractive race model of countermanding saccades. Psychological Review, 114(2):376\u2013397,\n2007.\n\n[5] CD Chambers, H Garavan, and MA Bellgrove.\n\nInsights into the neural basis of response\ninhibition from cognitive and clinical neuroscience. Neuroscience and Biobehavioral Reviews,\n33(5):631\u2013646, 2009.\n\n[6] E.E. Emeric, J.W. Brown, L. Boucher, R.H.S. Carpenter, D.P. Hanes, R. Harris, G.D. Logan,\nR.N. Mashru, M. Par\u00b4e, P. Pouget, V. Stuphorn, T.L. Taylor, and J Schall. In\ufb02uence of history\non saccade countermanding performance in humans and macaque monkeys. Vision research,\n47(1):35\u201349, 2007.\n\n[7] S Gauggel, M Rieger, and T Feghoff. Inhibition of ongoing responses in patients with Paking-\n\nson\u2019s disease. J. Neurol. Neurosurg. Psychiatry, (75):4, 539-544 2004.\n\n[8] D.P. Hanes, W.F. Patterson, and J.D. Schall. The role of frontal eye \ufb01eld in countermanding\nsaccades: Visual, movement and \ufb01xation activity. Journal of Neurophysiology, 79:817\u2013834,\n1998.\n\n[9] DP Hanes and JD Schall. Countermanding saccades in macaque. Visual Neuroscience,\n\n12(5):929, 1995.\n\n[10] L.A. Leotti and T.D. Wager. Motivational in\ufb02uences on response inhibition measures. J Exp\n\nPsychol Hum Percept Perform, 2009.\n\n[11] G.D. Logan and W.B. Cowan. On the ability to inhibit thought and action: A theory of an act\n\nof control. Psychological Review, 91(3):295\u2013327, 1984.\n\n[12] L. Menzies, S. Achard, S.R. Chamberlain, N. Fineberg, C.H. Chen, N. del Campo, B.J.\nSahakian, T.W. Robbins, and E. Bullmore. Neurocognitive endophenotypes of obsessive-\ncompulsive disorder. Brain, 130(12):3223, 2007.\n\n[13] J.T. Nigg, M.M. Wong, M.M. Martel, J.M. Jester, L.I. Puttler, J.M. Glass, K.M. Adams, H.E.\nFitzgerald, and R.A. Zucker. Poor response inhibition as a predictor of problem drinking and\nillicit drug use in adolescents at risk for alcoholism and other substance use disorders. Journal\nof Amer Academy of Child & Adolescent Psychiatry, 45(4):468, 2006.\n\n[14] M. Pare and D.P. Hanes. Controlled movement processing: superior colliculus activity associ-\n\nated with countermanded saccades. Journal of Neuroscience, 23(16):6480\u20136489, 2003.\n\n[15] V. Stuphorn, J.W. Brown, and J.D. Schall. Role of Supplementary Eye Field in Saccade Initi-\n\nation: Executive, Not Direct, Control. Journal of Neurophysiology, 103(2):801, 2010.\n\n[16] V. Stuphorn and J.D. Schall. Executive control of countermanding saccades by the supplemen-\n\ntary eye \ufb01eld. Nature neuroscience, 9(7):925\u2013931, 2006.\n\n[17] F. Verbruggen and G.D. Logan. Models of response inhibition in the stop-signal and stop-\n\nchange paradigms. Neuroscience & Biobehavioral Reviews, 33(5):647\u2013661, 2009.\n\n[18] F. Verbruggen and G.D. Logan. Proactive adjustments of response strategies in the stop-\nsignal paradigm. Journal of Experimental Psychology: Human Perception and Performance,\n35(3):835\u2013854, 2009.\n\n[19] K.F. Wong-Lin, P. Eckhoff, P. Holmes, and J.D. Cohen. Optimal performance in a counter-\n\nmanding saccade task. Brain Research, 2009.\n\n[20] AJ Yu and JD Cohen. Sequential effects: Superstition or rational behavior? Advances in\n\nNeural Information Processing Systems, 21:1873\u20131880, 2009.\n\n9\n\n\f", "award": [], "sourceid": 991, "authors": [{"given_name": "Pradeep", "family_name": "Shenoy", "institution": null}, {"given_name": "Angela", "family_name": "Yu", "institution": null}, {"given_name": "Rajesh", "family_name": "Rao", "institution": null}]}