{"title": "Risk Sensitive Particle Filters", "book": "Advances in Neural Information Processing Systems", "page_first": 961, "page_last": 968, "abstract": null, "full_text": "Risk Sensitive Particle Filters\n\nSebastian Thrun, John Langford, Vandi Verma\n\nSchool of Computer Science\nCarnegie Mellon University\n\nPittsburgh, PA 15213\n\n thrun,jcl,vandi\n\n@cs.cmu.edu\n\nAbstract\n\nWe propose a new particle \ufb01lter that incorporates a model of costs when\ngenerating particles. The approach is motivated by the observation that\nthe costs of accidentally not tracking hypotheses might be signi\ufb01cant in\nsome areas of state space, and next to irrelevant in others. By incorporat-\ning a cost model into particle \ufb01ltering, states that are more critical to the\nsystem performance are more likely to be tracked. Automatic calculation\nof the cost model is implemented using an MDP value function calcula-\ntion that estimates the value of tracking a particular state. Experiments in\ntwo mobile robot domains illustrate the appropriateness of the approach.\n\n1 Introduction\n\nIn recent years, particle \ufb01lters [3, 7, 8] have found widespread application in domains with\nnoisy sensors, such as computer vision and robotics [2, 5]. Particle \ufb01lters are powerful\ntools for Bayesian state estimation in non-linear systems. The key idea of particle \ufb01lters is\nto approximate a posterior distribution over unknown state variables by a set of particles,\ndrawn from this distribution.\n\nThis paper addresses a primary de\ufb01ciency of particle \ufb01lters: Particle \ufb01lters are insensitive\nto costs that might arise from the approximate nature of the particle representation. Their\nonly criterion for generating a particle is the posterior likelihood of a state. To illustrate this\npoint, consider the example of a Space Shuttle. Failures of the engine system are extremely\nunlikely, even in the presence of evidence to the contrary. Should we therefore not track\nthe possibility of such failures, just because they are unlikely? If failure to track such low-\nlikelihood events may incur high costs\u2014such as a mission failure\u2014these variables should\nbe tracked even when their posterior probability is low. This observation suggests that costs\nshould be taken into consideration when generating particles in the \ufb01ltering process.\n\nThis paper proposes a particle \ufb01lter that generates particles according to a distribution that\ncombines the posterior probability with a risk function. The risk function measures the\nimportance of a state location on future cumulative costs. We obtain this risk function via\nan MDP that calculates the approximate future risk of decisions made in a particular state.\nExperimental results in two robotic domains illustrate that our approach yields signi\ufb01cantly\nbetter results than a particle \ufb01lter insensitive to costs.\n\n\u0001\n\f2 The \u201cClassical\u201d Particle Filter\n\n\u0001\u0003\u0002\n\nParticle \ufb01lters are a popular means of estimating the state of partially observable control-\nlable Markov chains [3], sometimes referred to as dynamical systems [1]. To do so, particle\n\ufb01lters require two types of information: data, and a probabilistic generative model of the\nsystem. The data generally comes in two \ufb02avors: measurements (e.g., camera images) and\ncontrols (e.g., robot motion commands). The measurement at time\n,\nand\n. Thus, the data is given by\n\ndenotes the control asserted in the time interval\n\nwill be denoted\n\nand\n\n\u0005\u0006\b\u0007\n\t\f\u000b\r\u000f\u000e\n\u0004\b\u0012\u001c\u000b\u001d\u0004\u001e\u0016\u001c\u000b\u0019\u0018\u001a\u0018\u0019\u0018\u0019\u000b\r\u0004\u001f\u0002\n\n\u0002\u001b\u0010\n\n\u0002\u0011\u0010\n\n\u0001\u0013\u0012\u0014\u000b\u0015\u0001\u0017\u0016\u0013\u000b\u0019\u0018\u0019\u0018\u001a\u0018\u0019\u000b\u0015\u0001\u001a\u0002\nand the superscript \u0002\n !\u0005#\"\u001e\u0002\u0019$\n\u000b\u001d\u0004\n\n '\u0005\u0006\u0001\n\n\u000b\r\u0004\n\nFollowing common notation in the controls literature, we use the subscript\nevent at time\nto denote all events leading up to time\n\nto refer to an\n\n.\n\nParticle \ufb01lters, like any member of the family of Bayes \ufb01lters such as Kalman \ufb01lters and\nHMMs, estimate the posterior distribution of the state of the dynamical system conditioned\non the data,\n\n\u0002&% . They do so via the following recursive formula\n\u0002\u000f+,\u0012\n\u000b\r\u0004\n\n '\u0005#\"\nwhere(\n\nis a normalization constant. To calculate this posterior, three probability distri-\nbutions are required, which together are commonly referred as the probabilistic model of\nthe dynamical system: (1) A measurement model\nof measuring\ncharacterizes the effect of controls\nthe system is in state\nbution\nfor examples of such models in practical applications.\n\n\u0002\u000f+\b\u0012\n% , which describes the probability\n% , which\n\u0001\f\u0002\n\u0002\u000f+,\u0012\n% , which speci\ufb01es the user\u2019s knowledge about the initial system state. See [2, 5]\n\non the system state by specifying the probability that\n. (3) An initial state distri-\n\nEqn. 1 is easily derived under the common assumption that the system is Markov:\n\nwhen the system is in state\n\nafter executing control\n\n. (2) A control model\n\n\u0004.\u0002\u0015\u000b\u001d\"*\u0002\u000f+,\u0012\n\n\u0002\u000f+,\u0012\n !\u0005\u0006\u0001\n\n !\u0005#\",\u0002\u0019$\n\n '\u0005#\"\u001e/\n\nin state\n\n\u0002\u000f+,\u0012\n\n\u0002\u000f+\b\u0012\n\n !\u0005#\"\n\n '\u0005#\"\n\n%\u0003-\n\n\u0004!\u0002\n\n\"\u001e\u0002\n\n%*)\n\n(1)\n\n\u000b\u001d\"\n\n '\u0005#\"\u001f\u00020$\n\n\u000b\u001d\u0004\n\n132547658\n2\u0015;#<\u0015=5>\n2\u0015;#<\u0015=5>\n\n\u0002* !\u0005\u0006\u0001\u0017\u00020$\n\u0002* !\u0005\u0006\u0001\u0017\u00020$\n !\u0005\u0006\u0001\n\u0002* !\u0005\u0006\u0001\u0017\u00020$\n\n\u0002\u000f+\b\u0012\n\u000b\r\u0004\n '\u0005\u0006\"*\u0002\u0019$\n !\u0005#\"\n !\u0005#\"\u001f\u0002\u0019$\n\n\"\u001f\u00029\u000b\r\u0001\n\"\u001f\u0002\n%*)\n\"\u001f\u0002\n\n\u0002\u000f+,\u0012\n\n !\u0005#\"*\u0002\u0019$\n\u000b\u001d\u0004\n\u0002\u000f+\b\u0012\n\u000b\r\u0004\n\u0004.\u0002\r\u000b\r\"*\u0002\u000f+,\u0012\n\n\u0002\u000f+\b\u0012\n\u000b\r\"\n\n\u000b\r\u0004\n '\u0005\u0006\"\n\u0002\u000f+\b\u0012\n !\u0005#\"*\u0002\u000f+,\u0012\f$\n\n\u0002\u000f+,\u0012\n\u0002\u000f+\b\u0012\n\n\u000b\r\u0004\n\n\u0002\u000f+,\u0012\n\u0002\u000f+,\u0012\n\n\u000b\u001d\u0004\n%?-\n\n%\u0003-\n\"*\u0002\u000f+,\u0012\n\n\u0002\u000f+\b\u0012\n\n(2)\n\nNotice that this \ufb01lter, in the general form stated here, is commonly known as a Bayes \ufb01l-\nter. Approximations to Bayes \ufb01lters includes the Kalman \ufb01lter, the hidden Markov model,\nbinary \ufb01lters, and of course particle \ufb01lters. In many applications, the key concern in im-\n, and\nplementing this probabilistic \ufb01lter is the continuous nature of the states\nmeasurements\n. Even in discrete applications, the state space is often too large to compute\nthe entire posterior in reasonable time.\n\n, controls\n\nThe particle \ufb01lter addresses these concerns by approximating the posterior using sets of\nstate samples (particles):\n\nconsists of\n\nparticles\n\n\",A\n\nBDC\n\n\u00129GIHIHIH\n\nBFE\n\ngether, these particles approximates the posterior\nInitially, at time\n\n, the particles\n\nThe set@\n% . The\n '\u0005\u0006\"\n\n-th particle set@\n\n, for some large number\n\n). To-\n(e.g,\nis calculated recursively.\nare generated from the initial state distribution\n\n\tM\u000b\rN\fNMN\n\n\"'A\n\nBLC\n\n\u0002&% .@\nis then calculated recursively from@\n\n !\u0005#\"\n\n\"\u0003A\n\n\u000b\u001d\u0004\n\nBLC\n\nas follows:\n\n\u0002\u000f+,\u0012\n\n(3)\n\n\n\u0004\n\u0002\n\u0001\n\u0004\n\u0002\n\n\n\u0001\n\u0002\n\u0002\n$\n\u0001\n\u0002\n\u0002\n%\n\u0010\n(\n\u0002\n\u0002\n$\n\"\n\u0002\n\u0002\n$\n\u0004\n\u0002\n%\n$\n\u0001\n\"\n\u0002\n\u0002\n$\n\"\n\u0002\n\"\n\u0002\n\u0004\n\u0002\n\"\n\u0001\n\u0002\n\u0002\n%\n\u0010\n(\n\u0002\n%\n\u0001\n\u0002\n%\n:\n\u0010\n(\n%\n\u0001\n\u0002\n%\n\u0010\n(\n\u0002\n\u0002\n$\n\"\n\u0002\n\u0002\n$\n\u0001\n\u0002\n%\n$\n\u0001\n\u0002\n\"\n:\n\u0010\n(\n%\n)\n%\n\u0001\n\"\n\u0004\n\u0001\n@\n\u0002\n\u0010\n\n\u0002\n\u0001\nG\nJ\n\u0002\nK\n\u0002\nK\nK\n\u0010\n\u0002\n$\n\u0001\n\u0002\n\u0002\n\n\u0010\nN\n/\n/\n\n\u0002\n\f1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n\n\u0002\u000f+\b\u0012\n\nset@\n\nfor\n\nto\npick the\ndraw\nset\nadd\nendfor\nfor\n\ndo\n\n-th sample\n\n\u0010\u0005\u0004\n2\u0001\u0003\u0002\n\u0007&C\n\"?A\n\u0002\u000f+\b\u0012\t\b\n\u0004\u001f\u0002\u0015\u000b\u001d\"\n '\u0005\u0006\"*\u0002\u0019$\n\u0007&C\n\u0007&C\n\"'A\n\u0002\u000b\n\n\u0002\u000f+\b\u0012\n !\u0005\u0006\u0001\u0017\u00027$\n\u0007&C\n\",A\n\u0007&C\nto@\n2\u0001\u0003\u0002\n\u0007&C\n\r#\"'A\n\u0007&C\n\u000b\u000e\f\n\u0002\u0010\u000f\nfrom@\n2\u0001\u0003\u0002\nBDC\nto@\n\ndo\n\nto\n\ndraw\nadd\nendfor\n\n\"!A\n\nBDC\n\nwith probability proportional to\n\nBDC\n\n. It is a well-known fact that (for large\n\n. Lines\nLines 2 through 7 generates a new set of particles that incorporates the control\n8 through 11 apply a technique known as importance-weighted resampling [11] to account\nfor the measurement\n) the resulting weighted\nparticles are asymptotically distributed according to the desired posterior [12]\n\nIn recent years, researchers have actively developed various extensions of the basic particle\n\ufb01lter, capable of coping with degenerate situations that are often relevant in practice [3, 7,\n8]. The common aim of this rich body of literature, however, is to generate samples from\nthe posterior\ncosts, generating samples according to the posterior runs the risk of not capturing important\nevents that warrant action. Overcoming this de\ufb01ciency is the very aim of this paper.\n\n\u00025%\n\u000b\u001d\u0004\n\u0002&% . If different controls at different states infer drastically different\n\n !\u0005#\"\n\n\u0001\f\u0002\n\n\u0002\u0019$\n\n '\u0005\u0006\"\u001e\u0002\u0019$\n\n\u000b\u001d\u0004\n\n3 Risk Sensitive Particle Filters\n\nThis section describes a modi\ufb01ed particle \ufb01lter that is sensitive to the risk arising from the\napproximate nature of the particle representation. To arrive at a notion of risk, our approach\nrequires a cost function\n\n\u0005#\"\n\n\u000b\r\u0004\n\n\b\u0014\u0013\n\nThis function assigns real-valued costs to states and control. From a decision theoretic\npoint of view, the goal of risk sensitive sampling is to generate particles that minimize\nthe cumulative increase in cost due to the particle approximation. To translate this into a\npractical algorithm, we extend the basic paradigm in two ways. First, we modify the basic\nparticle \ufb01lters so that particles are generated in a risk-sensitive way, where the risk is a\n. Second, an appropriate risk function is de\ufb01ned that approximates the cumu-\nfunction of \u0012\nlative expected costs relative to tracking individual states. This risk function is calculated\nusing value iteration.\n\n(4)\n\n3.1 Risk-Sensitive Sampling\n\nRisk-sensitive sampling generates particles factoring in a risk function,\nall we have to ask of a risk function\nis that it be positive and \ufb01nite almost everywhere.\nNot all risk functions will be equally useful, however, so deriving the \u201cright\u201d risk function\nis important. Decision theory gives us a framework for deciding what the \u201cright\u201d action\nis in any given state. By considering approximation errors due to monte carlo sampling\nin decision theory and making a sequence of rough approximations, we can arrive at the\nchoice of\nspace. For now, let us simply assume are given a suitable risk function.\n\n% , which is discussed further below. The full derivation is omitted for lack of\n\n% . Formally,\n\n\u0005#\"\n\n\u0005#\"\n\n\u0002\n\u0010\n@\n\u0002\n\u0006\n\u0010\n\t\nK\n\u0006\n@\nA\n%\n\f\nA\n\u0002\n\u0010\n\u0002\n%\n\u0002\nA\n\u0002\n\u0011\n\u0010\n\t\nK\n\"\nA\n\u0002\n\u0002\n\f\nA\n\u0002\n\u0002\n\u0002\n\u0004\n\u0002\nK\n\u0001\n\u0002\n\u0001\n\u0002\n\u0012\n%\n\u0015\n\u0015\n\u0015\n\fRisk sensitive particle \ufb01lters generate samples that are distributed according to\n\n(5)\n\n(6)\n\n(7)\n\nBLC\n\n\u0005\u0006\"*\u0002\n\u0010\u0002\u0001\n\n\u000b\r\u0004\n\n !\u0005#\"\u001f\u0002\u0019$\n '\u0005\u0006\"!$\n\u0005#\"\n\n\u000b\u001d\u0004\n\n+,\u0012\n\n\u0002\u000f%\u001d-\n\nHere \n(5) is indeed a probability distribution. Thus, the probability that a state sample\nof@\n\nis not only a function of its posterior probability, but also of the risk\n\nis a normalization constant that ensures that the term in\nis part\n\nSampling from (5) is easily achieved by the following two modi\ufb01cations of the basic parti-\ncle \ufb01lter algorithm. First, the initial set of particles\n\nBLC\n% associated\n\nis generated from the distribution\n\nwith that sample.\n\n\"*\u000e\n\n\u0005#\"\n\nBLC\n\nset\n\n\u0005#\"\u001f/\n\u0007&C\n\n, since the risk function\n\nSecond, Line 5 of the particle \ufb01lter algorithm is replaced by the following assignment:\n\n '\u0005#\"\u001f/\n\u0005#\"\n\u0007&C\nWe conjecture that this simple modi\ufb01cation results in a particle \ufb01lter with samples dis-\ntributed according to \ncase\n\nwas explicitly incorporated in the construction of\nare distributed\n\n+\b\u0012\n\u0005\u0006\"\n !\u0005\u0006\u0001\n\u0007&C\n\u0007&C\n\u0002\u000f+,\u0012\n\u0002&% . Our conjecture is obviously true for the base\n\u0005#\"\u001f\u0002\n\u000b\r\u0004\n !\u0005#\"\u001f\u0002\u0019$\n\u0002\u000f+,\u00120% . Then Line 3 of the modi\ufb01ed algo-\n\u0002\u000f+\b\u0012\n\u000b\r\u0004\n\u0002\u000f+\b\u0012\n\u0005\u0006\"*\u0002\u000f+\b\u0012\n !\u0005#\"*\u0002\u000f+,\u0012\f$\n\u000b\r\u0004\n !\u0005#\"*\u0002\u000f+,\u0012\f$\n\u0002\u000f+\b\u0012\n\u0005\u0006\"*\u0002\u000f+\b\u0012\n\u0007&C\n\"\u0003A\n\u0007&C\n\u0002\u000f+\b\u0012\n\u0002\u000f+\b\u0012\u0019% . Samples generated in Line 9 are dis-\n\u0002\u000f+,\u0012\n\u000b\u001d\u0004\n '\u0005\u0006\"*\u0002\u0019$\n\u0004\u001f\u00027\u000b\r\"*\u0002\u000f+\b\u0012\n '\u0005\u0006\"*\u0002\u000f+,\u0012\f$\n\u0002\u000f+,\u0012\n\u0002\u000f+,\u0012\n\u000b\r\u0004\n !\u0005#\"*\u0002\u000f+,\u0012\u0013$\n\u0004.\u0002\u0015\u000b\r\"*\u0002\u000f+,\u0012\n !\u0005#\"*\u0002\u0019$\n\u0005\u0006\"*\u0002\u000f+\b\u0012\n+,\u0012\n\u0004\u001f\u00027\u000b\r\"*\u0002\u000f+\b\u0012\n '\u0005\u0006\"*\u0002\u0019$\n\u0002\u000f+\b\u0012\n\u0001\u001a\u00020$\n\u0005#\"\u001f\u0002\u000f+\b\u0012\n\u0005#\"\u001f\u0002\u000f+\b\u0012\n\"*\u0002\n '\u0005\n\u0002\u000f+,\u0012\n\u0004\u001f\u00027\u000b\u001d\"\u001f\u0002\u000f+\b\u0012\n\u0002\u000f+,\u0012\n\u0005\u0006\"*\u0002\n '\u0005\u0006\u0001\u0017\u00029$\n\"\u001f\u0002\n '\u0005\u0006\"*\u0002\u0019$\n '\u0005\u0006\"*\u0002\u000f+\b\u0012M$\n\u000b\u001d\u0004\nThis term is, up to the normalization constant \n, equivalent to the desired distri-\n+,\u0012\n\u0002\u000f+,\u0012\nbution (5) (see also eqn. 1), which proves our conjecture. Thus, the risk sensitive particle\n\ufb01lter successfully generates samples from a distribution that factors in the risk\n\n(see eqn. 6). By induction, let us assume that the particles in @\naccording to \nrithm generates\n\u0005#\"\u001f\u0002\u000f+\b\u0012\n\u0002\u000f+\b\u0012\n\u0002\u000f+\b\u0012\n\u0007&C\n\u0005#\"\u001f\u0002\n\nSubstituting in the modi\ufb01ed weight (eqn. 7) we \ufb01nd the \ufb01nal sample distribution:\n\n\u0002\u000f+,\u00129% . Line 4 gives us\n\n '\u0005\u0006\"*\u0002\u000f+\b\u0012M$\n\u0002\u000f+\b\u0012\n\ntributed according to\n\n\u0002\u000f+,\u0012\n\n\u0002\u000f+\b\u0012\n\n\u0002\u000f+\b\u0012\n\n\u0002\u000f+\b\u0012\n\n\"\u0003A\n\n\u000b\u001d\u0004\n\n(8)\n\n(9)\n\n.\n\n3.2 The Risk Function\n\nThe remaining question is: What is an appropriate risk function\n? How important is\nit to track a state\n? Our approach rests on the assumption that there are two possible\nsituations, one in which the state is tracked well, and one in which the state is tracked\npoorly. In the \ufb01rst situation, we assume that any controller will basically chose the right\ncontrol, whereas in the second situation, it is reasonable to assume that controls are selected\nanywhere between random and in the worst possible way. To complete this model, we\nassume that with small probability, the state estimator might move from \u201cwell-tracked\u201d to\n\u201clost track\u201d and vice versa.\n\nThese assumptions are suf\ufb01cient to formulate an MDP that models the effect of tracking\naccuracy on the expected costs. The MDP is de\ufb01ned over an augmented state space\nis a binary state variable that models the event that the\n(see also [10]), where\nestimator tracks the state with suf\ufb01cient (\n) accuracy. The various\nprobabilities of the MDP are easily obtained from the known probability distributions via\n\n) or insuf\ufb01cient (\n\n\u000b\u0005\u0004\n\n\u0006\"\n\nN3\u000b\u0019\t\n\n\u0004\u0014\u0002\n\n\u0004\u0017\u0002\n\n\n\u0002\n\u0015\n%\n\u0001\n\u0002\n\u0002\n%\n\u0002\n\u0003\n\u0015\n%\n\u0001\n\u0002\n\"\nA\n\u0002\n\u0002\n\u0015\nA\n\u0002\n\"\nA\n/\n\n/\n\u0015\n%\n%\n\f\nA\n\u0002\n\u0010\n\u0015\nA\n\u0002\n%\n\u0015\nA\n%\n\u0002\n$\n\"\nA\n\u0002\n%\n\u0002\n\u0015\n%\n\u0001\n\u0002\n\n\u0010\nN\n\u0015\n@\n/\n\u0015\n%\n\u0001\n\n\n\u0015\n%\n\u0001\n\u0002\n\n\n\u0015\n%\n%\n\u0001\n\f\nA\n\u0002\n\n\u0015\n%\n%\n\u0001\n%\n\u0015\n%\n\u0015\n%\n%\n\n\u0015\n%\n%\n\u0001\n%\n\u0010\n\n\u0015\n%\n%\n%\n\u0001\n%\n\u0002\n(\n\u0002\n\n\u0015\n\u0015\n\"\n\u000f\n\u0004\n\b\n\n\u0001\n\u0010\n\t\n\u0010\nN\n\fthe natural assumption that the variable\n\n:\n\nis conditionally independent of the system state\n\n '\u0005\n\n\u0006\"\n\n\u0006\"\n\u0002\u000f+,\u0012\n !\u0005\u0006\u0001\u0017\u00020$\n !\u0005\n\r\u0006\"\n\n\u0002\u000f+\b\u0012\n\r\u0006\"*\u00029\u000b\n\u00049\u0002\n\u000b\u0005\u0004\n\r\u0006\"\n\u000b\r\u0004\n\n '\u0005#\"\n '\u0005\u0006\u0001\u0017\u00020$\n '\u0005#\"\n\u0005#\"\n\n\u000b\u001d\"\n\"*\u0002\n !\u0005\n\u000b\r\u0004\n\n\u0002\u000f+,\u0012\n\n !\u0005\n\n\u0002\u000f+\b\u0012\n\n(10)\n\ndenotes the value function, and\u0007\n\n\f\u000b\u000e\r\u0010\u000f\n\u000b\u0014\u0013\u0016\u0015\n\n\u000b\r\u0004\n\nThe expressions on the left hand side de\ufb01ne all necessary components of the augmented\nmodel. The only unspeci\ufb01ed terms on the right hand side are the initial tracking probability\n\nset in accordance to the initial knowledge state (e.g., 1 if the initial system state is known, 0\nif it is unknown). For the latter, we adopt a model where with high likelihood the tracking\nstate is retained (\n\n% and the transition probabilities for the state estimator\n\u00049/\n '\u0005\n\u0002 ).\n\u00049\u0002\u000f+\b\u0012\n\u0004\u0019\u0002\nN3\u0018\nfor simplicity), in which\u0006\n\n\u0001\u0003\u0002 ) and with low likelihood it changes (\n '\u0005\n\nThe MDP is solved via value iteration. To model the effect of poor tracking on the control\npolicy, our approach uses the following value iteration rule (stated here without discounting\n\n% . The former must be\n\nis an auxiliary variable:\n\n\u0004\u0013\u0002\u0005\u0004\n\n\u00049\u0002\u000f+,\u0012\n\n\u0002\u000f+,\u0012\n\n !\u0005\n\n '\u0005\n\nN3\u0018\n\n\u0005\u000e\r#\"\n\u0019\u001b\u001a\n\n\u000e\u000e\u0017\n\r\u0006\"\u001d\u001c\n\u000b\u0005\u0004\u001e\u001c\n\n#\"'\u000b\u0005\u0004\n\u000b\u0005\u0004\n\u000b\u001d\u0004\n\n%\u0003-\n\"\u001d\u001c\n\n\u0005#\"\n\n\u0006\"\n\n\u0006\"\n\n\u000b\r\u0004\n\n\u000b\r\u0004\n\n\u000b\r\u0004\n\n\u0004\u001f\u001c\n\n\u0004*\u000e\n\n\u0005&\tM\u0007\n '\u0005\n\nThis value iteration rule considers two cases: When\n\ufb01ciently accurately, it is assumed that the controller acts by minimizing costs.\nhowever, the controller adopts a mixture of picking the worst possible control\n\n\u0006\"\n !\u0005#\"\u001d\u001c5$\n\u0004'\u000b\r\"\nrandom control. These two options are traded off by the gain factor\u0012\n\u201cpessimism\u201d of the approach. \u0012\nexperiments have yielded somewhat indifferent results relative to the choice of\u0012\nuse\u0012\n\n, i.e., the state is estimated suf-\nIf\n,\n, and a\n, which controls the\nsuggests that poor state estimation leads to the worst\nis more optimistic, in that control is assumed to be random. Our\n, and we\n\nfor all experiments reported here.\n\npossible control.\n\nis de\ufb01ned as the difference between the value function that arises from\n\n%\u0003-\n\nFinally, the risk\naccurate versus inaccurate state estimation:\n\nif\n\nif\n\n(11)\n\nN3\u0018\n\n\u0005\u0006\"\n\n\u0005\u0006\"\n\n\u0007 \u0006\n% can be shown to be strictly positive.\n\n\u0005\u0006\"\n\n\u0005#\"\n\nUnder mild assumptions,\n\n4 Experimental Results\n\n(12)\n\nWe have applied our approach to two complimentary real-world robotic domains: robot\nlocalization, and mobile robot diagnostics. Both yield superior results using our new risk\nsensitive approach when compared to the standard particle \ufb01lter.\n\n4.1 Mobile Robot Localization\n\nOur \ufb01rst evaluation domain involves the problem of localizing a mobile robot from sensor\ndata [2]. In our experiments, we focused on the most dif\ufb01cult of all localization problems:\n\n\u0004\n\"\n\u0002\n\u000b\n\u0004\n\u0002\n\u000f\n$\n\u0004\n\u0002\n\u000b\n\u000b\n\u0004\n\u000f\n%\n\u0010\n\u0002\n$\n\u0004\n\u0002\n%\n\u0004\n\u0002\n$\n\u0004\n%\n\u000f\n%\n\u0010\n%\n/\n/\n\u000f\n%\n\u0010\n/\n%\n\u0004\n/\n%\n\u0012\n\u0005\n\u0002\n\u000b\n\u0004\n\u0002\n\u000f\n\u0002\n%\n\u0010\n\u0012\n\u0002\n\u0002\n%\n\u0004\n\u0002\n$\n\u0004\n\u0010\n%\n\u0010\n\u0010\n%\n\u0010\nN\n\u0006\n\u0005\n\u000f\n%\n\u0010\n\b\n\t\n\u0011\n\u0007\n\u000b\n\u0004\n\u000f\n%\n\u0004\n\u0010\n\t\n\u0012\n\u0001\n\u0011\n\u0007\n\u0005\n\u000b\n\u0004\n\u000f\n%\n\u0012\n%\n\u0001\n)\n\u0007\n\u0005\n\u000b\n\u0004\n\u000f\n\u0004\n\u0010\nN\n\u0007\n\u0005\n\u000f\n%\n\u0010\n\u0012\n%\n\u0017\n\u0012\n\u0018\nE\n/\n)\n\u0006\n\u0005\n\u000f\n%\n$\n\u0004\n%\n\u0004\n\u0010\n\t\n\u0004\n\u0010\nN\n\u0004\n\u0010\n\t\n\u0012\n\u0010\nN\n\u0010\n\u0002\n\u0015\n\u0015\n%\n\u0010\n\u0006\n\u000b\n\u0004\n\u0010\nN\n%\n\u000b\n\u0004\n\u0010\n\t\n%\n\u0015\n\f(a)\n\n(b)\n\nB\n\nA\n\nC\n\nFigure 1: (a) Robot Pearl, as it interacts with elderly people at an assisted living facility in Oakmont,\nPA. (b) Occupancy grid map. Shown here are also three testing locations labeled A, B, and C, and\nregions of high costs (black contours).\n\n(a)\n\n(b)\n\nFigure 2: (a) Risk function \u0003 : the darker a location, the higher the risk. This function, which is\n\nused in the proposal distribution, is derived from the immediate risk function shown in Figure 1b. (b)\nSample of a uniform distribution, taking into consideration the risk function.\n\nrisk sensitive \ufb01lter\n\n12.3\n\nstandard \ufb01lter\n13.7\n35.2\n6.2\n14.1\n\n120 \u0004\n301 \u0004\n63.2 \u0004\n96.1 \u0004\n\n89.3 \u0004\n203 \u0004 37.6\n53.2 \u0004\n57.4 \u0004\n\n7.7\n10.3\n\nsteps to re-localize when ported to A\nsteps to re-localize when ported to B\nsteps to re-localize when ported to C\nnumber of violations after global kidnapping\n\nTable 1: Localization results for the kidnapped robot problem, which emulates a total localization\nfailure. Our new approach requires consistently fewer steps for re-localization in high-cost areas, and\ntherefore incurs less cost.\n\nThe kidnapped robot problem [4]. Here a well-localized robot is \u201ctele-ported\u201d to some\nunknown location and has to recover from this event. This problem plays an important\nrole in evaluating the robustness of a localization algorithm. Figure 1a shows the robot\nPearl, which has recently been deployed in an assisted living facility as an assistant to the\nelderly and cognitively frail. Our study is motivated by the fact that some of the robot\u2019s\noperational area is a densely cluttered dining room, where the robot is not allowed to cross\ncertain boundaries due to the danger of physically harming people. These boundaries are\nillustrated by the black contours shown in Figure 1b, which also depicts an occupancy grid\nmap of the facility. Beyond the boundaries, the robot\u2019s sensor are somewhat insuf\ufb01cient to\navoid collisions, since they can only sense obstacles at one speci\ufb01c height (34 cm).\n\nFigure 2a shows the risk function\n, projected into 2D. The darker a location, the higher\nthe risk. A sample set drawn from this risk function is shown in Figure 2b. This sample\nset represents a uniform posterior. Since risk sensitive particle \ufb01lters incorporate the risk\n\n\n\n\n\u0001\n\n\n\u0001\n\u0002\n\u0015\n\f(a)\n\nRover position at time step 1, 10, 22 and 35\n\n(b)\n\nv1\n\nv2\n\nW2\n\n(c)\n\nSy\n\nSx\n\nRy\n\nW1\n\nv3\n\nW3\n\nRx\n\nL\n\nv4\n\nW4\n\n6\n\n5\n\n4\n\n3\n\n2\n\n1\n\n0\n\n>\n\u2212\n \ny\n\nFigure 3: (a) The Hyperion rover, a mobile robot being developed at CMU. (b) Kinematic model. (c)\nRover position at time step 1, 10, 22 and 35.\n\nB\n\n\u22124\n\n\u22123\n\n\u22122\n\n\u22121\n\n0\n x \u2212>\n\n1\n\n2\n\n3\n\n4\n\n(a)\n\n10\n\ne\n\nt\n\n100 samples\n\n1000 samples\n\n10,000 samples\n\n100,000 samples\n\n10\n\n5\n\n10\n\n5\n\n10\n\n5\n\n20\n\n40\n\n0\n\n0\n\n20\n\n40\n\n0\n\n0\n\n20\n\n40\n\n0\n\n0\n\n20\n\n40\n\n5\n\n0\n\n0\n\n8\n\n6\n\n4\n\n2\n\n0\n\n1\n\n0\n\n20\n\n40\n\n8\n\n6\n\n4\n\n2\n\n0\n\n1\n\n0\n\n20\n\n40\n\n8\n\n6\n\n4\n\n2\n\n0\n\n1\n\n0\n\n20\n\n40\n\n8\n\n6\n\n4\n\n2\n\n0\n\n1\n\n0\n\n20\n\n40\n\nt\n\na\nS\n \ny\ne\nk\nL\n\nl\n\ni\n\n \nt\ns\no\nM\n\ne\nc\nn\na\ni\nr\na\nV\ne\np\nm\na\nS\n\n \n\nl\n\ns\ns\no\n\nl\n \n\n0\n\u2212\n1\n\n(b)\n\ne\n\nt\n\na\n\nt\ns\n \ny\ne\nk\n\nl\n\ni\nl\n \nt\ns\no\nM\n\ne\nc\nn\na\ni\nr\na\nv\n \n\nl\n\ne\np\nm\na\ns\n \n.\n\ng\nv\nA\n)\ns\ns\no\n\nl\n \n\n10\n\n5\n\n0\n\n15\n\n10\n\n5\n\n0\n\n1\n\n0\n\n0\n\u2212\n1\n(\n \nr\no\nr\nr\ne\nn\na\nd\ne\nM\n\n \n\ni\n\n0.5\n\n0\n\n0.1\n\n0\n\n100 samples\n\n1000 samples\n\n10000 samples\n\n10\n\n20\n\n30\n\n40\n\n20\n\n40\n\n10\n\n20\n\n30\n\n40\n\n10\n\n5\n\n0\n\n0\n\n15\n\n10\n\n0\n\n5\n\n0\n\n1\n\n0\n\n\u22121\n\n0\n\n1\n\n0\n\n20\n\n40\n\n20\n\n40\n\n10\n\n5\n\n0\n\n0\n\n15\n\n10\n\n0\n\n5\n\n0\n\n1\n\n0\n\n20\n\n40\n\n20\n\n40\n\n20\n\n40\n\n\u22121\n\n0\n\n20\n\n40\n\n1\n\n0\n\n\u22120.1\n\n10\n\n20\n\n30\n\nTime step \u2212>\n\n40\n\n\u22121\n\n0\n\n20\n\nTime step \u2212>\n\n40\n\n\u22121\n\n0\n\n20\n\nTime step \u2212>\n\n40\n\n0.5\n\n \n\ng\nn\ns\nu\n\ni\n\n \nr\no\nr\nr\n\nE\n\n0\n\n0\n\n20\n\n40\n\nTime step \u2212>\n\n0.5\n\n0\n\n0\n\n0.5\n\n0\n\n0\n\n0.5\n\n0\n\n0\n\n20\n\n40\n\nTime step \u2212>\n\n20\n\n40\n\nTime step \u2212>\n\n20\n\n40\n\nTime step \u2212>\n\ne\nc\nn\na\ni\nr\na\nv\n \nr\no\nr\nr\n\nE\n\nFigure 4: Tracking curves obtained with (a) plain particle \ufb01lters, and (b) our new risk sensitive \ufb01lter.\nThe bottom curves show the error, which is much smaller for our new approach.\n\nfunction into the sampling process, however, the density of samples is proportional to the\nrisk function\n\n.\n\nNumerical results are summarized in Table 1, using data collected in the facility at dinner\ntime. We ran two types of experiments: First, we kidnapped the robot to any of the locations\nmarked A, B, and C in Figure 1, and measured the number of sensor readings required to\nrecover from this global failure. All three locations are within the high-risk area so the\nrecovery time is signi\ufb01cantly shorter than with plain particle \ufb01lters. Second, we measured\nthe number of times a simple-minded planner that always looks at the most likely pose\nwould violate the safety constraint. Here we \ufb01nd that our approach is almost twice as\nsafe as the conventional particle \ufb01lter, at virtually the same computational expense. All\nexperiments were repeated 20 times, and rely on real-world data and operating conditions.\n\n4.2 Mobile Robot Diagnosis\n\nIn some domains, particle \ufb01lters simply cannot be applied in real time because of a large\nnumber of high loss and low probability events. One example is the fault detection domain\nillustrated in Figure 3. Our evaluation involves a data set where a rover is driven with a\ntime step,\nvariety of different control inputs in the normal operation mode. At the\nwheel #3 becomes stuck and locked against a rock. The wheel is then driven in the back-\nward direction, \ufb01xing the problem. The rover returns to the normal operation mode and\ncontinues to operate normally until the gear on wheel #4 breaks at the\ntime step. This\nfault is not recoverable and the controller just alters its input based on this state. Notice that\nboth failures lead to very similar sensor measurement, despite the fact that they are caused\nby quite different events.\n\n\u0004\fN\f\u0003\u0002\n\n\t\u0001\u0014\u0003\u0002\n\na\n\u0015\n\fTracking results in Figure 4 show that our approach yields superior results to the standard\nparticle \ufb01lter. Even though failures are very unlikely, our approach successfully identi\ufb01es\nthem due to the high risk associated with such a failure while the plain particle \ufb01lter essen-\ntially fails to do so. The estimation error is shown in the bottom row of Figure 4, which is\npractically zero for our approach when 1,000 or more samples are used. Vanialle particle\n\ufb01lters exhibit non-zero error even with 100,000 samples. However, it is important to no-\ntice that these results were obtained using simulated data and a hand-tuned loss function\napproach.\n\n5 Discussion\n\nWe have proposed a particle \ufb01lter algorithm that considers a cost model when generating\nsamples. The key idea is that particles are generated in proportion to their posterior likeli-\nhood and to the risk that arises relative to a control goal. An MDP algorithm was developed\nthat computes the risk function as a differential cumulative cost. Experimental results in\ntwo robotic domains show the superior performance of our new approach.\n\nAn alternative approach for solving the problem addressed in this paper would be to analyze\nthe estimation process as a partially observable Markov decision process (POMDP) [6].\nBounds on the performance loss due to the approximate nature of particle \ufb01lters can be\nfound in [9]. Pursuing the problem of risk-sensitive particle generation within the POMDP\nframework might be a promising future line of research.\n\nAcknowledgment\n\nThe authors thank Dieter Fox and Wolfram Burgard, who generously provided some the\nlocalization software on which this research is built. Financial support by DARPA (TMR,\nMARS, CoABS and MICA programs) and NSF (ITR, Robotics, and CAREER programs)\nis gratefully acknowledged.\n\nReferences\n[1] X. Boyen and D. Koller. Tractable inference for complex stochastic processes. In Proc. UAI-98.\n[2] F. Dellaert, D. Fox, W. Burgard, and S. Thrun. Monte carlo localization for mobile robots. In\n\nProc. ICRA-99.\n\n[3] A. Doucet, J.F.G. de Freitas, and N.J. Gordon, editors. Sequential Monte Carlo Methods In\n\nPractice. Springer, 2001.\n\n[4] S. Engelson. Passive Map Learning and Visual Place Recognition. PhD thesis, Computer\n\nScience Department, Yale University, 1994.\n\n[5] M. Isard and A. Blake. CONDENSATION: conditional density propagation for visual tracking.\n\nInternational Journal of Computer Vision, 29(1):5\u201328, 1998.\n\n[6] L.P. Kaelbling, M.L. Littman, and A.R. Cassandra. Planning and acting in partially observable\n\nstochastic domains. Arti\ufb01cial Intelligence, 101(1-2):99\u2013134, 1998.\n\n[7] J. Liu and R. Chen. Sequential monte carlo methods for dynamic systems. Journal of the\n\nAmerican Statistical Association, 93:1032\u20131044, 1998.\n\n[8] M. Pitt and N. Shephard. Filtering via simulation: auxiliary particle \ufb01lter.\n\nAmerican Statistical Association, 94:590\u2013599, 1999.\n\nJournal of the\n\n[9] P. Poupart, L.E. Ortiz, and C. Boutilier. Value-directed sampling methods for monitoring\n\nPOMDPs. In Proc. UAI-2001.\n\n[10] N. Roy and S. Thrun. Coastal navigation with mobile robot. In Proc. NIPS-99.\n[11] D.B. Rubin. Using the SIR algorithm to simulate posterior distributions. In Bayesian Statistics\n\n3. Oxford Univ. Press, 1988.\n\n[12] M.A. Tanner. Tools for Statistical Inference. Springer, 1996.\n\n\f", "award": [], "sourceid": 1948, "authors": [{"given_name": "Sebastian", "family_name": "Thrun", "institution": null}, {"given_name": "John", "family_name": "Langford", "institution": null}, {"given_name": "Vandi", "family_name": "Verma", "institution": null}]}