{"title": "Synthesizing Robust Plans under Incomplete Domain Models", "book": "Advances in Neural Information Processing Systems", "page_first": 2472, "page_last": 2480, "abstract": "Most current planners assume complete domain models and focus on generating correct plans. Unfortunately, domain modeling is a laborious and error-prone task, thus real world agents have to plan with incomplete domain models. While domain experts cannot guarantee completeness, often they are able to circumscribe the incompleteness of the model by providing annotations as to which parts of the domain model may be incomplete. In such cases, the goal should be to synthesize plans that are robust with respect to any known incompleteness of the domain. In this paper, we first introduce annotations expressing the knowledge of the domain incompleteness and formalize the notion of plan robustness with respect to an incomplete domain model. We then show an approach to compiling the problem of finding robust plans to the conformant probabilistic planning problem, and present experimental results with Probabilistic-FF planner.", "full_text": "Synthesizing Robust Plans\n\nunder Incomplete Domain Models\n\nTuan A. Nguyen\n\nSubbarao Kambhampati\n\nMinh Do\n\nArizona State University\n\nArizona State University\n\nNASA Ames Research Center\n\nnatuan@asu.edu\n\nrao@asu.edu\n\nminh.do@nasa.gov\n\nAbstract\n\nMost current planners assume complete domain models and focus on generating\ncorrect plans. Unfortunately, domain modeling is a laborious and error-prone task,\nthus real world agents have to plan with incomplete domain models. While do-\nmain experts cannot guarantee completeness, often they are able to circumscribe\nthe incompleteness of the model by providing annotations as to which parts of the\ndomain model may be incomplete. In such cases, the goal should be to synthesize\nplans that are robust with respect to any known incompleteness of the domain. In\nthis paper, we \ufb01rst introduce annotations expressing the knowledge of the domain\nincompleteness and formalize the notion of plan robustness with respect to an in-\ncomplete domain model. We then show an approach to compiling the problem of\n\ufb01nding robust plans to the conformant probabilistic planning problem, and present\nexperimental results with Probabilistic-FF planner.\n\n1\n\nIntroduction\n\nIn the past several years, signi\ufb01cant strides have been made in scaling up plan synthesis techniques.\nWe now have technology to routinely generate plans with hundreds of actions. All this work, how-\never, makes a crucial assumption\u2014that the action models of an agent are completely known in\nadvance. While there are domains where knowledge-engineering such detailed models is necessary\nand feasible (e.g., mission planning domains in NASA and factory-\ufb02oor planning), it is increasingly\nrecognized (c.f. [13]) that there are also many scenarios where insistence on correct and complete\nmodels renders the current planning technology unusable. The incompleteness in such cases arises\nbecause domain writers do not have the full knowledge of the domain physics. One tempting idea is\nto wait until the models become complete, either by manual revision or by machine learning. Alas,\nthe users often don\u2019t have the luxury of delaying their decision making. For example, although there\nexist efforts [1, 26] that attempt to either learn models from scratch or revise existing ones, their\noperation is contingent on the availability of successful plan traces, or access to execution experi-\nence. There is thus a critical need for planning technology that can get by with partially speci\ufb01ed\ndomain models, and yet generate plans that are \u201crobust\u201d in the sense that they are likely to execute\nsuccessfully in the real world.\n\nThis paper addresses the problem of formalizing the notion of plan robustness with respect to an\nincomplete domain model, and connects the problem of generating a robust plan under such model\nto conformant probabilistic planning [15, 11, 2, 4]. Following Garland & Lesh [7], we shall assume\nthat although the domain modelers cannot provide complete models, often they are able to provide\nannotations on the partial model circumscribing the places where it is incomplete.In our framework,\nthese annotations consist of allowing actions to have possible preconditions and effects (in addition\nto the standard necessary preconditions and effects).\n\nAs an example, consider a variation of the Gripper domain, a well-known planning benchmark\ndomain. The robot has one gripper that can be used to pick up balls, which are of two types light and\nheavy, from one room and move them to another room. The modeler suspects that the gripper may\nhave an internal problem, but this cannot be con\ufb01rmed until the robot actually executes the plan. If\nit actually has the problem, the execution of the pick-up action succeeds only with balls that are not\n\n1\n\n\fheavy, but if it has no problem, it can always pickup all types of balls. The modeler can express\nthis partial knowledge about the domain by annotating the action with a statement representing the\npossible precondition that balls should be light.\n\nIncomplete domain models with such possible preconditions and effects implicitly de\ufb01ne an expo-\nnential set of complete domain models, with the semantics that the real domain model is guaranteed\nto be one of these. The robustness of a plan can now be formalized in terms of the cumulative prob-\nability mass of the complete domain models under which it succeeds. We propose an approach that\ncompiles the problem of \ufb01nding robust plans into the conformant probabilistic planning problem.\nWe then present empirical results showing interesting relation between aspects such as the amount\ndomain incompleteness, solving time and plan quality.\n\n2 Problem Formulation\nWe de\ufb01ne an incomplete domain model eD as eD = hF, Ai, where F = {p1, p2, ..., pm} is a set of\n\npropositions, A is a set of actions a, each might be incompletely speci\ufb01ed. We denote T and F as\nthe true and false truth values of propositions. A state s \u2286 F is a set of propositions. In addition\nto proposition sets that are known as its preconditions P re(a) \u2286 F , add effects Add(a) \u2286 F and\ndelete effects Del(a) \u2286 F , each action a \u2208 A also contains the following annotations:\n\nneed as its preconditions.\n\n\u2022 Possible precondition set gP re(a) \u2286 F \\ P re(a) contains propositions that action a might\n\u2022 Possible add (delete) effect set gAdd(a) \u2286 F \\ Add(a) (gDel(a) \u2286 F \\ Del(a)) contains\n\npropositions that the action a might add (delete, respectively) after its execution.\n\na (p), wadd\n\na\n\n(p) and wdel\n\nIn addition, each possible precondition, add and delete effect p of the action a is associated with\na weight wpre\na (p) < 1) representing the\ndomain writer\u2019s assessment of the likelihood that p will actually be realized as a precondition, add\nand delete effect of a (respectively) during plan execution. Possible preconditions and effects whose\nlikelihood of realization is not given are assumed to have weights of 1\n2 . Propositions that are not\nlisted in those \u201cpossible lists\u201d of an action are assumed to be not affecting or being affected by the\naction.1\n\na (p) (0 < wpre\n\na (p), wadd\n\n(p), wdel\n\na\n\ndomain models whose actions have all the necessary preconditions, adds and deletes, and a sub-\n\nGiven an incomplete domain model eD, we de\ufb01ne its completion set hheDii as the set of complete\nset of the possible preconditions, possible adds and possible deletes. Since any subset of gP re(a),\ngAdd(a) and gDel(a) can be realized as preconditions and effects of action a, there are exponen-\ntially large number of possible complete domain models Di \u2208 hheDii = {D1, D2, ..., D2K }, where\nK = Pa\u2208A(|gP re(a)| + |gAdd(a)| + |gDel(a)|). For each complete model Di, we denote the\ncorresponding sets of realized preconditions and effects for each action a as P rei(a), Addi(a)\nand Deli(a); equivalently, its complete sets of preconditions and effects are P re(a) \u222a P rei(a),\nAdd(a) \u222a Addi(a) and Del(a) \u222a Deli(a).\n\nThe projection of a sequence of actions \u03c0 from an initial state I according to an incomplete domain\n\nmodel eD is de\ufb01ned in terms of the projections of \u03c0 from I according to each complete domain model\nDi \u2208 hheDii:\n\n(1)\n\n\u03b3(\u03c0, I, eD) = {\u03b3(\u03c0, I, Di) | Di \u2208 hheDii}\n\nwhere the projection over complete models is de\ufb01ned in the usual STRIPS way, with one important\ndifference. Speci\ufb01cally, the result of applying an action a, which is complete in Di, in a state s is\nde\ufb01ned as followed:\n\n\u03b3(hai, s, Di) = (s \\ (Del(a) \u222a Deli(a))) \u222a (Add(a) \u222a Addi(a)),\n\nif all preconditions of a are satis\ufb01ed in s, and is taken to be s otherwise (rather than as an unde\ufb01ned\nstate)\u2014in other words, actions in our setting have \u201csoft\u201d preconditions and thus are applicable in any\nstate. Such a generous execution semantics (GES) is critical from an application point of view: With\n\n1Our incompleteness annotations therefore can also be used to model domains in which the domain writer\ncan only provide lists of known preconditions/effects of actions, and optionally specifying those known to be\nnot in the lists.\n\n2\n\n\fincomplete models, failure of actions should be expected, and the plan needs to be \u201crobusti\ufb01ed\u201d\nagainst them during synthesis. The GES facilitates this by ensuring that the plan as a whole does\nnot have to fail if an individual action fails (without it, failing actions doom the plan and thus cannot\nbe supplanted). The resulting state of applying a sequence of complete actions \u03c0 = ha1, ..., ani in s\nwith respects to Di is de\ufb01ned as:\n\n\u03b3(\u03c0, s, Di) = \u03b3(hani, \u03b3(ha1, ..., an\u22121i, s, Di), Di).\n\npropositions that are true in the initial state (and all the remaining are false), and G is the set of\n\nA planning problem with incomplete domain eD is eP = heD, I, Gi where I \u2286 F is the set of\ngoal propositions. An action sequence \u03c0 is considered a valid plan for eP if \u03c0 solves the problem in\nat least one completion of hheDii. Speci\ufb01cally, \u2203Di\u2208hh eDii\u03b3(\u03c0, I, Di) |= G. Given that hheDii can be\n\nexponentially large in terms of possible preconditions and effects, validity is too weak to guarantee\non the quality of the plan. What we need is a notion that \u03c0 succeeds in most of the highly likely\n\ncompletions of eD. We do this in terms of a robustness measure, which will be presented in the next\n\nsection.\n\nModeling assumptions underlying our for-\nmulation: From the modeling point of view,\nthe possible precondition and effect sets can\nbe modeled at either the grounded action or\naction schema level (and thus applicable to\nall grounded actions sharing the same action\nschema).\nFrom a practical point of view,\nhowever, incompleteness annotations at ground\nlevel hugely increase the burden on domain\nwriters. In our formal treatment, we therefore\nassume that annotations are speci\ufb01ed at the schema level.\n\nFigure 1: Decription of incomplete action schema\npick-up in Gripper domain.\n\nSince possible preconditions and effects can be represented as random variables, they can in prin-\nciple be modeled using graphical models such as Makov Logic Networks and Bayesian Networks\n[14]. Though it appears to be an interesting technical challenge, this would require a signi\ufb01cant\nadditional knowledge input from the domain writer, and thus less likely to be helpful in practice. We\ntherefore assume that the possible preconditions and effects are uncorrelated, thus can be realized\nindependently (both within each action schema and across different ones).\n\nExample: Figure 1 shows the description of incomplete action pick-up(?b - ball,?r - room) as\ndescribed above at the schema level. In addition to the possible precondition (light ?b) on the weight\nof the ball ?b, we also assume that since the modeler is unsure if the gripper has been cleaned or\nnot, she models it with a possible add effect (dirty ?b) indicating that the action might make the\nball dirty. Those two possible preconditions and effects can be realized independently, resulting\nin four possible candidate complete domains (assuming all other action schemas in the domain are\ncompletely described).\n\n3 A Robustness Measure for Plans\nThe robustness of a plan \u03c0 for the problem eP = heD, I, Gi is de\ufb01ned as the cumulative probability\nmass of the completions of eD under which \u03c0 succeeds (in achieving the goals). More formally, let\nPr(Di) be the probability distribution representing the modeler\u2019s estimate of the probability that\na given model in hheDii is the real model of the world (such that PDi\u2208hh eDii\nPr(Di) = 1). The\n\nrobustness of \u03c0 is de\ufb01ned as follows:\n\nR(\u03c0, eP : heD, I, Gi)\n\ndef\n\u2261\n\nX\n\nDi\u2208hh eDii,\u03b3(\u03c0,I,Di)|=G\n\nPr(Di)\n\n(2)\n\nIt is easy to see that if R(\u03c0, eP) > 0, then \u03c0 is a valid plan for eP.\nNote that given the uncorrelated incompleteness assumption, the probability Pr(Di) for a model\nDi \u2208 hheDii can be computed as the product of the weights wpre\na (p) for all\na \u2208 A and its possible preconditions/effects p if p is realized in the model (or the product of their\n\u201ccomplement\u201d 1 \u2212 wpre\n\n(p), and 1 \u2212 wdel\n\na (p), wadd\n\n(p), and wdel\n\na\n\na (p), 1 \u2212 wadd\n\na\n\na (p) if p is not realized).\n\n3\n\n\fExample: Figure 2 shows an example with an\n\nincomplete domain model eD = hF, Ai with\nF = {p1, p2, p3} and A = {a1, a2} and a\nsolution plan \u03c0 = ha1, a2i for the problem\neP = heD, I = {p2}, G = {p3}i. The in-\ncomplete model is: P re(a1) = \u2205, gP re(a1) =\n{p1}, Add(a1) = {p2, p3}, gAdd(a1) = \u2205,\nDel(a1) = \u2205, gDel(a1) = \u2205; P re(a2) = {p2},\ngP re(a2) = \u2205, Add(a2) = \u2205, gAdd(a2) = {p3},\nDel(a2) = \u2205, gDel(a2) = {p1}. Given that\n\nthe total number of possible preconditions and\neffects is 3, the total number of completions\n\n(|hheDii|) is 23 = 8, for each of which the plan\n\nFigure 2: Example for a set of complete candidate\ndomain models, and the corresponding plan sta-\ntus. Circles with solid and dash boundary respec-\ntively are propositions that are known to be T and\nmight be F when the plan executes (see more in\ntext).\n\n\u03c0 may succeed or fail to achieve G, as shown\nin the table. In the \ufb01fth candidate model, for\ninstance, p1 and p3 are realized as precondition\nand add effect of a1 and a2, whereas p1 is not\na delete effect of action a2. Even though a1\ncould not execute (and thus p3 remains false in\nthe second state), the goal eventually is achieved by action a2 with respects to this candidate model.\nOverall, there are two of eight candidate models where \u03c0 fails and six for which it succeeds. The\nrobustness value of the plan is R(\u03c0) = 3\n4 if Pr(Di) is the uniform distribution. However, if the\ndomain writer thinks that p1 is very likely to be a precondition of a1 and provides wpre\na1 (p1) = 0.9,\nthe robustness of \u03c0 decreases to R(\u03c0) = 2 \u00d7 (0.9 \u00d7 0.5 \u00d7 0.5) + 4 \u00d7 (0.1 \u00d7 0.5 \u00d7 0.5) = 0.55 (as\nintutively, the last four models with which \u03c0 succeeds are very unlikely to be the real one). Note that\nunder the standard non-generous execution semantics (non-GES) where action failure causes plan\nfailure, the plan \u03c0 would be mistakenly considered failing to achieve G in the \ufb01rst two complete\nmodels, since a2 is prevented from execution.\n\n3.1 A Spectrum of Robust Planning Problems\n\nGiven this set up, we can now talk about a spectrum of problems related to planning under incom-\nplete domain models:\n\nplan \u03c0\u2217.\n\nRobustness Assessment (RA): Given a plan \u03c0 for the problem eP, assess the robustness of \u03c0.\nMaximally Robust Plan Generation (RG\u2217): Given a problem eP, generate the maximally robust\nGenerating Plan with Desired Level of Robustness (RG\u03c1): Given a problem eP and a robustness\nc ): Given a problem eP and a cost bound c, generate a\nCost-sensitive Robust Plan Generation (RG\u2217\n\nthreshold \u03c1 (0 < \u03c1 \u2264 1), generate a plan \u03c0 with robustness greater than or equal to \u03c1.\n\nplan \u03c0 of maximal robustness subject to cost bound c (where the cost of a plan \u03c0 is de\ufb01ned\nas the cumulative costs of the actions in \u03c0).\n\nIncremental Robusti\ufb01cation (RIc): Given a plan \u03c0 for the problem eP, improve the robustness of\n\n\u03c0, subject to a cost budget c.\n\nThe problem of assessing robustness of plans, RA, can be tackled by compiling it into a weighted\nmodel-counting problem. The following theorem shows that RA with uniform distribution of candi-\ndate complete models is complete for #P complexity class [22], and thus the robustness assessment\nproblem is at least as hard as NP-complete.2\nTheorem 1. The problem of assessing plan robustness with the uniform distribution of candidate\ncomplete models is #P -complete.\n\nFor plan synthesis problems, we can talk about either generating a maximally robust plan, RG\u2217, or\n\ufb01nding a plan with a robustness value above the given threshold, RG\u03c1. A related issue is that of the\n\n2The proof is based on a counting reduction from the problem of counting satisfying assignments for\n\nMONOTONE-2SAT [23]. We omit it due to the space limit.\n\n4\n\n\finteraction between plan cost and robustness. Often, increasing robustness involves using additional\n(or costlier) actions to support the desired goals, and thus comes at the expense of increased plan\ncost. We can also talk about cost-constrained robust plan generation problem RG\u2217\nc . Finally, in\npractice, we are often interested in increasing the robustness of a given plan (either during iterative\nsearch, or during mixed-initiative planning). We thus also have the incremental variant RIc. In the\nnext section, we will focus on the problem of synthesizing plans with at least a robustness value \u03c1.\n\n4 Synthesizing Robust Plans\nGiven a planning problem eP with an incomplete domain eD, the ultimate goal is to synthesize a plan\n\nhaving a desired level of robustness, or one with maximal robustness value. In this section, we will\nshow that the problem of generating plan with at least \u03c1 robustness (0 < \u03c1 \u2264 1), can be compiled\ninto an equivalent conformant probabilistic planning problem. The most robust plan can then be\nfound with a sequence of increasing threshold values.\n\n4.1 Conformant Probabilistic Planning\n\nFollowing the formalism in [4], a domain in conformant probabilistic planning (CPP) is a tuple\nD\u2032 = hF \u2032, A\u2032i, where F \u2032 and A\u2032 are the sets of propositions and probabilistic actions, respectively.\nA belief state b : 2F \u2032\n\u2192 [0, 1] is a distribution of states s \u2286 F \u2032 (we denote s \u2208 b if b(s) > 0). Each\naction a\u2032 \u2208 A\u2032 is speci\ufb01ed by a set of preconditions P re(a\u2032) \u2286 F \u2032 and conditional effects E(a\u2032).\nFor each e = (cons(e), O(e)) \u2208 E(a\u2032), cons(e) \u2286 F \u2032 is the condition set and O(e) determines the\nset of outcomes \u03b5 = (P r(\u03b5), add(\u03b5), del(\u03b5)) that will add and delete proposition sets add(\u03b5), del(\u03b5)\ninto and from the resulting state with the probability P r(\u03b5) (0 \u2264 P r(\u03b5) \u2264 1 , P\u03b5\u2208O(e) P r(\u03b5) = 1).\nAll condition sets of the effects in E(a\u2032) are assumed to be mutually exclusive and exhaustive. The\naction a\u2032 is applicable in a belief state b if P re(a\u2032) \u2286 s for all s \u2208 b, and the probability of a state\ns\u2032 in the resulting belief state is ba\u2032 (s\u2032) = Ps\u2287P re(a\u2032) b(s)P\u03b5\u2208O\u2032(e) P r(\u03b5), where e \u2208 E(a\u2032) is\nthe conditional effect such that cons(e) \u2286 s, and O\u2032(e) \u2286 O(e) is the set of outcomes \u03b5 such that\ns\u2032 = s \u222a add(\u03b5) \\ del(\u03b5).\nGiven the domain D\u2032, a problem P \u2032 is a quadruple P \u2032 = hD\u2032, bI , G\u2032, \u03c1\u2032i, where bI is an initial\nbelief state, G\u2032 is a set of goal propositions and \u03c1\u2032 is the acceptable goal satisfaction probability. A\nsequence of actions \u03c0\u2032 = (a\u2032\ni is applicable in the belief state bi\n(assuming b1 \u2261 bI ), which results in bi+1 (1 \u2264 i \u2264 n), and it achieves all goal propositions with at\nleast \u03c1\u2032 probability.\n\nn) is a solution plan for P \u2032 if a\u2032\n\n1, ..., a\u2032\n\n4.2 Compilation\n\nGiven an incomplete domain model eD = hF, Ai and a planning problem eP = heD, I, Gi, we now\ndescribe a compilation that translates the problem of synthesizing a solution plan \u03c0 for eP such\nthat R(\u03c0, eP) \u2265 \u03c1 to a CPP problem P \u2032. At a high level, the realization of possible preconditions\np \u2208 gP re(a) and effects q \u2208 gAdd(a), r \u2208 gDel(a) of an action a \u2208 A can be understood as\nbeing determined by the truth values of hidden propositions ppre\nthat are certain\n(i.e. unchanged in any world state) but unknown. Speci\ufb01cally, the applicability of the action in\na state s \u2286 F depends on possible preconditions p that are realized (i.e. ppre\na = T), and their\ntruth values in s. Similarly, the values of q and r are affected by a in the resulting state only if\nthey are realized as add and delete effects of the action (i.e., qadd\na = T). There are\ntotally 2| gP re(a)|+| gAdd(a)|+|gDel(a)| realizations of the action a, and all of them should be considered\nsimultaneously in checking the applicability of the action and in de\ufb01ning corresponding resulting\nstates.\n\na = T, rdel\n\na , qadd\n\na\n\nand rdel\n\na\n\nWith those observations, we use multiple conditional effects to compile away incomplete knowledge\non preconditions and effects of the action a. Each conditional effect corresponds to one realization of\nthe action, and can be \ufb01red only if p = T whenever ppre\na = T, and adding (removing) an effect q (r)\ninto (from) the resulting state depending on the values of qadd\na , respectively) in the realization.\n\n(rdel\n\na\n\nWhile the partial knowledge can be removed, the hidden propositions introduce uncertainty into\nthe initial state, and therefore making it a belief state. Since actions are always applicable in our\nformulation, resulting in either a new or the same successor state, preconditions P re(a) must be\nmodeled as conditions of all conditional effects. We are now ready to formally specify the resulting\ndomain D\u2032 and problem P \u2032.\n\n5\n\n\fa\n\na\n\na\n\na\n\n, nrdel\n\nFor each action a \u2208 A, we introduce new propositions ppre\nand their negations nppre\na ,\nfor each p \u2208 gP re(a), q \u2208 gAdd(a) and r \u2208 gDel(a) to determine whether they are\nnqadd\nrealized as preconditions and effects of a in the real domain.3 Let Fnew be the set of those new\npropositions, then F \u2032 = F \u222a Fnew is the proposition set of D\u2032.\nEach action a\u2032 \u2208 A\u2032 is made from one action a \u2208 A such that P re(a\u2032) = \u2205, and E(a\u2032) consists of\n2| gP re(a)|+| gAdd(a)|+|gDel(a)| conditional effects e. For each conditional effect e:\n\na , qadd\n\n, rdel\n\n\u2022 cons(e) is the union of the following sets: (i) the certain preconditions P re(a), (ii) the set\nof possible preconditions of a that are realized, and hidden propositions representing their\na |p \u2208 gP re(a) \\ P re(a)}, (iii) the set of\nrealization: P re(a) \u222a {ppre\nhidden propositions corresponding to the realization of possible add (delete) effects of a:\n{qadd\na |r \u2208\ngDel(a) \\ Del(a)}, respectively);\n\n|q \u2208 gAdd(a) \\ Add(a)} ({rdel\n\na |p \u2208 P re(a)} \u222a {nppre\n\na |r \u2208 Del(a)} \u222a {nrdel\n\n|q \u2208 Add(a)} \u222a {nqadd\n\n\u2022 the single outcome \u03b5 of e is de\ufb01ned as add(\u03b5) = Add(a) \u222a Add(a), del(\u03b5) = Del(a) \u222a\n\na\n\na\n\nDel(a), and P r(\u03b5) = 1,\n\nwhere P re(a) \u2286 gP re(a), Add(a) \u2286 gAdd(a) and Del(a) \u2286 gDel(a) represent the sets of realized\n\npreconditions and effects of the action. In other words, we create a conditional effect for each subset\nof the union of the possible precondition and effect sets of the action a. Note that the inclusion of new\n\npropositions derived from P re(a), Add(a), Del(a) and their \u201ccomplement\u201d sets gP re(a) \\ P re(a),\ngAdd(a) \\ Add(a), gDel(a) \\ Del(a) makes all condition sets of the action a\u2032 mutually exclusive.\n\nAs for other cases (including those in which some precondition in P re(a) is excluded), the action\nhas no effect on the resulting state, they can be ignored. The condition sets, therefore, are also\nexhaustive.\nThe initial belief state bI consists of 2|Fnew| states s\u2032 \u2286 F \u2032 such that p \u2208 s\u2032 iff p \u2208 I (\u2200p \u2208 F ),\neach represents a complete domain model Di \u2208 hheDii and with the probability Pr(Di), as de\ufb01ned\nin Section 3. The speci\ufb01cation of bI includes simple Bayesian networks representing the relation\nbetween variables in Fnew, e.g. ppre\na , where the weights w(\u00b7) and 1 \u2212 w(\u00b7) are used\nto de\ufb01ne conditional probability tables. The goal is G\u2032 = G, and the acceptable goal satisfaction\nprobability is \u03c1\u2032 = \u03c1. Theorem 2 shows the correctness of our compilation. It also shows that a plan\nfor eP with at least \u03c1 robustness can be obtained directly from solutions of the compiled problem P \u2032.\nTheorem 2. Given a plan \u03c0 = (a1, ..., an) for the problem eP, and \u03c0\u2032 = (a\u2032\nk is the\ncompiled version of ak (1 \u2264 k \u2264 n) in P \u2032. Then R(\u03c0, eP) \u2265 \u03c1 iff \u03c0\u2032 achieves all goals with at least\n\u03c1 probability in P \u2032.\n\nn) where a\u2032\n\nand nppre\n\n1, ..., a\u2032\n\na\n\n4.3 Experimental Results\nIn this section, we discuss the results of the compilation with Probabilistic-FF (PFF) on variants of\nthe Logistics and Satellite domains, where domain incompleteness is modeled on the preconditions\nand effects of actions (respectively). Our purpose here is to observe and explain how plan length and\nsynthesizing time vary with the amount of domain incompleteness and the robustness threshold.4\n\nLogistics: In this domain, each of the two cities C1 and C2 has an airport and a downtown area.\nThe transportation between the two distant cities can only be done by two airplanes A1 and A2.\nIn the downtown area of Ci (i \u2208 {1, 2}), there are three heavy containers Pi1, ..., Pi3 that can be\nmoved to the airport by a truck Ti. Loading those containers onto the truck in the city Ci, however,\nrequires moving a team of m robots Ri1, ..., Rim (m \u2265 1), initially located in the airport, to the\ndowntown area. The source of incompleteness in this domain comes from the assumption that each\npair of robots R1j and R2j (1 \u2264 j \u2264 m) are made by the same manufacturer Mj , both therefore\nmight fail to load a heavy container.5 The actions loading containers onto trucks using robots made\n\n3These propositions are introduced once, and re-used for all actions sharing the same schema with a.\n4The experiments were conducted using an Intel Core2 Duo 3.16GHz machine with 4Gb of RAM, and the\n\ntime limit is 15 minutes.\n\n5The uncorrelated incompleteness assumption applies for possible preconditions of action schemas speci-\n\ufb01ed for different manufacturers. It should not be confused here that robots R1j and R2j of the same manufac-\nturer Mj can independently have fault.\n\n6\n\n\fby a particular manufacturer (e.g., the action schema load-truck-with-robots-of-M1 using robots of\nmanufacturer M1), therefore, have a possible precondition requiring that containers should not be\nheavy. To simplify discussion (see below), we assume that robots of different manufacturers may\nfail to load heavy containers, though independently, with the same probability of 0.7. The goal is\nto transport all three containers in the city C1 to C2, and vice versa. For this domain, a plan to\nship a container to another city involves a step of loading it onto the truck, which can be done by a\nrobot (after moving it from the airport to the downtown). Plans can be made more robust by using\nadditional robots of different manufacturer after moving them into the downtown areas, with the cost\nof increasing plan length.\n\nm = 5\n\nm = 4\n\n\u22a5\n\u22a5\n\u22a5\n\u22a5\n\u22a5\n\u22a5\n\n\u03c1\n0.1\n0.2\n0.3\n0.4\n0.5\n0.6\n0.7\n0.8\n0.9\n\nm = 1\n32/10.9\n32/10.9\n32/10.9\n\nm = 2\n36/26.2\n36/25.9\n36/26.2\n42/42.1\n42/42.0\n\n44/121.8\n44/121.8\n44/122.2\n58/252.8\n58/253.1\n58/252.8\n58/253.1\n\nm = 3\n40/57.8\n40/57.8\n40/57.7\n50/107.9\n50/107.9\n50/108.2\n\n48/245.6\n48/245.6\n48/245.6\n66/551.4\n66/551.1\n66/551.1\n66/551.6\n66/550.9\n\nSatellite:\nIn this domain, there are\ntwo satellites S1 and S2 orbiting the\nplanet Earth, on each of which there\nare m instruments Li1, ..., Lim (i \u2208\n{1, 2}, m \u2265 1) used to take images\nof interested modes at some direction\nin the space. For each j \u2208 {1, ..., m},\nthe lenses of instruments Lij \u2019s were\nmade from a type of material Mj ,\nwhich might have an error affecting\nthe quality of images that they take.\nIf the material Mj actually has error,\nall instruments Lij \u2019s produce mangled images. The knowledge of this incompleteness is modeled as\na possible add effect of the action taking images using instruments made from Mj (for instance, the\naction schema take-image-with-instruments-M1 using instruments of type M1) with a probability of\npj , asserting that images taken might be in a bad condition. A typical plan to take an image using an\ninstrument, e.g. L14 of type M4 on the satellite S1, is \ufb01rst to switch on L14, turning the satellite S1\nto a ground direction from which L14 can be calibrated, and then taking image. Plans can be made\nmore robust by using additional instruments, which might be on a different satellite, but should be of\ndifferent type of materials and can also take an image of the interested mode at the same direction.\n\nFigure 3: The results of generating robust plans in Logistics\ndomain.\n\n\u22a5\n\u22a5\n\u22a5\n\u22a5\n\n\u22a5\n\u22a5\n\u22a5\n\n\u22a5\n\u22a5\n\n\u22a5\n\nm = 1\n10/0.1\n10/0.1\n\n\u03c1\n0.1\n0.2\n0.3\n0.4\n0.5\n0.6\n0.7\n0.8\n0.9\n\nm = 2\n10/0.1\n10/0.1\n10/0.1\n37/17.7\n\nTable 3 and 4 shows respectively the\nresults in the Logistics and Satellite\ndomains with \u03c1 \u2208 {0.1, 0.2, ..., 0.9}\nand m = {1, 2, ..., 5}. The num-\nber of complete domain models in\nthe two domains is 2m. For Satellite\ndomain, the probabilities pj \u2019s range\nto 0.45 when m\nfrom 0.25, 0.3,...\nincreases from 1, 2, ...\nto 5. For\neach speci\ufb01c value of \u03c1 and m, we re-\nport l/t where l is the length of plan\nand t is the running time (in seconds).\nCases in which no plan is found within the time limit are denoted by \u201c\u2013\u201d, and those where it is prov-\nable that no plan with the desired robustness exists are denoted by \u201c\u22a5\u201d.\n\nFigure 4: The results of generating robust plans in Satellite\ndomain.\n\nm = 4\n10/0.2\n10/0.2\n10/0.2\n10/0.2\n37/79.2\n37/94.1\n53/462.0\n\nm = 3\n10/0.2\n10/0.1\n10/0.1\n37/25.1\n37/25.5\n53/216.7\n\nm = 5\n10/0.2\n10/0.2\n10/0.2\n10/0.3\n\n37/199.2\n37/216.7\n\n\u22a5\n\u22a5\n\u22a5\n\u22a5\n\u22a5\n\u22a5\n\u22a5\n\n\u22a5\n\u22a5\n\u22a5\n\u22a5\n\u22a5\n\n\u22a5\n\u22a5\n\u22a5\n\n\u2013\n\u2013\n\u22a5\n\n\u22a5\n\u22a5\n\nAs the results indicate, for a \ufb01xed amount of domain incompleteness (represented by m), the solution\nplans in both domains tend to be longer with higher robustness threshold \u03c1, and the time to synthesize\nplans also increases. For instance, in Logistics with m = 5, the plan returned has 48 actions if\n\u03c1 = 0.3, whereas 66-length plan is needed if \u03c1 increases to 0.4. On the other hand, we also note that\nmore than the needed number of actions have been used in many solution plans. In the Logistics\ndomain, speci\ufb01cally, it is easy to see that the probability of successfully loading a container onto\na truck using robots of k (1 \u2264 k \u2264 m) different manufacturers is (1 \u2212 0.7k). However, robots of\nall \ufb01ve manufacturers are used in a plan when \u03c1 = 0.4, whereas using those of three manufacturers\nis enough. The relaxation employed by PFF that ignores all but one condition in effects of actions,\nwhile enables an upper-bound computation for plan robustness, is probably too strong and causes\nunnecessary increasing in plan length.\n\nAlso as we would expect, when the amount of domain incompleteness (i.e., m) increases, it takes\nlonger time to synthesize plans satisfying a \ufb01xed robustness value \u03c1. As an example, in the Satellite\ndomain, with \u03c1 = 0.6 it takes 216.7 seconds to synthesize a 37-length plan when m = 5, whereas it\nis only 94.1 seconds for m = 4. Two exceptions can be seen with \u03c1 = 0.7 where no plan is found\n\n7\n\n\fwithin the time limit when m = 5, although a plan with robustness of 0.7075 exists in the solution\nspace. A probable explanation for this performance is the costly satis\ufb01ability tests and weighted\nmodel-counting for computing resulting belief states during the search.\n\n5 Related Work\n\nThere are currently very few research efforts in automated planning literature that explicitly consider\nincompletely speci\ufb01ed domain models. To our best knowledge, Garland and Lesh [7] were the \ufb01rst\ndiscussing incomplete actions and generating robust plans under incomplete domain models. Their\nnotion of plan robustness, however, only has tenuous heuristic connections with the likelihood of\nsuccessful execution of plans. Weber and Bryce [24] consider a model similar to ours but assume\na non-GES formulation during plan synthesis\u2014the plan fails if any action\u2019s preconditions are not\nsatis\ufb01ed. As we mention earlier, this semantics is signi\ufb01cantly less helpful from an application point\nof view; and it is arguably easier. Indeed, their method for generating robust plans relies on the\npropagation of \u201creasons\u201d for failure of each action, assuming that every action before it successfully\nexecutes. Such a propagation is no longer appliable for GES. Morwood and Bryce [16] studied the\nproblem of robustness assessment for the same incompleteness formulation in temporal planning\ndomains, where plan robustness is de\ufb01ned as the number of complete models under which temporal\nconstraints are consistent. The work by Fox et al [6] also explores robustness of plans, but their\nfocus is on temporal plans under unforeseen execution-time variations rather than on incompletely\nspeci\ufb01ed domains. Eiter et al [5] introduces language K for planning under incomplete knowledge.\nTheir formulation is however different from ours in the type of incompleteness (world states v.s.\naction models) and the notion of plans (secure/conformant plans v.s. robust plans). Our work can\nalso be categorized as one particular instance of the general model-lite planning problem, as de\ufb01ned\nin [13], in which the author points out a large class of applications where handling incomplete\nmodels is unavoidable due to the dif\ufb01culty in getting a complete model.\n\nAs mentioned earlier, there were complementary approaches (c.f. [1, 26]) that attempt to either learn\nmodels from scratch or revise existing ones, given the access to successful plan traces or execution\nexperience, which can then be used to solve new planning problems. These works are different\nfrom ours in both the additional knowledge about the incomplete model (execution experience v.s.\nincompleteness annotations), and the notion of solutions (correct with respect to the learned model\nv.s. to candidate complete models).\n\nThough not directly addressing formulation like ours, the work on k-fault plans for non-deterministic\nplanning [12] focused on reducing the \u201cfaults\u201d in plan execution. It is however based on the context\nof stochastic/non-deterministic actions rather than incompletely speci\ufb01ed ones. The semantics of\nthe possible preconditions/effects in our incomplete domain models fundamentally differs from non-\ndeterministic and stochastic effects (c.f. work by Kushmerick et al [15]). While the probability of\nsuccess can be increased by continously executing actions with stochastic effects, the consequence\nof unknown but deterministic effects is consistent over different executions.\n\nIn Markov Decision Processes (MDPs), a fairly rich body of work has been done for imprecise tran-\nsition probabilities [19, 25, 8, 17, 3, 21], using various ways to represent imprecision/incompleteness\nin the transition models. These works mainly seek for max-min or min-max optimal policies, assum-\ning that Nature acts optimally against the agent. Much of these work is however done at atomic level\nwhile we focus on factored planning models. Our incompleteness formulation can also be extended\nfor agent modeling, a topic of interest in multi-agent systems (c.f. [10, 9, 20, 18]).\n\n6 Conclusion and Future Work\n\nIn this paper, we motivated the need for synthesizing robust plans under incomplete domain models.\nWe introduced annotations for expressing domain incompleteness, formalized the notion of plan\nrobustness, and showed an approach to compile the problem of generating robust plans into confor-\nmant probabilistic planning. We presented empirical results showing interesting relation between\naspects such as the amount of domain incompleteness, solving time and plan quality. We are work-\ning on a direct approach reasoning on correctness constraints of plan pre\ufb01xes and partial relaxed\nplans, constrasting it with our compilation method. We also plan to take successful plan traces as a\nsecond type of additional inputs for generating robust plans.\n\nAcknowledgements: This research is supported in part by the ARO grant W911NF-13-1-0023,\nthe ONR grants N00014-13-1-0176, N00014-09-1-0017 and N00014-07-1-1049, and the NSF grant\nIIS201330813.\n\n8\n\n\fReferences\n\n[1] E. Amir and A. Chang. Learning partially observable deterministic action models. Journal of Arti\ufb01cial\n\nIntelligence Research, 33(1):349\u2013402, 2008.\n\n[2] D. Bryce, S. Kambhampati, and D. Smith. Sequential monte carlo in probabilistic planning reachability\n\nheuristics. Proceedings of ICAPS06, 2006.\n\n[3] K. Delgado, S. Sanner, and L. De Barros. Ef\ufb01cient solutions to factored mdps with imprecise transition\n\nprobabilities. Arti\ufb01cial Intelligence, 2011.\n\n[4] C. Domshlak and J. Hoffmann. Probabilistic planning via heuristic forward search and weighted model\n\ncounting. Journal of Arti\ufb01cial Intelligence Research, 30(1):565\u2013620, 2007.\n\n[5] T. Eiter, W. Faber, N. Leone, G. Pfeifer, and A. Polleres. Planning under incomplete knowledge. Compu-\n\ntational LogicCL 2000, pages 807\u2013821, 2000.\n\n[6] M. Fox, R. Howey, and D. Long. Exploration of the robustness of plans. In Proceedings of the National\n\nConference on Arti\ufb01cial Intelligence, volume 21, page 834, 2006.\n\n[7] A. Garland and N. Lesh. Plan evaluation with incomplete action descriptions.\n\nIn Proceedings of the\n\nNational Conference on Arti\ufb01cial Intelligence, pages 461\u2013467, 2002.\n\n[8] R. Givan, S. Leach, and T. Dean. Bounded-parameter markov decision processes. Arti\ufb01cial Intelligence,\n\n122(1-2):71\u2013109, 2000.\n\n[9] P. Gmytrasiewicz and P. Doshi. A framework for sequential planning in multiagent settings. Journal of\n\nArti\ufb01cial Intelligence Research, 24(1):49\u201379, 2005.\n\n[10] P. Gmytrasiewicz and E. Durfee. Rational coordination in multi-agent environments. Autonomous Agents\n\nand Multi-Agent Systems, 3(4):319\u2013350, 2000.\n\n[11] N. Hya\ufb01l and F. Bacchus. Conformant probabilistic planning via csps. In Proceedings of the Thirteenth\n\nInternational Conference on Automated Planning and Scheduling, pages 205\u2013214, 2003.\n\n[12] R. Jensen, M. Veloso, and R. Bryant. Fault tolerant planning: Toward probabilistic uncertainty models in\nsymbolic non-deterministic planning. In Proceedings of the 14th International Conference on Automated\nPlanning and Scheduling (ICAPS), volume 4, pages 235\u2013344, 2004.\n\n[13] S. Kambhampati. Model-lite planning for the web age masses: The challenges of planning with incom-\nplete and evolving domain models. In Proceedings of the National Conference on Arti\ufb01cial Intelligence,\nvolume 22, page 1601, 2007.\n\n[14] D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.\n\n[15] N. Kushmerick, S. Hanks, and D. Weld. An algorithm for probabilistic planning. Arti\ufb01cial Intelligence,\n\n76(1-2):239\u2013286, 1995.\n\n[16] D. Morwood and D. Bryce. Evaluating temporal plans in incomplete domains. In Twenty-Sixth AAAI\n\nConference on Arti\ufb01cial Intelligence, 2012.\n\n[17] A. Nilim and L. Ghaoui. Robust control of Markov decision processes with uncertain transition matrices.\n\nOperations Research, 53(5):780\u2013798, 2005.\n\n[18] F. A. Oliehoek. Value-Based Planning for Teams of Agents in Stochastic Partially Observable Environ-\n\nments. PhD thesis, Informatics Institute, University of Amsterdam, Feb. 2010.\n\n[19] J. Satia and R. Lave Jr. Markovian decision processes with uncertain transition probabilities. Operations\n\nResearch, pages 728\u2013740, 1973.\n\n[20] S. Seuken and S. Zilberstein. Formal models and algorithms for decentralized decision making under\n\nuncertainty. Autonomous Agents and Multi-Agent Systems, 17(2):190\u2013250, 2008.\n\n[21] A. Shapiro and A. Kleywegt. Minimax analysis of stochastic problems. Optimization Methods and\n\nSoftware, 17(3):523\u2013542, 2002.\n\n[22] L. Valiant. The complexity of computing the permanent. Theoretical computer science, 8(2):189\u2013201,\n\n1979.\n\n[23] L. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing,\n\n8(3):410\u2013421, 1979.\n\n[24] C. Weber and D. Bryce. Planning and acting in incomplete domains. Proceedings of ICAPS11, 2011.\n\n[25] C. White III and H. Eldeib. Markov decision processes with imprecise transition probabilities. Operations\n\nResearch, pages 739\u2013749, 1994.\n\n[26] Q. Yang, K. Wu, and Y. Jiang. Learning action models from plan examples using weighted max-sat.\n\nArti\ufb01cial Intelligence, 171(2):107\u2013143, 2007.\n\n9\n\n\f", "award": [], "sourceid": 1169, "authors": [{"given_name": "Tuan", "family_name": "Nguyen", "institution": "Arizona State University"}, {"given_name": "Subbarao", "family_name": "Kambhampati", "institution": "Arizona State University"}, {"given_name": "Minh", "family_name": "Do", "institution": "NASA"}]}