How to Dynamically Merge Markov Decision Processes

Part of Advances in Neural Information Processing Systems 10 (NIPS 1997)

Bibtex Metadata Paper


Satinder Singh, David Cohn


We are frequently called upon to perform multiple tasks that com(cid:173) pete for our attention and resource. Often we know the optimal solution to each task in isolation; in this paper, we describe how this knowledge can be exploited to efficiently find good solutions for doing the tasks in parallel. We formulate this problem as that of dynamically merging multiple Markov decision processes (MDPs) into a composite MDP, and present a new theoretically-sound dy(cid:173) namic programming algorithm for finding an optimal policy for the composite MDP. We analyze various aspects of our algorithm and illustrate its use on a simple merging problem.

Every day, we are faced with the problem of doing mUltiple tasks in parallel, each of which competes for our attention and resource. If we are running a job shop, we must decide which machines to allocate to which jobs, and in what order, so that no jobs miss their deadlines. If we are a mail delivery robot, we must find the intended recipients of the mail while simultaneously avoiding fixed obstacles (such as walls) and mobile obstacles (such as people), and still manage to keep ourselves sufficiently charged up. Frequently we know how to perform each task in isolation; this paper considers how we can take the information we have about the individual tasks and combine it to efficiently find an optimal solution for doing the entire set of tasks in parallel. More importantly, we describe a theoretically-sound algorithm for doing this merging dynamically; new tasks (such as a new job arrival at a job shop) can be assimilated online into the solution being found for the ongoing set of simultaneous tasks.