Part of Advances in Neural Information Processing Systems 12 (NIPS 1999)
Sebastian Thrun
We present a Monte Carlo algorithm for learning to act in partially observable Markov decision processes (POMDPs) with real-valued state and action spaces. Our approach uses importance sampling for representing beliefs, and Monte Carlo approximation for belief propagation. A reinforcement learning algorithm, value iteration, is employed to learn value functions over belief states. Finally, a sample(cid:173) based version of nearest neighbor is used to generalize across states. Initial empirical results suggest that our approach works well in practical applications.