Part of Advances in Neural Information Processing Systems 9 (NIPS 1996)
Eric Hansen, Andrew Barto, Shlomo Zilberstein
Closed-loop control relies on sensory feedback that is usually as(cid:173) sumed to be free . But if sensing incurs a cost, it may be cost(cid:173) effective to take sequences of actions in open-loop mode. We de(cid:173) scribe a reinforcement learning algorithm that learns to combine open-loop and closed-loop control when sensing incurs a cost. Al(cid:173) though we assume reliable sensors, use of open-loop control means that actions must sometimes be taken when the current state of the controlled system is uncertain. This is a special case of the hidden-state problem in reinforcement learning, and to cope, our algorithm relies on short-term memory. The main result of the pa(cid:173) per is a rule that significantly limits exploration of possible memory states by pruning memory states for which the estimated value of information is greater than its cost. We prove that this rule allows convergence to an optimal policy.