A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory

Suematsu, Nobuo; Hayashi, Akira

A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory

Nobuo Suematsu, Akira Hayashi

Advances in Neural Information Processing Systems 11 (NIPS 1998)

Abstract

We describe a Reinforcement Learning algorithm for partially observ(cid:173) able environments using short-term memory, which we call BLHT. Since BLHT learns a stochastic model based on Bayesian Learning, the over(cid:173) fitting problem is reasonably solved. Moreover, BLHT has an efficient implementation. This paper shows that the model learned by BLHT con(cid:173) verges to one which provides the most accurate predictions of percepts and rewards, given short-term memory.

Abstract

Name Change Policy