Jordan Pollack, Alan Blair
Although TD-Gammon is one of the major successes in machine learn(cid:173) ing, it has not led to similar impressive breakthroughs in temporal dif(cid:173) ference learning for other applications or even other games. We were able to replicate some of the success of TD-Gammon, developing a competitive evaluation function on a 4000 parameter feed-forward neu(cid:173) ral network, without using back-propagation, reinforcement or temporal difference learning methods. Instead we apply simple hill-climbing in a relative fitness environment. These results and further analysis suggest that the surprising success of Tesauro's program had more to do with the co-evolutionary structure of the learning task and the dynamics of the backgammon game itself.