NeurIPS 2020

Deep active inference agents using Monte-Carlo methods

Meta Review

This submission present a new model for active inference, a theory that combines action and perception into a single objective in the form of a free energy. The proposed approach combines several innovations that have not previously been applied to active inference problems, including the inclusion of a habitual network, the use of MC dropout to predict parameter belief updates, and a top-down mechanism that modulates the precision over state transitions. The authors evaluate the proposed mode on a newly developed agent environment (Dynamic d-Sprites) and also evaluate in Animal AI. Reviewers had mixed opinions on this submission. Three reviewers, who were familiar with literature on active inference, were positively predisposed. These reviewers noted that the paper is (comparatively) clearly written, makes a reasonable set of technical contributions, whilst acknowledging limitations in of the approach. Main criticisms came from the fourth reviewer, who found the exposition difficult to follow without prior knowledge of the many Friston-group papers that are cited. More broadly, reviewers noted problems with notation, found experimental results somewhat limited, and noted that comparisons to Deep RL baselines would be warranted (even if improved performance on RL tasks is not the primary intended contribution of this paper). All reviewers engaged in discussion post response. A substantial component of the discussion focused on whether the experiments include comparisons to baseline RL methods. The reviewers noted that model-based MCTS with only 7a is a reasonable proxy for existing RL methods. The AC would suggest that the authors point this out more explicitly. Reviewers are happy to hear that the authors are planning to include comparisons to DQN, A2C & PPO in the camera ready. Note that the fourth reviewer adjusted their score post discussion. Based on the reviews and discussion, this submission is just about above the a bar for acceptance. That said, having attempted to understand the proposed work, the AC agrees with comments about clarity and notation. It is of course infeasible (and not expected) to summarize all relevant existing literature, but the authors should make some attempt at a self-contained exposition. More importantly, all notation that is introduced should be explicit, particularly where it comes to each of the densities in the objective. It is unclear how Q(s_t) and P(o_t, s_t ; θ) in Eq (1) relate to Q(s_τ, θ | π) and P(o_τ, s_τ, θ | π). Similarly, it is not clear how P and Q in F_t relate to P and Q in G. Certain distributions such as Q(o_τ, s_τ, θ | π), are never defined in terms of conditionals. Other distributions, such as the encoders, have inputs that should be made notationally explicit (e.g. write Q_φ(s_t | o_t) and Q_φ(a_t | s_t) and not Q_φ(s_t) and Q_φ(a_t)). Finally the authors mix and match notation of parameterizations (i.e. Q_φ(…) vs P(… ; θ)). In addition to addressing these notational ambiguities, the AC would suggest that the authors begin by defining distributions P and Q over full trajectories (s_1, o_1, a_1, …, s_T, o_T), then express a free energy F in terms of this P and Q, then decompose this energy into a sum over time points F_t, and then explain how the prior over actions is defined in terms of G (and make it explicit how the distributions P and Q in G relate to the ones in F). This should make the exposition much easier to follow.