NeurIPS 2020

An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch

Meta Review

Summary: This paper proposes a new technique for learning to transfer optimal policies obtained from a simulator to a real world environment. The only difference between sim and real is in the state transition probabilities. The main idea consists in learning an action grounding function that maps state-actions learned in simulation to modified actions that are executed in the real system. The authors notice that this problem is similar to a variant of imitation learning, where the imitator learns to match state trajectories (where the actions are unknown) demonstrated by an expert. Experiments on MuJoCO where the "real" environment is obtained by modifying physical properties (such as mass and friction) from their values in simulation. Pros: - Nice connection between transfer learning and inverse reinforcement learning - Theoretical guarantees about the optimality of the transfer Cons: - The use of the term real is misleading. There are no real experiments here, which makes the evaluation weak, especially since this is supposed to be about sim2real. Discussion and decision: The discussion is centered around the use of the term real in the paper, which is highly misleading. Any work on sim2real must be evaluated on real environments. There are concerns from a reviewer about the similarities between the objective function in this paper and previous ones such as [43], but this work is concerned with transfer learning and not imitation learning as in previous works. The re-use of previous tools can be seen as an original application to other domains. NeurIPS does not have conditional accept mechanisms. But if this paper is accepted, the authors must remove the term "real" from the paper. The area chairs and all the reviewers find the use of this term inaccurate and highly misleading to researchers in robotics. I suggest to use terms such as transfer learning, or domain adaptation or even sim2sim. Potential sim2real experiments can be discussed and used as a motivation behind this work, but this paper cannot claim that it proposes a sim-2-real method without backing up the claim with actual sim-2-real experiments.