The initial evaluations of this paper were somewhat mixed. However, the response provided by the authors answered many of the questions the reviews had raised including ones regarding computation cost and the number of additional required parameters. I find that this work opens up new research avenues and should be published. I urge the authors to take the reviewers' comments into account in preparing their camera-ready version. In particular, I would suggest: 1) Adding a discussion regarding the computational complexity of the method (even if improvements are left to future work), 2) Mentioning and contrasting to 2019/2020 work as suggested by reviewer #2 (papers in ICRA'19, ECCV'20 to appear, and CVPR'20 workshop), 3) Including the new results (e.g., table at the top of your response), 4) Add a discussion of the number of required parameters especially as it compares to other methods (this was clearly explained in your author response). There is still one reviewer who feels relatively strongly that this manuscript is not yet ready to be published. My understanding is that the reviewer's main argument is that the field of continual learning has started to and should move away from multi-head setups. While I agree that practical settings are more likely to require single-head settings and many recent methods study the single-head setting, I also do not think that it would be fair for the conference to reject this paper solely based on this (e.g., papers with multi-head setups continue to be published elsewhere). Further, as pointed out by a reviewer in the discussion and by the authors in their response, many multi-head approaches can be adapted to the single-head setting and so the proposed approach is orthogonal to the question of single-head vs. multi-head. It may be good to say this explicitly in the paper.