This submission proposes a method for 3D reconstruction of objects within a given category from videos without 3D supervision. It initially received four reviews with three positive and one negative scores (7,7,6,4). The reviewers mention strong empirical results and well motivated and novel methodology. The main concerns of the reviewers included (1) absence of quantitative results on 3D reconstruction (only 2D metrics reported), (2) simplicity of the evaluation setting (only two object categories with minor variations in shapes and poses), (3) lack of ablations for core contributions (temporal consistency) and comparisons with simple baselines for temporal smoothing. The rebuttal addressed some of these concerns, which resulted in changes in scores to (6,7,7,4, with one reviewer upgrading and one downgrading their ratings). The AC's recommendation is to accept this submission as a poster, with a request for the authors to carefully revise the manuscript for the camera ready version to address the remaining concerns of the reviewers.