NeurIPS 2020

What Makes for Good Views for Contrastive Learning?


Meta Review

The paper studies contrastive methods for self-supervised representation learning. It studies how multiple views of the same data are used for representation learning, and how the mutual information between these views matters for downstream performance. The authors propose a theory that there is a sweet spot in the amount of mutual information between two views (not too less, not too much) such that the downstream performance is highest at this point. They empirically verify this theory for two classes of views (patches, and colors). They propose a method that simply combines existing augmentations from prior work and provides gains over them. The paper was reviewed by four reviewers that were all positive and recommended acceptance. The reviewers liked the theoretical basis of the paper and the resulting insight about the sweet spot for model generalization, as well as the experiments presented to validate the theory. The reviewers raised many technical questions, which were satisfactorily addressed by the author rebuttal. In terms of weaknesses, some reviewers noted that the proposed theory provides limited actionable insights for developing new techniques, and that the paper has limited technical contribution in terms of algorithmic advances to contrastive learning. These limitations weaken its impact. Nevertheless, there was consensus that the paper deserves publication.