NeurIPS 2020

Learning Representations from Audio-Visual Spatial Alignment


Meta Review

The paper received mixed reviews. The reviewers found 360 audio to be very interesting for self-supervised representation learning. However, at the same time, the reviewers noted that the evaluation did not align well with spatial tasks, which is where intuitively the benefits of 360 audio would transpire. The rebuttal seemed to further misunderstand several of the points raised by the reviewers (see multiple individual reviews). In discussion, reviewers were also quite puzzled by the rebuttal, especially because it misrepresented the major points they mentioned. However, although the rebuttal left much to be desired by the reviewers and AC, the reviewers generally agreed the points were not sufficient to reject the paper. Given the other contributions of the paper as noted by the reviewers, the AC recommends the paper for acceptance.