NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
Originality: this work is an extension of "Unsupervised learning of visual representations by solving jigsaw puzzles" in 3D point cloud. The design is pretty natural and the paper has shown 3D design could be even simpler than its 2D counterpart since trivial solutions will not easily appear. The paper also shows through various experiments that the learned feature could be very helpful for different downstream tasks and is quite general for different learning frameworks. The overall contribution is valid yet a bit incremental. The references are ok as far as I could tell. Quality: the paper is technically sound with enough technical details provided. The paper requires prealigned Clarity: the paper is clearly written and can be easily followed. Significance: self-supervised learning in 3D point cloud is an important problem in 3D computer vision. This paper has adapted an existing approach from 2D image world to process 3D point cloud. The approach is quite simple and natural. The change from 2D images seems quite small and the induced improvement is a bit marginal as well. The experiments are quite thorough though covering different 3D processing tasks and various baselines. I would say the paper has made a valid yet quite limited improvement without too many inspiring points, which would limit its influence in the community.
Reviewer 2
Originality: This paper is a novel combination of an existing method [7,21] for 2D images, to an existing task (point cloud feature learning). Given the success of [21], one would expect it also works for 3D representation where the spatial layout is equally or more important, which is confirmed by the results in this paper. The citations in this paper sufficiently cover related work. Quality: Most of the experimental results appear to be meaningful and support claimed advantages of this method: architecture-agnostic, avoids reconstruction metric, helps supervised down-stream tasks. But the comparison to alternative methods in Table 1 is weakened by the fact that model architectures used by the baseline methods are not mentioned. Given the significant gap between PointNet + Pre-training vs DGCNN + Pre-Training, I wonder how much of the improvement simply comes from a better architecture (DGCNN). For example, FoldingNet which uses a PointNet-like architecture is actually better than this method + PointNet as shown in Table 1. There's also a minor problem: page 4 L 152 "no limitation is needed on the receptive field size" is not supported by any analysis/results, it would be helpful to mention the receptive field size of the two base architectures studied in this paper. Clarity: The paper is well organized and has provided enough details for reproducing this results. Significance: This paper is addressing a important problem that has potentially big impact. However I'm not confident if it is advancing the state-of-the-art due to a concern stated above. Update after rebuttal: Thanks for providing the numbers I requested in the original review. Changed my overall scores accordingly.
Reviewer 3
This paper proposed a self-supervised method for learning representation from unlabelled point clouds. By random displacing the point clouds and training a network to reconstruct them, good feature representation can be learned, which can benefit the downstream tasks. Self-supervised learning is a hot topic in recent years. The authors extended the idea of [21] to 3D point clouds and showed its effectiveness. The proposed method is simple and easy-to-implement. The main weakness is that the performance gain is limited according to table II and III.