Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This work proposes an optimization scheme for learning a curriculum over classes or training samples. The importance of each sample/class is reflected by a learnable parameter that is learned by gradient descent simultaneously with network weights. The proposed scheme particularly shows its advantage in noisy data as demonstrated empirically. All reviewers find their concerns well-addressed in authors' response, and they all find the paper a solid and interesting work.