The paper attempts to improve robustness of neural network training. Specifically, authors try to mitigate poor generalization that can arise from naively training overparameterized neural networks on noisy data/labels. The authors makes interesting observation about the Jacobian matrix of neural networks: it is low rank approximately with a few large singular values. It is further claimed that error for clean labels mostly lies in the subspace corresponding to theses large singular values; and for noisy labels, it is in the complement subspace. Authors leverage this observation to propose a training method, called CRUST, to sample a subset of training points in every iteration such that the Jacobian associated with the sampled points forms a low-rank approximation of the original Jacobian matrix. To mitigate the effect of the points with noisy labels selected in the subset, the paper proposes to utilize the mixup technique. The proposed method shows strong empirical performance. Reviewers found low rank Jacobian observation to be novel. Furthermore, theoretical analysis (which significantly relaxes the assumption in ) are quite interesting. Overall, the reviewers reached a consensus to accept the paper and thus I am happy to recommend an acceptance to NeurIPS. However, authors should thoroughly revision of the manuscript for typographical and grammatical errors. Also add the comparisons from rebuttal to main paper.