Regularizing toward permutation invariance is a simple alternative to explicitly building such an invariance into a model. The reviewers raised some good points, which should be addressed in the camera ready version of the paper. Importantly, the main argument of the paper hinges on the assumption that DeepSets uses a particular reduction operator (namely, \sum), whereas the parity example could be trivially achieved with an exclusive-or reduction. Put differently, circuit complexity arguments are obviously tied to the architectural assumptions, and this paper uses somewhat of a strawman argument to claim RNNs are more efficient. On the other hand, I disagree with the reviewer's argument r/e parallelization of DeepSets, as there is still at least a logarithmic complexity in the reduction. (It would be interesting to explore whether the RNNs designed in this paper could be modified to perform an associative scan rather than a linear scan, and thereby achieve logarithmic complexity.). The reviewers also questioned the limited experimental results. I think this is a significant concern, and I encourage the authors to address it in the camera ready, but it is not one that necessarily precludes publication in my opinion. Finally, I would like to see more discussion/investigation of if and where the regularized models fail to achieve permutation invariance; for large set functions you would need exponentially many samples to assure that all invariances are checked, and presumably this rarely is achieved in practice. Do the learned models still show permutation invariance even on examples they never saw in training?