Reviews: Neural Shuffle-Exchange Networks - Sequence Processing in O(n log n) Time

The paper presents a way to incorporate sparse routing networks into the transformer architecture to reduce the computation cost of attention for long sequences. The reviewers acknowledge that the idea is novel and the experiments suggest that the proposed architecture is potentially useful. However, the experiments do not demonstrate improved efficiency or accuracy on real world tasks with long sequences. Comparison with Transformer architectures that make use of sparse attention is lacking. Hence, I recommend acceptance as a poster.

Paper ID:	3591
Title:	Neural Shuffle-Exchange Networks - Sequence Processing in O(n log n) Time