This paper achieved a high accept consensus. The paper puts forward a simple core idea, shows it being helpful, and gives analysis/insights that justify how it works. The authors strongly ground their approach theoretically. The method beats SOTA in various tasks. However, a bad quality of language and some experimental details missing were reported and I encourage the authors to fix these following the reviewer's recommendations for the final version of the manuscript.