NeurIPS 2020

Throughput-Optimal Topology Design for Cross-Silo Federated Learning

Meta Review

The paper proposes methods for designing communication graph for the decentralized periodic averaging SGD (DPASGD) in the federated learning set up focusing on reducing the per-iteration complexity (cycle time). The reviews were very appreciative of the good system and experimental design aspects of the paper that accounts for various types of delays in realistic scenarios. I would like to thank the authors for their effort. The reviewers were quite engaged and have provided many useful feedback and I hope these will be used to improve the paper. In particular, I would like to comment of few points -- please see full reviews for details - Although the authors motivate the need for focusing on cycle time over convergence rate in the introduction, based on the reviews, I believe it would be useful to include this discussion explicitly as a highlighted paragraph or subsection (see also comments by R2 on digraph constraint) - I would also encourage you to consider the title change suggestion by R2 (or something similar) as I and other reviewers agree that the current title is too generic. - technical terms like edge-capacitated and node-capacitated networks, access delay are used quite prominently but not precisely/formally defined. For a general audience and some in-field reviewers, this was confusing. Please organize these definitions more systematically. Relating to the above, the AC had the following question which was not clarified in the submission and I hope the authors will address in the final version. “My understanding is that in edge-capacitated networks the M/min(C_up/N_i,C_dn/N_j,A(i',j')) is ignored while in node capacitated networks they cannot be. This would mean that the MSR and RING topologies suggested in Sec 3.1 for edge-capacited networks should be optimal when bandwidth is *large* enough. However Fig 3 and the discussion seems to suggest that these algorithms are more effective for smaller bandwidths. The discussion in lines 270-275 also suggests that RING is faster for slower networks. Isnt this contradictory?”