Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization

Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

AuthorFeedback Bibtex MetaReview Paper Review Supplemental

Authors

Chaobing Song, Yong Jiang, Yi Ma

Abstract

In this paper, we introduce a simplified and unified method for finite-sum convex optimization, named \emph{Variance Reduction via Accelerated Dual Averaging (VRADA)}. In the general convex and smooth setting, VRADA can attain an $O\big(\frac{1}{n}\big)$-accurate solution in $O(n\log\log n)$ number of stochastic gradient evaluations, where $n$ is the number of samples; meanwhile, VRADA matches the lower bound of this setting up to a $\log\log n$ factor. In the strongly convex and smooth setting, VRADA matches the lower bound in the regime $n \le \Theta(\kappa)$, while it improves the rate in the regime $n\gg \kappa$ to $O\big(n +\frac{n\log(1/\epsilon)}{\log(n/\kappa)}\big)$, where $\kappa$ is the condition number. Besides improving the best known complexity results, VRADA has more unified and simplified algorithmic implementation and convergence analysis for both the general convex and strongly convex settings. Through experiments on real datasets, we show the good performance of VRADA over existing methods for large-scale machine learning problems.