Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm

Part of Advances in Neural Information Processing Systems 27 (NIPS 2014)

Bibtex »Metadata »Paper »Reviews »Supplemental »


Deanna Needell, Rachel Ward, Nati Srebro


We improve a recent gurantee of Bach and Moulines on the linear convergence of SGD for smooth and strongly convex objectives, reducing a quadratic dependence on the strong convexity to a linear dependence. Furthermore, we show how reweighting the sampling distribution (i.e. importance sampling) is necessary in order to further improve convergence, and obtain a linear dependence on average smoothness, dominating previous results, and more broadly discus how importance sampling for SGD can improve convergence also in other scenarios. Our results are based on a connection we make between SGD and the randomized Kaczmarz algorithm, which allows us to transfer ideas between the separate bodies of literature studying each of the two methods.