Is Deeper Better only when Shallow is Good?
Decoupling "when to update" from "how to update"
Learning a Metric Embedding for Face Recognition using the Multibatch Method
Beyond Convexity: Stochastic Quasi-Convex Optimization
On the Computational Efficiency of Training Neural Networks
Accelerated Mini-Batch Stochastic Dual Coordinate Ascent
More data speeds up training time in learning halfspaces over sparse vectors