Exploiting Numerical Sparsity for Efficient Learning : Faster Eigenvector Computation and Regression

Gupta, Neha; Sidford, Aaron

Exploiting Numerical Sparsity for Efficient Learning : Faster Eigenvector Computation and Regression

Neha Gupta, Aaron Sidford

Advances in Neural Information Processing Systems 31 (NeurIPS 2018)

Bibtex Metadata Paper Reviews Supplemental

Abstract

In this paper, we obtain improved running times for regression and top eigenvector computation for numerically sparse matrices. Given a data matrix $\mat{A} \in \R^{n \times d}$ where every row $a \in \R^d$ has $\|a\|_2^2 \leq L$ and numerical sparsity $\leq s$, i.e. $\|a\|_1^2 / \|a\|_2^2 \leq s$, we provide faster algorithms for these problems for many parameter settings. For top eigenvector computation, when $\gap > 0$ is the relative gap between the top two eigenvectors of $\mat{A}^\top \mat{A}$ and $r$ is the stable rank of $\mat{A}$ we obtain a running time of $\otilde(nd + r(s + \sqrt{r s}) / \gap^2)$ improving upon the previous best unaccelerated running time of $O(nd + r d / \gap^2)$. As $r \leq d$ and $s \leq d$ our algorithm everywhere improves or matches the previous bounds for all parameter settings. For regression, when $\mu > 0$ is the smallest eigenvalue of $\mat{A}^\top \mat{A}$ we obtain a running time of $\otilde(nd + (nL / \mu) \sqrt{s nL / \mu})$ improving upon the previous best unaccelerated running time of $\otilde(nd + n L d / \mu)$. This result expands when regression can be solved in nearly linear time from when $L/\mu = \otilde(1)$ to when $L / \mu = \otilde(d^{2/3} / (sn)^{1/3})$. Furthermore, we obtain similar improvements even when row norms and numerical sparsities are non-uniform and we show how to achieve even faster running times by accelerating using approximate proximal point \cite{frostig2015regularizing} / catalyst \cite{lin2015universal}. Our running times depend only on the size of the input and natural numerical measures of the matrix, i.e. eigenvalues and $\ell_p$ norms, making progress on a key open problem regarding optimal running times for efficient large-scale learning.

Abstract

Name Change Policy