Part of Advances in Neural Information Processing Systems 22 (NIPS 2009)
Garvesh Raskutti, Bin Yu, Martin J. Wainwright
This paper uses information-theoretic techniques to determine minimax rates for estimating nonparametric sparse additive regression models under high-dimensional scaling. We assume an additive decomposition of the form $f^*(X_1, \ldots, X_p) = \sum_{j \in S} h_j(X_j)$, where each component function $h_j$ lies in some Hilbert Space $\Hilb$ and $S \subset \{1, \ldots, \pdim \}$ is an unknown subset with cardinality $\s = |S$. Given $\numobs$ i.i.d. observations of $f^*(X)$ corrupted with white Gaussian noise where the covariate vectors $(X_1, X_2, X_3,...,X_{\pdim})$ are drawn with i.i.d. components from some distribution $\mP$, we determine tight lower bounds on the minimax rate for estimating the regression function with respect to squared $\LTP$ error. The main result shows that the minimax rates are $\max{\big(\frac{\s \log \pdim / \s}{n}, \LowerRateSq \big)}$. The first term reflects the difficulty of performing \emph{subset selection} and is independent of the Hilbert space $\Hilb$; the second term $\LowerRateSq$ is an \emph{\s-dimensional estimation} term, depending only on the low dimension $\s$ but not the ambient dimension $\pdim$, that captures the difficulty of estimating a sum of $\s$ univariate functions in the Hilbert space $\Hilb$. As a special case, if $\Hilb$ corresponds to the $\m$-th order Sobolev space $\SobM$ of functions that are $m$-times differentiable, the $\s$-dimensional estimation term takes the form $\LowerRateSq \asymp \s \; n^{-2\m/(2\m+1)}$. The minimax rates are compared with rates achieved by an $\ell_1$-penalty based approach, it can be shown that a certain $\ell_1$-based approach achieves the minimax optimal rate.