NIPS 2018
Sun Dec 2nd through Sat the 8th, 2018 at Palais des Congrès de Montréal
Paper ID: 6832 Analysis of Krylov Subspace Solutions of Regularized Non-Convex Quadratic Problems

### Reviewer 1

This work provides a convergence rate for Krylov subspace solutions to the trust-region and cubic-regularization subproblems, which is of rising interest in solving nonconvex, nonlinear optimization problems recently. The paper is well organized and well-written overall. Below are some detailed comments. 1. Linear rate vs sublinear rate: for the general case (vs ‘hard case’) of subproblems, the authors provide two convergence rate, one is linear, and another sublinear. It seems like the convergence behavior is similar to the tail behavior of sub-exponential r.v., which has two different behaviors when t varies. Here, for better presentation of the result, it is better to present the convergence rates in a similar favor of tail behavior of sub-exponential r.v., that convergence rate of the Krylov subspace solution behaves like sublinear when t \ll \sqrt{kappa}, and linear when t >= \sqrt{kappa}, and make this point clear early in the remark of the result rather than in the experiment part. Otherwise, it is a little bit confusing why the author presents two convergence rate for the same algorithm, for which the linear rate is obviously better than sublinear for large t. 2. ”Hard case": "hard case" is one of the major challenges for solving the trust-region and cubic regularization subproblems. The general analysis of this work does not apply to the hard case’’ neither, and this work proposed a randomization approach to deal with hard case’’. With high probability, the work provides a sublinear convergence rate for the randomization approach. Since it is almost impossible to check the hard case a-priori, how does the proposed method work in practice? For running the trust-region method, do we need to do randomization for each subproblem? If so, does it imply that the convergence rate of Krylov subspace solutions is sublinear in practice? Does it possible to provide a linear rate result for the "hard case" using random perturbations or is there some fundamental challenge for the "hard case”? 3. For the experiment part, the authors need to show some experiments of the convergence behavior of the randomizing approach of the Krylov subspace solutions for the "hard cases".