NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:7788
Title:An Inexact Augmented Lagrangian Framework for Nonconvex Optimization with Nonlinear Constraints

This paper uses the Augmented Lagrangian method to solve optimization problems for a sum of functions f and g, where f is nonconvex and g is convex but 'proximal-friendly' subject to quite general nonlinear constraints. The proposed method solves the primal problem within some error epsilon_k that is gradually decreased as a penalty schedule beta_k is increasing across iterations. The approximate intermediate problems are solved using first order and second order solvers. The proposed analysis is technically non-trivial and interesting. The presentation of the paper was poor and at times confusing which made this a borderline paper. The algorithmic and convergence results are novel, but the applicability was not fully understood. This is because of the assumptions that the authors require for their framework and convergence analysis to work. Specifically, after discussions and considering the rebuttal of the authors, R4 is still concerned that the assumption (15) is strong. Further, it was confusing to us that the assumption depends on intermediate iterates of the proposed algorithm. We understand that the authors introduce beta_k to provide some slackness in the primal updates and the conditions like the equation in Algorithm 1, step 2 is valid. This could be achieved when the oracle is called sufficiently many times. The problem is that equation (15) is an assumption that one needs to verify for a specific problem. Gladly, the authors did indeed verify it for the cases when function g does not play a role (in appendix E/F), but not for a general function g (or at least an interesting sub-family). In Appendix B, they try to prove corollary 4.2, but they actually did not verify this assumption (15). As far as we understand, in appendix B the authors assume (15) is true (for some unspecified g), and just use the conclusion of Thm 4.1 to show that they could converge to the (epsilon_f,beta_k) approximation. However, in the statements of corollaries 4.2/4,3, this assumption (15) is not explicitly stated, which was confusing. The authors should discuss this point in more detail and make sure all the assumptions are clearly stated in the statement of every theorem or corollary, so that the reader does not have to dig into an appednix to understand how to use a corollary. In conclusion, we are sure there are some families of functions for which the claimed results hold. There are also some statements in the paper that we are not sure if they hold unless specific technical assumptions are true, but they are clarified in the appendices. Still, after discussions, we believe that the mathematical results that are clear (i.e. when g does not play a role) are still sufficiently interesting for publication as a poster. We strongly encourage the authors to clarify these technical issues in the final version of the paper..