NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:4162
Title:Competitive Gradient Descent


		
Overall, the reviewers appreciated the novelty and simplicity of the algorithm. The paper is also well written and the methods are well motivated. In terms of content for revision: The lack of discussion around Newton-style approximation and dropping of diagonal terms in the Jacobian was raised again the discussions (see updated reviews). Additionally, we also sought an expert reviewer during discussions. The additional reviewer included below: ---- Additional review: The paper proposes an interesting new method for solving games Min_x F(x,y) Min_y G(x,y) which is, in general, different from running online gradient descent on x and y in parallel, or other existing methods. The idea is to do a bilinear approximation of F and G and solve for the equilibrium of the resulting game (with regularization) then add the equilibrium (x^*,y^*) of this game to the current iterate (x_t,y_t) to get the next iterate. It’s an interesting and natural idea. The abstract oversells the result in the following ways: - they claim their method avoids oscillatory behavior - their method does not need to adapt the step size However: - they only show *local convergence results* and in particular their method may as well oscillate if initialized away from the equilibrium (in contrast to other methods such as optimistic gradient descent or extragradient which have been shown to have last iterate convergence for convex-concave problems) - they need to choose their eta appropriately for this to hold as also have to do other methods Their empirical findings for bilinear games alpha*x*y with scalar x and y show better stability than other methods for a fixed learning rate as alpha is tuned higher. I would expect that as alpha is tuned higher their method eventually diverges. Overall, I think it is an interesting proposal for a method to solve bilinear games, but there is a lot of overselling in the paper around non-oscillatory behavior and around the lack of need to do hyper-parameter tuning. I would measure those claims and also try to prove global convergence results for convex-concave problems. The experiments are encouraging.