Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
After rebuttal: The authors addressed my concerns by explaining their choice not to include a discussion of them modified algorithm was motivated by the aim to keep the paper focused and concise. My recommendation remains accept, 9. The authors consider a version of the convex MF/dictionary learning problem appropriate for datasets with a cluster structure: they constrain the basis elements to be convex combinations of exemplar data points, where they explicitly attempt to enforce sparse combinations by using l1 penalties on the coefficients, and design the algorithm to select the exemplar data points appropriately from the clusters so that the basis elements lie within the convex hull of the individual clusters. The problem addressed by the paper is well-motivated, and the intuitions justifying the design of the algorithm are well-explained. The novelty lies in the way the authors use the clustering structure assumed to be present in the dataset to show that their algorithm converges to a stationary point of the convex MF objective. I did not read the proofs in the supplement, but it seems that the techniques used will be of interest to those designing similar non-convex stochastic online algorithms.
The authors propose an online version of convex matrix factorization (MF) which cannot be made online by adapting the seminal work of Mairal et al., 2000.They provide an unconstrained and a constrained version of online convex MF. The experiments show the advantages of convex MF over other MF versions as well as the online version of convex MF over the batch version. The technical parts of the paper could have been written more clearly for a wider audience but it should be ok for experts in the field. The experiments section is clearer and makes a strong demonstration for the proposed method. In general, there is room for improvement in the structuring.
Overall, this paper is well written. The authors have proven the effectiveness of the proposed algorithm through sufficient analysis. However, I have several concerns regarding the proposed algorithm. First question is that what is a reasonable choice of N in your initialization phase. You use N samples to construct the initial representative sets as well as the dictionary matrix. As an online algorithm, I think your convergence bound should include N as a parameter here. But I cannot see it from the theorem you presented in the paper. Can you explain it? The number of samples in each cluster is assumed to be constant over time. What is the intuition behind this? Does this mean in the initialization, you already have enough samples to expand the representation space? For the restricted online cvsMF algorithm, you assume that the new sample always gets to the correct cluster which I think is not possible in real world. Especially, you only have limit number of samples to initialize your K clusters. How could it be possible that you always get correct assignment?