
Submitted by Assigned_Reviewer_7
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper proposes a new penalty term that can be used in learning a regression model; it's based on the idea that features are grouped into (possibly overlapping) groups and the penalty term discourages the use of nonzero weights for several features from the same group. The paper describes how to solve the resulting optimization problem and presents an evaluation on several synthetic and realistic datasets, demonstrating that the proposed approach works better than several alternatives. The paper is clearly written and well organized, and seems to present an interesting incremental improvement on various earlier LASSO methods (which are mentioned in the related work section).
Comments for the authors:
"exclusive group LASSO" [caption of Fig. 1]  should be "Exclusive" "Eq.(2)" [caption of Fig. 1]  space missing; there are several similar occurrences elsewhere
"G_g \in P({1, 2, ..., p}) denotes a set of group g" [page 2]  I found this sentence unclear. What exactly is the relationship between g and G_g? Earlier you say that a group g is a subset of {1, 2, ..., p}, and later you give an example where G_g = {1, 2, 4}, so it would appear that G_g is also a subset of {1, 2, ..., p}; what then is the difference between g and G_g at all?
"feature index u \in supp(g) will be nonzero" [page 3]  I found this sentence a bit unclear as well. What exactly is this feature index u, how is it chosen and what is its purpose?
On page 3, there are several instances where groups are described using square brackets: G_u = [1, 2], G_v = [2, 3] and so on. Why do you use brackets here whereas on page 2 you used curly braces (e.g. "G_g = {1, 2, 4}")? This makes it seem as if the groups were now realnumber intervals, but probably that's not intended here.
"is a diagonal matrices" [page 4]  should be "matrix"
I was slightly surprised when I saw that your experiments on realistic data all focus on using your improved LASSO (and other methods) for feature selection (and then applying SVM to train a classifier); when I read about the definition of LASSO in [21], I was under the impression that its original purpose was to improve the training of regression models, and your eq. (2) seemed to be along the same lines as well, so I thought that you would use some regression tasks to evaluate the method in your experiments. It might be useful to add some discussion on why you use the method only for feature selection (in the context of a classification task), and not for any regression tasks. Additionally, it might be good to add some of the more traditional featureselection methods for comparison (e.g. information gain etc.).
Above the first table on p. 7, you say "we list the mean and variance of classification accuracy", but it wasn't quite clear to me what the mean and variance are taken over. Did you perform some sort of crossvalidation or something like that?
Q2: Please summarize your review in 12 sentences
I think this is an interesting paper, a nice incremental improvement on existing work, it seems technically sound and presents a reasonable experimental evaluation. Submitted by Assigned_Reviewer_13
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The authors present a novel Lasso regressor formulation called “exclusive group LASSO” that is supposed to enforce the sparsity for features at an intragroup level. The method is to allow sparsity at intragroup level in the context of feature selection and not only intergroup like SoA groupLASSO. In this new formulation, In intragroup sparsity is obtained through a ℓ1 penalization, while intergroup nonsparsity with a ℓ2 penalization term. A second aspect of the paper is the proposition of a proved convex formulation of the associated Loss function. Finally, SoA synthetic and UCI dataset are used to illustrate the capability of the method. Quality : Sufficiently well written and illustrated. Clarity : The paper is quite readable and the propositions are formally justified. More challenging dataset would have been prefered than synthetic one perhaps. Originality and Signifiance : despite the large amount of already existed and studied LASSO algorithm, the exclusive group LASSO model and associated solving technique proposed in this paper is novel and justified enough to be signifiant.
Q2: Please summarize your review in 12 sentences
An interesting original formulation of a well known and studied approach of feature selection, group LASSO. The approach is well presented, justified and theoretically grounded. More challenging experiments would have been prefered. Submitted by Assigned_Reviewer_24
Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper presents a new formulation known as Exclusive Group LASSO for enforcing the sparsity for features in Group LASSO at an intragroup level. Both theoretical convergence analysis and empirical study are given. Q2: Please summarize your review in 12 sentences
The intragroup sparsity are shown to outperform traditional LASSO in several experiments, but this result may be dependent on how closely related these feature groups are. Thus a cauterization of the feature relations may be important to consider. Q1:Author rebuttal: Please respond to any concerns raised in the reviews. There are no constraints on how you want to argue your case, except for the fact that your text should be limited to a maximum of 6000 characters. Note however, that reviewers and area chairs are busy and may not read long vague rebuttals. It is in your own interest to be concise and to the point. We thank all reviewers for their valuable comments and strong support.
To reviewer 1:
Thanks for your support on our paper. Your main concern is to include some experiments and comparisons on more ``challenging'' dataset. Due to space limitation, we only report a portion of our results. We will include more experiments (such as ANDI dataset, pie face image dataset) in the supplemental materials of our final version.
To reviewer 2:
Thanks for your strong support and insightful comment. We agree that the results depend on how closely related these feature groups, which has been partially indicated by the correlation graph in Fig. 2c. It is kind of an implicit assumption behind of our formulation (exclusive group LASSO). Thanks for you pointing out this. We will include more examples / graphs to illustrate this point clearer.
To reviewer 3:
We very appreciate your carefully reading. We have fixed typos you pointed out. A few confusions are clarified in the following: G_g is a subset of {1, 2, ..., p}, g=1,2,3, ..., G; G is the total number of the group. G_u = {1, 2}, G_v = {2, 3}, and so on. Square brackets should be brackets. From the definition of supp(g) in Page 2, any feature index u that falls in set supp(g) is corresponding to nonzero feature value.
We also tried experiment on regression tasks. The experiment results also suggest the advantage of the exclusive group lasso also over several other lasso type formulations. Due to space limitation, they were not shown in our original submission, but would be included in our final version (probably in the supplemental material). In addition, comparisons to some traditional feature selection methods (like information gain) would be added in our final version.
We use different methods to select features, and then feed the selected features to linear SVM. In classification task, we use fivefold cross validation, and list the mean and variance of classification accuracy in Page 7.
 