Paper ID: | 4552 |
---|---|

Title: | Multiclass Learning from Contradictions |

Originality : The proposed method is a novel combination of existing methods. This combination comes with theoretical guarantees and practical tools to deal with the obvious tuning complexity. Quality : All claims and proposition are justified and detailed (if not in the paper, in the supplementary material). I did not find flaws in it. The interest of incorporating universum examples is shown (already done in binary case) and the motivation to adapt the framework to multiclass case, is established : forcing a universum example to be neutral for all classes makes sense. The paper proposes a complete piece of work concerning the proposed algorithm. However I'm less convinced by the experimental evaluation : the bibliography points previous works on universum prescription (at least) that are not taken into account in the experimental part. Experiments only compare the proposed method to the ones it is build on (SVM and binary U-SVM), which is not enough : it's a good start but not a complete study. I could even suspect that the comparison could be in favor of universum prespription, surely in terms of complexity, and maybe in terms of accuracy too... an argument in favour of MU-SVM could be the fact that it can easily be applied to small dataset, but avoiding the comparison is not a good option. Clarity : The paper is clearly written and quite easy to follow. There are a few missing notations and some figures don't have a sufficient definition and are ugly once printed. Significance : this is the weak point of the paper. MU-SVM does not apply to large dataset (ok could be a positive argument in some contexts) but even for small datasets it has many hyper-parameters and a high training complexity. Obviously the authors have worked hard on alleviating this problem but I'm not sure it will be enough in practice. Details in the order of appearance in the manuscript: eq(2) : \Lambda not defined eq(3) : \delta_{ij} deifned only for eq(7) l110 : "thet the" eq(11) : d? l143 : I did not catch z_i definition, what are zeros ? sec 3.4 : are we sure that good parameters for M-SVM are good for MU-SVM? l 194 : assumptions are very restrictive... (but experiments semmes to validate the approximation) l 201 the Kronecker definition should have appear earlier in the paper (l 145)

AFTER REBUTTAL This is an interesting problem setting. The paper covers a large amount of highly technical material and some proofs contain interesting original ideas. Furthermore, many experiments were run and I have the rare feeling that contrary to many submissions, the results were truly honestly reported rather than cherry picked. Overall, it is clear that the authors worked very hard on this paper and are mathematically competent. However, more work is needed to make the paper meet NeurIPS's standards in terms of exposition: the paper is sloppily written, both in terms of grammar and in terms of mathematical accuracy. Most proofs and mathematical statements are littered with typos and the formulae are often awkward. Although I believe the proofs to be correct (in the sense that the flaws are not fatal) and it is in most cases possible to understand the proof after thinking about it for a long time, the mistakes severely impede understanding. The significance of the results in the experiments section does not appear to be very clearly explained either. In conclusion, the topic and the amount of material covered are amply enough for an excellent submission. However, the work would definitely benefit from substantial revision. Taking everything into account, this is a borderline submission. I am also concerned that the authors did not seem to acknowledge the need for revision in the rebuttal phase. For the good of the community, I hope they will consider making improvements for the camera-ready version.

This article deals with multi-category pattern classification in the framework of universum learning. The authors introduce a new multi-class support vector machine which is an extension of the model of Crammer and Singer. For this machine, they establish a bound on the generalization performance and a model selection algorithm based on an extension of the span bound. Empirical results are provided. They are obtained on three well-known data sets. This contribution introduces a relevant solution to the problem addressed. It seems to be technically sound, although I could not check all the proofs in details. Regarding the originality, it is noteworthy that other multi-class extensions of the span bound can be found in the literature, for instance in the PhD manuscript of R. Bonidal. The major choices made should be better justified. For instance, the M-SVM of Crammer and Singer is not the first option that comes to mind since its loss function is not Fisher consistent. The choice of the capacity measure (Natarajan dimension) is also unusual. Indeed, one would have expected to find a scale-sensitive measure, let it be a Rademacher complexity, a metric entropy or a scale-sensitive combinatorial dimension (fat-shattering dimension of the class of margin functions...). Some minor corrections should be made. For instance, The notation for the Kronecker product is used at line 145 but introduced at line 200. Below is a non exhaustive list of typos. Line 42: natarajan -> Natarajan Line 47: propostion -> proposition Line 110: the the -> the Line 247: Table 2 show -> Table 2 shows Line 249: upto -> up to *** Update *** I appreciated the answers to my comments. RĂ©mi Bonidal's PhD manuscript can be found at the following address: http://docnum.univ-lorraine.fr/public/DDOC_T_2013_0066_BONIDAL.pdf I think that the paper of Maximov and Reshetova is not technically sound (to say the least).