NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:3426
Title:Copula Multi-label Learning

Reviewer 1

This paper proposes copula multi-label learning to explain the statistical properties of multi-label dependency. Copula has been widely used for multivariate data in many areas. To use copula in multi-label learning, the paper first leverages the kernel trick to estimate the multi-label distribution. The proposed model is estimated semiparametrically. Theoretical analysis is provided, and the error bound of the estimator is also given. The experiment validates the proposed method is better than existing methods. Originality: The paper is original. Quality: The paper appears to be of high quality and contains a substantial theoretical advance. Clarity: The exposition is clear. Significance: This work is likely of significance to the community. I suggest acceptance of the paper.

Reviewer 2

[Reply to authors feedback] I thank the authors for their answers. After that and reading the other reviews, I maintain that while the approach is original and potentially interesting, I miss a deeper analysis of the joint model. In particular: * There is no clear theoretical justification to develop a complete joint complex model if it is then to focus on the marginal distributions (whose estimation does not require estimation of the dependencies), and in my viewpoint the biggest benefit of having a full joint distribution is the ability to optimize arbitrary loss function, a topic never touched upon in the paper. * I miss a comparison with other approaches, not based on copulas, that also try to completely model the dependencies between labels, such as probabilistic chaining, be it from a theoretical or empirical perspective. Originality: to my knowledge, using copulas to solve the global multi-label problem has not been done before, hence the paper can be considered as original. Quality: the submission appears to be correct, yet I honestly did not dwelve into the formulas, as the presentation is quite heavy in notations, starting with the presentation that is very technical, and much more involuted than classical presentation of copulas. Being familiar with copulas, I could follow most of the explanations, but someone unfamaliar with them would probably have a quite hard time to follow the paper. It is also surprising that on the one hand authors try to identify the whole joint distribution, if it is at the end to make their predictions by only focusing on the marginals on the labels (L140-L144), which in theory (but in practice is) is not different from BR. There are no discussion related to optiizing particular loss functions (in particular the one used in the experiments), nor of the connection with the probabilistic chaining, that in principle can minimize any given loss. It is somehow disppointing to use a fancy model to then make predictions using only the marginals on the labels. What about commparing with methods trying to optimize such loss functions? The experiments are also a bit disappointing, in the sense tha tthe consistent improvement observed is a bit low, and almost always within the standard deviation interval given in the results. A lot of efforts is spent on showing good properties of the estimator, however it is not clear to me whether a simple BR estimator also does not enjoy such similar properties, at least of unbiasedness and convergence of the marginal estimate? Clarity: the paper is well-written, but the mathematics are written in a quite complex way. I think this could be simplified

Reviewer 3

Due to wide applications, multi-label learning has become one of the most important research area in machine learning. However, the statistical properties of existing multi-label dependency modelings are not well understood. To provide the statistical understandings for multi-label learning, this paper proposes a new copula multi-label learning paradigm for modeling label and feature dependencies. The continuous distribution in the output space is first constructed by leveraging the kernel trick in this paper, and then the proposed model is estimated semi-parametrically. Moreover, this paper shows that the proposed estimator is an unbiased and consistent estimator and follows asymptotically a normal distribution, and they further provide the bound for the mean squared error of estimator. The experiment validates the performance of the proposed method.