{"title": "Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms", "book": "Advances in Neural Information Processing Systems", "page_first": 7847, "page_last": 7857, "abstract": "The data deluge comes with high demands for data labeling. Crowdsourcing (or, more generally, ensemble learning) techniques aim to produce accurate labels via integrating noisy, non-expert labeling from annotators. The classic Dawid-Skene estimator and its accompanying expectation maximization (EM) algorithm have been widely used, but the theoretical properties are not fully understood. Tensor methods were proposed to guarantee identification of the Dawid-Skene model, but the sample complexity is a hurdle for applying such approaches---since the tensor methods hinge on the availability of third-order statistics that are hard to reliably estimate given limited data. In this paper, we propose a framework using pairwise co-occurrences of the annotator responses, which naturally admits lower sample complexity. We show that the approach can identify the Dawid-Skene model under realistic conditions. We propose an algebraic algorithm reminiscent of convex geometry-based structured matrix factorization to solve the model identification problem efficiently, and an identifiability-enhanced algorithm for handling more challenging and critical scenarios. Experiments show that the proposed algorithms outperform the state-of-art algorithms under a variety of scenarios.", "full_text": "Crowdsourcing via Pairwise Co-occurrences:\n\nIdenti\ufb01ability and Algorithms\n\nSchool of Elect. Eng. & Computer Sci.\n\nSchool of Elect. Eng. & Computer Sci.\n\nShahana Ibrahim\n\nOregon State University\n\nCorvallis, OR 97331\n\nibrahish@oregonstate.edu\n\nNikos Kargas\n\nUniversity of Minnesota\nMinneapolis, MN 55455\n\nkaga005@umn.edu\n\nXiao Fu\u2217\n\nOregon State University\n\nCorvallis, OR 97331\n\nxiao.fu@oregonstate.edu\n\nKejun Huang\n\nUniversity of Florida\nGainesville, FL 32611\nkejun.huang@ufl.edu\n\nDepartment of Elect. & Computer Eng.\n\nDepartment of Computing & Info. Sci. & Eng.\n\nAbstract\n\nThe data deluge comes with high demands for data labeling. Crowdsourcing (or,\nmore generally, ensemble learning) techniques aim to produce accurate labels via\nintegrating noisy, non-expert labeling from annotators. The classic Dawid-Skene\nestimator and its accompanying expectation maximization (EM) algorithm have\nbeen widely used, but the theoretical properties are not fully understood. Tensor\nmethods were proposed to guarantee identi\ufb01cation of the Dawid-Skene model, but\nthe sample complexity is a hurdle for applying such approaches\u2014since the tensor\nmethods hinge on the availability of third-order statistics that are hard to reliably\nestimate given limited data. In this paper, we propose a framework using pairwise\nco-occurrences of the annotator responses, which naturally admits lower sample\ncomplexity. We show that the approach can identify the Dawid-Skene model under\nrealistic conditions. We propose an algebraic algorithm reminiscent of convex\ngeometry-based structured matrix factorization to solve the model identi\ufb01cation\nproblem ef\ufb01ciently, and an identi\ufb01ability-enhanced algorithm for handling more\nchallenging and critical scenarios. Experiments show that the proposed algorithms\noutperform the state-of-art algorithms under a variety of scenarios.\n\n1\n\nIntroduction\n\nBackground. The drastically increasing availability of data has successfully enabled many timely\napplications in machine learning and arti\ufb01cial intelligence. At the same time, most supervised\nlearning tasks, e.g., the core tasks in computer vision, natural language processing, and speech\nprocessing, heavily rely on labeled data. However, labeling data is not a trivial task\u2014it requires\neducated and knowledgeable annotators (which could be human workers or machine classi\ufb01ers),\nto work under a reliable way. More importantly, it needs an effective mechanism to integrate the\npossibly different labeling from multiple annotators. Techniques addressing this problem in machine\nlearning are called crowdsourcing [24] or more generally, ensemble learning [8].\n\n\u2217The work is supported in part by the National Science Foundation under projects ECCS 1808159 and NSF\nECCS 1608961, and by the Army Research Of\ufb01ce (ARO) under projects ARO W911NF-19-1-0247 and ARO\nW911NF-19-1-0407.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fCrowdsourcing has a long history in machine learning, which can be traced back to the 1970s [6].\nMany models and methods have appeared since then [22, 23, 21, 34, 38, 28, 37]. Intuitively, if a\nnumber of reliable annotators label the same data samples, then majority voting among the annotators\nis expected to work well. However, in practice, not all the annotators are equally reliable\u2014e.g.,\ndifferent annotators could be specialized for recognizing different classes. In addition, not all the\nannotators are labeling all the data samples, since data samples are often dispatched to different groups\nof annotators in a certain way. Under such circumstances, majority voting is not very promising.\nA more sophisticate way is to treat the crowdsourcing problem as a model identi\ufb01cation problem.\nThe arguably most popular generative model in crowdsourcing is the Dawid-Skene model [6], where\nevery annotator is assigned with a \u2018confusion matrix\u2019 that decides the probability of an annotator\ngiving class label (cid:96) when the ground-truth label is g. If such confusion matrices and the probability\nmass function (PMF) of the ground-truth label can be identi\ufb01ed, then a maximum likelihood (ML) or\na maximum a posteriori (MAP) estimator for the true label of any given sample can be constructed.\nThe Dawid-Skene model is quite simple and succinct, and some of the model assumptions (e.g., the\nconditional independence of the annotator responses) are actually debatable. Nonetheless, this model\nhas been proven very useful in practice [31, 37, 14, 23, 28, 39].\nTheoretical aspects for the Dawid-Skene model, however, are less well understood. In particular, it\nhad been unclear if the model could be identi\ufb01ed via the accompanying expectation maximization\n(EM) algorithm proposed in the same paper [6], until some recent works addressing certain special\ncases [23]. The works in [37, 39] put forth tensor methods for learning the Dawid-Skene model.\nThese methods admit model identi\ufb01ability, and also can be used to effectively initialize the classic\nEM algorithm provably [39]. The challenge is that tensor methods utilize third-order statistics of the\ndata samples, which are rather hard to estimate reliably in practice given limited data [19].\nContributions. In this work, we propose an alternative for identifying the Dawid-Skene model,\nwithout using third-order statistics. Our approach is based on utilizing the pairwise co-occurrences\nof annotators\u2019 responses to data samples\u2014which are second-order statistics and thus are naturally\nmuch easier to estimate compared to the third-order ones. We show that, by judiciously combining\nthe co-occurrences between different annotator pairs, the confusion matrices and the ground-truth\nlabel\u2019s prior PMF can be provably identi\ufb01ed, under realistic conditions (e.g., when there exists a\nrelatively well-trained annotator among all annotators). This is reminiscent of nonnegative matrix\ntheory and convex geometry [13, 15]. Our approach is also naturally robust to spammers as well as\nscenarios where every annotator only labels partial data. We offer two algorithms under the same\nframework. The \ufb01rst algorithm is algebraic, and thus is ef\ufb01cient and suitable for handling very\nlarge-scale crowdsourcing problems. The second algorithm offers enhanced identi\ufb01ability guarantees,\nand is able to deal with more critical cases (e.g., when no highly reliable annotators exist), with the\nprice of using a computationally more involved iterative optimization algorithm. Experiments show\nthat both approaches outperform a number of competitive baselines.\n2 Background\nThe Dawid-Skene Model. Let us consider a dataset {fn}N\nn=1, where fn \u2208 Rd is a data sample (or,\nfeature vector) and N is the number of samples. Each fn belongs to one of K classes. Let yn be\nthe ground-truth label of the data sample fn. Suppose that there are M annotators who work on\nthe dataset {fn}N\nn=1 and provide labels. Let Xm(fn) represent the response of the annotator m to\nfn. Hence, Xm can be understood as a discrete random variable whose alphabet is {1, . . . , K}. In\ncrowdsourcing or ensemble learning, our goal is to estimate the true label corresponding to each item\nfn from the M annotator responses. Note that in a realistic scenario, an annotator will likely to only\nwork on part of the dataset, since having all annotators work on all the samples is much more costly.\nIn 1979, Dawid and Skene proposed an intuitively pleasing model for estimating the \u2018true response\u2019\nof the patients from recorded answers [6], which is essentially a crowdsourcing/ensemble learning\nproblem. This model has sparked a lot of interest in the machine learning community [31, 37, 14,\n23, 28, 39]. The Dawid-Skene model in essence is a naive Bayesian model [29]. In this model, the\nground-truth label of a data sample is a latent discrete random variable, Y , whose values are different\nclass indices. The ambient variables are the responses given by different annotators, denoted as\nX1, . . . , XM , where M is the number of annotators. The key assumption in the Dawid-Skene model\nis that given the ground-truth label, the responses of the annotators are conditionally independent.\nOf course, the Dawid-Skene model is a simpli\ufb01ed version of reality, but has been proven very\nuseful\u2014and it has been a workhorse for crowdsourcing since its proposal.\n\n2\n\n\fUnder the Dawid-Skene model, one can see that\n\nPr(X1 = k1, . . . , XM = kM ) =\n\nk=1\n\nm=1\n\nK(cid:88)\n\nM(cid:89)\n\nPr(Xm = km|Y = k)Pr(Y = k),\n\n(1)\n\nwhere k \u2208 {1, . . . , K} denotes the index of a given class, and km denotes the response of the m-th\nannotator. If one de\ufb01nes a series of matrices Am \u2208 RK\u00d7K and let\nA(km, k) := Pr(Xm = km|Y = k),\n\n(2)\nthen Am \u2208 RK\u00d7K can be understood as the \u2018confusion matrix\u2019 of annotator m: It contains all the\nconditional probabilities of annotator m labeling a given data sample as from class km while the\nground-truth label is k. Also de\ufb01ne a vector d \u2208 RK such that d(k) := Pr(Y = k); i.e., the prior\nPMF of the ground-truth label Y . Then the crowdsourcing problem boils down to estimating Am for\nm = 1, . . . , M and d.\nPrior Art. In the seminal paper [6], Dawid and Skene proposed an EM-based algorithm to estimate\nPr(Xm = km|Y = k) and Pr(Y = k). Their formulation is well-motivated from an ML viewpoint,\nbut also has some challenges. First, it is unknown if the model is identi\ufb01able, especially when there\nis a large number of unrecorded responses (i.e., missing values)\u2014but model identi\ufb01cation plays an\nessential role in such estimation problems [13]. Second, since the ML estimator is a nonconvex\noptimization criterion, the solution quality of the EM algorithm is not easy to characterize in general.\nMore recently, tensor methods were proposed to identify the Dawid-Skene model [39, 37]. Take the\nmost recent work in [37] as an example. The approach considers estimating the joint probability\nPr(Xi = ki, Xj = kj, X(cid:96) = k(cid:96)) for different triples i, j, (cid:96). Such joint PMFs can be regarded as third-\norder tensors, and the confusion matrices and the prior d are latent factors of these tensors. The upshot\nis that identi\ufb01ability of Am and d can be elegantly established leveraging tensor algebra [33, 25]. The\nchallenge, however, is that reliably estimating Pr(Xi = ki, Xj = kj, X(cid:96) = k(cid:96)) is quite hard, since it\nnormally needs a large number of annotator responses. Another tensor method in [39] judiciously\npartitions the data and works with group statistics between three groups, which is reminiscent of the\ngraph statistics proposed in [1]. The method is computationally more tractable, leveraging orthogonal\ntensor decomposition. Nevertheless, the challenge again lies in sample complexity: the group/graph\nstatistics are still third-order statistics.\n3 Proposed Approach\nIn this section, we propose a model identi\ufb01cation approach that only uses second-order statistics, in\nparticular, pairwise co-occurrences Pr(Xi = ki, Xj = kj).\nProblem Formulation. Let us consider the following pairwise joint PMF: Pr(Xm = km, X(cid:96) =\nk=1 Pr(Y = k)Pr(Xm = km|Y = k)Pr(X(cid:96) = k(cid:96)|Y = k). Letting Rm,(cid:96)(km, k(cid:96)) =\nPr(Xm = km, X(cid:96) = k(cid:96)), and using the matrix notations that we de\ufb01ned, we have Rm,(cid:96)(km, k(cid:96)) =\n\nk(cid:96)) = (cid:80)K\n(cid:80)K\nk=1 Pr(Y = k)Pr(Xm = km|Y = k)Pr(X(cid:96) = k(cid:96)|Y = k)\u2014or, in a more compact form:\n\nRm,(cid:96)(km, k(cid:96)) =\n\nd(k)Am(km, k)A(cid:96)(k(cid:96), k) \u21d0\u21d2 Rm,(cid:96) := AmDA(cid:62)\n(cid:96) ,\n\nwhere we have D = Diag(d), which is a diagonal matrix. Note that Am is a confusion matrix, i.e.,\nits columns are respectable probability measures. In addition, d is a prior PMF. Hence, we have\n\nk=1\n\n1(cid:62)Am = 1(cid:62), Am \u2265 0, \u2200 m,\n\n1(cid:62)d = 1, d \u2265 0.\n\n(3)\n\n(cid:80)\n\nIn practice, Rm,(cid:96)\u2019s are not available but can be estimated via sample averaging.\nSpeci\ufb01cally,\nif we are given the annotator\nI [Xm(fn) = km, X(cid:96)(fn) = k(cid:96)] , where Sm,(cid:96) is the index set of samples which\n1|Sm,(cid:96)|\nboth annotators m and (cid:96) have worked on. Here, I[\u00b7] is an indicator function: If the event E happens,\nthen I[E] = 1, and I[Ec] = 0 otherwise. It is readily seen that\n\nresponses Xm(fn),\n\nn\u2208Sm,(cid:96)\n\nthen (cid:98)Rm,(cid:96)(km, k(cid:96)) =\n\nE [I(Xm(fn) = km, X(cid:96)(fn) = k(cid:96))] = Rm,(cid:96)(km, k(cid:96)),\n\n(4)\nwhere the expectation is taken over data samples. Note that the sample complexity for reliably\nestimating Rm,(cid:96) is much lower relative to that of estimating Rm,n,(cid:96) [39, 1], and the latter is needed\n\n3\n\nK(cid:88)\n\n\fin tensor based methods, e.g., [37]. To be speci\ufb01c, to achieve |Rm,(cid:96)(km, k(cid:96)) \u2212 (cid:98)Rm,(cid:96)(km, k(cid:96))| \u2264 \u0001\nneeded. However, in order to attain the same accuracy for (cid:98)Rm,n,(cid:96)(km, kn, k(cid:96)), the number of joint\n\nwith a probability greater than 1 \u2212 \u03b4, O(\u0001\u22122(log 1\n\n\u03b4 )) joint responses from annotators m and (cid:96) are\n\nresponses from annotators m,n and (cid:96) is required to be atleast O(K\u0001\u22122(log K\nnumber of classes (also see supplementary materials Sec. J for a short discussion).\nAn Algebraic Algorithm. Assume that we have obtained Rm,(cid:96)\u2019s for different pairs of m, (cid:96). We now\nshow how to identify Am\u2019s and d from such second-order statistics. Let us take the estimation of\nAm as an illustrative example. First, we construct a matrix Zm as follows:\n\n\u03b4 )), where K is the\n\n(5)\nwhere mt (cid:54)= m for t = 1, . . . , T (m) denote the indices of annotators who have co-labeled data\nsamples with annotator m, and the integer T (m) denotes the number of such annotators. Due\nto the underlying model of Rm,(cid:96) in (3), we have Zm =\n=\n\n(cid:3) ,\n\nZm =(cid:2)Rm,m1, Rm,m2 , . . . , Rm,mT (m)\n(cid:104)\n(cid:105) \u2208 RK\u00d7KT (m). Let us de\ufb01ne H(cid:62)\n\n(cid:104)\n\nAmDA(cid:62)\nm =\n\nm1\n\n, . . . , AmDA(cid:62)\n, . . . , DA(cid:62)\n\nDA(cid:62)\n\nT (m)\n\nDA(cid:62)\n\n, . . . , DA(cid:62)\n\nm1\n\nT (m)\n\nAm\nRK\u00d7KT (m). This leads to the model Zm = AmH(cid:62)\nm. We propose to identify Am from Zm. The\nkey enabling postulate is that, among all annotators, some A(cid:96)\u2019s should be diagonally dominant\u2014if\nthere exist annotators who are reasonably trained. In other words, for a reasonable annotator (cid:96),\nPr(X(cid:96) = j|Y = j) should be greater than Pr(X(cid:96) = j|Y = k) and Pr(X(cid:96) = j|Y = i) for k, i (cid:54)= j.\nTo see the intuition of the algorithm, consider an ideal case where for each class k, there exists an\nannotator mt(k) \u2208 {m1, . . . , mT (m)} such that\n\nT (m)\n\nm1\n\nPr(Xmt(k) = k|Y = k) = 1, Pr(Xmt(k) = k|Y = j) = 0,\n\nj (cid:54)= k.\n\n(6)\n\n(cid:105)\n(cid:105) \u2208\n\n(cid:104)\n\nThis physically means that annotator mt(k) is very good at recognizing class k and never confuses\nother classes with class k. Under such circumstances, one can use the following procedure to\nidentify Am. First, let us normalize the columns of Zm via Zm(:, q) = Zm(:, q)/(cid:107)Zm(:, q)(cid:107)1 for\nq = {1, . . . , KT (m)}. This way, we have a normalized model Zm = AmH\n\n(cid:62)\nm, where\nHm(q, :)(cid:107)Am(:, k)(cid:107)1\n\nAm(:, k)\n(cid:107)Am(:, k)(cid:107)1\n\nAm(:, k) =\n\n(7)\nwhere the second equality above is because (cid:107)Am(:, k)(cid:107)1 = 1 [cf. Eq. (3)]. After normalization, it\ncan be veri\ufb01ed that\n\n= Am(:, k), H m(q, :) =\n\n(cid:107)Zm(:, q)(cid:107)1\n\n.\n\n(8)\ni.e., all the rows of H m reside in the (K \u2212 1)-probability simplex. In addition, by the assumption\nin (6), it is readily seen that there exists \u039bq = {q1, . . . , qK} \u2282 {1, . . . , Lm} where Lm = KT (m)\nsuch that\n\nH m1 = 1, H m \u2265 0,\n\nH m(\u039bq, :) = IK,\n\n(9)\n\ni.e., an identity matrix is a submatrix of H m (after proper row permutations). Consequently, we have\nAm = Zm(:, \u039bq)\u2014i.e., Am can be identi\ufb01ed from Zm up to column permutations. The task also\nboils down to identifying \u039bq. This turns out to be a well-studied task in the context of separable\nnonnegative matrix factorization [16, 15, 13], and an algebraic algorithm exists:\n\n\u22a5(cid:98)Am(:,1:k\u22121)Zm(:, q)\nwhere (cid:98)Am(:, 1 : k \u2212 1) = [Zm(:,(cid:98)q1), . . . , Zm(:,(cid:98)qk\u22121)] and P \u22a5(cid:98)Am(:,1:k\u22121) is a projector onto the orthogonal\ncomplement of range((cid:98)Am(:, 1 : k \u2212 1)) and we let P \u22a5(cid:98)Am(:,1:0) := I.\n\nq\u2208{1,...,Lm}\n\n, \u2200k.\n\n(10)\n\n2\n\n(cid:98)qk = arg max\n\n(cid:13)(cid:13)(cid:13)P\n\n(cid:13)(cid:13)(cid:13)2\n\nIt has been shown in [16, 2] that the so-called successive projection algorithm (SPA) in Eq. (10)\nidenti\ufb01es \u039bq in K steps. This is a very plausible result, since the procedure admits Gram-Schmitt-like\nlightweight steps and thus is quite scalable. See more details in Sec. F.1.\nEach of the Am\u2019s can be estimated from the corresponding Zm by repeatedly applying SPA, and we\ncall this simple procedure multiple SPA (MultiSPA) as we elaborate in Algorithm 1.\n\n4\n\n\fOf course, assuming that (6) or (9) holds per-\nfectly may be too ideal. It is more likely that\nthere exist some annotators who are good at\nrecognizing certain classes, but still have some\npossibilities of being confused. It is of interest\nto analyze how SPA can do under such condi-\ntions. Another challenge is that one may not\nhave Rm,(cid:96) perfectly estimated, since only lim-\nited number of samples are available. It is desir-\nable to understand the sample complexity of ap-\nplying SPA to Dawid-Skene identi\ufb01cation. We\nanswer these two key technical questions in the\nfollowing theorem:\nTheorem 1. Assume that annotators m\nleast S samples \u2200t \u2208\nand t co-label at\n\nAlgorithm 1 MultiSPA\n\n(cid:96)1 norm;\n\nfor m = 1 to M do\n\nInput: Annotator Responses {Xm(fn)}.\n\nOutput: (cid:98)Am for m = 1, . . . , M, (cid:98)d.\nestimate second order statistics (cid:98)Rm,(cid:96);\nconstruct (cid:98)Zm and normalize columns to unit\nestimate (cid:98)Am using Eq. (10);\n\ufb01x permutation mismatch between (cid:98)Am and (cid:98)A(cid:96)\nestimate (cid:98)D = (cid:98)A\u22121\nm Rm,(cid:96)((cid:98)A(cid:62)\nextract the prior (cid:98)d = diag((cid:98)D).\n\nend for\nfor all m (cid:54)= (cid:96);\n\n(cid:96) )\u22121 (and take av-\n\n{m1, . . . , mT (m)}, and that (cid:98)Zm is constructed\nusing (cid:98)Rm,mT (m)\u2019s according to Eq. (5). Also assume that the constructed (cid:98)Zm satis\ufb01es (cid:107)(cid:98)Zm(:\n\n, l)(cid:107)1 \u2265 \u03b7,\u2200l \u2208 {1, . . . KT (m)}, where \u03b7 \u2208 (0, 1]. Suppose that rank(Am) = rank(D) = K\nfor m = 1, . . . , M, and that for every class index k \u2208 {1, . . . , K}, there exists an annotator\nmt(k) \u2208 {m1, . . . , mT (m)} such that\n\nerage over all pairs (m, (cid:96)) if needed).;\n\nK(cid:88)\n\nj=1\n\n(11)\n\n(cid:16)\n\nmax\n\n\u221a\n\n\u221a\n\nPr(Xmt(k) = k|Y = k) \u2265 (1 \u2212 \u0001)\n\nPr(Xmt(k) = k|Y = j),\n\n(cid:16)\n\nmin\n\u03a0\n\n, with\n\nK\u03ba2(Am) max\n\n(cid:107)(cid:98)Am\u03a0 \u2212 Am(cid:107)2,\u221e\n\nK\u22121\u03ba\u22123(Am),(cid:112)ln(1/\u03b4)(\u03c3max(Am)\n\u03c3max(Am)\u0001,(cid:112)ln(1/\u03b4)(\n\nS\u03b7)\u22121(cid:17)(cid:17)\nwhere \u0001 \u2208 [0, 1]. Then, if \u0001 \u2264 O(cid:16)\nprobability greater than 1 \u2212 \u03b4, the SPA algorithm in (10) can estimate an (cid:98)Am such that\nS\u03b7)\u22121(cid:17)(cid:19)\n(cid:18)\u221a\n(cid:16)\n(cid:17) \u2264 O\nIn the above Theorem, the assumption (cid:107)(cid:98)Zm(:, l)(cid:107)1 \u2265 \u03b7 means that the proposed algorithm favors\ncases where more co-occurrences are observed, since (cid:98)Zm\u2019s elements are averaged number of co-\n\n(12)\nwhere \u03a0 \u2208 RK\u00d7K is a permutation matrix, (cid:107)Y (cid:107)2,\u221e = max(cid:96) (cid:107)Y (:, (cid:96))(cid:107)2, \u03c3max(Am) is the largest\nsingular value of Am, and \u03ba(Am) is the condition number of Am.\n\noccurrences\u2014which makes a lot of sense. In addition, Eq. (11) relaxes the ideal assumption in (6),\nallowing the \u2018good annotator\u2019 mt(k) to confuse class j (cid:54)= k with class k up to a certain probability,\nthereby being more realistic. The proof of Theorem 1 is reminiscent of the noise robustness of the\nSPA algorithm [16, 2]; see the supplementary materials (Sec. F.1). A direct corollary is as follows:\n\nCorollary 1. Assume that the conditions in Theorem 1 hold for (cid:98)Zm and Am, \u2200m \u2208 {1, . . . , M}.\nThen, the estimation error bound in (12) holds for every MultiSPA-output (cid:98)Am, \u2200m \u2208 {1, . . . , M}.\n\nTheorem 1 and Corollary 1 are not entirely surprising due to the extensive research on SPA-like\nalgorithms [2, 16, 10, 30, 4]. The implication for crowdsourcing, however, is quite intriguing. First,\none can see that if an annotator m does not label all the data samples, it does not necessarily hurt\nthe model identi\ufb01ability\u2014as long as annotator m has co-labeled some samples with a number of\nother annotators, identi\ufb01cation of Am is possible. Second, assume that there exists a well-trained\nannotator m(cid:63) whose confusion matrix is diagonally dominant, then for every annotator m who has\nco-labeled samples with annotator m(cid:63), the matrix H m can easily satisfy (11) by letting mt(k) = m(cid:63)\nfor all k. In practice, one would not know who is m(cid:63)\u2014otherwise the crowdsourcing problem would\nbe trivial. However, one can design a dispatch strategy such that every pair of annotators m and (cid:96)\nco-label a certain amount of data. This way, it guarantees that Am(cid:63) appears in everyone else\u2019s Hm\nand thus ensures identi\ufb01ability of all Am\u2019s for m (cid:54)= m(cid:63). This insight may shed some light on how to\neffectively dispatch data to annotators.\nAnother interesting question to ask is does having more annotators help? Intuitively, having more\nannotators should help: If one has more rows in H m, then it is more likely that some rows approach\n\n5\n\n\f\u03c1\n\nK\n\nlog\n\n(cid:17)(cid:17)\n\n(cid:16) K\n\n(cid:16) \u03b5\u22122(K\u22121)\n\nthe vertices of the probability simplex\u2014which can then enable SPA. We use the following simpli\ufb01ed\ngenerative model and theorem to formalize the intuition:\nTheorem 2. Let \u03c1 > 0, \u03b5 > 0, and assume that the rows of H m are generated within the\n(K \u2212 1)-probability simplex uniformly at random. If the number of annotators satis\ufb01es M \u2265\n, then, with probability greater than or equal to 1 \u2212 \u03c1, there exist rows of\n\u2126\nH m indexed by q1, . . . qK such that (cid:107)H m(qk, :) \u2212 e(cid:62)\nNote that Theorem 2 implies (11) under proper \u03b5 and \u0001\u2014and thus having more annotators indeed\nhelps identify the model. The above can be shown by utilizing the Chernoff-Hoeffding inequality,\nand the detailed proof can be found in the supplementary materials (Sec. G).\n\nAfter obtaining (cid:98)Am\u2019s, d can be estimated via various ways\u2014see the supplementary materials in Sec.\nD. Using (cid:98)d and (cid:98)Am\u2019s together, ML and MAP estimators for the true labels can be built up [37].\n\nk(cid:107)2 \u2264 \u03b5, k = 1, . . . , K.\n\n4\n\nIdenti\ufb01ability-enhanced Algorithm\n\nThe MultiSPA algorithm is intuitive and lightweight, and is effective as we will show in the experi-\nments. One concern is that perhaps the assumption in (11) may be violated in some cases. In this\nsection, we propose another model identi\ufb01cation algorithm that is potentially more robust to critical\nscenarios. Speci\ufb01cally, we consider the following feasibility problem:\n\n\ufb01nd {Am}M\n\nm=1, D\n\nsubject to Rm,(cid:96) = AmDA(cid:62)\n\n(cid:96) , \u2200m, (cid:96) \u2208 {1, . . . , M}\n\n1(cid:62)Am = 1(cid:62), Am \u2265 0, \u2200m, 1(cid:62)d = 1, d \u2265 0.\n\n(13a)\n(13b)\n(13c)\n\nThe criterion in (13) seeks confusion matrices and a prior PMF that \ufb01t the available second-order\nstatistics. The constraints in (13c) re\ufb02ect the fact that the columns of Am\u2019s are conditional PMFs and\nthe prior d is also a PMF.\nTo proceed, let us \ufb01rst introduce the following notion from convex geometry [13, 27]:\nDe\ufb01nition 1. (Suf\ufb01ciently Scattered) A nonnegative matrix H \u2208 RL\u00d7K is suf\ufb01ciently scattered if 1)\ncone{H(cid:62)} \u2287 C, and 2) cone{H(cid:62)}\u2217 \u2229 bdC\u2217 = {\u03bbek | \u03bb \u2265 0, k = 1, ..., K}. Here, C = {x|x(cid:62)1 \u2265\n\u221a\nK \u2212 1(cid:107)x(cid:107)2}, C\u2217 = {x|x(cid:62)1 \u2265 (cid:107)x(cid:107)2}. In addition, cone{H(cid:62)} = {x|x = H(cid:62)\u03b8, \u2200\u03b8 \u2265 0}\nand cone{H(cid:62)}\u2217 = {y|x(cid:62)y \u2265 0, \u2200x \u2208 cone{H(cid:62)}} are the conic hull of H(cid:62) and its dual cone,\nrespectively, and bd is the boundary of a closed set.\n\nThe suf\ufb01ciently scattered condition has recently emerged in convex geometry-based matrix factoriza-\ntion [27, 12]. This condition models how the rows of H are spread in the nonnegative orthant. In\nprinciple, the suf\ufb01ciently scattered condition is much easier to be satis\ufb01ed relative to the condition as\nin (9), or, the so-called separability condition under the context of nonnegative matrix factorization\n[9, 16]. H satisfying the separability condition is the extreme case, meaning that cone{H(cid:62)} = RK\n+ .\nHowever, the suf\ufb01ciently scattered condition only requires C \u2286 cone{H(cid:62)}\u2014which is naturally much\nmore relaxed; also see [13] and the supplementary materials for detailed illustrations (Sec. E).\nRegarding identi\ufb01ability of A1, . . . , AM and d, we have the following result:\nTheorem 3. Assume that rank(D) = rank(Am) = K for all m = 1, . . . , M, and that there\nexist two subsets of the annotators, indexed by P1 and P2, where P1 \u2229 P2 = \u2205 and P1 \u222a P2 \u2286\n{1, . . . , M}. Suppose that from P1 and P2 the following two matrices can be constructed: H (1) =\n(cid:96)|P2|](cid:62), where mt \u2208 P1 and (cid:96)j \u2208 P2. Furthermore,\n[A(cid:62)\nassume that i) both H (1) and H (2) are suf\ufb01ciently scattered; ii) all Rmt,(cid:96)j \u2019s for mt \u2208 P1 and\n(cid:96)j \u2208 P2 are available; and iii) for every m /\u2208 P1 \u222a P2 there exists a Rm,r available, where\nr \u2208 P1 \u222a P2. Then, solving Problem (13) recovers Am for m = 1, . . . , M and D = Diag(d) up to\nidentical column permutation.\n\nm|P1|](cid:62), H (2) = [A(cid:62)\n\n, . . . , A(cid:62)\n\n, . . . , A(cid:62)\n\nm1\n\n(cid:96)1\n\nThe proof of Theorem 3 is relegated to the supplementary results (Sec. H). Note that the theorem\nholds under the the existence of P1 and P2, but there is no need to know the sets a priori. Generally\n\n6\n\n\f(cid:16) K(K\u22121)\n\n(cid:17)(cid:17)\n\n\u03c1\n\nspeaking, a \u2018taller\u2019 matrix H (i) would have a better chance to have its rows suf\ufb01ciently spread in the\nnonnegative orthant under the same intuition of Theorem 2. Thus, having more annotators also helps\nto attain the suf\ufb01ciently scattered condition. Nevertheless, formally showing the relationship between\nthe number of annotators and H (i) for i = 1, 2 being suf\ufb01ciently scattered is more challenging than\nthe case in Theorem 2, since the suf\ufb01ciently scattered condition is a bit more abstract relative to the\nseparability condition\u2014the latter speci\ufb01cally assumes ek\u2019s exist as rows of H (i) while the former\ndepends on the \u2018shape\u2019 of the conic hull of (H (i))(cid:62), which contains an in\ufb01nite number of cases.\nTowards this end, let us \ufb01rst de\ufb01ne the following notion:\n\nDe\ufb01nition 2. Assume that there exist(cid:102)H \u2208 RL\u00d7K such that(cid:102)H is suf\ufb01ciently scattered. Also assume\nV is the row index set of(cid:102)H such that(cid:102)H(V, :) collects the extreme rays of cone{(cid:102)H (cid:62)}. If there exist\nrow indices (cid:96)v \u2208 {1, . . . , L} for all v \u2208 V, such that (cid:107)(cid:102)H(v, :) \u2212 H((cid:96)v, :)(cid:107)2 \u2264 \u03b5, then H \u2208 RL\u00d7K\n\nK\u03b12(K\u22122)\u03b52 log\n\n(cid:16) (K\u22121)2\n\nis called \u03b5-suf\ufb01ciently scattered.\nOne can see that an \u03b5-suf\ufb01ciently scattered matrix is suf\ufb01ciently scattered when \u03b5 \u2192 0. With this\nde\ufb01nition, we show the following theorem:\nTheorem 4. Let \u03c1 > 0, \u03b1\n2 > \u03b5 > 0,, and assume that the rows of H (1) and H (2) are generated from\nRK uniformly at random. If the number of annotators satis\ufb01es M \u2265 \u2126\n,\nwhere \u03b1 = 1 for K = 2, \u03b1 = 2/3 for K = 3 and \u03b1 = 1/2 for K > 3, then with probability greater\nthan or equal to 1 \u2212 \u03c1, H (1) and H (2) are \u03b5-suf\ufb01ciently scattered.\nThe proof of Theorem 4 is relegated to the supplementary materials (Sec. I). One can see that to\nsatisfy \u03b5-suf\ufb01ciently scattered condition, M is smaller than that in Theorem 2. Conditions i)-iii)\nin Theorem 3 and Theorem 4 together imply that if we have enough annotators, and if many pairs\nco-label a certain number of data, then it is quite possible that one can identify the Dawid-Skene\nmodel via simply \ufb01nding a feasible solution to (13). This feasibility problem is nonconvex, but can\nbe effectively approximated; see the supplementary materials (Sec. C). In a nutshell, we reformulate\nthe problem as a Kullback-Leibler (KL) divergence-based constrained \ufb01tting problem and handle it\nusing alternating optimization. Since nonconvex optimization relies on initialization heavily, we use\nMultiSPA to initialize the \ufb01tting stage\u2014which we will refer to as the MultiSPA-KL algorithm.\n5 Experiments\nBaselines. The performance of the proposed approach is compared with a number of competitive\nbaselines, namely, Spectral-D&S [39], TensorADMM [37], and KOS [22], EigRatio [5], GhoshSVD\n[14] and MinmaxEntropy [40]. The performance of the Majority Voting scheme and the Majority\nVoting initialized Dawid-Skene (MV-D&S) estimator [6] are also presented. We also use MultiSPA to\ninitialize EM algorithm (named as MultiSPA-D&S). Note that KOS, EigRatio and MinmaxEntropy\nwork with more complex models relative to the Dawid-Skene model, but are considered as good\nbaselines for the crowdsourcing/ensemble learning tasks. After identifying the model parameters, we\nconstruct a MAP predictor following [37] and observe the result. The algorithms are coded in Matlab.\nSynthetic-data Simulations. Due to page limitations, synthetic data experiments demonstrating\nmodel identi\ufb01ability of the proposed algorithms are presented in the supplementary materials (Sec. A).\nIntegrating Machine Classi\ufb01ers. We employ different UCI datasets (https://archive.ics.\nuci.edu/ml/datasets.html; details in Sec. B). For each of the datasets under test, we use a\ncollection of different classi\ufb01cation algorithms to annotate the data samples. Different classi\ufb01ca-\ntion algorithms from the MATLAB machine learning toolbox (https://www.mathworks.com/\nproducts/statistics.html) such as various k-nearest neighbour classi\ufb01ers, support vector ma-\nchine classi\ufb01ers, and decision tree classi\ufb01ers are employed to serve as our machine annotators. In\norder to train the annotators, we use 20% of the samples to act as training data. After the data samples\nare trained, we use the annotators to label the unseen data samples. In practice, not all samples\nare labeled by an annotator due to several factors such as annotator capacity, dif\ufb01culty of the task,\neconomical issues and so on. To simulate such a scenario, each of the trained algorithms is allowed\nto label a data sample with probability p \u2208 (0, 1]. We test the performance of all the algorithms under\ndifferent p\u2019s\u2014and a smaller p means a more challenging scenario. All the results are averaged from\n10 random trials.\nTable 1 shows the classi\ufb01cation error of the algorithms under test. Since GhoshSVD and EigenRatio\nworks only on binary tasks, they are not evaluated for the Nursery dataset where K = 4. The \u2018single\n\n7\n\n\fTable 1: Classi\ufb01cation Error (%) on UCI Datasets; see runtime tabulated in Sec. B.\n\nNursery\nAlgorithms\np = 0.5\np = 1\n4.54\n2.83\nMultiSPA\n4.26\n2.72\nMultiSPA-KL\n4.44\n2.82\nMultiSPA-D&S\n37.2\n3.14\nSpectral-D&S\n7.26\n17.97\nTensorADMM\n66.48\n2.92\nMV-D&S\n26.31\n3.63\nMinmax-entropy\nN/A\nN/A\nEigenRatio\n6.07\n4.21\nKOS\nN/A\nN/A\nGhosh-SVD\n4.83\nMajority Voting 2.94\n3.94\nSingle Best\nN/A\n15.65 N/A\nSingle Worst\n\np = 0.2\n17.96\n13.06\n13.39\n44.29\n19.78\n66.61\n11.09\nN/A\n13.48\nN/A\n19.75\nN/A\nN/A\n\nMushroom\np = 0.5\n0.293\n0.152\n0.194\n0.198\n0.237\n47.99\n0.163\n0.329\n0.576\n0.329\n0.566\nN/A\nN/A\n\np = 0.2\n6.35\n5.89\n6.17\n6.17\n6.18\n48.63\n8.14\n5.97\n6.42\n5.97\n6.57\nN/A\nN/A\n\np = 1\n0.02\n0.00\n0.00\n0.00\n0.06\n0.00\n0.00\n0.06\n0.06\n0.06\n0.14\n0.00\n7.22\n\nAdult\np = 0.5\np = 1\n16.05\n15.71\n15.98\n15.66\n16.29\n15.74\n16.31\n15.72\n16.05\n15.72\n75.21\n15.76\n16.92\n16.11\n16.28\n15.84\n24.97\n17.19\n16.28\n15.84\n15.75\n16.21\n16.23 N/A\n19.27 N/A\n\np = 0.2\n17.66\n17.63\n23.88\n23.97\n25.08\n75.13\n15.64\n17.69\n38.29\n17.71\n20.57\nN/A\nN/A\n\nTable 2: Classi\ufb01cation Error (%) and Run-time (sec) : AMT Datasets\n\nAlgorithms\n\nTREC\n\nBluebird\n\nRTE\n\nWeb\n\nDog\n\n(%) Error (sec) Time (%) Error (sec) Time (%) Error (sec) Time (%) Error (sec) Time (%) Error (sec) Time\n31.47\nMultiSPA\n29.23\nMultiSPA-KL\n29.84\nMultiSPA-D&S\n29.58\nSpectral-D&S\nN/A\nTensorADMM\n30.02\nMV-D&S\nMinmax-entropy 91.61\n43.95\nEigenRatio\n51.95\nKOS\n43.03\nGhoshSVD\nMajority Voting 34.85\n\n50.68\n536.89\n53.14\n919.98\nN/A\n3.20\n352.36\n1.48\n9.98\n11.62\nN/A\n\n0.07\n15.88\n0.12\n51.16\n603.93\n0.04\n7.22\nN/A\n0.13\nN/A\nN/A\n\n0.54\n12.34\n0.84\n179.92\nN/A\n0.28\n26.61\nN/A\n0.31\nN/A\nN/A\n\n17.09\n15.48\n16.11\n17.84\n17.96\n15.86\n16.23\nN/A\n31.84\nN/A\n17.91\n\n13.88\n11.11\n12.03\n12.03\n12.03\n12.03\n8.33\n27.77\n11.11\n27.77\n21.29\n\n0.07\n1.94\n0.09\n1.97\n2.74\n0.02\n3.43\n0.02\n0.01\n0.01\nN/A\n\n8.75\n7.12\n7.12\n7.12\nN/A\n7.25\n7.50\n9.01\n39.75\n49.12\n10.31\n\n0.28\n17.06\n0.32\n6.40\nN/A\n0.07\n9.10\n0.03\n0.03\n0.03\nN/A\n\n15.22\n14.58\n15.11\n16.88\nN/A\n16.02\n11.51\nN/A\n42.93\nN/A\n26.93\n\nbest\u2019 and \u2018single worst\u2019 rows correspond to the results of using the classi\ufb01ers individually when\np = 1, as references. The best and second-best performing algorithms are highlighted in the table.\nOne can see that the proposed methods are quite promising for this experiment. Both algorithms\nlargely outperform the tensor based methods TensorADMM and Spectral-D&S in this case, perhaps\nbecause the limited number of available samples makes the third-order statistics hard to estimate. It\nis also observed that the proposed algorithms enjoy favorable runtime;s ee supplementary materials\n(cf. Table 8 in Sec. B). Using the MultiSPA to initialize EM (i.e. MultiSPA-D&S) also works well,\nwhich offers another viable option that strikes a good balance between runtime and accuracy.\nAmazon Mechanical Turk Crowdsourcing Data. In this section, the performance of the proposed\nalgorithms are evaluated using the Amazon Mechanical Turk (AMT) data (https://www.mturk.\ncom) in which human annotators label various classi\ufb01cation tasks. Data description is given in the\nsupplementary materials Sec. B. Table 2 shows the classi\ufb01cation error and the runtime performance\nof the algorithms under test. One can see that MultiSPA has a very favorable execution time,\nbecause it is a Gram-Schmitt-like algorithm. MultiSPA-KL uses more time, because it is an iterative\noptimization method\u2014with better accuracy paid off. Since TensorADMM algorithm does not scale\nwell, the results are not reported for very large datasets (i.e., TREC and RTE). Similar as before,\nsince Web and Dog are multi-class datasets, EigenRatio and GhoshSVD are not applicable. From\nthe results, it can be seen that the proposed algorithms outperform many existing crowdsourcing\nalgorithms in both classi\ufb01cation accuracy and runtime. In particular, one can see that the algebraic\nalgorithm MultiSPA gives very similar results compared to the computationally much more involved\nalgorithms. This shows the potential for its application in big data crowdsourcing.\n6 Conclusion\nIn this work, we have revisited the classic Dawid-Skene model for multi-class crowdsourcing. We\nhave proposed a second-order statistics-based approach that guarantees identi\ufb01ability of the model\nparameters, i.e., the confusion matrices of the annotators and the label prior. The proposed method\nnaturally admits lower sample complexity relative to existing methods that utilize tensor algebra\nto ensure model identi\ufb01ability. The proposed approach also has an array of favorable features. In\nparticular, our framework enables a lightweight algebraic algorithm, which is reminiscent of the\nGram-Schmitt-like SPA algorithm for nonnegative matrix factorization. We have also proposed a\ncoupled and constrained matrix factorization criterion that enjoys enhanced-identi\ufb01ability, as well as\nan alternating optimization algorithm for handling the identi\ufb01cation problem. Real-data experiments\nshow that our proposed algorithms are quite promising for integrating crowdsourced labeling.\n\n8\n\n\fReferences\n[1] Anandkumar, A., Ge, R., Hsu, D., and Kakade, S. M. A tensor approach to learning mixed\nmembership community models. The Journal of Machine Learning Research, 15(1):2239\u20132312,\n2014.\n\n[2] Arora, S., Ge, R., Halpern, Y., Mimno, D., Moitra, A., Sontag, D., Wu, Y., and Zhu, M. A\npractical algorithm for topic modeling with provable guarantees. In Proceedings of ICML, 2013.\n\n[3] Bertsekas, D. P. Nonlinear programming. Athena Scienti\ufb01c, 1999.\n\n[4] Chan, T.-H., Ma, W.-K., Ambikapathi, A., and Chi, C.-Y. A simplex volume maximization\nframework for hyperspectral endmember extraction. IEEE Trans. Geosci. Remote Sens., 49(11):\n4177 \u20134193, Nov. 2011.\n\n[5] Dalvi, N., Dasgupta, A., Kumar, R., and Rastogi, V. Aggregating crowdsourced binary ratings.\nIn Proceedings of the 22nd International Conference on World Wide Web, pp. 285\u2013294, New\nYork, NY, USA, 2013. ACM.\n\n[6] Dawid, A. P. and Skene, A. M. Maximum likelihood estimation of observer error-rates using\n\nthe em algorithm. Applied statistics, pp. 20\u201328, 1979.\n\n[7] Deng, J., Dong, W., Socher, R., Li, L., and and. Imagenet: A large-scale hierarchical image\ndatabase. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248\u2013255,\nJune 2009.\n\n[8] Dietterich, T. G. Ensemble methods in machine learning. In International Workshop on Multiple\n\nClassi\ufb01er Systems, pp. 1\u201315. Springer, 2000.\n\n[9] Donoho, D. and Stodden, V. When does non-negative matrix factorization give a correct\ndecomposition into parts? In Advances in Neural Information Processing Systems, volume 16,\n2003.\n\n[10] Fu, X., Ma, W.-K., Chan, T.-H., and Bioucas-Dias, J. M. Self-dictionary sparse regression for\nhyperspectral unmixing: Greedy pursuit and pure pixel search are related. IEEE J. Sel. Topics\nSignal Process., 9(6):1128\u20131141, 2015.\n\n[11] Fu, X., Huang, K., Yang, B., Ma, W.-K., and Sidiropoulos, N. D. Robust volume minimization-\nbased matrix factorization for remote sensing and document clustering. IEEE Trans. Signal\nProcess., 64(23):6254\u20136268, 2016.\n\n[12] Fu, X., Huang, K., and Sidiropoulos, N. D. On identi\ufb01ability of nonnegative matrix factorization.\n\nIEEE Signal Process. Lett., 25(3):328\u2013332, 2018.\n\n[13] Fu, X., Huang, K., Sidiropoulos, N. D., and Ma, W.-K. Nonnegative matrix factorization for\nsignal and data analytics: Identi\ufb01ability, algorithms, and applications. IEEE Signal Process.\nMag., 36(2):59\u201380, March 2019.\n\n[14] Ghosh, A., Kale, S., and McAfee, P. Who moderates the moderators?: crowdsourcing abuse\ndetection in user-generated content. In Proceedings of the 12th ACM conference on Electronic\ncommerce, pp. 167\u2013176. ACM, 2011.\n\n[15] Gillis, N. The why and how of nonnegative matrix factorization. Regularization, Optimization,\n\nKernels, and Support Vector Machines, 12:257, 2014.\n\n[16] Gillis, N. and Vavasis, S. Fast and robust recursive algorithms for separable nonnegative matrix\n\nfactorization. IEEE Trans. Pattern Anal. Mach. Intell., 36(4):698\u2013714, April 2014.\n\n[17] Huang, K., Sidiropoulos, N., and Swami, A. Non-negative matrix factorization revisited:\nUniqueness and algorithm for symmetric decomposition. IEEE Trans. Signal Process., 62(1):\n211\u2013224, 2014.\n\n[18] Huang, K., Sidiropoulos, N. D., and Liavas, A. P. A \ufb02exible and ef\ufb01cient algorithmic framework\nfor constrained matrix and tensor factorization. IEEE Trans. Signal Process., 64(19):5052\u20135065,\n2016.\n\n9\n\n\f[19] Huang, K., Fu, X., and Sidiropoulos, N. D. Learning hidden markov models from pairwise\n\nco-occurrences with applications to topic modeling. In Proceedings of ICML 2018, 2018.\n\n[20] Jonker, R. and Volgenant, T. Improving the hungarian assignment algorithm. Operations\n\nResearch Letters, 5(4):171\u2013175, 1986.\n\n[21] Karger, D. R., Oh, S., and Shah, D. Budget-optimal crowdsourcing using low-rank matrix\napproximations. In 2011 49th Annual Allerton Conference on Communication, Control, and\nComputing (Allerton), pp. 284\u2013291, Sep. 2011.\n\n[22] Karger, D. R., Oh, S., and Shah, D. Ef\ufb01cient crowdsourcing for multi-class labeling. ACM\n\nSIGMETRICS Performance Evaluation Review, 41(1):81\u201392, 2013.\n\n[23] Karger, D. R., Oh, S., and Shah, D. Budget-optimal task allocation for reliable crowdsourcing\n\nsystems. Operations Research, 62(1):1\u201324, 2014.\n\n[24] Kittur, A., Chi, E. H., and Suh, B. Crowdsourcing user studies with mechanical turk. In\nProceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 453\u2013456.\nACM, 2008.\n\n[25] Kolda, T. G. and Bader, B. W. Tensor decompositions and applications. SIAM review, 51(3):\n\n455\u2013500, 2009.\n\n[26] Lease, M. and Kazai., G. Overview of the trec 2011 crowdsourcing track. 2011.\n\n[27] Lin, C.-H., Ma, W.-K., Li, W.-C., Chi, C.-Y., and Ambikapathi, A. Identi\ufb01ability of the simplex\nvolume minimization criterion for blind hyperspectral unmixing: The no-pure-pixel case. IEEE\nTrans. Geosci. Remote Sens., 53(10):5530\u20135546, Oct 2015.\n\n[28] Liu, Q., Peng, J., and Ihler, A. T. Variational inference for crowdsourcing. In Advances in\n\nNeural Information Processing Systems, pp. 692\u2013700, 2012.\n\n[29] Murphy, K. P. Machine Learning: A Probabilistic Perspective. The MIT Press, 2012.\n\n[30] Nascimento, J. and Bioucas-Dias, J. Vertex component analysis: A fast algorithm to unmix\n\nhyperspectral data. IEEE Trans. Geosci. Remote Sens., 43(4):898\u2013910, 2005.\n\n[31] Raykar, V. C., Yu, S., Zhao, L. H., Valadez, G. H., Florin, C., Bogoni, L., and Moy, L. Learning\n\nfrom crowds. Journal of Machine Learning Research, 11(Apr):1297\u20131322, 2010.\n\n[32] Razaviyayn, M., Hong, M., and Luo, Z.-Q. A uni\ufb01ed convergence analysis of block successive\nminimization methods for nonsmooth optimization. SIAM Journal on Optimization, 23(2):\n1126\u20131153, 2013.\n\n[33] Sidiropoulos, N. D., De Lathauwer, L., Fu, X., Huang, K., Papalexakis, E. E., and Faloutsos, C.\nTensor decomposition for signal processing and machine learning. IEEE Trans. Signal Process.,\n65(13):3551\u20133582, 2017.\n\n[34] Snow, R., O\u2019Connor, B., Jurafsky, D., and Ng, A. Y. Cheap and fast\u2014but is it good?: evaluating\nnon-expert annotations for natural language tasks. In Proceedings of the Conference on Empir-\nical Methods in Natural Language Processing, pp. 254\u2013263. Association for Computational\nLinguistics, 2008.\n\n[35] Stein, P. A Note on the Volume of a Simplex. The American Mathematical Monthly, 73(3),\n\n1966. doi: 10.2307/2315353.\n\n[36] Stephane Boucheron, Gabor Lugosi, O. B. Concentration Inequalities, 2004. URL: http:\n\n//www.econ.upf.edu/~lugosi/mlss_conc.pdf.\n\n[37] Traganitis, P. A., Pages-Zamora, A., and Giannakis, G. B. Blind multiclass ensemble classi\ufb01ca-\n\ntion. IEEE Trans. Signal Process., 66(18):4737\u20134752, 2018.\n\n[38] Welinder, P., Branson, S., Perona, P., and Belongie, S. J. The multidimensional wisdom of\n\ncrowds. In Advances in Neural Information Processing Systems, pp. 2424\u20132432, 2010.\n\n10\n\n\f[39] Zhang, Y., Chen, X., Zhou, D., and Jordan, M. I. Spectral methods meet em: A provably\noptimal algorithm for crowdsourcing. In Advances in Neural Information Processing Systems,\npp. 1260\u20131268, 2014.\n\n[40] Zhou, D., Liu, Q., Platt, J., and Meek, C. Aggregating ordinal labels from crowds by minimax\n\nconditional entropy. In Proceedings of ICML, volume 32, pp. 262\u2013270, 2014.\n\n11\n\n\f", "award": [], "sourceid": 4230, "authors": [{"given_name": "Shahana", "family_name": "Ibrahim", "institution": "Oregon State University"}, {"given_name": "Xiao", "family_name": "Fu", "institution": "Oregon State University"}, {"given_name": "Nikolaos", "family_name": "Kargas", "institution": "University of Minnesota"}, {"given_name": "Kejun", "family_name": "Huang", "institution": "University of Florida"}]}