{"title": "Is Approval Voting Optimal Given Approval Votes?", "book": "Advances in Neural Information Processing Systems", "page_first": 1801, "page_last": 1809, "abstract": "Some crowdsourcing platforms ask workers to express their opinions by approving a set of k good alternatives. It seems that the only reasonable way to aggregate these k-approval votes is the approval voting rule, which simply counts the number of times each alternative was approved. We challenge this assertion by proposing a probabilistic framework of noisy voting, and asking whether approval voting yields an alternative that is most likely to be the best alternative, given k-approval votes. While the answer is generally positive, our theoretical and empirical results call attention to situations where approval voting is suboptimal.", "full_text": "Is Approval Voting Optimal Given Approval Votes?\n\nAriel D. Procaccia\n\nComputer Science Department\nCarnegie Mellon University\narielpro@cs.cmu.edu\n\nNisarg Shah\n\nComputer Science Department\nCarnegie Mellon University\nnkshah@cs.cmu.edu\n\nAbstract\n\nSome crowdsourcing platforms ask workers to express their opinions by approv-\ning a set of k good alternatives. It seems that the only reasonable way to aggregate\nthese k-approval votes is the approval voting rule, which simply counts the num-\nber of times each alternative was approved. We challenge this assertion by propos-\ning a probabilistic framework of noisy voting, and asking whether approval voting\nyields an alternative that is most likely to be the best alternative, given k-approval\nvotes. While the answer is generally positive, our theoretical and empirical results\ncall attention to situations where approval voting is suboptimal.\n\n1\n\nIntroduction\n\nIt is surely no surprise to the reader that modern machine learning algorithms thrive on large\namounts of data \u2014 preferably labeled. Online labor markets, such as Amazon Mechanical Turk\n(www.mturk.com), have become a popular way to obtain labeled data, as they harness the power\nof a large number of human workers, and offer signi\ufb01cantly lower costs compared to expert opin-\nions. But this low-cost, large-scale data may require compromising quality: the workers are often\nunquali\ufb01ed or unwilling to make an effort, leading to a high level of noise in their submitted labels.\nTo overcome this issue, it is common to hire multiple workers for the same task, and aggregate their\nnoisy opinions to \ufb01nd more accurate labels. For example, TurKit [17] is a toolkit for creating and\nmanaging crowdsourcing tasks on Mechanical Turk. For our purposes its most important aspect\nis that it implements plurality voting: among available alternatives (e.g., possible labels), workers\nreport the best alternative in their opinion, and the alternative that receives the most votes is selected.\nMore generally, workers may be asked to report the k best alternatives in their opinion; such a vote\nis known as a k-approval vote. This has an advantage over plurality (1-approval) in noisy situations\nwhere a worker may not be able to pinpoint the best alternative accurately, but can recognize that\nit is among the top k alternatives [23].1 At the same time, k-approval votes, even for k > 1, are\nmuch easier to elicit than, say, rankings of the alternatives, not to mention full utility functions. For\nexample, EteRNA [16] \u2014 a citizen science game whose goal is to design RNA molecules that fold\ninto stable structures \u2014 uses 8-approval voting on submitted designs, that is, each player approves\nup to 8 favorite designs; the designs that received the largest number of approval votes are selected\nfor synthesis in the lab.\nSo, the elicitation of k-approval votes is common practice and has signi\ufb01cant advantages. And it\nmay seem that the only reasonable way to aggregate these votes, once collected, is via the approval\nvoting rule, that is, tally the number of approvals for each alternative, and select the most approved\none.2 But is it? In other words, do the k-approval votes contain useful information that can lead to\n\n1k-approval is also used for picking k winners, e.g., various cities in the US such as San Francisco, Chicago,\n\nand New York use it in their so-called \u201cparticipatory budgeting\u201d process [15].\n\n2There is a subtle distinction, which we will not belabor, between k-approval voting, which is the focus of\nthis paper, and approval voting [8], which allows voters to approve as many alternatives as they wish. The latter\n\n1\n\n\fsigni\ufb01cantly better outcomes, and is ignored by approval voting? Or is approval voting an (almost)\noptimal method for aggregating k-approval votes?\nOur Approach. We study the foregoing questions within the maximum likelihood estimation (MLE)\nframework of social choice theory, which posits the existence of an underlying ground truth that pro-\nvides an objective comparison of the alternatives. From this viewpoint, the votes are noisy estimates\nof the ground truth. The optimal rule then selects the alternative that is most likely to be the best\nalternative given the votes. This framework has recently received attention from the machine learn-\ning community [18, 3, 2, 4, 21], in part due to its applications to crowdsourcing domains [20, 21, 9],\nwhere, indeed, there is a ground truth, and individual votes are objective.\nIn more detail, in our model there exists a ground truth ranking over the alternatives, and each voter\nholds an opinion, which is another ranking that is a noisy estimate of the ground truth ranking. The\nopinions are drawn i.i.d. from the popular Mallows model [19], which is parametrized by the ground\ntruth ranking, a noise parameter \u03d5 \u2208 [0, 1], and a distance metric d over the space of rankings.\nWe use \ufb01ve well-studied distance metrics: the Kendall tau (KT) distance, the (Spearman) footrule\ndistance, the maximum displacement distance, the Cayley distance, and the Hamming distance.\nWhen required to submit a k-approval vote, a voter simply approves the top k alternatives in his\nopinion. Given the votes, an alternative a is the maximum likelihood estimate (MLE) for the best\nalternative if the votes are most likely generated by a ranking that puts a \ufb01rst.\nWe can now reformulate our question in slightly more technical terms:\n\nIs approval voting (almost) a maximum likelihood estimator for the best alterna-\ntive, given votes drawn from the Mallows model? How does the answer depend\non the noise parameter \u03c6 and the distance metric d?\n\nOur results. Our \ufb01rst result (Theorem 1) shows that under the Mallows model, the set of winners\naccording to approval voting coincides with the set of MLE best alternatives under the Kendall tau\ndistance, but under the other four distances there may exist approval winners that are not MLE best\nalternatives. Our next result (Theorem 2) con\ufb01rms the intuition that the suboptimality of approval\nvoting stems from the information that is being discarded: when only a single alternative is approved\nor disapproved in each vote, approval voting \u2014 which now utilizes all the information that can be\ngleaned from the anonymous votes \u2014 is optimal under mild conditions.\nGoing back to the general case of k-approval votes, we show (Theorem 3) that even under the four\ndistances for which approval voting is suboptimal, a weaker statement holds: in cases with very high\nor very low noise, every MLE best alternative is an approval winner (but some approval winners\nmay not be MLE best alternatives). And our experiments, using real data, show that the accuracy of\napproval voting is usually quite close to that of the MLE in pinpointing the best alternative.\nWe conclude that approval voting is a good way of aggregating k-approval votes in most situations.\nBut our work demonstrates that, perhaps surprisingly, approval voting may be suboptimal, and, in\nsituations where a high degree of accuracy is required, exact computation of the MLE best alternative\nis an option worth considering. We discuss our conclusions in more detail in Section 6.\n\n2 Model\nLet [t] (cid:44) {1, . . . , t}. Denote the set of alternatives by A, and let |A| = m. We use L(A) to denote\nthe set of rankings (total orders) of the alternatives in A. For a ranking \u03c3 \u2208 L(A), let \u03c3(i) denote\nthe alternative occupying position i in \u03c3, and let \u03c3\u22121(a) denote the rank (position) of alternative a\nin \u03c3. With a slight abuse of notation, let \u03c3([t]) (cid:44) {a \u2208 A|\u03c3\u22121(a) \u2208 [t]}. We use \u03c3a\u2194b to denote\nthe ranking obtained by swapping the positions of alternatives a and b in \u03c3. We assume that there\nexists an unknown true ranking of the alternatives (the ground truth), denoted \u03c3\u2217 \u2208 L(A). We also\nmake the standard assumption of a uniform prior over the true ranking.\n\nframework of approval voting has been studied extensively, both from the axiomatic point of view [7, 8, 13,\n22, 1], and the game-theoretic point of view [14, 12, 6]. However, even under this framework it is a standard\nassumption that votes are tallied by counting the number of times each alternative is approved, which is why\nwe simply refer to the aggregation rule under consideration as approval voting.\n\n2\n\n\fLet N = {1, . . . , n} denote the set of voters. Each voter i has an opinion, denoted \u03c0i \u2208 L(A),\nwhich is a noisy estimate of the true ranking \u03c3\u2217; the collection of opinions \u2014 the (opinion) pro\ufb01le\n\u2014 is denoted \u03c0. Fix k \u2208 [m]. A k-approval vote is a collection of k alternatives approved by a\nvoter. When asked to submit a k-approval vote, voter i simply submits the vote Vi = \u03c0i([k]), which\nis the set of alternatives at the top k positions in his opinion. The collection of all votes is called\nthe vote pro\ufb01le, and denoted V = {Vi}i\u2208[n]. For a ranking \u03c3 and a k-approval vote v, we say that\nv is generated from \u03c3, denoted \u03c3 \u2192k v (or \u03c3 \u2192 v when the value of k is clear from the context),\nif v = \u03c3([k]). More generally, for an opinion pro\ufb01le \u03c0 and a vote pro\ufb01le V , we say \u03c0 \u2192k V (or\n\u03c0 \u2192 V ) if \u03c0i \u2192k Vi for every i \u2208 [n].\nLet Ak = {Ak \u2286 A||Ak| = k} denote the set of all subsets of A of size k. A voting rule operating\non k-approval votes is a function (Ak)n \u2192 A that returns a winning alternative given the votes.3\nIn particular, let us de\ufb01ne the approval score of an alternative a, denoted SCAPP(a), as the number\nof voters that approve a. Then, approval voting simply chooses an alternative with the greatest\napproval score. Note that we do not break ties. Instead, we talk about the set of approval winners.\nFollowing the standard social choice literature, we model the opinion of each voter as being drawn\ni.i.d. from an underlying noise model. A noise model describes the probability of drawing an opinion\n\u03c3 given the true ranking \u03c3\u2217, denoted Pr[\u03c3|\u03c3\u2217]. We say that a noise model is neutral if the labels\nof the alternatives do not matter, i.e., renaming alternatives in the true ranking \u03c3 and in the opinion\n\u03c3\u2217, in the same fashion, keeps Pr[\u03c3|\u03c3\u2217] intact. A popular noise model is the Mallows model [19],\nunder which Pr[\u03c3|\u03c3\u2217] = \u03d5d(\u03c3,\u03c3\u2217)/Z m\n\u03d5 . Here, d is a distance metric over the space of rankings.\nParameter \u03d5 \u2208 [0, 1] governs the noise level; \u03d5 = 0 implies that the true ranking is generated with\n\u03d5 is the normalization constant, which\nprobability 1, and \u03d5 = 1 implies the uniform distribution. Z m\nis independent of the true ranking \u03c3\u2217 given that distance d is neutral, i.e., renaming alternatives in\nthe same fashion in two rankings does not change the distance between them. Below, we review \ufb01ve\npopular distances used in the social choice literature; they are all neutral.\n\nsolute difference between positions) of all alternatives in two rankings.\n\n\u2022 The Kendall tau (KT) distance, denoted dKT , measures the number of pairs of alternatives\nover which two rankings disagree. Equivalently, it is the number of swaps required by\nbubble sort to convert one ranking into another.\n\u2022 The (Spearman) footrule (FR) distance, denoted dFR, measures the total displacement (ab-\n\u2022 The Maximum Displacement (MD) distance, denoted dMD, measures the maximum of the\n\u2022 The Cayley (CY) distance, denoted dCY , measures the minimum number of swaps (not\n\u2022 The Hamming (HM) distance, denoted dHM , measures the number of positions in which\n\nnecessarily of adjacent alternatives) required to convert one ranking into another.\n\ndisplacements of all alternatives between two rankings.\n\ntwo rankings place different alternatives.\n\ni=1 Pr[\u03c0i|\u03c3\u2217] \u221d \u03d5d(\u03c0,\u03c3\u2217), where d(\u03c0, \u03c3\u2217) =(cid:80)n\n\nPr[\u03c0|\u03c3\u2217] =(cid:81)n\nSince opinions are drawn independently, the probability of a pro\ufb01le \u03c0 given the true ranking \u03c3\u2217 is\n(cid:80)\ni=1 d(\u03c0i, \u03c3\u2217). Once we \ufb01x the noise\nmodel, for a \ufb01xed k we can derive the probability of observing a given k-approval vote v: Pr[v|\u03c3\u2217] =\ni=1 Pr[Vi|\u03c3\u2217]. Alternatively, this can also be expressed as Pr[V |\u03c3\u2217] =(cid:80)\n(cid:81)n\n\u03c3\u2208L(A):\u03c3\u2192v Pr[\u03c3|\u03c3\u2217]. Then, the probability of drawing a given vote pro\ufb01le V is Pr[V |\u03c3\u2217] =\n\u03c0\u2208L(A)n:\u03c0\u2192V Pr[\u03c0|\u03c3\u2217].\nHereinafter, we omit the domains L(A)n for \u03c0 and L(A) for \u03c3\u2217 when they are clear from the context.\nranking \u03c3\u2217 is proportional to (via Bayes\u2019 rule) Pr[V |\u03c3\u2217(1) = a] =(cid:80)\nFinally, given the vote pro\ufb01le V the likelihood of an alternative a being the best alternative in the true\n\u03c3\u2217:\u03c3\u2217(1)=a Pr[V |\u03c3\u2217]. Using\nthe two expressions derived earlier for Pr[V |\u03c3\u2217], and ignoring the normalization constant Z m\n(cid:34) (cid:88)\n\u03d5 from\nthe probabilities, we de\ufb01ne the likelihood function of a given votes V as\n\nL(V, a) (cid:44) (cid:88)\n\n\u03d5d(\u03c0,\u03c3\u2217) =\n\n\u03d5d(\u03c0i,\u03c3\u2217)\n\n(cid:88)\n\n(cid:88)\n\n(cid:35)\n\n.\n\nn(cid:89)\n\n(1)\n\n\u03c3\u2217:\u03c3\u2217(1)=a\n\n\u03c0:\u03c0\u2192V\n\n\u03c3\u2217:\u03c3\u2217(1)=a\n\ni=1\n\n\u03c0i:\u03c0i\u2192Vi\n\nThe maximum likelihood estimate (MLE) for the best alternative is given by arg maxa\u2208A L(V, a).\nAgain, we do not break ties; we study the set of MLE best alternatives.\n\n3Technically, this is a social choice function; a social welfare function returns a ranking of the alternatives.\n\n3\n\n\f3 Optimal Voting Rules\n\nAt \ufb01rst glance, it seems natural to use approval voting (that is, returning the alternative that is\napproved by the largest number of voters) given k-approval votes. However, consider the following\nexample with 4 alternatives (A = {a, b, c, d}) and 5 voters providing 2-approval votes:\n\nV1 = {b, c}, V2 = {b, c}, V3 = {a, d}, V4 = {a, b}, V5 = {a, c}.\n\n(2)\n\nNotice that alternatives a, b, and c receive 3 approvals each, while alternative d receives only a\nsingle approval. Approval voting may return any alternative other than alternative d. But is that\nalways optimal? In particular, while alternatives b and c are symmetric, alternative a is qualitatively\ndifferent due to different alternatives being approved along with a. This indicates that under certain\nconditions, it is possible that not all three alternatives are MLE for the best alternative. Our \ufb01rst\nresult shows that this is indeed the case under three of the distance functions listed above, and a\nsimilar example works for a fourth. However, surprisingly, under the Kendall tau distance the MLE\nbest alternatives are exactly the approval winners, and hence are polynomial-time computable, which\nstands in sharp contrast to the NP-hardness of computing them given rankings [5].\nTheorem 1. The following statements hold for aggregating k-approval votes using approval voting.\n1. Under the Mallows model with a \ufb01xed distance d \u2208 {dMD , dCY , dHM , dFR}, there exist a\nvote pro\ufb01le V with at most six 2-approval votes over at most \ufb01ve alternatives, and a choice\nfor the Mallows parameter \u03d5, such that not all approval winners are MLE best alternatives.\n\n2. Under the Mallows model with the distance d = dKT , the set of MLE best alternatives\ncoincides with the set of approval winners, for all vote pro\ufb01les V and all values of the\nMallows parameter \u03d5 \u2208 (0, 1).\n\nProof. For the Mallows model with d \u2208 {dMD , dCY , dHM} and any \u03d5 \u2208 (0, 1), the pro\ufb01le from\nEquation (2) is a counterexample: alternatives b and c are MLE best alternatives, but a is not.\nFor the Mallows model with d = dFR, we could not \ufb01nd a counter example with 4 alternatives;\ncomputer-based simulations generated the following counterexample with 5 alternatives that works\nfor any \u03d5 \u2208 (0, 1): V1 = V2 = {a, b}, V3 = V4 = {c, d}, V5 = {a, e}, and V6 = {b, c}.\nHere, alternatives a, b, and c have the highest approval score of 3. However, alternative b has a\nstrictly lower likelihood of being the best alternative than alternative a, and hence is not an MLE\nbest alternative. The calculation verifying these counterexamples is presented in the online appendix\n(speci\ufb01cally, Appendix A).\nIn contrast, for the Kendall tau distance, we show that all approval winners are MLE best alternatives,\nand vice-versa. We begin by simplifying the likelihood function L(V, a) from Equation (1) for the\nspecial case of the Mallows model with the Kendall tau distance. In this case, it is well known that\ni=0 \u03d5i. Consider a ranking\nthe normalization constant satis\ufb01es Z m\n\u03c0i such that \u03c0i \u2192 Vi. We can decompose dKT (\u03c0i, \u03c3\u2217) into three types of pairwise mismatches:\ni) d1(\u03c0i, \u03c3\u2217): The mismatches over pairs (b, c) where b \u2208 Vi and c \u2208 A \\ Vi, or vice-versa; ii)\nd2(\u03c0i, \u03c3\u2217): The mismatches over pairs (b, c) where b, c \u2208 Vi; and iii) d3(\u03c0i, \u03c3\u2217): The mismatches\nover pairs (b, c) where b, c \u2208 A \\ Vi.\nNote that every ranking \u03c0i that satis\ufb01es \u03c0i \u2192 Vi has identical mismatches of type 1. Let us denote\nthe number of such mismatches by dKT (Vi, \u03c3\u2217). Also, notice that d2(\u03c0i, \u03c3\u2217) = dKT (\u03c0i|Vi, \u03c3\u2217|Vi),\nwhere \u03c3|S denotes the ranking of alternatives in S \u2286 A dictated by \u03c3. Similarly, d3(\u03c0i, \u03c3\u2217) =\ndKT (\u03c0i|A\\Vi , \u03c3\u2217|A\\Vi). Now, in the expression for the likelihood function L(V, a),\n,\u03c3\u2217|A\\Vi\n\n\u03d5 = (cid:80)j\u22121\n\n\u03d5 = (cid:81)m\n\n\u03d5, where T j\n\n)+dKT (\u03c0i|A\\Vi\n\n\u03d5dKT (Vi,\u03c3\u2217)+dKT (\u03c0i|Vi\n\n(cid:88)\n\nj=1 T j\n\nL(V, a) =\n\n,\u03c3\u2217|Vi\n\n\u03c3\u2217:\u03c3\u2217(1)=a\n\n\u03c0i:\u03c0i\u2192V\n\n(cid:88)\n(cid:88)\n(cid:88)\n\ni=1\n\nn(cid:89)\nn(cid:89)\nn(cid:89)\n\ni=1\n\n\u03c3\u2217:\u03c3\u2217(1)=a\n\n=\n\n=\n\n\u03c3\u2217:\u03c3\u2217(1)=a\n\ni=1\n\n\uf8ee\uf8f0 (cid:88)\n\n\u03d5dKT (Vi,\u03c3\u2217)\n\n\u03d5dKT (\u03c01\n\ni ,\u03c3\u2217|Vi\n\n)\n\n\u03d5dKT (Vi,\u03c3\u2217) \u00b7 Z k\n\ni \u2208L(Vi)\n\u03c01\n\u03d5 \u00b7 Zm\u2212k\n\n\u03d5\n\n\u221d (cid:88)\n\n\u03c3\u2217:\u03c3\u2217(1)=a\n\n\uf8f9\uf8fb\n\ni ,\u03c3\u2217|A\\Vi\n\n)\n\n)\n\n\uf8f9\uf8fb \u00b7\n\uf8ee\uf8f0 (cid:88)\n\u03d5dKT (V,\u03c3\u2217) (cid:44) (cid:98)L(V, a).\n\ni \u2208L(A\\Vi)\n\u03c02\n\n\u03d5dKT (\u03c02\n\n4\n\n\fi \u2208 L(Vi) and \u03c02\ni=1 dKT (Vi, \u03c3\u2217).\n\nThe second equality follows because every ranking \u03c0i that satis\ufb01es \u03c0i \u2192 V can be generated by\ndKT (V, \u03c3\u2217) (cid:44) (cid:80)n\ni \u2208 L(A \\ Vi), and concatenating them. The third equality\npicking rankings \u03c01\nfollows from the de\ufb01nition of the normalization constant in the Mallows model. Finally, we denote\n(cid:98)L(V, a). Note that dKT (V, \u03c3\u2217) counts the number of times alternative a is approved while alternative\nIt follows that maximizing L(V, a) amounts to maximizing\nThen, dKT (V, \u03c3\u2217) =(cid:80)\nb is not for all a, b \u2208 A with b (cid:31)\u03c3\u2217 a. That is, let nV (a,\u2212b) (cid:44) |{i \u2208 [n]|a \u2208 Vi \u2227 b /\u2208 Vi}|.\na,b\u2208A:b(cid:31)\u03c3\u2217 a nV (a,\u2212b). Also, note that for alternatives c, d \u2208 A, we have\nSCAPP(c) \u2212 SCAPP(d) = nV (c,\u2212d) \u2212 nV (d,\u2212c).\nNext, we show that (cid:98)L(V, a) is a monotonically increasing function of SCAPP(a). Equivalently,\n(cid:98)L(V, a) \u2265 (cid:98)L(V, b) if and only if SCAPP(a) \u2265 SCAPP(b). Fix a, b \u2208 A. Consider the bijection\n\nbetween the sets of rankings placing a and b \ufb01rst, which simply swaps a and b (\u03c3 \u2194 \u03c3a\u2194b). Then,\n(3)\n\n(cid:98)L(V, a) \u2212(cid:98)L(V, b) =\n\n\u03d5dKT (V,\u03c3\u2217) \u2212 \u03d5dKT (V,\u03c3\u2217\n\n(cid:88)\n\na\u2194b).\n\n\u03c3\u2217:\u03c3\u2217(1)=a\nFix \u03c3\u2217 such that \u03c3\u2217(1) = a. Note that \u03c3\u2217\na\u2194b(1) = b. Let C denote the set of alternatives positioned\na\u2194b). Now, \u03c3\u2217 and \u03c3\u2217\nbetween a and b in \u03c3\u2217 (equivalently, in \u03c3\u2217\na\u2194b have identical disagreements with\nV on a pair of alternatives (x, y) unless i) one of x and y belongs to {a, b}, and ii) the other belongs\nto C \u222a {a, b}. Thus, the difference of disagreements of \u03c3\u2217 and \u03c3\u2217\n(cid:88)\n\na\u2194b with V on such pairs is\n\n[nV (c,\u2212a) + nV (b,\u2212c) \u2212 nV (c,\u2212b) \u2212 nV (a,\u2212c)]\n\n\u2217\na\u2194b)\nnV (b,\u2212a) \u2212 nV (a,\u2212b)\n\n) \u2212 dKT (V, \u03c3\n\ndKT (V, \u03c3\n\n(cid:105)\n\n+\n\n\u2217\n\n(cid:104)\n= (|C| + 1) \u00b7(cid:16)\n\n=\n\nc\u2208C\nSCAPP(b) \u2212 SCAPP(a)\n\n(cid:17)\n\n.\n\nThus, SCAPP(a) = SCAPP(b) implies dKT (V, \u03c3\u2217) = dKT (V, \u03c3\u2217\nand SCAPP(a) > SCAPP(b) implies dKT (V, \u03c3\u2217) < dKT (V, \u03c3\u2217\n\na\u2194b) (and thus,(cid:98)L(V, a) = (cid:98)L(V, b)),\na\u2194b) (and thus,(cid:98)L(V, a) >(cid:98)L(V, b)). (cid:4)\n\nSuboptimality of approval voting for distances other than the KT distance stems from the fact that\nin counting the number of approvals for a given alternative, one discards information regarding\nother alternatives approved along with the given alternative in various votes. However, no such\ninformation is discarded when only one alternative is approved (or not approved) in each vote. That\nis, given plurality (k = 1) or veto (k = m \u2212 1) votes, approval voting should be optimal, not only\nfor the Mallows model but for any reasonable noise model. The next result formalizes this intuition.\nTheorem 2. Under a neutral noise model, the set of MLE best alternatives coincides with the set of\napproval winners\n\n1. given plurality votes, if p1 > pi > 0,\u2200i \u2208 {2, . . . , m}, where pi is the probability of the\nalternative in position i in the true ranking appearing in the \ufb01rst position in a sample, or\n2. given veto votes, if 0 < q1 < qi,\u2200i \u2208 {2, . . . , m}, where qi is the probability of the\n\nalternative in position i in the true ranking appearing in the last position in a sample.\n\nThe likelihood function for a is given by L(V, a) = (cid:80)\ncontribution of the SCAPP(a) plurality votes for a to the product Pr[V |\u03c3\u2217] = (cid:81)n\n\nProof. We show the proof for plurality votes. The case of veto votes is symmetric: in every vote,\ninstead of a single approved alternative, we have a single alternative that is not approved. Note that\nthe probability pi is independent of the true ranking \u03c3\u2217 due to the neutrality of the noise model.\nConsider a plurality vote pro\ufb01le V and an alternative a. Let T = {\u03c3\u2217 \u2208 L(A)|\u03c3\u2217(1) = a}.\n\u03c3\u2217\u2208T Pr[V |\u03c3\u2217]. Under every \u03c3\u2217 \u2208 T , the\ni=1 Pr[Vi|\u03c3\u2217] is\n(p1)SCAPP(a). Note that the alternatives in A \\ {a} are distributed among positions in {2, . . . , m} in\nall possible ways by the rankings in T . Let ib denote the position of alternative b \u2208 A \\ {a}. Then,\n\nL(V, a) = pSCAPP (a)\n\n1\n\n\u00b7\n\n= (p1)n\u00b7k \u00b7\n\n(cid:88)\n(cid:88)\n\n(cid:89)\n(cid:89)\n\npSCAPP (b)\nib\n\n(cid:18) pib\n\n(cid:19)SCAPP (b)\n\n.\n\n{ib}b\u2208A\\{a}={2,...,m}\n\nb\u2208A\\{a}\n\n{ib}b\u2208A\\{a}={2,...,m}\n\nb\u2208A\\{a}\n\np1\n\n5\n\n\fThe second transition holds because SCAPP(a) = n \u00b7 k \u2212(cid:80)\nfor a, b \u2208 A, we have(cid:98)L(V, a)/(cid:98)L(V, b) =(cid:80)\nb\u2208A\\{a} SCAPP(b). Our assumption in\nthe theorem statement implies 0 < pib /p1 < 1 for ib \u2208 {2, . . . , m}. Now, it can be checked that\nSCAPP(b) if and only if(cid:98)L(V, a) \u2265(cid:98)L(V, b), as required. (cid:4)\ni\u2208{2,...,m}(pi/p1)SCAPP(b)\u2212SCAPP(a). Thus, SCAPP(a) \u2265\n\nNote that the conditions of Theorem 2 are very mild. In particular, the condition for plurality votes\nis satis\ufb01ed under the Mallows model with all \ufb01ve distances we consider, and the condition for veto\nvotes is satis\ufb01ed under the Mallows model with the Kendall tau, the footrule, and the maximum\ndisplacement distances. This is presented as Theorem 4 in the online appendix (Appendix B).\n\n4 High Noise and Low Noise\n\nWhile Theorem 1 shows that there are situations where at least some of the approval winners may not\nbe MLE best alternatives, it does not paint the complete picture. In particular, in both pro\ufb01les used as\ncounterexamples in the proof of Theorem 1, it holds that every MLE best alternative is an approval\nwinner. That is, the optimal rule choosing an MLE best alternative works as if a tie-breaking scheme\nis imposed on top of approval voting. Does this hold true for all pro\ufb01les? Part 2 of Theorem 1 gives\na positive answer for the Kendall tau distance. In this section, we answer the foregoing question\n(largely) in the positive under the other four distance functions, with respect to the two ends of the\nMallows spectrum: the case of low noise (\u03d5 \u2192 0), and the case of high noise (\u03d5 \u2192 1). The case\nof high noise is especially compelling (because that is when it becomes hard to pinpoint the ground\ntruth), but both extreme cases have received special attention in the literature [24, 21, 11]. In contrast\nto previous results, which have almost always yielded different answers in the two cases, we show\nthat every MLE best alternative is an approval winner in both cases, in almost every situation.\n\nWe begin with the likelihood function for alternative a: L(V, a) =(cid:80)\n\n(cid:80)\n\u03c0:\u03c0\u2192V \u03d5d(\u03c0,\u03c3\u2217).\nWhen \u03d5 \u2192 0, maximizing L(V, a) requires minimizing the minimum exponent. Ties, if any, are\nbroken using the number of terms achieving the minimum exponent, then the second smallest expo-\nnent, and so on. At the other extreme, let \u03d5 = 1\u2212 \u0001 with \u0001 \u2192 0. Using the \ufb01rst-order approximation\n(1\u2212 \u0001)d(\u03c0,\u03c3\u2217) \u2248 1\u2212 \u0001\u00b7 d(\u03c0, \u03c3\u2217), maximizing L(V, a) requires minimizing the sum of d(\u03c0, \u03c3\u2217) over\nall \u03c3\u2217, \u03c0 with \u03c3\u2217(1) = a and \u03c0 \u2192 V . Ties are broken using higher-order approximations. Let\n\n\u03c3\u2217:\u03c3\u2217(1)=a\n\n(cid:88)\n\n(cid:88)\n\n\u03c3\u2217:\u03c3\u2217(1)=a\n\n\u03c0:\u03c0\u2192V\n\nL0(V, a) = min\n\n\u03c3\u2217:\u03c3\u2217(1)=a\n\nmin\n\u03c0:\u03c0\u2192V\n\nd(\u03c0, \u03c3\u2217)\n\nL1(V, a) =\n\nd(\u03c0, \u03c3\u2217).\n\nWe are interested in minimizing L0(V, a) and L1(V, a); this leads to novel combinatorial problems\nthat require detailed analysis. We are now ready for the main result of this section.\nTheorem 3. The following statements hold for using approval voting to aggregate k-approval votes\ndrawn from the Mallows model.\n\n1. Under the Mallows model with d \u2208 {dFR, dCY , dHM} and \u03d5 \u2192 0, and under the Mallows\nmodel with d \u2208 {dFR, dCY , dHM , dMD} and \u03d5 \u2192 1, it holds that for every k \u2208 [m \u2212 1],\nand every pro\ufb01le with k-approval votes, every MLE best alternative is an approval winner.\n2. Under the Mallows model with d = dMD and \u03d5 \u2192 0, there exists a pro\ufb01le with seven 2-\napproval votes over 5 alternatives such that no MLE best alternative is an approval winner.\nBefore we proceed to the proof, we remark that in part 1 of the theorem, by \u03d5 \u2192 0 and \u03d5 \u2192 1,\n0 and \u03d5 \u2265 \u03d5\u2217\nwe mean that there exist 0 < \u03d5\u2217\n1,\nrespectively. In part 2 of the theorem, we mean that for every \u03d5\u2217 > 0, there exists a \u03d5 < \u03d5\u2217 for\nwhich the negative result holds. Due to space constraints, we only present the proof for the Mallows\nmodel with d = dFR and \u03d5 \u2192 0; the full proof appears in the online appendix (Appendix C).\nProof of Theorem 3 (only for d = dFR, \u03c6 \u2192 0). Let \u03d5 \u2192 0 in the Mallows model with the footrule\ndistance. To analyze L0(V,\u00b7), we \ufb01rst analyze min\u03c0:\u03c0\u2192V dFR(\u03c3\u2217, \u03c0) for a \ufb01xed \u03c3\u2217 \u2208 L(A). Then,\nwe minimize it over \u03c3\u2217, and show that the set of alternatives that appear \ufb01rst in the minimizers (i.e.,\nthe set of alternatives minimizing L0(V, a)) is exactly the set of approval winners. Since every MLE\nbest alternative in the \u03d5 \u2192 0 case must minimize L0(V,\u00b7), the result follows.\n\n1 < 1 such that the result holds for all \u03d5 \u2264 \u03d5\u2217\n\n0, \u03d5\u2217\n\n6\n\n\fFix \u03c3\u2217 \u2208 L(A). Imagine a boundary between positions k and k + 1 in all rankings, i.e., between the\napproved and the non-approved alternatives. Now, given a pro\ufb01le \u03c0 such that \u03c0 \u2192 V , we \ufb01rst apply\nthe following operation repeatedly. For i \u2208 [n], let an alternative a \u2208 A be in positions t and t(cid:48) in \u03c3\u2217\nand \u03c0i, respectively. If t and t(cid:48) are on the same side of the boundary (i.e., either both are at most k or\nboth are greater than k) and t (cid:54)= t(cid:48), then swap alternatives \u03c0i(t) and \u03c0i(t(cid:48)) = a in \u03c0i. Note that this\ndecreases the displacement of a in \u03c0i with respect to \u03c3\u2217 by |t \u2212 t(cid:48)|, and increases the displacement\nof \u03c0i(t) by at most |t \u2212 t(cid:48)|. Hence, the operation cannot increase dFR(\u03c0, \u03c3\u2217). Let \u03c0\u2217 denote the\npro\ufb01le that we converge to. Note that \u03c0\u2217 satis\ufb01es \u03c0\u2217 \u2192 V (because we only swap alternatives on\nthe same side of the boundary), dFR(\u03c0\u2217, \u03c3\u2217) \u2264 dFR(\u03c0, \u03c3\u2217), and the following condition:\nCondition X: for i \u2208 [n], every alternative that is on the same side of the boundary in \u03c3\u2217 and \u03c0\u2217\ni is\nin the same position in both rankings.\nBecause we started from an arbitrary pro\ufb01le \u03c0 (subject to \u03c0 \u2192 V ), it follows that it is suf\ufb01cient to\nminimize dFR(\u03c0\u2217, \u03c3\u2217) over all \u03c0\u2217 with \u03c0\u2217 \u2192 V satisfying condition X. However, we show that\nsubject to \u03c0\u2217 \u2192 V and condition X, dFR(\u03c0\u2217, \u03c3\u2217) is actually a constant.\nNote that for i \u2208 [n], every alternative that is in different positions in \u03c0\u2217\ni and \u03c3\u2217 must be on different\nsides of the boundary in the two rankings. It is easy to see that in every \u03c0\u2217\ni , there is an equal number\nof alternatives on both sides of the boundary that are not in the same position as they are in \u03c3\u2217. Now,\nwe can divide the total footrule distance dFR(\u03c0\u2217, \u03c3\u2217) into four parts:\n\n1. Let i \u2208 [n] and t \u2208 [k] such that \u03c3\u2217(t) (cid:54)= \u03c0\u2217\n\ni (t). Let a = \u03c3\u2217(t) and (\u03c0\u2217\n\ni )\u22121(a) = t(cid:48) > k.\n\nThen, the displacement t(cid:48) \u2212 t of a is broken into two parts: (i) t(cid:48) \u2212 k, and (ii) k \u2212 t.\n\n2. Let i \u2208 [n] and t \u2208 [m] \\ [k] such that \u03c3\u2217(t) (cid:54)= \u03c0\u2217\n\ni )\u22121(a) =\nt(cid:48) \u2264 k. Then, the displacement t\u2212 t(cid:48) of a is broken into two parts: (i) k \u2212 t(cid:48), and (ii) t\u2212 k.\n\ni (t). Let a = \u03c3\u2217(t) and (\u03c0\u2217\n\nBecause the number of alternatives of type 1 and 2 is equal for every \u03c0\u2217\ni , we can see that the total\ndisplacements of types 1(i) and 2(ii) are equal, and so are the total displacements of types 1(ii) and\n2(i). By observing that there are exactly n \u2212 SCAPP(\u03c3\u2217(t)) instances of type 1 for a given value of\nt \u2264 k, and SCAPP(\u03c3\u2217(t)) instances of type 2 for a given value of t > k, we conclude that\n\nSCAPP(\u03c3\u2217(t)) \u00b7 (t \u2212 k)\n\n.\n\ndFR(\u03c0\u2217, \u03c3\u2217) = 2 \u00b7\n\nMinimizing this over \u03c3\u2217 reduces to minimizing(cid:80)m\n\nt=1\n\n(n \u2212 SCAPP(\u03c3\u2217(t))) \u00b7 (k \u2212 t) +\n\nt=1 SCAPP(\u03c3\u2217(t))\u00b7 (t\u2212 k). By the rearrangement\ninequality, this is minimized when alternatives are ordered in a non-increasing order of their approval\nscores. Note that exactly the set of approval winners appear \ufb01rst in such rankings. (cid:4)\nTheorem 3 shows that under the Mallows model with d \u2208 {dFR, dCY , dHM}, every MLE best\nalternative is an approval winner for both \u03d5 \u2192 0 and \u03d5 \u2192 1. We believe that the same statement\nholds for all values of \u03d5, as we were unable to \ufb01nd a counterexample despite extensive simulations.\nConjecture 1. Under the Mallows model with distance d \u2208 {dFR, dCY , dHM}, every MLE best\nalternative is an approval winner for every \u03d5 \u2208 (0, 1).\n\n(cid:34) k(cid:88)\n\nm(cid:88)\n\nt=k+1\n\n(cid:35)\n\n5 Experiments\n\nWe perform experiments with two real-world datasets \u2014 Dots and Puzzle [20] \u2014 to compare the\nperformance of approval voting against that of the rule that is MLE for the empirically observed\ndistribution of k-approval votes (and not for the Mallows model). Mao et al. [20] collected these\ndatasets by asking workers on Amazon Mechanical Turk to rank either four images by the number\nof dots they contain (Dots), or four states of an 8-puzzle by their distance to the goal state (Puzzle).\nHence, these datasets contain ranked votes over 4 alternatives in a setting where a true ranking of the\nalternatives indeed exists. Each dataset has four different noise levels; higher noise was created by\nincreasing the task dif\ufb01culty [20]. For Dots, ranking images with a smaller difference in the number\nof dots leads to high noise, and for Puzzle, ranking states farther away from the goal state leads to\nhigh noise. Each noise level of each dataset contains 40 pro\ufb01les with approximately 20 votes each.\n\n7\n\n\fIn our experiments, we extract 2-approval votes from the ranked votes by taking the top 2 alternatives\nin each vote. Given these 2-approval votes, approval voting returns an alternative with the largest\nnumber of approvals. To apply the MLE rule, however, we need to learn the underlying distribution\nof 2-approval votes. To that end, we partition the set of pro\ufb01les in each noise level of each dataset\ninto training (90%) and test (10%) sets. We use a high fraction of the pro\ufb01les for training in order\nto examine the maximum advantage that the MLE rule may have over approval voting.\nGiven the training pro\ufb01les (which approval voting simply ignores), the MLE rule learns the proba-\nbilities of observing each of the 6 possible 2-subsets of the alternatives given a \ufb01xed true ranking.4\nOn the test data, the MLE rule \ufb01rst computes the likelihood of each ranking given the votes. Then, it\ncomputes the likelihood of each alternative being the best by adding the likelihoods of all rankings\nthat put the alternative \ufb01rst. It \ufb01nally returns an alternative with the highest likelihood.\nWe measure the accuracy of both methods by their frequency of being able to pinpoint the correct\nbest alternative. For each noise level in each dataset, the accuracy is averaged over 1000 simulations\nwith random partitioning of the pro\ufb01les into training and test sets.\n\ny\nc\na\nr\nu\nc\nc\nA\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n1\n\n2\n\n3\n\n4\n\nNoise Level\n\n(a) Dots\n\ny\nc\na\nr\nu\nc\nc\nA\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n1\n\n2\n\n3\n\n4\n\nNoise Level\n\n(b) Puzzle\n\nMLE\nApproval\n\nFig. 1: The MLE rule (trained on 90% of the pro\ufb01les) and approval voting for 2-approval votes.\n\nFigures 1(a) and 1(b) show that in general the MLE rule does achieve greater accuracy than approval\nvoting. However, the increase is at most 4.5%, which may not be signi\ufb01cant in some contexts.\n\n6 Discussion\n\nOur main conclusion from the theoretical and empirical results is that approval voting is typically\nclose to optimal for aggregating k-approval votes. However, the situation is much subtler than it\nappears at \ufb01rst glance. Moreover, our theoretical analysis is restricted by the assumption that the\nvotes are drawn from the Mallows model. A recent line of work in social choice theory [9, 10] has\nfocused on designing voting rules that perform well \u2014 simultaneously \u2014 under a wide variety of\nnoise models. It seems intuitive that approval voting would work well for aggregating k-approval\nvotes under any reasonable noise model; an analysis extending to a wide family of realistic noise\nmodels would provide a stronger theoretical justi\ufb01cation for using approval voting.\nOn the practical front, it should be emphasized that approval voting is not always optimal. When\nmaximum accuracy matters, one may wish to switch to the MLE rule. However, learning and apply-\ning the MLE rule is much more demanding. In our experiments we learn the entire distribution over\nk-approval votes given the true ranking. While for 2-approval or 3-approval votes over 4 alterna-\ntives we only need to learn 6 probability values, in general for k-approval votes over m alternatives\n\n(cid:1) probability values, and the training data may not be suf\ufb01cient for this\n\npurpose. This calls for the design of estimators for the best alternative that achieve greater statistical\nef\ufb01ciency by avoiding the need to learn the entire underlying distribution over votes.\n\none would need to learn(cid:0)m\n\nk\n\n4Technically, we learn a neutral noise model where the probability of a subset of alternatives being observed\n\nonly depends on the positions of the alternatives in the true ranking.\n\n8\n\n\fReferences\n[1] C. Al\u00b4os-Ferrer. A simple characterization of approval voting. Social Choice and Welfare,\n\n27(3):621\u2013625, 2006.\n\n[2] H. Azari Sou\ufb01ani, W. Z. Chen, D. C. Parkes, and L. Xia. Generalized method-of-moments for\n\nrank aggregation. In Proc. of 27th NIPS, pages 2706\u20132714, 2013.\n\n[3] H. Azari Sou\ufb01ani, D. C. Parkes, and L. Xia. Random utility theory for social choice. In Proc. of\n\n26th NIPS, pages 126\u2013134, 2012.\n\n[4] H. Azari Sou\ufb01ani, D. C. Parkes, and L. Xia. Computing parametric ranking models via rank-\n\nbreaking. In Proc. of 31st ICML, pages 360\u2013368, 2014.\n\n[5] J. Bartholdi, C. A. Tovey, and M. A. Trick. Voting schemes for which it can be dif\ufb01cult to tell\n\nwho won the election. Social Choice and Welfare, 6:157\u2013165, 1989.\n\n[6] D. Baumeister, G. Erd\u00b4elyi, E. Hemaspaandra, L. A. Hemaspaandra, and J. Rothe. Computa-\ntional aspects of approval voting. In Handbook on Approval Voting, pages 199\u2013251. Springer,\n2010.\n\n[7] S. J. Brams. Mathematics and democracy: Designing better voting and fair-division proce-\n\ndures. Princeton University Press, 2007.\n\n[8] S. J. Brams and P. C. Fishburn. Approval Voting. Springer, 2nd edition, 2007.\n[9] I. Caragiannis, A. D. Procaccia, and N. Shah. When do noisy votes reveal the truth? In Proc. of\n\n14th EC, pages 143\u2013160, 2013.\n\n[10] I. Caragiannis, A. D. Procaccia, and N. Shah. Modal ranking: A uniquely robust voting rule.\n\nIn Proc. of 28th AAAI, pages 616\u2013622, 2014.\n\n[11] E. Elkind and N. Shah. Electing the most probable without eliminating the irrational: Voting\n\nover intransitive domains. In Proc. of 30th UAI, pages 182\u2013191, 2014.\n\n[12] G. Erd\u00b4elyi, M. Nowak, and J. Rothe. Sincere-strategy preference-based approval voting fully\nresists constructive control and broadly resists destructive control. Math. Log. Q., 55(4):425\u2013\n443, 2009.\n\n[13] P. C. Fishburn. Axioms for approval voting: Direct proof. Journal of Economic Theory,\n\n19(1):180\u2013185, 1978.\n\n[14] P. C. Fishburn and S. J. Brams. Approval voting, condorcet\u2019s principle, and runoff elections.\n\nPublic Choice, 36(1):89\u2013114, 1981.\n\n[15] A. Goel, A. K. Krishnaswamy, S. Sakshuwong, and T. Aitamurto. Knapsack voting. In Proc.\n\nof Collective Intelligence, 2015.\n\n[16] J. Lee, W. Kladwang, M. Lee, D. Cantu, M. Azizyan, H. Kim, A. Limpaecher, S. Yoon,\nA. Treuille, and R. Das. RNA design rules from a massive open laboratory. Proceedings\nof the National Academy of Sciences, 111(6):2122\u20132127, 2014.\n\n[17] G. Little, L. B. Chilton, M. Goldman, and R. C. Miller. TurKit: Human computation algorithms\n\non Mechanical Turk. In Proc. of 23rd UIST, pages 57\u201366, 2010.\n\n[18] T. Lu and C. Boutilier. Learning Mallows models with pairwise preferences. In Proc. of 28th\n\nICML, pages 145\u2013152, 2011.\n\n[19] C. L. Mallows. Non-null ranking models. Biometrika, 44:114\u2013130, 1957.\n[20] A. Mao, A. D. Procaccia, and Y. Chen. Better human computation through principled voting.\n\nIn Proc. of 27th AAAI, pages 1142\u20131148, 2013.\n\n[21] A. D. Procaccia, S. J. Reddi, and N. Shah. A maximum likelihood approach for selecting sets\n\nof alternatives. In Proc. of 28th UAI, pages 695\u2013704, 2012.\n\n[22] M. R. Sertel. Characterizing approval voting. Journal of Economic Theory, 45(1):207\u2013211,\n\n1988.\n\n[23] N. Shah, D. Zhou, and Y. Peres. Approval voting and incentives in crowdsourcing. In Proc. of\n\n32nd ICML, pages 10\u201319, 2015.\n\n[24] H. P. Young. Condorcet\u2019s theory of voting.\n\n82(4):1231\u20131244, 1988.\n\nThe American Political Science Review,\n\n9\n\n\f", "award": [], "sourceid": 1065, "authors": [{"given_name": "Ariel", "family_name": "Procaccia", "institution": "Carnegie Mellon University"}, {"given_name": "Nisarg", "family_name": "Shah", "institution": "Carnegie Mellon University"}]}