{"title": "An Approximate, Efficient LP Solver for LP Rounding", "book": "Advances in Neural Information Processing Systems", "page_first": 2895, "page_last": 2903, "abstract": "Many problems in machine learning can be solved by rounding the solution of an appropriate linear program. We propose a scheme that is based on a quadratic program relaxation which allows us to use parallel stochastic-coordinate-descent to approximately solve large linear programs efficiently. Our software is an order of magnitude faster than Cplex (a commercial linear programming solver) and yields similar solution quality. Our results include a novel perturbation analysis of a quadratic-penalty formulation of linear programming and a convergence result, which we use to derive running time and quality guarantees.", "full_text": "An Approximate, Ef\ufb01cient Solver for LP Rounding\n\nSrikrishna Sridhar1, Victor Bittorf1, Ji Liu1, Ce Zhang1\n\nChristopher R\u00b4e2, Stephen J. Wright1\n\n1Computer Sciences Department, University of Wisconsin-Madison, Madison, WI 53706\n\n2Computer Science Department, Stanford University, Stanford, CA 94305\n{srikris,vbittorf,ji-liu,czhang,swright}@cs.wisc.edu\n\nchrismre@cs.stanford.edu\n\nAbstract\n\nMany problems in machine learning can be solved by rounding the solution of an\nappropriate linear program (LP). This paper shows that we can recover solutions\nof comparable quality by rounding an approximate LP solution instead of the ex-\nact one. These approximate LP solutions can be computed ef\ufb01ciently by applying\na parallel stochastic-coordinate-descent method to a quadratic-penalty formula-\ntion of the LP. We derive worst-case runtime and solution quality guarantees of\nthis scheme using novel perturbation and convergence analysis. Our experiments\ndemonstrate that on such combinatorial problems as vertex cover, independent set\nand multiway-cut, our approximate rounding scheme is up to an order of mag-\nnitude faster than Cplex (a commercial LP solver) while producing solutions of\nsimilar quality.\n\nIntroduction\n\n1\nA host of machine-learning problems can be solved effectively as approximations of such NP-hard\ncombinatorial problems as set cover, set packing, and multiway-cuts [8, 11, 16, 22]. A popular\nscheme for solving such problems is called LP rounding [22, chs. 12-26], which consists of the\nfollowing three-step process: (1) construct an integer (binary) linear program (IP) formulation of a\ngiven problem; (2) relax the IP to an LP by replacing the constraints x 2{ 0, 1} by x 2 [0, 1]; and\n(3) round an optimal solution of the LP to create a feasible solution for the original IP problem. LP\nrounding is known to work well on a range of hard problems, and comes with theoretical guarantees\nfor runtime and solution quality.\nThe Achilles\u2019 heel of LP-rounding is that it requires solutions of LPs of possibly extreme scale.\nDespite decades of work on LP solvers, including impressive advances during the 1990s, commercial\ncodes such as Cplex or Gurobi may not be capable of handling problems of the required scale. In this\nwork, we propose an approximate LP solver suitable for use in the LP-rounding approach, for very\nlarge problems. Our intuition is that in LP rounding, since we ultimately round the LP to obtain an\napproximate solution of the combinatorial problem, a crude solution of the LP may suf\ufb01ce. Hence,\nan approach that can \ufb01nd approximate solutions of large LPs quickly may be suitable, even if it is\ninef\ufb01cient for obtaining highly accurate solutions.\nThis paper focuses on the theoretical and algorithmic aspects of \ufb01nding approximate solutions to an\nLP, for use in LP-rounding schemes. Our three main technical contributions are as follows: First, we\nshow that one can approximately solve large LPs by forming convex quadratic programming (QP)\napproximations, then applying stochastic coordinate descent to these approximations. Second, we\nderive a novel convergence analysis of our method, based on Renegar\u2019s perturbation theory for linear\nprogramming [17]. Finally, we derive bounds on runtime as well as worst-case approximation ratio\nof our rounding schemes. Our experiments demonstrate that our approach, called Thetis, produces\nsolutions of comparable quality to state-of-the-art approaches on such tasks as noun-phrase chunking\nand entity resolution. We also demonstrate, on three different classes of combinatorial problems, that\nThetis can outperform Cplex (a state-of-the-art commercial LP and IP solver) by up to an order of\nmagnitude in runtime, while achieving comparable solution quality.\n\n1\n\n\fRelated Work. Recently, there has been some focus on the connection between LP relaxations\nand maximum a posteriori (MAP) estimation problems [16, 19]. Ravikumar et. al [16] proposed\nrounding schemes for iterative LP solvers to facilitate MAP inference in graphical models. In con-\ntrast, we propose to use stochastic descent methods to solve a QP relaxation; this allows us to take\nadvantage of recent results on asynchronous parallel methods of this type [12,14]. Recently, Makari\net. al [13] propose an intriguing parallel scheme for packing and covering problems. In contrast, our\nresults apply to more general LP relaxations, including set-partitioning problems like multiway-cut.\nAdditionally, the runtime of our algorithm is less sensitive to approximation error. For an error \",\nthe bound on runtime of the algorithm in [13] grows as \"5, while the bound on our algorithm\u2019s\nruntime grows as \"2.\n2 Background: Approximating NP-hard problems with LP Rounding\nIn this section, we review the theory of LP-rounding based approximation schemes for NP-hard\ncombinatorial problems. We use the vertex cover problem as an example, as it is the simplest\nnontrivial setting that exposes the main ideas of this approach.\n\nPreliminaries. For a minimization problem , an algorithm ALG is an \u21b5-factor approximation\nfor , for some \u21b5> 1, if any solution produced by ALG has an objective value at most \u21b5 times\nthe value of an optimal (lowest cost) solution. For some problems, such as vertex cover, there is a\nconstant-factor approximation scheme (\u21b5 = 2). For others, such as set cover, the value of \u21b5 can be\nas large as O(log N ), where N is the number of sets.\nAn LP-rounding based approximation scheme for the problem  \ufb01rst constructs an IP formulation\nof  which we denote as \u201cP \u201d. This step is typically easy to perform, but the IP formulation P is, in\ntheory, as hard to solve as the original problem . In this work, we consider applications in which\nthe only integer variables in the IP formulation are binary variables x 2{ 0, 1}. The second step in\nLP rounding is a relax / solve step: We relax the constraints in P to obtain a linear program LP (P ),\nreplacing the binary variables with continuous variables in [0, 1], then solve LP (P ). The third step\nis to round the solution of LP (P ) to an integer solution which is feasible for P , thus yielding a\ncandidate solution to the original problem . The focus of this paper is on the relax / solve step,\nwhich is usually the computational bottleneck in an LP-rounding based approximation scheme.\n\nExample: An Oblivious-Rounding Scheme For Vertex Cover. Let G(V, E) denote a graph with\nvertex set V and undirected edges E \u2713 (V \u21e5 V ). Let cv denote a nonnegative cost associated with\neach vertex v 2 V . A vertex cover of a graph is a subset of V such that each edge e 2 E is incident\nto at least one vertex in this set. The minimum-cost vertex cover is the one that minimizes the sum of\nterms cv, summed over the vertices v belonging to the cover. Let us review the \u201cconstruct,\u201d \u201crelax /\nsolve,\u201d and \u201cround\u201d phases of an LP-rounding based approximation scheme applied to vertex cover.\nIn the \u201cconstruct\u201d phase, we introduce binary variables xv 2{ 0, 1}, 8v 2 V , where xv is set to 1 if\nthe vertex v 2 V is selected in the vertex cover and 0 otherwise. The IP formulation is as follows:\n(1)\n\ncvxv s.t. xu + xv  1 for (u, v) 2 E and xv 2{ 0, 1} for v 2 V.\n\nmin\n\nx Xv2V\nx Xv2V\n\nmin\n\nRelaxation yields the following LP\n\ncvxv s.t. xu + xv  1 for (u, v) 2 E and xv 2 [0, 1] for v 2 V.\n\n(2)\n\nA feasible solution of the LP relaxation (2) is called a \u201cfractional solution\u201d of the original problem.\nIn the \u201cround\u201d phase, we generate a valid vertex cover by simply choosing the vertices v 2 V whose\n2. It is easy to see that the vertex cover generated by such a rounding\nfractional solution xv  1\nscheme costs no more than twice the cost of the fractional solution. If the fractional solution chosen\nfor rounding is an optimal solution of (2), then we arrive at a 2-factor approximation scheme for\nvertex cover. We note here an important property: The rounding algorithm can generate feasible\nintegral solutions while being oblivious of whether the fractional solution is an optimal solution of\n(2). We formally de\ufb01ne the notion of an oblivious rounding scheme as follows.\nDe\ufb01nition 1. For a minimization problem  with an IP formulation P whose LP relaxation is\ndenoted by LP(P ), a -factor \u2018oblivious\u2019 rounding scheme converts any feasible point xf 2 LP(P )\nto an integral solution xI 2 P with cost at most  times the cost of LP(P ) at xf .\n\n2\n\n\fProblem Family Approximation Factor\n\nMachine Learning Applications\n\nSet Covering\nSet Packing\nMultiway-cut\n\nlog(N ) [20]\nes + o(s) [1]\n3/2  1/k [5]\n\nGraphical Models Heuristic\n\nClassi\ufb01cation [3], Multi-object tracking [24].\nMAP-inference [19], Natural language [9].\nComputer vision [4], Entity resolution [10].\nSemantic role labeling [18], Clustering [21].\n\nFigure 1: LP-rounding schemes considered in this paper. The parameter N refers to the number of\nsets; s refers to s-column sparse matrices; and k refers to the number of terminals. e is the Euler\u2019s\nconstant.\n\nGiven a -factor oblivious algorithm ALG to the problem , one can construct a -factor approxi-\nmation algorithm for  by using ALG to round an optimal fractional solution of LP(P ). When we\nhave an approximate solution for LP(P ) that is feasible for this problem, rounding can produce an\n\u21b5-factor approximation algorithm for  for a factor \u21b5 slightly larger than , where the difference\nbetween \u21b5 and  takes account of the inexactness in the approximate solution of LP(P ). Many\nLP-rounding schemes (including the scheme for vertex cover discussed in Section 2) are oblivious.\nWe implemented the oblivious LP-rounding algorithms in Figure 1 and report experimental results\nin Section 4.\n3 Main results\nIn this section, we describe how we can solve LP relaxations approximately, in less time than tradi-\ntional LP solvers, while still preserving the formal guarantees of rounding schemes. We \ufb01rst de\ufb01ne a\nnotion of approximate LP solution and discuss its consequences for oblivious rounding schemes. We\nshow that one can use a regularized quadratic penalty formulation to compute these approximate LP\nsolutions. We then describe a stochastic-coordinate-descent (SCD) algorithm for obtaining approx-\nimate solutions of this QP, and mention enhancements of this approach, speci\ufb01cally, asynchronous\nparallel implementation and the use of an augmented Lagrangian framework. Our analysis yields a\nworst-case complexity bound for solution quality and runtime of the entire LP-rounding scheme.\n3.1 Approximating LP Solutions\nConsider the LP in the following standard form\n\nmin cT x s.t. Ax = b, x  0,\nwhere c 2 Rn, b 2 Rm, and A 2 Rm\u21e5n and its corresponding dual\n\n(3)\n\nmax bT u s.t. c  AT u  0.\n\n(4)\nLet x\u21e4 denote an optimal primal solution of (3). An approximate LP solution \u02c6x that we use for LP-\nrounding may be infeasible and have objective value different from the optimum cT x\u21e4. We quantify\nthe inexactness in an approximate LP solution as follows.\nDe\ufb01nition 2. A point \u02c6x is an (\u270f, )-approximate solution of the LP (3) if \u02c6x  0 and there exists\nconstants \u270f> 0 and > 0 such that\n\nkA\u02c6x  bk1 \uf8ff \u270f\n\nand\n\n|cT \u02c6x  cT x\u21e4|\uf8ff |cT x\u21e4|.\n\nUsing De\ufb01nitions 1 and 2, it is easy to see that a -factor oblivious rounding scheme can round\na (0, ) approximate solution to produce a feasible integral solution whose cost is no more than\n(1 + ) times the optimal solution of the P . The factor (1 + ) arises because the rounding\nalgorithm does not have access to an optimal fractional solution. To cope with the infeasibility, we\nconvert an (\u270f, )-approximate solution to a (0, \u02c6) approximate solution where \u02c6 is not too large. For\nvertex cover (2), we prove the following result in Appendix C. (Here, \u21e7[0,1]n(\u00b7) denotes projection\nonto the unit hypercube in Rn.)\nLemma 3. Let \u02c6x be an (\", ) approximate solution to the linear program (2) with \" 2 [0, 1). Then,\n\u02dcx =\u21e7 [0,1]n((1  \")1 \u02c6x) is a (0, (1  \")1)-approximate solution.\nSince \u02dcx is a feasible solution for (2), the oblivious rounding scheme in Section 2 results in an 2(1 +\n(1 \")1) factor approximation algorithm. In general, constructing (0, \u02c6) from (\u270f, ) approximate\nsolutions requires reasoning about the structure of a particular LP. In Appendix C, we establish\nstatements analogous to Lemma 3 for packing, covering and multiway-cut problems.\n\n3\n\n\f3.2 Quadratic Programming Approximation to the LP\nWe consider the following regularized quadratic penalty approximation to the LP (3), parameterized\nby a positive constant , whose solution is denoted by x():\n\nx() := arg min\nx0\n\nf(x) := cT x  \u00afuT (Ax  b) +\n\n\n2kAx  bk2 +\n\n1\n2kx  \u00afxk2,\n\n(5)\n\nwhere \u00afu 2 Rm and \u00afx 2 Rn are arbitrary vectors. (In practice, \u00afu and \u00afx may be chosen as ap-\nproximations to the dual and primal solutions of (3), or simply set to zero.) The quality of the\napproximation (5) depends on the conditioning of underlying linear program (3), a concept that was\nstudied by Renegar [17]. Denoting the data for problem (3) by d := (A, b, c), we consider perturba-\ntions d := (A, b, c) such that the linear program de\ufb01ned by d + d is primal infeasible. The\nprimal condition number P is the in\ufb01mum of the ratios kdk/kdk over all such vectors d. The\ndual condition number D is de\ufb01ned analogously. (Clearly both P and D are in the range [0, 1];\nsmaller values indicate poorer conditioning.) We have the following result, which is proven in the\nsupplementary material.\nTheorem 4. Suppose that P and D are both positive, and let (x\u21e4, u\u21e4) be any primal-dual solution\npair for (3), (4). If we de\ufb01ne C\u21e4 := max(kx\u21e4  \u00afxk,ku\u21e4  \u00afuk), then the unique solution x() of (5)\nsatis\ufb01es\n\nIf in addition the parameter  \n\nkAx()  bk \uf8ff (1/)(1 + p2)C\u21e4,\n\uf8ff 25C\u21e4\n\nkdk min(P ,D), then we have\n+ 6C2\n\n|cT x\u21e4  cT x()|\uf8ff\n\nkx()  x\u21e4k \uf8ff p6C\u21e4.\n\u21e4 + p6k\u00afxkC\u21e4 .\n\n2P D\n\n10C\u21e4\n\n1\n\nIn practice, we solve (5) approximately, using an algorithm whose complexity depends on the thresh-\nold \u00af\u270f for which the objective is accurate to within \u00af\u270f. That is, we seek \u02c6x such that\n\n1k\u02c6x  x()k2 \uf8ff f(\u02c6x)  f(x()) \uf8ff \u00af\u270f,\n\nwhere the left-hand inequality follows from the fact that f is strongly convex with modulus 1.\nIf we de\ufb01ne\n\n\u00af\u270f :=\n\nC2\n20\n3 , C20 :=\n\n25C\u21e4\n\n2kdkP D\n\n,\n\n(6)\n\nthen by combining some elementary inequalities with the results of Theorem 4, we obtain the bounds\n\n1\n\n\uf8ff 25C\u21e4\n\n\u21e4 + p6k\u00afxkC\u21e4 ,\n\nP D\n\n+ 6C2\n\n|cT \u02c6x  cT x\u21e4|\uf8ff\nThe following result is almost an immediate consequence.\nTheorem 5. Suppose that P and D are both positive and let (x\u21e4, u\u21e4) be any primal-dual optimal\npair. Suppose that C\u21e4 is de\ufb01ned as in Theorem 4. Then for any given positive pair (\u270f, ), we have\nthat \u02c6x satis\ufb01es the inequalities in De\ufb01nition 2 provided that  satis\ufb01es the following three lower\nbounds:\n\nkA\u02c6x  bk \uf8ff\n\n1\n\n\uf8ff(1 + p2)C\u21e4 +\n\n25C\u21e4\n\n2P D .\n\n \n\n \n\n \n\n10C\u21e4\n\n,\n\n1\n\nkdk min(P , D)\n\n|cT x\u21e4|\uf8ff 25C\u21e4\n\u270f\uf8ff(1 + p2)C\u21e4 +\n\nP D\n\n1\n\n+ 6C2\n\n\u21e4 + p6k\u00afxkC\u21e4 ,\n2P D .\n\n25C\u21e4\n\nFor an instance of vertex cover with n nodes and m edges, we can show that 1\nn)1/2) and 1\nWe therefore obtain  = O(m1/2n1/2(m + n)(min{\u270f, |cT x\u21e4|})1).\n\nP = O(n1/2(m +\nD = O((m + n)1/2) (see Appendix D). The values \u00afx = 1 and \u00afu = ~0 yield C\u21e4 \uf8ff pm.\n\n4\n\n\fAlgorithm 1 SCD method for (5)\n1: Choose x0 2 Rn; j 0\n2: loop\n3:\n4:\n\nChoose i(j) 2{ 1, 2, . . . , n} randomly with equal probability;\nDe\ufb01ne xj+1 from xj by setting [xj+1]i(j) max(0, [xj]i(j)  (1/Lmax)[rf(xj)]i(j)),\nleaving other components unchanged;\nj j + 1;\n\n5:\n6: end loop\n\n3.3 Solving the QP Approximation: Coordinate Descent\nWe propose the use of a stochastic coordinate descent (SCD) algorithm [12] to solve (5). Each step\nof SCD chooses a component i 2{ 1, 2, . . . , n} and takes a step in the ith component of x along the\npartial gradient of (5) with respect to this component, projecting if necessary to retain nonnegativity.\nThis simple procedure depends on the following constant Lmax, which bounds the diagonals of the\nHessian in the objective of (5):\n\nLmax = ( max\n\ni=1,2,...,n\n\nAT\n\n:i A:i) + 1,\n\n(7)\n\nwhere A:i denotes the ith column of A. Algorithm 1 describes the SCD method. Convergence\nresults for Algorithm 1 can be obtained from [12]. In this result, E(\u00b7) denotes expectation over\nall the random variables i(j) indicating the update indices chosen at each iteration. We need the\nfollowing quantities:\n\nl :=\n\n1\n\n\n, R := sup\n\nj=1,2,...nkxj  x()k2,\n\n(8)\n\nwhere xj denotes the jth iterate of the SCD algorithm. (Note that R bounds the maximum distance\nthat the iterates travel from the solution x() of (5).)\nTheorem 6. For Algorithm 1 we have\n\n(f(x0)  f\u21e4 )\u25c6 ,\nEkxj  x()k2 +\nwhere f\u21e4 := f(x()). We obtain high-probability convergence of f(xj) to f\u21e4 in the following\nsense: For any \u2318 2 (0, 1) and any small \u00af\u270f, we have P (f(xj)  f\u21e4 < \u00af\u270f)  1  \u2318, provided that\n\nE(f(xj)  f\u21e4 ) \uf8ff\u27131 \n\nLmax\n\nLmax\n\n2\n\nl\n\n2\n\nn(l + Lmax)\u25c6j\u2713R2 +\n(f(x0)  f\u21e4 )\u25c6 .\n\nLmax\n\n2\n\nLmax\n\n2\u2318\u00af\u270f \u2713R2 +\n\nlog\n\nn(l + Lmax)\n\nj \n\nl\n\nWorst-Case Complexity Bounds. We now combine the analysis in Sections 3.2 and 3.3 to derive\na worst-case complexity bound for our approximate LP solver. Supposing that the columns of A\nhave norm O(1), we have from (7) and (8) that l = 1 and Lmax = O(). Theorem 6 indicates\nthat we require O(n2) iterations to solve (5) (modulo a log term). For the values of  described in\nSection 3.2, this translates to a complexity estimate of O(m3n2/\u270f2).\nIn order to obtain the desired accuracy in terms of feasibility and function value of the LP (captured\nby \u270f) we need to solve the QP to within the different, tighter tolerance \u00af\u270f introduced in (6). Both\ntolerances are related to the choice of penalty parameter  in the QP. Ignoring here the dependence\non dimensions m and n, we note the relationships  \u21e0 \u270f1 (from Theorem 5) and \u00af\u270f \u21e0 3 \u21e0\n\u270f3 (from (6)). Expressing all quantities in terms of \u270f, and using Theorem 6, we see an iteration\ncomplexity of \u270f2 for SCD (ignoring log terms). The linear convergence rate of SCD is instrumental\nto this favorable value. By contrast, standard variants of stochastic-gradient descent (SGD) applied\nto the QP yield poorer complexity. For diminishing-step or constant-step variants of SGD, we see\ncomplexity of \u270f7, while for robust SGD, we see \u270f10. (Besides the inverse dependence on \u00af\u270f or its\nsquare in the analysis of these methods, there is a contribution of order \u270f2 from the conditioning of\nthe QP.)\n3.4 Enhancements\nWe mention two important enhancements that improve the ef\ufb01ciency of the approach outlined above.\nThe \ufb01rst is an asynchronous parallel implementation of Algorithm 1 and the second is the use of an\naugmented Lagrangian framework rather than \u201cone-shot\u201d approximation by the QP in (5).\n\n5\n\n\fTask\nCoNLL\nTAC-KBP\n\nFormulation\n\nSkip-chain CRF\n\nFactor graph\n\nNNZ\n\nP\nPV\n25M 51M .87\n62K 115K .79\n\nThetis\nF1\n.89\n.79\n\nR\n.90\n.79\n\nRank\n10/13\n6/17\n\nP\n.86\n.80\n\nGibbs Sampling\n\nRank\n10/13\n6/17\n\nR\n.90\n.80\n\nF1\n.88\n.80\n\nFigure 2: Solution quality of our LP-rounding approach on two tasks. PV is the number of primal\nvariables and NNZ is the number of non-zeros in the constraint matrix of the LP in standard form.\nThe rank indicates where we would been have placed, had we participated in the competition.\nAsynchronous Parallel SCD. An asynchronous parallel version of Algorithm 1, described in\n[12], is suitable for execution on multicore, shared-memory architectures. Each core, executing\na single thread, has access to the complete vector x. Each thread essentially runs its own version\nof Algorithm 1 independently of the others, choosing and updating one component i(j) of x on\neach iteration. Between the time a thread reads x and performs its update, x usually will have been\nupdated by several other threads. Provided that the number of threads is not too large (according to\ncriteria that depends on n and on the diagonal dominance properties of the Hessian matrix), and the\nstep size is chosen appropriately, the convergence rate is similar to the serial case, and near-linear\nspeedup is observed.\n\nIs Our Approximate LP-Rounding Scheme Useful in Graph Analysis Tasks?\n\nAugmented Lagrangian Framework.\nIt is well known (see for example [2, 15]) that the\nquadratic-penalty approach can be extended to an augmented Lagrangian framework, in which a\nsequence of problems of the form (5) are solved, with the primal and dual solution estimates \u00afx and\n\u00afu (and possibly the penalty parameter ) updated between iterations. Such a \u201cproximal method of\nmultipliers\u201d for LP was described in [23]. We omit a discussion of the convergence properties of\nthe algorithm here, but note that the quality of solution depends on the values of \u00afx, \u00afu and  at the\nlast iteration before convergence is declared. By applying Theorem 5, we note that the constant C\u21e4\nis smaller when \u00afx and \u00afu are close to the primal and dual solution sets, thus improving the approx-\nimation and reducing the need to increase  to a larger value to obtain an approximate solution of\nacceptable accuracy.\n4 Experiments\nOur experiments address two main questions: (1) Is our approximate LP-rounding scheme useful in\ngraph analysis tasks that arise in machine learning? and (2) How does our approach compare to a\nstate-of-the-art commercial solver? We give favorable answers to both questions.\n4.1\nLP formulations have been used to solve MAP inference problems on graphical models [16], but\ngeneral-purpose LP solvers have rarely been used, for reasons of scalability. We demonstrate that\nthe rounded solutions obtained using Thetis are of comparable quality to those obtained with state-\nof-the-art systems. We perform experiments on two different tasks: entity linking and text chunking.\nFor each task, we produce a factor graph [9], which consists of a set of random variables and a set\nof factors to describe the correlation between random variables. We then run MAP inference on the\nfactor graph using the LP formulation in [9] and compare the quality of the solutions obtained by\nThetis with a Gibbs sampling-based approach [26]. We follow the LP-rounding algorithm in [16]\nto solve the MAP estimation problem. For entity linking, we use the TAC-KBP 2010 benchmark1.\nThe input graphical model has 12K boolean random variables and 17K factors. For text chunking,\nwe use the CoNLL 2000 shared task2. The factor graph contained 47K categorical random variables\n(with domain size 23) and 100K factors. We use the training sets provided by TAC-KBP 2010 and\nCoNLL 2000 respectively. We evaluate the quality of both approaches using the of\ufb01cial evaluation\nscripts and evaluation data sets provided by each challenge. Figure 2 contains a description of the\nthree relevant quality metrics, precision (P), recall (R) and F1-scores. Figure 2 demonstrates that our\nalgorithm produces solutions of quality comparable with state-of-the-art approaches for these graph\nanalysis tasks.\n4.2 How does our proposed approach compare to a state-of-the-art commercial solver?\nWe conducted numerical experiments on three different combinatorial problems that commonly arise\nin graph analysis tasks in machine learning: vertex cover, independent set, and multiway cuts. For\n\n1http://nlp.cs.qc.cuny.edu/kbp/2010/\n2http://www.cnts.ua.ac.be/conll2000/chunking/\n\n6\n\n\feach problem, we compared the performance of our LP solver against the LP and IP solvers of Cplex\n(v12.5) (denoted as Cplex-LP and Cplex-IP respectively). The two main goals of this experiment\nare to: (1) compare the quality of the integral solutions obtained using LP-rounding with the integral\nsolutions from Cplex-IP and (2) compare wall-clock times required by Thetis and Cplex-LP to solve\nthe LPs for the purpose of LP-rounding.\n\nDatasets. Our tasks are based on two families of graphs. The \ufb01rst family of instances (frb59-26-1\nto frb59-26-5) was obtained from Bhoslib3 (Benchmark with Hidden Optimum Solutions); they are\nconsidered dif\ufb01cult problems [25]. The instances in this family are similar; the \ufb01rst is reported in the\n\ufb01gures of this section, while the remainder appear in Appendix E. The second family of instances\nare social networking graphs obtained from the Stanford Network Analysis Platform (SNAP)4.\n\nSystem Setup. Thetis was implemented using a combination of C++ (for Algorithm 1) and Mat-\nlab (for the augmented Lagrangian framework). Our implementation of the augmented Lagrangian\nframework was based on [6]. All experiments were run on a 4 Intel Xeon E7-4450 (40 cores @\n2Ghz) with 256GB of RAM running Linux 3.8.4 with a 15-disk RAID0. Cplex used 32 (of the\n40) cores available in the machine, and for consistency, our implementation was also restricted to\n32 cores. Cplex implements presolve procedures that detect redundancy, and substitute and elim-\ninate variables to obtain equivalent, smaller LPs. Since the aim of this experiment is compare the\nalgorithms used to solve LPs, we ran both Cplex-LP and Thetis on the reduced LPs generated by\nthe presolve procedure of Cplex-LP. Both Cplex-LP and Thetis were run to a tolerance of \u270f = 0.1.\nAdditional experiments with Cplex-LP run using its default tolerance options are reported in Ap-\npendix E. We used the barrier optimizer while running Cplex-LP. All codes were provided with a\ntime limit of 3600 seconds excluding the time taken for preprocessing as well as the runtime of the\nrounding algorithms that generate integral solutions from fractional solutions.\n\nTasks. We solved the vertex cover problem using the approximation algorithm described in Sec-\ntion 2. We solved the maximum independent set problem using a variant of the es + o(s)-factor\napproximation in [1] where s is the maximum degree of a node in the graph (see Appendix C for\ndetails). For the multiway-cut problem (with k = 3) we used the 3/2  1/k-factor approximation\nalgorithm described in [22]. The details of the transformation from approximate infeasible solu-\ntions to feasible solutions are provided in Appendix C. Since the rounding schemes for maximum-\nindependent set and multiway-cut are randomized, we chose the best feasible integral solution from\n10 repetitions.\n\nMinimization problems\n\nInstance\n\nfrb59-26-1\nAmazon\nDBLP\nGoogle+\n\nVC\n\nPV NNZ\n0.37\n0.12\n1.17\n0.39\n1.13\n0.37\n0.71\n2.14\n\nS\n2.8\n8.4\n8.3\n9.0\n\nQ\n1.04\n1.23\n1.25\n1.21\n\nMC\n\nPV NNZ\n3.02\n0.75\n23.2\n5.89\n26.1\n6.61\n9.24\n36.8\n\nS\n53.3\n-\n-\n-\n\nQ\n1.01\n0.42\n0.33\n0.83\n\nMaximization problems\n\nMIS\n\nPV NNZ\n0.38\n0.12\n1.17\n0.39\n1.13\n0.37\n0.71\n2.14\n\nS\n5.3\n7.4\n8.5\n10.2\n\nQ\n0.36\n0.82\n0.88\n0.82\n\nFigure 3: Summary of wall-clock speedup (in comparison with Cplex-LP) and solution quality (in\ncomparison with Cplex-IP) of Thetis on three graph analysis problems. Each code is run with a time\nlimit of one hour and parallelized over 32 cores, with \u2018-\u2019 indicating that the code reached the time\nlimit. PV is the number of primal variables while NNZ is the number of nonzeros in the constraint\nmatrix of the LP in standard form (both in millions). S is the speedup, de\ufb01ned as the time taken by\nCplex-LP divided by the time taken by Thetis. Q is the ratio of the solution objective obtained by\nThetis to that reported by Cplex-IP. For minimization problems (VC and MC) lower Q is better; for\nmaximization problems (MIS) higher Q is better. For MC, a value of Q < 1 indicates that Thetis\nfound a better solution than Cplex-IP found within the time limit.\n\nResults. The results are summarized in Figure 3, with additional details in Figure 4. We discuss\nthe results for the vertex cover problem. On the Bhoslib instances, the integral solutions from\nThetis were within 4% of the documented optimal solutions. In comparison, Cplex-IP produced\n\n3http://www.nlsde.buaa.edu.cn/\u02dckexu/benchmarks/graph-benchmarks.htm\n4http://snap.stanford.edu/\n\n7\n\n\ft (secs)\n\n85.5\n22.1\n\nVC\n(min)\n\nfrb59-26-1\nAmazon\nDBLP\nGoogle+\n\nMC\n(min)\n\nCplex IP\nBFS\n1475\n1.60\u21e5105\n1.65\u21e5105\n1.06\u21e5105\nCplex IP\nBFS\n346\n12\n15\n6\n\nCplex IP\nBFS\n50\n\n-\n\n-\n\n-\n-\n-\n\n-\n\n-\n\nGap (%)\n\n0.67\n\n-\n-\n\n0.01\n\nt (secs)\n2.48\n24.8\n22.3\n40.1\n\nCplex LP\n\nLP\n767\n\nRSol\n1534\n1.50\u21e5105 2.04\u21e5105\n1.42\u21e5105 2.08\u21e5105\n1.00\u21e5105 1.31\u21e5105\nCplex LP\nRSol\n346\n-\n-\n-\n\nLP\n346\n-\n-\n-\n\n-\n-\n-\n\nt (secs)\n0.88\n2.97\n2.70\n4.47\n\nThetis\nLP\nRSol\n959.7\n1532\n1.50\u21e5105 1.97\u21e5105\n1.42\u21e5105 2.06\u21e5105\n1.00\u21e5105 1.27\u21e5105\nRSol\n349\n5\n5\n5\n\n-\n-\n-\n\nt (secs)\n\n35.4\n17.3\n\nCplex LP\n\nLP\n767\n\nGap (%)\n\nRSol\n18\n\nRSol\n15\n\nMIS\n(max)\n\nGap (%)\n\n18.0\n\n-\nNA\nNA\nNA\n\nt (secs)\n72.3\n\nt (secs)\n312.2\n\nt (secs)\n5.86\n55.8\n63.8\n109.9\n\nt (secs)\n0.88\n3.09\n2.72\n4.37\n\nt (secs)\n4.65\n23.0\n23.2\n44.5\n\n1.75\u21e5105\n1.52\u21e5105\n1.06\u21e5105\n\nfrb59-26-1\nAmazon\nDBLP\nGoogle+\n\nfrb59-26-1\nAmazon\nDBLP\nGoogle+\n\n1.85\u21e5105 1.56\u21e5105\n1.75\u21e5105 1.41\u21e5105\n1.11\u21e5105 9.39\u21e5104\n\nThetis\nLP\n352.3\n7.28\n11.7\n5.84\nThetis\nLP\n447.7\n1.73\u21e5105 1.43\u21e5105\n1.66\u21e5105 1.34\u21e5105\n1.00\u21e5105 8.67\u21e5104\nFigure 4: Wall-clock time and quality of fractional and integral solutions for three graph analysis\nproblems using Thetis, Cplex-IP and Cplex-LP. Each code was given a time limit of one hour, with\n\u2018-\u2019 indicating a timeout. BFS is the objective value of the best integer feasible solution found by\nCplex-IP. The gap is de\ufb01ned as (BFSBB)/BFS where BB is the best known solution bound found\nby Cplex-IP within the time limit. A gap of \u2018-\u2019 indicates that the problem was solved to within\n0.01% accuracy and NA indicates that Cplex-IP was unable to \ufb01nd a valid solution bound. LP is the\nobjective value of the LP solution, and RSol is objective value of the rounded solution.\nintegral solutions that were within 1% of the documented optimal solutions, but required an hour for\neach of the instances. Although the LP solutions obtained by Thetis were less accurate than those\nobtained by Cplex-LP, the rounded solutions from Thetis and Cplex-LP are almost exactly the same.\nIn summary, the LP-rounding approaches using Thetis and Cplex-LP obtain integral solutions of\ncomparable quality with Cplex-IP \u2014 but Thetis is about three times faster than Cplex-LP.\nWe observed a similar trend on the large social networking graphs. We were able to recover integral\nsolutions of comparable quality to Cplex-IP, but seven to eight times faster than using LP-rounding\nwith Cplex-LP. We make two additional observations. The difference between the optimal frac-\ntional and integral solutions for these instances is much smaller than frb59-26-1. We recorded\nunpredictable performance of Cplex-IP on large instances. Notably, Cplex-IP was able to \ufb01nd the\noptimal solution for the Amazon and DBLP instances, but timed out on Google+, which is of com-\nparable size. On some instances, Cplex-IP outperformed even Cplex-LP in wall clock time, due to\nspecialized presolve strategies.\n5 Conclusion\nWe described Thetis, an LP rounding scheme based on an approximate solver for LP relaxations\nof combinatorial problems. We derived worst-case runtime and solution quality bounds for our\nscheme, and demonstrated that our approach was faster than an alternative based on a state-of-the-\nart LP solver, while producing rounded solutions of comparable quality.\nAcknowledgements\nSS is generously supported by ONR award N000141310129. JL is generously supported in part by\nNSF awards DMS-0914524 and DMS-1216318 and ONR award N000141310129. CR\u2019s work on\nthis project is generously supported by NSF CAREER award under IIS-1353606, NSF award un-\nder CCF-1356918, the ONR under awards N000141210041 and N000141310129, a Sloan Research\nFellowship, and gifts from Oracle and Google. SJW is generously supported in part by NSF awards\nDMS-0914524 and DMS-1216318, ONR award N000141310129, DOE award DE-SC0002283, and\nSubcontract 3F-30222 from Argonne National Laboratory. Any recommendations, \ufb01ndings or opin-\nions expressed in this work are those of the authors and do not necessarily re\ufb02ect the views of any\nof the above sponsors.\n\n8\n\n\fReferences\n[1] Nikhil Bansal, Nitish Korula, Viswanath Nagarajan, and Aravind Srinivasan. Solving packing integer\n\nprograms via randomized rounding with alterations. Theory of Computing, 8(1):533\u2013565, 2012.\n\n[2] Dimitri P. Bertsekas. Nonlinear Programming. Athena Scienti\ufb01c, 1999.\n[3] Jacob Bien and Robert Tibshirani. Classi\ufb01cation by set cover: The prototype vector machine. arXiv\n\npreprint arXiv:0908.2284, 2009.\n\n[4] Yuri Boykov and Vladimir Kolmogorov. An experimental comparison of min-cut/max-\ufb02ow algorithms\nIEEE Transactions on Pattern Analysis and Machine Intelligence,\n\nfor energy minimization in vision.\n26:1124\u20131137, 2004.\n\n[5] Gruia C\u02d8alinescu, Howard Karloff, and Yuval Rabani. An improved approximation algorithm for multiway\ncut. In Proceedings of the thirtieth annual ACM symposium on Theory of Computing, pages 48\u201352. ACM,\n1998.\n\n[6] Jonathan Eckstein and Paulo JS Silva. A practical relative error criterion for augmented lagrangians.\n\nMathematical Programming, pages 1\u201330, 2010.\n\n[7] Dorit S Hochbaum. Approximation algorithms for the set covering and vertex cover problems. SIAM\n\nJournal on Computing, 11(3):555\u2013556, 1982.\n\n[8] VK Koval and MI Schlesinger. Two-dimensional programming in image analysis problems. USSR\n\nAcademy of Science, Automatics and Telemechanics, 8:149\u2013168, 1976.\n\n[9] Frank R Kschischang, Brendan J Frey, and H-A Loeliger. Factor graphs and the sum-product algorithm.\n\nInformation Theory, IEEE Transactions on, 47(2):498\u2013519, 2001.\n\n[10] Taesung Lee, Zhongyuan Wang, Haixun Wang, and Seung-won Hwang. Web scale entity resolution using\n\nrelational evidence. Technical report, Microsoft Research, 2011.\n\n[11] Victor Lempitsky and Yuri Boykov. Global optimization for shape \ufb01tting.\nComputer Vision and Pattern Recognition (CVPR \u201907), pages 1\u20138. IEEE, 2007.\n\nIn IEEE Conference on\n\n[12] Ji Liu, Stephen J. Wright, Christopher R\u00b4e, and Victor Bittorf. An asynchronous parallel stochastic coor-\n\ndinate descent algorithm. Technical report, University of Wisconsin-Madison, October 2013.\n\n[13] F Manshadi, Baruch Awerbuch, Rainer Gemulla, Rohit Khandekar, Juli\u00b4an Mestre, and Mauro Sozio. A\ndistributed algorithm for large-scale generalized matching. Proceedings of the VLDB Endowment, 2013.\n[14] Feng Niu, Benjamin Recht, Christopher R\u00b4e, and Stephen J. Wright. Hogwild!: A lock-free approach to\n\nparallelizing stochastic gradient descent. arXiv preprint arXiv:1106.5730, 2011.\n[15] Jorge Nocedal and Stephen J Wright. Numerical Optimization. Springer, 2006.\n[16] Pradeep Ravikumar, Alekh Agarwal, and Martin J Wainwright. Message-passing for graph-structured\nlinear programs: Proximal methods and rounding schemes. The Journal of Machine Learning Research,\n11:1043\u20131080, 2010.\n\n[17] J. Renegar. Some perturbation theory for linear programming. Mathenatical Programming, Series A,\n\n65:73\u201392, 1994.\n\n[18] Dan Roth and Wen-tau Yih.\n\nInteger linear programming inference for conditional random \ufb01elds.\nProceedings of the 22nd International Conference on Machine Learning, pages 736\u2013743. ACM, 2005.\n\nIn\n\n[19] Sujay Sanghavi, Dmitry Malioutov, and Alan S Willsky. Linear programming analysis of loopy belief\npropagation for weighted matching. In Advances in Neural Information Processing Systems, pages 1273\u2013\n1280, 2007.\n\n[20] Aravind Srinivasan.\n\nImproved approximation guarantees for packing and covering integer programs.\n\nSIAM Journal on Computing, 29(2):648\u2013670, 1999.\n\n[21] Jurgen Van Gael and Xiaojin Zhu. Correlation clustering for crosslingual link detection. In IJCAI, pages\n\n1744\u20131749, 2007.\n\n[22] Vijay V Vazirani. Approximation Algorithms. Springer, 2004.\n[23] Stephen J. Wright.\n\nImplementing proximal point methods for linear programming.\n\nOptimization Theory and Applications, 65(3):531\u2013554, 1990.\n\nJournal of\n\n[24] Zheng Wu, Ashwin Thangali, Stan Sclaroff, and Margrit Betke. Coupling detection and data association\nfor multiple object tracking. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference\non, pages 1948\u20131955. IEEE, 2012.\n\n[25] Ke Xu and Wei Li. Many hard examples in exact phase transitions. Theoretical Computer Science,\n\n355(3):291\u2013302, 2006.\n\n[26] Ce Zhang and Christopher R\u00b4e. Towards high-throughput gibbs sampling at scale: A study across storage\n\nmanagers. In SIGMOD Proceedings, 2013.\n\n9\n\n\f", "award": [], "sourceid": 1320, "authors": [{"given_name": "Srikrishna", "family_name": "Sridhar", "institution": "UW-Madison"}, {"given_name": "Stephen", "family_name": "Wright", "institution": "UW-Madison"}, {"given_name": "Christopher", "family_name": "Re", "institution": "UW-Madison"}, {"given_name": "Ji", "family_name": "Liu", "institution": "UW-Madison"}, {"given_name": "Victor", "family_name": "Bittorf", "institution": "UW-Madison"}, {"given_name": "Ce", "family_name": "Zhang", "institution": "UW-Madison"}]}