{"title": "Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search", "book": "Advances in Neural Information Processing Systems", "page_first": 539, "page_last": 548, "abstract": "We present a learning-based approach to computing solutions for certain NP-hard problems. Our approach combines deep learning techniques with useful algorithmic elements from classic heuristics. The central component is a graph convolutional network that is trained to estimate the likelihood, for each vertex in a graph, of whether this vertex is part of the optimal solution. The network is designed and trained to synthesize a diverse set of solutions, which enables rapid exploration of the solution space via tree search. The presented approach is evaluated on four canonical NP-hard problems and five datasets, which include benchmark satisfiability problems and real social network graphs with up to a hundred thousand nodes. Experimental results demonstrate that the presented approach substantially outperforms recent deep learning work, and performs on par with highly optimized state-of-the-art heuristic solvers for some NP-hard problems. Experiments indicate that our approach generalizes across datasets, and scales to graphs that are orders of magnitude larger than those used during training.", "full_text": "Combinatorial Optimization with Graph\n\nConvolutional Networks and Guided Tree Search\n\nZhuwen Li\nIntel Labs\n\nQifeng Chen\n\nHKUST\n\nVladlen Koltun\n\nIntel Labs\n\nAbstract\n\nWe present a learning-based approach to computing solutions for certain NP-\nhard problems. Our approach combines deep learning techniques with useful\nalgorithmic elements from classic heuristics. The central component is a graph\nconvolutional network that is trained to estimate the likelihood, for each vertex\nin a graph, of whether this vertex is part of the optimal solution. The network\nis designed and trained to synthesize a diverse set of solutions, which enables\nrapid exploration of the solution space via tree search. The presented approach is\nevaluated on four canonical NP-hard problems and \ufb01ve datasets, which include\nbenchmark satis\ufb01ability problems and real social network graphs with up to a\nhundred thousand nodes. Experimental results demonstrate that the presented\napproach substantially outperforms recent deep learning work, and performs on par\nwith highly optimized state-of-the-art heuristic solvers for some NP-hard problems.\nExperiments indicate that our approach generalizes across datasets, and scales to\ngraphs that are orders of magnitude larger than those used during training.\n\nIntroduction\n\n1\nMany of the most important algorithmic problems in computer science are NP-hard. But their\nworst-case complexity does not diminish their practical role in computing. NP-hard problems arise as\na matter of course in computational social science, operations research, electrical engineering, and\nbioinformatics, and must be solved as well as possible, their worst-case complexity notwithstanding.\nThis motivates vigorous research into the design of approximation algorithms and heuristic solvers.\nApproximation algorithms provide theoretical guarantees, but their scalability may be limited and\nalgorithms with satisfactory bounds may not exist [3, 38]. In practice, NP-hard problems are often\nsolved using heuristics that are evaluated in terms of their empirical performance on problems of\nvarious sizes and dif\ufb01culty levels [15].\nRecent progress in deep learning has stimulated increased interest in learning algorithms for NP-hard\nproblems. Convolutional networks and reinforcement learning have been applied with inspiring\nresults to the game Go, which is theoretically intractable [34, 35]. Recent work has also considered\nclassic NP-hard problems, such as Satis\ufb01ability, Travelling Salesman, Knapsack, Minimum Vertex\nCover, and Maximum Cut [37, 6, 10, 32, 25]. The appeal of learning-based approaches is that they\nmay discover useful patterns in the data that may be hard to specify by hand, such as graph motifs\nthat can indicate a set of vertices that belong to an optimal solution.\nIn this paper, we present a new approach to solving NP-hard problems that can be expressed in\nterms of graphs. Our approach combines deep learning techniques with useful algorithmic elements\nfrom classic heuristics. The central component is a graph convolutional network (GCN) [12, 24]\nthat is trained to predict the likelihood, for each vertex, of whether this vertex is part of the optimal\nsolution. A naive implementation of this idea does not yield good results, because there may be many\noptimal solutions, and each vertex could participate in some of them. A network trained without\nprovisions that address this can generate a diffuse and uninformative likelihood map. To overcome\nthis problem, we use a network structure and loss that allows the network to synthesize a diverse set\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fof solutions, which enables the network to explicitly disambiguate different modes in the solution\nspace. This trained GCN is used to guide a parallelized tree search procedure that rapidly generates a\nlarge number of candidate solutions, one of which is chosen after subsequent re\ufb01nement.\nWe apply the presented approach to four canonical NP-hard problems: Satis\ufb01ability (SAT), Maximal\nIndependent Set (MIS), Minimum Vertex Cover (MVC), and Maximal Clique (MC). The approach\nis evaluated on two SAT benchmarks, an MC benchmark, real-world citation network graphs, and\nsocial network graphs with up to one hundred thousand nodes from the Stanford Large Network\nDataset Collection. The experiments indicate that our approach substantially outperforms recent\nstate-of-the-art (SOTA) deep learning work. For example, on the SATLIB benchmark, our approach\nsolves all of the problems in the test set, while a recent method based on reinforcement learning does\nnot solve any. The experiments also indicate that our approach performs on par with or better than\nhighly-optimized contemporary solvers based on traditional heuristic methods. Furthermore, the\nexperiments indicate that the presented approach generalizes across datasets and scales to graphs that\nare orders of magnitude larger than those used during training.\n2 Background\nApproaches to solving NP-hard problems include approximation algorithms with provable guarantees\nand heuristics tuned for empirical performance [20, 36, 38, 15]. A variety of heuristics are employed\nin practice, including greedy algorithms, local search, genetic algorithms, simulated annealing,\nparticle swarm optimization, and others. By and large, the heuristics are based on extensive manual\ntuning and domain expertise.\nLearning-based approaches have the potential to yield more effective empirical algorithms for NP-\nhard problems by learning from large datasets. The learning procedure can detect useful patterns and\nleverage regularities in real-world data that may escape human algorithm designers. He et al. [19]\nlearned a node selection policy for branch-and-bound algorithms with imitation learning. Silver et al.\n[34, 35] used reinforcement learning to learn strategies for the game Go that achieved unprecedented\nresults. Vinyals et al. [37] developed a new neural network architecture called a pointer network,\nand applied it to small-scale planar Travelling Salesman Problem (TSP) instances with up to 50\nnodes. Bello et al. [6] used reinforcement learning to train pointer networks to generate solutions\nfor synthetic planar TSP instances with up to 100 nodes, and also demonstrated their approach on\nsynthetic random Knapsack problems with up to 200 elements.\nMost recently, Dai et al. [10] used reinforcement learning to train a deep Q-network (DQN) to\nincrementally construct solutions to graph-based NP-hard problems, and showed that this approach\noutperforms prior learning-based techniques. Our work is related, but differs in several key respects.\nFirst, we do not use reinforcement learning, which is known as a particularly challenging optimization\nproblem. Rather, we show that very strong performance and generalization can be achieved with\nsupervised learning, which bene\ufb01ts from well-understood and reliable solvers. Second, we use a\ndifferent predictive model, a graph convolutional network [12, 24]. Third, we design and train the\nnetwork to synthesize a diverse set of solutions at once. This is key to our approach and enables rapid\nexploration of the solution space.\nA technical note by Nowak et al. [30] describes an application of graph neural networks to the\nquadratic assignment problem. The authors report experiments on matching synthetic random 50-\nnode graphs and generating solutions for 20-node random planar TSP instances. Unfortunately, the\nresults did not surpass classic heuristics [9] or the results achieved by pointer networks [37].\n3 Preliminaries\nNP-complete problems are closely related to each other and all can be reduced to each other in\npolynomial time. (Of course, not all such reductions are ef\ufb01cient.) In this work we focus on four\ncanonical NP-hard problems [22].\nMaximal Independent Set (MIS). Given an undirected graph, \ufb01nd the largest subset of vertices in\nwhich no two are connected by an edge.\nMinimum Vertex Cover (MVC). Given an undirected graph, \ufb01nd the smallest subset of vertices\nsuch that each edge in the graph is incident to at least one vertex in the selected set.\nMaximal Clique (MC). Given an undirected graph, \ufb01nd the largest subset of vertices that form a\nclique.\n\n2\n\n\fFigure 1: Algorithm overview. First, the input graph is reduced to an equivalent smaller graph. Then\nit is fed into the graph convolutional network f, which generates multiple probability maps that\nencode the likelihood of each vertex being in the optimal solution. The probability maps are used to\niteratively label the vertices until all vertices are labelled. A complete labelling corresponds to a leaf\nin the search tree. Internal nodes in the search tree represent incomplete labellings that are generated\nalong the way. The complete labellings generated by the tree search are re\ufb01ned by rapid local search.\nThe best result is used as the \ufb01nal output.\n\nSatis\ufb01ability (SAT). Consider a Boolean expression that is built from Boolean variables, parentheses,\nand the following operators: AND (conjunction), OR (disjunction), and NOT (negation). Here a\nBoolean expression is a conjunction of clauses, where a clause is a disjunction of literals. A literal is\na Boolean variable or its negation. The problem is to \ufb01nd a Boolean labeling of all variables such that\nthe given expression is true, or determine that no such label assignment exists.\nAll these problems can be reduced to each other. In particular, the MVC, MC, and SAT problems can\nall be represented as instances of the MIS problem, as reviewed in the supplementary material. Thus,\nSection 4 will focus primarily on the MIS problem, although the basic structure of the approach is\nmore general. The experiments in Section 5 will be conducted on benchmarks and datasets for all\nfour problems, which will be solved by converting them and solving the equivalent MIS problem.\n4 Method\nConsider a graph G = (V,E, A), where V = {vi}N\ni=1 is the set of N vertices in G, E is the set of E\nedges, and A \u2208 {0, 1}N\u00d7N is the corresponding unweighted symmetric adjacent matrix. Given G,\nour goal is to produce a binary labelling for each vertex in G, such that label 1 indicates that a vertex\nis in the independent set and label 0 indicates that it\u2019s not.\nA natural approach to this problem is to train a deep network of some form to perform the labelling.\nThat is, a network f would take the graph G as input, and the output f (G) would be a binary labelling\nof the nodes. A natural output representation is a probability map in [0, 1]N that indicates how likely\neach vertex is to belong to the MIS. This direct approach did not work well in our experiments. The\nproblem is that converting the probability map f (G) to a discrete assignment generally yields an\ninvalid solution. (A set that is not independent.) Instead, we will use a network f within a tree search\nprocedure.\nWe begin in Section 4.1 by describing a basic network architecture for f. This network generates\na probability map over the input graph. The network is used in a basic MIS solver that leverages\nit within a greedy procedure. Then, in Section 4.2 we modify the architecture and training of\nf to synthesize multiple diverse probability maps, and leverage this within a more powerful tree\nsearch procedure. Finally, Section 4.3 describes two ideas adopted from classic heuristics that are\ncomplementary to the application of learning and are useful in accelerating computation and re\ufb01ning\ncandidate solutions. The overall algorithm is illustrated in Figure 1.\n4.1\nWe begin by describing a basic approach that introduces the overall network architecture and leads to\na basic MIS solver. This will be extended into a more powerful solver in Section 4.2.\nLet D = {(Gi, li)} be a training set, where Gi is a graph as de\ufb01ned above and li \u2208 {0, 1}N\u00d71 is\none of the optimal solutions for the NP-hard graph problem. li is a binary map that speci\ufb01es which\nvertices are included in the solution. The network f (Gi; \u03b8) is parameterized by \u03b8 and is trained to\npredict li given Gi.\nWe use a graph convolutional network (GCN) architecture [12, 24]. This architecture can perform\ndense prediction over a graph with pairwise edges. (See [7, 14] for overviews of related architectures.)\n\nInitial approach\n\n3\n\nGCNInput GraphLocalSearchReduced Graph\u2026\u2026\u2026Guided Tree SearchGraph ReductionChoose the best\u2026LeafNot leaf\fA GCN consists of multiple layers {Hl} where Hl \u2208 RN\u00d7Cl is the feature layer in the l-th layer and\nC l is the number of feature channels in the l-th layer. We initialize the input layer H0 with all ones\nand Hl+1 is computed from the previous layer Hl with layer-wise convolutions:\n\n0 + D\u2212 1\n\n2 AD\u2212 1\n\nwhere \u03b8l\n\n0 \u2208 RCl\u00d7Cl+1 and \u03b8l\n\nD is the degree matrix of A with its diagonal entry D(i, i) =(cid:80)\n\n(1)\nHl+1 = \u03c3(Hl\u03b8l\n1 \u2208 RCl\u00d7Cl+1 are trainable weights in the convolutions of the network,\nj A(j, i), and \u03c3(\u00b7) is a nonlinear\nactivation function (ReLU [29]). For the last layer HL, we do not use ReLU but apply a sigmoid to\nget a likelihood map.\nDuring training, we minimize the binary cross-entropy loss for each training sample (Gi, li):\n\n2 Hl\u03b8l\n\n1),\n\n(cid:96)(li, f (Gi; \u03b8)) =\n\n{lij log(fj(Gi; \u03b8)) + (1 \u2212 lij) log(1 \u2212 fj(Gi; \u03b8))},\n\n(2)\n\nN(cid:88)\n\nj=1\n\nwhere lij is the j-th element of li and fj(Gi; \u03b8) is the j-th element of f (Gi; \u03b8).\nThe output f (Gi; \u03b8) of a trained network is generally not a binary vector but real-valued vector in\n[0, 1]N . Simply rounding the real values to 0 or 1 may violate the independence constraints. A\nsimple solution is to treat the prediction f (Gi; \u03b8) as a likelihood map over vertices and use the trained\nnetwork within a greedy growing procedure that makes sure that the constraints are satis\ufb01ed.\nIn this setup, f (G; \u03b8) is used as the heuristic function for a greedy search algorithm for MIS. Given\nG, the algorithm labels a batch of vertices with 1 or 0 recursively. First, we sort all the vertices in\ndescending order based on f (G). Then we iterate over the sorted list in order and label each vertex as\n1 and its neighbors as 0. This process stops when the next vertex in the sorted list is already labelled\nas 0. We remove all the labelled vertices and the incident edges from G and obtain a residual graph\nG(cid:48). We use G(cid:48) as input to f, obtain a new likelihood map, and repeat the process. The complete basic\nalgorithm, referred to as BasicMIS, is speci\ufb01ed in the supplementary material.\n4.2 Diversity and tree search\nOne weakness of the approach presented so far is that the network can get confused when there\nare multiple optimal solutions for the same graph. For instance, Figure 2 shows two equivalent\noptimal solutions that induce completely different labellings. In other words, the solution space is\nmultimodal and there are many different modes that may be encountered during training. Without\nfurther provisions, the network may learn to produce a labelling that \u201csplits the difference\u201d between\nthe possible modes. In the setting of Figure 2 this would correspond to a probability assignment of\n0.5 to each vertex, which is not a useful labelling.\nTo enable the network to differentiate be-\ntween different modes, we extend the struc-\nture of f to generate multiple probability\nmaps. Given the input graph G,\nthe re-\nvised network f generates M probability maps:\n\n(cid:10)f 1(Gi; \u03b8), . . . , f M (Gi; \u03b8)(cid:11). To train f to gen-\n\nerate diverse high-quality probability maps, we\nadopt the hindsight loss [18, 8, 28]:\n\nL(D, \u03b8) =\n\n(cid:96)(li, f m(Gi; \u03b8)),\n\n(3)\n\nmin\nm\n\n(cid:88)\n\ni\n\nSolution 1\n\nSolution 2\n\nFigure 2: Two equivalent solutions for MIS on a\nfour-vertex graph. The black vertices indicate the\nsolution.\n\nwhere (cid:96)(\u00b7,\u00b7) is the binary cross-entropy loss de-\n\ufb01ned in Equation 2. Note that the loss for a given training sample in Equation 3 is determined solely\nby the most accurate solution for that sample. This allows the network to spread its bets and generate\nmultiple diverse solutions, each of which can be sharper.\nAnother advantage of producing multiple diverse probability maps is that we can explore multiple\nsolutions with each run of f. Naively, we could apply the basic algorithm for each f m(Gi; \u03b8),\ngenerating at least M solutions. We can in principle generate exponentially many solutions, since\nin each iteration we can get M probability maps for labelling the graph. We do not generate an\nexponential number of solutions, but leverage the new f within a tree search procedure that generates\na large number of solutions.\n\n4\n\n\fIdeally, we want to explore a large amount of diverse solutions in a limited time and choose the best\none. The basic idea of the tree search algorithm is that we maintain a queue of incomplete solutions\nand randomly choose one of them to expand in each step. When we expand an incomplete solution,\n\nwe use M probability maps(cid:10)f 1(Gi; \u03b8), . . . , f M (Gi; \u03b8)(cid:11) to spawn M new more complete solutions,\n\nwhich are added to the queue. This is akin to breadth-\ufb01rst search, rather than depth-\ufb01rst search. If we\nexpand the tree in depth-\ufb01rst fashion, the diversity of solutions will suffer as most of them have the\nsame ancestors. By expanding the tree in breadth-\ufb01rst fashion, we can get higher diversity. To this\nend, the expanded tree nodes are kept in a queue and one is selected at random in each iteration for\nexpansion. On a desktop machine used in our experiments, this procedure yields up to 20K diverse\nsolutions in 10 minutes for a graph with 1,000 vertices. The revised algorithm is summarized in the\nsupplement.\nThe presented tree search algorithm is inherently parallelizable, and can thus be signi\ufb01cantly acceler-\nated. The basic idea is to run multiple threads that choose different incomplete solutions from the\nqueue and expand them. The parallelized tree search algorithm is summarized in the supplement.\nOn the same desktop machine, the parallelized procedure yields up to 100K diverse solutions in 10\nminutes for a graph with 1,000 vertices.\n4.3 Classic elements\nLocal search. In the literature on approximation algorithms for NP-hard problems, there are useful\nheuristic strategies that modify a solution locally by simply inserting, deleting, and swapping nodes\nsuch that the solution quality can only improve [5, 16, 2]. We use this approach to re\ufb01ne the candidate\nsolutions produced by tree search. Speci\ufb01cally, we use a 2-improvement local search algorithm [2, 13].\nMore details can be found in the supplement.\nGraph reduction. There are also graph reduction techniques that can rapidly reduce a graph to\na smaller one [1, 26] while preserving the size of the optimal MIS. This accelerates computation\nby only applying f to the \u201ccomplex\u201d part of the graph. The reduction techniques we adopted are\ndescribed in the supplement.\n5 Experiments\n5.1 Experimental setup\nDatasets. For training, we use the SATLIB benchmark [21]. This dataset provides 40,000 synthetic\n3-SAT instances that are all satis\ufb01able; each instance consists of about 400 clauses with 3 literals. We\nconvert these SAT instances to equivalent MIS graphs, which have about 1,200 vertices each. We\nwill show that a network trained on these graphs generalizes to other problems, datasets, and to much\nlarger graphs. We partition the dataset at random into a training set of size 38,000, a validation set of\nsize 1,000, and a test set of size 1,000. The network trained on this training set will be applied to all\nother problems and datasets described below.\nWe evaluate on other problems and datasets as follows:\n\u2022 SAT Competition 2017 [4]. The SAT Competition is a competitive event for SAT solvers. It was\norganized in conjunction with an annual conference on Theory and Applications of Satis\ufb01ability\nTesting. We evaluate on the 20 instances with the same scale as those in SATLIB. Note that\nsmall-scale does not necessarily mean easy. We evaluate SAT on this dataset in addition to the\nSATLIB test set.\n\u2022 BUAA-MC [39]. This dataset includes 40 hard synthetic MC instances. These problems are\nspeci\ufb01cally designed to be challenging [39]. The basic idea of generating hard instances is hiding\nthe optimal solutions in random instances. We evaluate MC, MVC, and MIS on this dataset.\n\u2022 SNAP Social Networks [27]. This dataset is part of the Stanford Large Network Dataset Collection.\nIt includes real-world graphs from social networks such as Facebook, Twitter, Google Plus, etc.\n(Nodes are people, edges are interactions between people.) We use all social network graphs with\nless than a million nodes. The largest graph in the dataset we use has roughly 100,000 vertices and\nmore than 10 million edges. We treat all edges as undirected. Details of the graphs can be found in\nthe supplement. We evaluate MVC and MIS on this dataset.\n\u2022 Citation networks [33]. This dataset includes real-world graphs from academic search engines.\nIn these graphs, nodes are documents and edges are citations. We treat all edges as undirected.\nDetails of the graphs can be found in the supplement. We evaluate MVC and MIS on this dataset.\nBaselines. We mainly compare the presented approach to the recent deep learning method of Dai\net al. [10]. This approach is referred to as S2V-DQN, following their terminology. For a number of\n\n5\n\n\fexperiments, we will also show the results of this approach when it is enhanced by the graph reduction\nand local search procedures described in Section 4.3. This will be referred to as S2V-DQN+GR+LS.\nFollowing Dai et al. [10], we also list the performance of a classic greedy heuristic, referred to as\nClassic [31], and its enhanced version \u2013 Classic+GR+LS. In addition, we calibrate these results\nagainst three powerful alternative methods: a Satis\ufb01ability Modulo Theories (SMT) solver called\nZ3 [11], a SOTA MIS solver called ReduMIS [26], and a SOTA integer linear programming (ILP)\nsolver called Gurobi [17].\nNetwork settings. Our network has L = 20 graph convolutional layers, which is deep enough to\nget a large receptive \ufb01eld for each node in the input graph. Since our input is a graph without any\nfeature vectors on vertices, the input H0 contains all-one vectors of size C 0 = 32. This input leads\nthe network to treat all vertices equally, and thus the prediction is made based on the structure of\nthe graph only. The widths of the intermediate layers are identical: C l = 32 for l = 1, . . . , L \u2212 1.\nThe width of the output layer is C L = M, where M is the number of output maps. We use M = 32.\n(Experiments indicate that performance saturates at M = 32.)\nTraining. Since SATLIB consists of synthetic SAT instances, the groud-truth assignments are known.\nWith the ground-truth assignments, we can generate multiple labelling solutions for the corresponding\ngraphs by switching on and off the free variables in a clause. We use Adam [23] with single-graph\nmini-batches and learning rate 10\u22124. Training proceeds for 200 epochs and takes about 16 hours on\na desktop with an i7-5960X 3.0 GHz CPU and a Titan X GPU. S2V-DQN is trained on the same\ndataset with the same number of iterations.\nTesting. For SAT, we report the number of problems that are solved by the evaluated approaches.\nThis is a very important metric, because there is a big difference in applications between \ufb01nding a\nsatisfying assignment or not. It is a binary success/failure outcome. Since we solve the SAT problems\nvia solving the equivalent MIS problems, we also report the size of the independent set that is found\nby the evaluated approaches. Note that it usually takes great effort to increase the size by 1 when\nthe solution is close to the optimum, and thus small increases in the average size, on the order of 1,\nshould be regarded as signi\ufb01cant. For MVC, MIS, and MC, we report the size of the set identi\ufb01ed by\nthe evaluated approaches. On the BUAA-MC dataset, we also report the fraction of MC problems\nthat are solved by the different approaches.\n5.2 Results\nWe test all approaches on the same desktop with an i7-5960X 3.0 GHz CPU and a Titan X GPU. Our\ntree search algorithm is parallelized with 16 threads. Since the search will continue as long as allowed\nfor Z3, Gurobi, ReduMIS, and our approach, we set a time limit. For fair comparison, we give the\nother methods 16\u00d7 running time, though we don\u2019t reboot them if they terminate earlier based on their\nstopping criteria. On the SATLIB and SAT Competition 2017 datasets, the time limit is 10 minutes.\nOn the SNAP-SocialNetwork and CitationNetwork datasets with large graphs, the time limit is 30\nminutes. There is no time limit for the Classic approach and S2V-DQN, since they only generate one\nsolution. However, note that on SAT problems these approaches can terminate as soon as a satisfying\nassignment is found. Thus, on the SAT problems we report the median termination time.\n\nMethod\n\nSolved MIS Time (s)\n\nMethod\n\nSolved MIS Time (s)\n\nClassic\nClassic+GR+LS\nS2V-DQN\nS2V-DQN+GR+LS\nGurobi\nZ3\nReduMIS\nOurs\n\n0.0% 403.98\n7.9% 424.82\n0.0% 413.77\n8.9% 424.98\n98.5% 426.86\n\n100.0% \u2013\n100.0% 426.90\n100.0% 426.90\n\n0.31\n0.45\n2.26\n2.41\n175.83\n0.01\n47.79\n11.47\n\nClassic\nClassic+GR+LS\nS2V-DQN\nS2V-DQN+GR+LS\nGurobi\nZ3\nReduMIS\nOurs\n\n0.0% 453.25\n75.0% 491.05\n0.0% 462.05\n80.0% 491.50\n80.0% \u2013\n100.0% \u2013\n100.0% 492.85\n100.0% 492.85\n\n0.30\n0.45\n2.19\n2.37\n141.66\n0.01\n21.90\n12.20\n\nTable 1: Results on the SATLIB test set. Fraction\nof solved SAT instances, average independent set\nsize, and runtime.\n\nTable 2: Results on the SAT Competition 2017.\nFraction of solved SAT instances, average inde-\npendent set size, and runtime.\n\n6\n\n\fWe begin by reporting results on the SAT datasets. For each approach, Table 1 reports the percentage\nof solved SAT instances and the average independent set size on the test set of the SATLIB dataset.\nNote that there are 1,000 instances in the test set. The Classic approach cannot solve a single problem.\nS2V-DQN, though it has been trained on similar graphs in the training set, does not solve a single\nproblem either, possibly because the reinforcement learning procedure did not discover fully satisfying\nsolutions during training. Looking at the MIS sizes reveals that S2V-DQN discovers solutions that are\nclose but struggles to get to the optimum. This observation is consistent with the results reported in\nthe paper [10]. With re\ufb01nement by the same classic elements we use, S2V-DQN+GR+LS solves 89\nSAT instances out of 1,000, and Classic+GR+LS solves 79 SAT instances. In contrast, our approach\nsolves all 1,000 SAT instances, which is slightly better than the SOTA ILP solver (Gurobi), and same\nas the modern SMT solver (Z3) and the SOTA MIS solver (ReduMIS). Note that Z3 directly solves\nthe SAT problem and cannot solve any transformed MIS problem on the SATLIB dataset.\nWe also analyze the effect of the number M of diverse solutions in our network. Note that this is\nanalyzed on the single-threaded tree search algorithm, since the multi-threaded version solves all\ninstances easily. Figure 3 plots the fraction of solved problems and average size of the computed MIS\nsolution on the SATLIB validation set for M = 1, 4, 32, 128, 256. The results indicate that increasing\nthe number of intermediate solutions helps up to M = 32, at which point the performance plateaus.\nTable 2 reports results on SAT Competition 2017 instances. Again, both Classic and S2V-DQN solve\n0 problems. When augmented by graph reduction and local search, Classic+GR+LS solve 75% of\nthe problems, and S2V-DQN+GR+LS solves 80%, while our approach solves 100% of the problems.\nAs sophisticated solvers, Z3 and ReduMIS solve 100%, while Gurobi solves 80%. Note that Gurobi\ncannot return a valid solution for some instances, and thus its independent set size is not listed.\nTable 3 reports results on the BUAA-MC dataset.\nWe evaluate MC, MIS, and MVC on this dataset.\nSince the optimal solutions for MC are given in\nthis dataset, we report the fraction of MC prob-\nlems solved optimally by each approach. Note\nthat this dataset is designed to be highly chal-\nlenging [39]. Most baselines, including Gurobi,\ncannot solve a single instance in this dataset.\nAs a sophisticated MIS solver, ReduMIS solves\n25%. S2V-DQN+GR+LS does not \ufb01nd any op-\ntimal solution on any problem instance. Our\napproach solves 62.5% of the instances. Note\nthat our approach was only trained on synthetic\nSAT graphs from a different dataset. We see\nthat the presented approach generalizes across\ndatasets and problem types. We also evaluate\nMIS and MVC on these graphs. As shown in Ta-\nble 3, our approach outperforms all the baselines\non MIS and MVC.\n\nFigure 3: Effect of the hyperparameter M. The\nblue curve shows the fraction of solved problems\non the SATLIB validation set for different settings\nof M. The orange curve shows the average size of\nthe computed independent set for different settings\nof M.\n\nMethod\n\nSolved MC MIS MVC\n\nClassic\nClassic+GR+LS\nS2V-DQN\nS2V-DQN+GR+LS\nGurobi\nReduMIS\nOurs\n\n0.0% 30.03 21.53 991.72\n0.0% 42.83 24.64 988.61\n0.0% 40.40 23.76 989.49\n0.0% 42.98 24.70 988.55\n0.0% 39.75 24.12 989.13\n25.0% 44.95 24.87 988.38\n62.5% 45.55 25.06 988.19\n\nTable 3: Results on the BUAA-MC dataset. The table\nreports the fraction of solved MC problems and the average\nsize of MC, MIS, and MVS solutions.\n\nMethod\n\nSolved\n\nMIS\n\n18.8% 425.55\nBasic\n59.2% 426.52\nBasic+Tree\n42.4% 426.41\nNo local search\n91.0% 426.81\nNo reduction\n98.8% 426.86\nFull w/o parallel\nFull with parallel 100.0% 426.88\n\nTable 4: Controlled experiment on the\nSATLIB validation set. The tables\nshows the fraction of solved SAT in-\nstances and the average independent\nset size.\n\n7\n\n1248163264128256Number of outputs102030405060708090100Solved problems (%)425.5426426.5427MIS\fName\n\n993\nego-Facebook\n56,866\nego-Gplus\n36,235\nego-Twitter\nsoc-Epinions1\n53,457\nsoc-Slashdot0811 53,009\nsoc-Slashdot0922 56,087\n4,730\nwiki-Vote\n8,019\nwiki-RfA\n4,330\nbitcoin-otc\nbitcoin-alpha\n2,703\n\n1,046\n\n4,866\n8,131\n4,346\n2,718\n\nMIS\n\nMVC\n\nClassic S2V-DQN ReduMIS Ours Classic S2V-DQN ReduMIS Ours\n2,993\n2,993\n50,220 50,220\n44,463 44,463\n22,280 22,280\n24,046 24,046\n25,770 25,770\n2,249\n2,249\n2,704\n2,704\n1,535\n1,535\n1,065\n1,065\n\n1,046\n3,046\n57,394 57,394 50,748\n36,843 36,843 45,071\n53,599 53,599 22,422\n53,314 53,314 24,351\n56,398 56,398 26,081\n4,866\n2,385\n8,131\n2,816\n4,346\n1,551\n2,718\n1,080\n\n3,019\n51,011\n45,031\n22,790\n24,641\n26,662\n2,336\n2,879\n1,547\n1,078\n\n1,020\n56,603\n36,275\n53,089\n52,719\n55,506\n4,779\n7,956\n4,334\n2,705\n\nTable 5: Results on the SNAP Social Network graphs. The table lists the sizes of solutions for MIS\nand MVC found by the different approaches.\n\nName\n\nCiteseer\nCora\nPubmed\n\nMIS\n\nMVC\n\nClassic\n1,848\n1,424\n15,852\n\nS2V-DQN ReduMIS\n1,867\n1,451\n15,912\n\n1,705\n1,381\n15,709\n\nOurs Classic\n1,867\n1,508\n1,451\n1,284\n15,912\n3,865\n\nS2V-DQN ReduMIS\n1,460\n1,257\n3,805\n\n1,622\n1,327\n4,008\n\nOurs\n1,460\n1,257\n3,805\n\nTable 6: Results on the citation networks.\n\nNext we report results on large-scale real-world graphs. We use the different approaches to compute\nMIS and MVC on the SNAP Social Networks and the Citation Networks. The results are reported in\nTables 5 and 6. Our approach and ReduMIS outperform the other baselines on all graphs. ReduMIS\nworks as well as out approach, presumably because both methods \ufb01nd the optimal solutions on these\ngraphs. Gurobi cannot return any valid solution for these large instances, and thus its results are\nnot listed. One surprising observation is that S2V-DQN does not perform as well as the Classic\napproach when the graph size is larger than 10,000 vertices; the reason could be that S2V-DQN does\nnot generalize well to large graphs. These results indicate that our approach generalizes well across\nproblem types and datasets. In particular, it generalizes from synthetic graphs to real ones, from SAT\ngraphs to real-world social networks, and from graphs with roughly 1,000 to graphs with roughly\n100,000 nodes and more than 10 million edges. This may indicate that there are universal motifs that\nare present in graphs and occur across datasets and scales, and that the presented approach discovers\nthese motifs.\nFinally, we conduct a controlled experiment on the SATLIB validation set to analyze how each\ncomponent contributes to the presented approach. Note that this is also analyzed on the single-\nthreaded tree search algorithm, as the multi-threaded version solves all instances easily. The result is\nsummarized in Table 4. First, we evaluate the initial approach presented in Section 4.1, augmented\nby reduction and local search (but no diversity); we refer to this approach as Basic. Then we evaluate\na different version of Basic that generates multiple solutions and conducts tree search via random\nsampling, but does not utilize the diversity loss presented in Section 4.2; we refer to this version as\nBasic+Tree. (Basic+Tree is structurally similar to our full pipeline, but does not use the diversity\nloss.) Finally, we evaluate two ablated versions of our full pipeline, by removing the local search or\nthe graph reduction. Our full approach with and without parallelization is listed for comparison. This\nexperiment demonstrates that all components presented in this paper contribute to the results.\n6 Conclusion\nWe have presented an approach to solving NP-hard problems with graph convolutional networks.\nOur approach trains a deep network to perform dense prediction over a graph. We showed that\ntraining the network to produce multiple solutions enables an effective exploration procedure. Our\napproach combines deep learning techniques with classic algorithmic ideas. The resulting algorithm\n\n8\n\n\fconvincingly outperforms recent work. A particularly encouraging \ufb01nding is that the approach\ngeneralizes across very different datasets and to problem instances that are larger by orders of\nmagnitude than ones it was trained on.\nWe have focused on the maximal independent set (MIS) problem and on problems that can be easily\nmapped to it. This is not a universal solution. For example, we did not solve Maximal Clique on the\nlarge SNAP Social Networks and Citation networks, because the complementary graphs of these\nlarge networks are very dense, and all evaluated approaches either run out of memory or cannot return\na result in reasonable time (24 hours). This highlights a limitation of only training a network for\none task (MIS) and indicates the desirability of applying the presented approach directly to other\nproblems such as Maximal Clique. The structure of the presented approach is quite general and can\nbe leveraged to train networks that predict likelihood of Maximal Clique participation rather than\nlikelihood of MIS participation, and likewise for other problems. We see the presented work as a step\ntowards a new family of solvers for NP-hard problems that leverage both deep learning and classic\nheuristics. We will release code to support future progress along this direction.\nReferences\n[1] Takuya Akiba and Yoichi Iwata. Branch-and-reduce exponential/FPT algorithms in practice: A\n\ncase study of vertex cover. In ALENEX, 2015.\n\n[2] Diogo Vieira Andrade, Mauricio G. C. Resende, and Renato Fonseca F. Werneck. Fast local\n\nsearch for the maximum independent set problem. J. Heuristics, 18(4), 2012.\n\n[3] Sanjeev Arora and Boaz Barak. Computational Complexity: A Modern Approach. Cambridge\n\nUniversity Press, 2009.\n\n[4] Tom\u00e1\u0161 Balyo, Marijn JH Heule, and Matti J\u00e4rvisalo. SAT competition 2017.\n\n[5] Roberto Battiti and Marco Protasi. Reactive local search for the maximum clique problem.\n\nAlgorithmica, 29(4), 2001.\n\n[6] Irwan Bello, Hieu Pham, Quoc V. Le, Mohammad Norouzi, and Samy Bengio. Neural combi-\n\nnatorial optimization with reinforcement learning. arXiv:1611.09940, 2016.\n\n[7] Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst.\nGeometric deep learning: Going beyond Euclidean data. IEEE Signal Processing Magazine, 34\n(4), 2017.\n\n[8] Qifeng Chen and Vladlen Koltun. Photographic image synthesis with cascaded re\ufb01nement\n\nnetworks. In ICCV, 2017.\n\n[9] Nicos Christo\ufb01des. Worst-case analysis of a new heuristic for the travelling salesman problem.\n\nTechnical report, Carnegie Mellon University, 1976.\n\n[10] Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, and Le Song. Learning combinatorial\n\noptimization algorithms over graphs. In NIPS, 2017.\n\n[11] Leonardo Mendon\u00e7a de Moura and Nikolaj Bj\u00f8rner. Z3: An ef\ufb01cient SMT solver. In TACAS,\n\n2008.\n\n[12] Micha\u00ebl Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks\n\non graphs with fast localized spectral \ufb01ltering. In NIPS, 2016.\n\n[13] Thomas A. Feo, Mauricio G. C. Resende, and Stuart H. Smith. A greedy randomized adaptive\n\nsearch procedure for maximum independent set. Operations Research, 42(5), 1994.\n\n[14] Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl.\n\nNeural message passing for quantum chemistry. In ICML, 2017.\n\n[15] Teo\ufb01lo F. Gonzalez. Handbook of Approximation Algorithms and Metaheuristics. Chapman\n\nand Hall/CRC, 2007.\n\n[16] Andrea Grosso, Marco Locatelli, and Wayne J. Pullan. Simple ingredients leading to very\n\nef\ufb01cient heuristics for the maximum clique problem. J. Heuristics, 14(6), 2008.\n\n9\n\n\f[17] Gurobi Optimization Inc. Gurobi optimizer reference manual, version 8.0, 2018.\n[18] Abner Guzm\u00e1n-Rivera, Dhruv Batra, and Pushmeet Kohli. Multiple choice learning: Learning\n\nto produce multiple structured outputs. In NIPS, 2012.\n\n[19] He He, Hal Daum\u00e9 III, and Jason Eisner. Learning to search in branch and bound algorithms.\n\nIn NIPS, 2014.\n\n[20] Dorit S Hochbaum. Approximation algorithms for NP-hard problems. PWS Publishing Co.,\n\n1997.\n\n[21] Holger H. Hoos and Thomas St\u00fctzle. SATLIB: An online resource for research on SAT. In SAT,\n\n2000.\n\n[22] Richard M. Karp. Reducibility among combinatorial problems. In Complexity of Computer\n\nComputations, 1972.\n\n[23] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR,\n\n2015.\n\n[24] Thomas N. Kipf and Max Welling. Semi-supervised classi\ufb01cation with graph convolutional\n\nnetworks. In ICLR, 2017.\n\n[25] Wouter Kool, Herke van Hoof, and Max Welling. Attention solves your TSP, approximately.\n\narXiv:1803.08475, 2018.\n\n[26] Sebastian Lamm, Peter Sanders, Christian Schulz, Darren Strash, and Renato F. Werneck.\n\nFinding near-optimal independent sets at scale. J. Heuristics, 23(4), 2017.\n\n[27] Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection.\n\nhttp://snap.stanford.edu/data, 2014.\n\n[28] Zhuwen Li, Qifeng Chen, and Vladlen Koltun. Interactive image segmentation with latent\n\ndiversity. In CVPR, 2018.\n\n[29] Vinod Nair and Geoffrey E. Hinton. Recti\ufb01ed linear units improve restricted Boltzmann\n\nmachines. In ICML, 2010.\n\n[30] Alex Nowak, Soledad Villar, Afonso S. Bandeira, and Joan Bruna. A note on learning algorithms\n\nfor quadratic assignment with graph neural networks. arXiv:1706.07450, 2017.\n\n[31] Christos H. Papadimitriou and Kenneth Steiglitz. Combinatorial Optimization: Algorithms and\n\nComplexity. Prentice-Hall, 1982.\n\n[32] Daniel Selsam, Matthew Lamm, Benedikt B\u00fcnz, Percy Liang, Leonardo de Moura, and David L.\n\nDill. Learning a SAT solver from single-bit supervision. arXiv:1802.03685, 2018.\n\n[33] Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Gallagher, and Tina Eliassi-\n\nRad. Collective classi\ufb01cation in network data. AI Magazine, 29(3), 2008.\n\n[34] David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den\nDriessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, et al. Mastering the\ngame of Go with deep neural networks and tree search. Nature, 529(7587), 2016.\n\n[35] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur\nGuez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of\nGo without human knowledge. Nature, 550(7676), 2017.\n\n[36] Vijay V. Vazirani. Approximation Algorithms. Springer, 2004.\n[37] Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. In NIPS, 2015.\n[38] David P Williamson and David B Shmoys. The Design of Approximation Algorithms. Cambridge\n\nUniversity Press, 2011.\n\n[39] Ke Xu, Fr\u00e9d\u00e9ric Boussemart, Fred Hemery, and Christophe Lecoutre. Random constraint\nsatisfaction: Easy generation of hard (satis\ufb01able) instances. Arti\ufb01cial Intelligence, 171(8-9),\n2007.\n\n10\n\n\f", "award": [], "sourceid": 319, "authors": [{"given_name": "Zhuwen", "family_name": "Li", "institution": "Intel Labs"}, {"given_name": "Qifeng", "family_name": "Chen", "institution": "HKUST"}, {"given_name": "Vladlen", "family_name": "Koltun", "institution": "Intel Labs"}]}