{"title": "Provable Certificates for Adversarial Examples: Fitting a Ball in the Union of Polytopes", "book": "Advances in Neural Information Processing Systems", "page_first": 14082, "page_last": 14092, "abstract": "We propose a novel method for computing exact pointwise robustness of deep\n neural networks for all convex lp norms. Our algorithm, GeoCert, finds the largest\n lp ball centered at an input point x0, within which the output class of a given neural\n network with ReLU nonlinearities remains unchanged. We relate the problem\n of computing pointwise robustness of these networks to that of computing the\n maximum norm ball with a fixed center that can be contained in a non-convex\npolytope. This is a challenging problem in general, however we show that there\n exists an efficient algorithm to compute this for polyhedral complices. Further\nwe show that piecewise linear neural networks partition the input space into a polyhedral complex. Our algorithm has the ability to almost immediately output a\nnontrivial lower bound to the pointwise robustness which is iteratively improved until it ultimately becomes tight. We empirically show that our approach generates\na distance lower bounds that are tighter compared to prior work, under moderate\ntime constraints.", "full_text": "Provable Certi\ufb01cates for Adversarial Examples:\n\nFitting a Ball in the Union of Polytopes\n\nMatt Jordan\u2217\n\nUniversity of Texas at Austin\nmjordan@cs.utexas.edu\n\nJustin Lewis\u2217\n\nUniversity of Texas at Austin\njustin94lewis@utexas.edu\n\nAlexandros G. Dimakis\n\nUniversity of Texas at Austin\n\ndimakis@austin.utexas.edu\n\nAbstract\n\nWe propose a novel method for computing exact pointwise robustness of deep\nneural networks for all convex (cid:96)p norms. Our algorithm, GeoCert, \ufb01nds the largest\n(cid:96)p ball centered at an input point x0, within which the output class of a given neural\nnetwork with ReLU nonlinearities remains unchanged. We relate the problem\nof computing pointwise robustness of these networks to that of computing the\nmaximum norm ball with a \ufb01xed center that can be contained in a non-convex\npolytope. This is a challenging problem in general, however we show that there\nexists an ef\ufb01cient algorithm to compute this for polyhedral complices. Further\nwe show that piecewise linear neural networks partition the input space into a\npolyhedral complex. Our algorithm has the ability to almost immediately output a\nnontrivial lower bound to the pointwise robustness which is iteratively improved\nuntil it ultimately becomes tight. We empirically show that our approach generates\ndistance lower bounds that are tighter compared to prior work, under moderate\ntime constraints.\n\n1\n\nIntroduction\n\nThe problem we consider in this paper is that of \ufb01nding the (cid:96)p-pointwise robustness of a neural net\nwith ReLU nonlinearities with respect to general (cid:96)p norms. The pointwise robustness of a neural net\nclassi\ufb01er, f, for a given input point x0 is de\ufb01ned as the smallest distance from x0 to the decision\nboundary [1]. Formally, this is de\ufb01ned as\n\n\u03c1(f, x0, p) := inf\n\nx {\u0001 \u2265 0 | f (x) (cid:54)= f (x0) \u2227 ||x \u2212 x0||p = \u0001}.\n\n(1)\n\nComputing the pointwise robustness is the central problem in certifying that neural nets are robust to\nadversarial attacks. Exactly computing this quantity this problem has been shown to be NP-complete\nin the (cid:96)\u221e setting [11], with hardness of approximation results under the (cid:96)1 norm [25]. Despite these\nhardness results, multiple algorithms have been devised to exactly compute the pointwise robustness,\nthough they may require exponential time in the worst case. As a result, ef\ufb01cient algorithms have\nalso been developed to give provable lower bounds to the pointwise robustness, though these lower\nbounds may be quite loose.\nIn this work, we propose an algorithm that initially outputs a nontrivial lower bound to the pointwise\nrobustness and continually improves this lower bound until it becomes tight. Although our algorithm\n\n\u2217First two authors have equal contribution\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fhas performance which is theoretically poor in the worst case, we \ufb01nd that in practice it provides a\nfundamental compromise between the two extremes of complete and incomplete veri\ufb01ers. This is\nuseful in the case where a lower-bound to the pointwise robustness is desired under a moderate time\nbudget.\nThe central mathematical problem we address is how to \ufb01nd the largest (cid:96)p ball with a \ufb01xed center\ncontained in the union of convex polytopes. We approach this by decomposing the boundary of such\na union into convex components. This boundary may have complexity exponential in the dimension\nin the general case. However, if the polytopes form a polyhedral complex, an ef\ufb01cient boundary\ndecomposition exists and we leverage this to develop an ef\ufb01cient algorithm to compute the largest (cid:96)p\nball with a \ufb01xed center contained in the polyhedral complex. We connect this geometric result to the\nproblem of computing the pointwise robustness of piecewise linear neural networks by proving that\nthe linear regions of piecewise linear neural networks indeed form a polyhedral complex. Further, we\nleverage the lipschitz continuity of neural networks to both initialize at a nontrivial lower bound, and\nguide our search to tighten this lower bound more quickly.\nOur contributions are as follows:\n\n\u2022 We provide results on the boundary complexity of polyhedral complices, and use these\nresults to motivate an algorithm to compute the the largest interior (cid:96)p ball centered at x0.\n\u2022 We prove that the linear regions of piecewise linear neural networks partition the input space\n\u2022 We incorporate existing incomplete veri\ufb01ers to improve our algorithm and demonstrate that\nunder a moderate time budget, our approach can provide tighter lower bounds compared to\nprior work.\n\ninto a polyhedral complex.\n\n2 Related Work\n\nComplete Veri\ufb01ers: We say that an algorithm is a complete veri\ufb01er if it exactly computes the\npointwise robustness of a neural network. Although this problem is NP-Complete in general under\nan (cid:96)\u221e norm [11], there are two main algorithms to do so. The \ufb01rst leverages formal logic and\nSMT solvers to generate a certi\ufb01cate of robustness [11], though this approach only works for (cid:96)\u221e\nnorms. The second formulates certi\ufb01cation of piecewise linear neural networks as mixed integer\nprograms and relies on fast MIP solvers to be scalable to reasonably small networks trained on\nMNIST [20, 8, 13, 6, 4]. This approach extends to the (cid:96)2 domain so long as the mixed integer\nprogramming solver utilized can solve linearly-constrained quadratic programs [20]. Both of these\napproaches are fundamentally different than our proposed method and do not provide a sequence of\never-tightening lower bounds. Certainly each can be used to certify any given lower bound, or provide\na counterexample, but the standard technique to do so is unable to reuse previous computation.\n\nIncomplete Veri\ufb01ers: There has been a large body of work on algorithms that output a certi\ufb01able\nlower bound on the pointwise robustness. We call these techniques incomplete veri\ufb01ers. These\napproaches employ a variety of relaxation techniques. Linear programming approaches admit ef\ufb01cient\nconvex relaxations that can provide nontrivial lower bounds [26, 25, 17, 7]. Exactly computing the\nLipschitz constant of neural networks has also been shown to be NP-hard [22], but overestimations\nof the Lipschitz constant have been shown to provide lower bounds to the pointwise robustness\n[15, 25, 19, 10, 21]. Other relaxations, such as those leveraging semide\ufb01nite programming, or\nabstract representations with zonotopes are also able to provide provable lower bounds [16, 14]. An\nequivalent formulation of this problem is providing overestimations on the range of neural nets, for\nwhich interval arithmetic has been shown useful [23, 24]. Other approaches generate lower bounds by\nexamining only a single linear region of a PLNN [18, 5], though we extend these results to arbitrarily\nmany linear regions. These approaches, while typically more ef\ufb01cient, may provide loose lower\nbounds.\n\n3 Centered Chebyshev Ball\n\nNotations and Assumptions\nBefore we proceed, we introduce some notation. A convex polytope is a bounded subset of Rn that\n\n2\n\n\fcan be described as the intersection of a \ufb01nite number of halfspaces. The polytopes we study are\ndescribed succinctly by their linear inequalities (i.e., they are H-polytopes), which means that the\nnumber of halfspaces de\ufb01ning the polytope, denoted by m, is at most O(poly(n)), i.e. polynomial\nin the ambient dimension. If a polytope P is described as {x | Ax \u2264 b}, an (n \u2212 k)-face of P is a\nnonempty subset of P de\ufb01ned as the set {x | x \u2208 P \u2227 A=x = b=} where A= is a matrix of rank k\ncomposed of a subset of the rows of A, and b= is the corresponding subset of b. We use the term\nfacet to refer to an (n \u2212 1) face of P. We de\ufb01ne the boundary \u03b4P of a polytope as the union of the\nfacets of P. We use the term nonconvex polytope to describe a subset of Rn that can be written as\na union of \ufb01nitely many convex polytopes, each with nonempty interior. The (cid:96)p-norm ball of size\nt centered at point x0 is denoted by Bp\nt (x0) := {x | ||x \u2212 x0||p \u2264 t}. The results presented hold\nfor (cid:96)p norms for p \u2265 1. When the choice of norm is arbitrary, we use || \u00b7 || to denote the norm and\nBt(x0) to refer to the corresponding norm ball.\nCentered Chebyshev Balls: Working towards the case of a union of polytopes, we \ufb01rst consider the\nsimple case of \ufb01tting the largest (cid:96)p-ball with a \ufb01xed center inside a single polytope. The uncentered\nversion of this problem is typically referred to as \ufb01nding the Chebyshev center of a polytope and\ncan be computed via a single linear program [3, 2]. When the center is \ufb01xed, this can be viewed as\ncomputing the projection to the boundary of the polytope. In fact, in the case for a single polytope,\nit suf\ufb01ces to compute the projection onto the hyperplanes containing each facet. See Appendix A\nfor further discussion computing projections onto polytopes. Ultimately, because of the polytope\u2019s\ngeometric structure, the problem\u2019s decomposition is straightforward. This theme of ef\ufb01cient boundary\ndecomposition will prove to hold true for polyhedral complices as shown in the following sections.\nNow, we turn our attention to the case of \ufb01nding a centered Chebyshev ball inside a general nonconvex\npolytope. This amounts to computing the projection to the boundary of the region. The key idea\nhere is that the boundary of a nonconvex polytope can be described as the union of \ufb01nitely many\n(n \u2212 1)-dimensional polytopes; however, the decomposition may be quite complex. We de\ufb01ne this\nset formally as follows:\nDe\ufb01nition 1. The boundary of a non-convex polytope P is the largest set T \u2286 P such that every\npoint x \u2208 T satis\ufb01es the following two properties:\n\n(i) There exists an \u00010 and a direction u such that for all \u0001 \u2208 (0, \u00010), there exists a neighborhood\n\ncentered around x + \u0001u that is contained in P .\n\n(ii) There exists an \u03b70 and a direction v such that for all \u03b7 \u2208 (0, \u03b70), x + \u03b7v /\u2208 P .\n\nThe boundary is composed of \ufb01nitely many convex polytopes, and computing the projection to a\nsingle convex polytope is an ef\ufb01ciently computable convex program. If there exists an ef\ufb01cient\ndecomposition of the boundary of a nonconvex polytope into convex sets, then a viable algorithm\nis to simply compute the minimal distance from x0 to each component of the boundary and return\nthe minimum. Unfortunately, for general nonconvex polytopes, there may not be an ef\ufb01cient convex\ndecomposition. See Theorem B.1 in Appendix B.\nHowever, there do exist classes of nonconvex polytopes that admit a convex decomposition with size\nthat is no larger than the description of the nonconvex polytope itself. To this end, we introduce the\nfollowing de\ufb01nition (see also Ch. 5 of [27]):\nDe\ufb01nition 2. A nonconvex polytope, described as the union of elements of the set P = {P1, ...,Pk}\nforms a polyhedral complex if, for every Pi,Pj \u2208 P with nonempty intersection, Pi \u2229 Pj is a face\nof both Pi and Pj. Additionally, for brevity, if a pair of polytopes P,Q, form a polyhedral complex,\nwe say they are PC. (See Figure 2 for examples.)\n\nWe can now state our main theorem concerning the computation of the centered Chebyshev ball\nwithin polyhedral complices:\nTheorem 3.1. Given a polyhedral complex, P = {P1, . . .Pk}, where Pi is de\ufb01ned as the intersec-\ni mi, and let x0 be a point contained by at least one such\ni\u2208[k] Pi is represented by at most M (n \u2212 1)-dimensional polytopes.\n\ntion of mi closed halfspaces. Let M =(cid:80)\nPi. Then the boundary of(cid:83)\n\nThere exists an algorithm that can compute this boundary in O(poly(n, M, k)) time.\nReturning to our desired application, we now prove a corollary about the centered Chebyshev ball\ncontained in a union of polytopes.\n\n3\n\n\fFigure 1: Three potential con\ufb01gurations of a nonconvex polytope. Note that only the rightmost\nnonconvex polytope forms a polyhedral complex.\n\nCorollary 3.2. Given a collection, P = {P1, . . .Pk} that meets all the conditions outlined in\ntheorem 3.1, with the boundary of P computed as in theorem 3.1, the centered Chebyshev ball\naround x0 has size\n\n(2)\n\nt := inf\n\nx\u2208T ||x \u2212 x0||\n\nThis can be solved by at most M linear programs in the case of (cid:96)\u221e norm, or at most M linearly\nconstrained quadratic programs in the case of the (cid:96)2-norm.\nGraph Theoretic Formulation:\nTheorem 3.1 and its corollary provide a natural algorithm to computing the centered Chebyshev ball\nof a polyhedral complex: compute the convex components of the boundary and then compute the\nprojection to each component. In the desired application of computing robustness of neural networks,\nthe number of such convex components may be large and therefore it may be inef\ufb01cient to even\nenumerate each component. While we demonstrate in Appendix G that the number of linear regions\nof ReLU networks tends to be much smaller than their theoretical upper bound, it is of interest to\ndevelop algorithms that do not have to compute projections to every boundary facet. In the absence\nof other information, one must at least compute the projection to every facet, boundary or otherwise,\nintersecting the centered Chebyshev ball.\nA more natural way to view this problem is as a local-search problem along a bipartite graph. For\na given polyhedral complex P composed of polytopes P1,P2, . . . , we construct a bipartite graph\nwhere each right vertex correspond to an n-dimensional polytope Pi, and each left vertex corresponds\nto a facet of the polyhedral complex. We abuse notation and let Pi refer to the right-vertex and its\ncorresponding polytope and similarly for left-vertex and facet Fj. An edge exists between right-\nvertex Pi and left-vertex Fj iff polytope Pi contains facet Fj. In other words, the graph of interest is\ncomposed of the terminal elements of the face lattice and their direct ancestors. By de\ufb01nition, for any\npolyhedral complex, the left-degree of this graph is at most 2.\nIn the context of computing the centered Chebyshev ball, centered around a point x0, we further\nequip each left-vertex/facet Fj in our graph with a value which we refer to as the \u2018potential.\u2019 For\nnow, the potential of vertex Fj can be thought of as the projection distance between x0 and the facet\nFj. We will denote the potential of vertex Fj as \u03a6(Fj). The boundary facets, T , correspond to a\nsubset of the left-vertices and recall that our goal is to return the left-vertex with minimal potential.\nBy the triangle inequality, any ray starting at x0 that intersects multiple facets in order Fi1,Fi2, ...\nwill have that \u03a6(Fi1) \u2264 \u03a6(Fi2 ) \u2264 . . . . Further, one can represent any norm ball Bt(x0) as a subset\nSt of left and right vertices of the graph. A left-vertex Fj is in St iff \u03a6(Fj) \u2264 t.\nThe local search along this graph can be thought of as follows. Any point x0 contained inside a\npolyhedral complex must reside in at least one polytope Pi, and our goal is to \ufb01nd the facet with\nminimum potential. The idea is similar to Djikstra\u2019s algorithm, where we maintain a set of \u2018frontier\nfacets\u2019 in a priority queue, ordered by their potential \u03a6, and a set of right-vertices/polytopes which\nhave already been explored. At each iteration, we pop the frontier facet with minimal potential, and\nexamine its neighbors, which correspond to polytopes containing this facet. Since the left-degree\nof the graph is 2, at most one of these neighboring polytopes has not yet been explored. If such a\npolytope exists, for each of its neigbors/facets, we compute the potential and insert the facet into the\npriority queue of \u2018frontier facets\u2019, and also add this new polytope to our set of explored polytopes.\nAt initialization, the set of seen polytopes is composed only of the polytope containing x0, and\ntermination occurs as soon as a boundary facet is popped from the priority queue. Pseudocode for\nthis procedure is outlined in Algorithm 1 and a proof of correctness is provided in Appendix C.\n\n4\n\n\fFigure 2: Example of bipartite graph de\ufb01ned over facets F and polytopes P of polyhedral complex\nP. Note that each facet is shared by at most two polytopes.\n\n: Algorithm 1: GeoCert\nInput: point x0, potential \u03a6;\nInitialization: ;\n// Setup priority queue, seen-polytope set;\nQ \u2190 [ ]; C \u2190 {P(x0)};\n// Handle \ufb01rst polytope\u2019s facets;\nfor Facet F \u2208 N (P(x0)) do\nend\n// Loop until boundary is popped;\nwhile Q (cid:54)= \u2205 do\n\nQ.push((\u03a6(F),F));\n\nF \u2190 Q.pop();\nif F is boundary then\nReturn F;\nelse\nfor P \u2208 N (F) \\ C do\nfor F \u2208 N (P ) do\nend\n\nQ.push((\u03a6(F),F);\n\nend\n\nend\n\nend\n\nFigure 3: Pseudocode for GeoCert (left) and a pictorial representation of the algorithm\u2019s behavior\non a simple example (right). The facets colored belong to the priority queue, with red and black\ndenoting adversarial facets and non-adversarial facets respectively. Once the minimal facet in the\nqueue is adversarial, the algorithm stops.\n\nThis alternative phrasing of our problem aids us in two ways. First, we note that the potential of any\nleft-vertex Fj can be computed as needed. Indeed, letting t\u2217 be the minimum potential of all facets\ncontained in the boundary set, this search procedure only requires that the potential need only be\ncomputed for the facets contained in St\u2217, as opposed to the entire collection of facets. Second, \u03a6\nneed not refer to the euclidean projection distance, and alternative potential functions exist which\nfurther reduce the number of potentials that need to be computed while preserving correctness. These\nwill be further discussed in Section 5 and Appendix C.\n\nIteratively Constructing Polyhedral Complices\nFinally, we note an approach by which polyhedral complices may be formed that will become useful\nwhen we discuss PLNN\u2019s in the following section. We present the following three lemmas which\n\n5\n\nP1P2P4FaFbFdFPP3FcP1P2P4FaFbFdP3Fcbbb\frelate to iterative constructions of polyhedral complices. Informally, they state that given any polytope\nor pair of polytopes which are PC, a slice with a hyperplane or a global intersection with a polytope\ngenerates a set that is still PC.\nLemma 3.3. Given an arbitrary polytope P := {x | Ax \u2264 b} and a hyperplane H := {x | cT x =\nd} that intersects the interior of P, the two polytopes formed by the intersection of P and the each of\nclosed halfpsaces de\ufb01ned by H are PC.\nLemma 3.4. Let P,Q be two PC polytopes and let HP, HQ be two hyperplanes that de\ufb01ne two\nclosed halfspaces each, H +P , H\u2212\nP , H +Q, H\u2212\nQ. If P \u2229 Q \u2229 HP = P \u2229 Q \u2229 HQ then the subset of the\nfour resulting polytopes {P \u2229 H +P ,P \u2229 H\u2212\nP ,Q \u2229 H +Q,Q \u2229 H\u2212\nQ} with nonempty interior forms a\n\npolyhedral complex.\n\nAnd the following will be necessary when we handle the case where we wish to compute the pointwise\nrobustness for the image classi\ufb01cation domain, where valid images are typically de\ufb01ned as vectors\ncontained in the hypercube [0, 1]n.\nLemma 3.5. Let P = {P1, . . .Pk} be a polyhedral complex and let D be any polytope. Then the\nset {Pi \u2229 D | Pi \u2208 P} also forms a polyhedral complex.\n4 Piecewise Linear Neural Networks\n\nWe now demonstrate an application of the geometric re-\nsults described above to certifying robustness of neural\nnets. We only discuss networks with fully connected layers\nand ReLU nonlinearities, but our results hold for networks\nwith convolutional and skip layers as well as max and\naverage pooling layers. Let f be an arbitrary L-layer feed\nforward neural net with fully connected layers and ReLU\nnonlinearities, where each layer f (i) : Rni\u22121 \u2192 Rni has\nthe form\n\n(cid:26)Wix + bi,\n\nf (i)(x) =\n\nWi\u03c3(f (i\u22121)(x)) + bi,\n\nif i = 1\nif i > 1\n\n(3)\n\nwhere \u03c3 refers to the element-wise ReLU operator. And\nwe denote the \ufb01nal layer output f (L)(x) as f (x). We\ntypically use the capital F (x) to refer to the maximum\nindex of f: F (x) := arg maxi fi(x). We de\ufb01ne the\ndecision region of f at x0 as the set of points for which\nthe classi\ufb01er returns the same label as it does for x0:\n{x | F (x) = F (x0)}.\nIt is important to note is that f (i)(x) refers to the pre-ReLU\nactivations of the ith layer of f. Let m be the number\ni=1 ni. We describe a\nneuron con\ufb01guration as a ternary vector, A \u2208 {\u22121, 0, 1}m,\nsuch that each coordinate of A corresponds to a particular\nneuron in f. In particular, for neuron j,\n\nof neurons of f, that is m = (cid:80)L\u22121\n\uf8f1\uf8f2\uf8f3+1,\n\nAj =\n\n\u22121,\n0,\n\nFigure 4: Pictorial reference for proof of\nTheorem 4.2. (Top Left) A single Relu\nactivation partitions the input space into\ntwo PC polytopes (Top Right) as addi-\ntional activations are added at the \ufb01rst\nlayer, the collection is still PC by Lemma\n3.4. (Bottom Left) as the next layer of\nactivations are added, the partitioning is\nlinear within each region created previ-\nously and PC at the previous boundaries,\nthus still PC. (Bottom Right) the parti-\ntioning due to all subsequent layers pre-\nserves PC-ness by induction.\n\nif neuron j is \u2018on\u2019\nif neuron j is \u2018off\u2019\nif neuron j is both \u2018on\u2019 and \u2018off\u2019\n\n(4)\n\nWhere a neuron being \u2018on\u2019 corresponds to the pre-ReLU activation is at least zero, \u2018off\u2019 corresponds\nto the pre-ReLU being at most zero, and if a neuron is both on and off its pre-ReLU activation is\nidentically zero. Further each neuron con\ufb01guration corresponds to a set\n\nPA = {x | f (x) has neuron activation consistent with A}\n\nThe following have been proved before, but we include them to introduce notational familiarity:\n\n6\n\n\fLemma 4.1. For a given neuron con\ufb01guration A, the following are true about PA,\n\n(i) f (i)(x) is linear in x for all x \u2208 PA.\n(ii) PA is a polytope.\n\nThis lets us connect the polyhedral complex results from the previous section towards computing the\npointwise robustness of PLNNs. Letting the potential \u03c6 be the (cid:96)p distance, we can apply Algorithm 1\ntowards this problem.\nTheorem 4.2. The collection of PA for all A, such that PA has nonempty interior forms a polyhedral\ncomplex. Further, the decision region of F at x0 also forms a polyhedral complex.\nIn fact, except for a set of measure zero over the parameter space, the facets of each such linear region\ncorrespond to exactly one ReLU \ufb02ipping con\ufb01gurations:\nCorollary 4.3. If the network parameters are in general position and A, B are neuron con\ufb01gurations\nsuch that dim(PA) = dim(PB) = n and their intersection is of dimension (n \u2212 1), then A, B have\nhamming distance 1 and their intersection corresponds to exactly one ReLU \ufb02ipping signs.\n\n5 Speedups\n\nWhile our results in section 3 hold for general polyhedral\ncomplices, we can boost the performance of GeoCert by\nleveraging additional structure of PLNNs. As the runtime\nof GeoCert hinges upon the total number of iterations and\ntime per iteration, we discuss techniques to improve each.\n\nImproving Iteration Speed Via Upper Bounds\nAt each iteration, GeoCert pops the minimal element\nfrom the priority queue of \u2018frontier facets\u2019 and, using the\ngraph theoretic lens, considers the facets in its two-hop\nneighborhood. Geometrically this corresponds to popping\nthe minimal-distance facet seen so far, considering the\npolytope on the opposite side of that facet and computing\nthe distan1ce to each of its facets. In the worst case, the\nnumber of facets of each linear region is the number of\nReLU\u2019s in the PLNN. While computing the projection\nrequires a linear or quadratic program, as we will show, it\nis usually not necessary to compute a convex program for\n\nFigure 5: Piecewise Linear Regions of a\n2D toy network. The dotted line repre-\nsents the decision boundary.\n\neach every nonlinearity at every iteration.\nIf we can quickly guarantee that a potential facet is infeasible within the domain of interest then\nwe avoid computing the projection exactly. In the image classi\ufb01cation domain, the domain of valid\nimages is usually the unit hypercube. If an upper bound on the pointwise robustness, U, is known,\n(cid:48) := BU (x0) \u2229 D. This aids us in two ways: (i) if the\nthen it suf\ufb01ces to restrict our domain to D\n(cid:48); (ii) a\nhyperplane containing a facet does not intersect D\ntighter restriction on the domain allows for tighter bounds on pre-ReLU activations. For point (i), we\nobserve that computing the feasibility of the intersection of a hyperplane and hyperbox is linear in\nthe dimension and hence many facets can very quickly be deemed infeasible. For point (ii), if we\ncan guarantee ReLU stability, by Corollary 4.3, then we can deem the facets corresponding to the\neach stable ReLU as infeasible. ReLU stability additionally provides tighter upper bounds on the\nLipschitz constants of the network.\nAny valid adversarial example provides an upper bound on the pointwise robustness. Any point on\nany facet on the boundary of the decision region also provides an upper bound. In Appendix F, we\ndescribe a novel tweak that can be used to generate adversarial examples tailored to be close to the\noriginal point. Also, during the runtime of Geocert, any time a boundary facet is added to the priority\nqueue, we update the upper bound based on the projection magnitude to this facet.\n\n(cid:48) then the facet also does not intersect D\n\nImproving Number of Iterations Via Lipschitz Overestimation\nWhen one uses distance as a potential function, if the true pointwise robustness is \u03c1, then GeoCert\n\n7\n\n\fmust examine every polytope that intersects B\u03c1(x0). This is necessary in the case when no extra\ninformation is known about the polyhedral complex of interest. However one can incorporate the\nlipschitz-continuity of a PLNN into the potential function \u03c6 to reduce on the number of linear regions\nexamined. The main idea is that as the network has some smoothness properties, any facet for which\nthe classi\ufb01er is very con\ufb01dent in its answer must be very far from the decision boundary.\nTheorem 5.1. Letting F (x0) = i, and gj(x) = fi(x)\u2212fj(x) and an upper bound Lj on the lipschitz\nas a potential for GeoCert maintains its\ncontinuity of gj, using \u03c6lip(y) := ||x0 \u2212 y|| + minj(cid:54)=i\ncorrectness in computing the pointwise robustness.\n\ngj (y)\n\nLj\n\nThe intuition behind this choice of potential is that it biases the set of seen polytopes to not expand\ntoo much in directions for which the distance to the decision boundary is guaranteed to be large. This\neffectively is able to reduce the number of polytopes examined, and hence the number of iterations\nof GeoCert, while still maintaining complete veri\ufb01cation. A critical bonus of this approach is that\nit allows one to \u2019warm-start\u2019 GeoCert with a nontrivial lower bound that will only increase until\nbecoming tight at termination. A more thorough discussion on upper-bounding the lipschitz constant\nof each gj can be found in [25].\n\n6 Experiments\n\nExactly Computing the Pointwise Robustness: Our \ufb01rst experiment compares the average point-\nwise robustness bounds provided by two complete veri\ufb01cation methods, GeoCert and MIP, as well as\nan incomplete veri\ufb01er, Fast-Lip. The average (cid:96)p distance returned by each method and the average\nrequired time (in seconds) to achieve this bound are provided in Table 1. Veri\ufb01cation for (cid:96)2 and (cid:96)\u221e\nrobustness was conducted for 1000 random validation images for two networks trained on MNIST.\nNetworks are divided into binary and non-binary examples. Binary networks were trained to distin-\nguish a subset of 1\u2019s and 7\u2019s from the full MNIST dataset. All networks were trained with (cid:96)1 weight\nregularization with \u03bb set to 2 \u00d7 10\u22123. All networks are composed of fully connected layers with\nReLU activations. The layer-sizes for the two networks are as follows: i) [784, 10, 50, 10, 2] termed\n70NetBin and ii) [784, 20, 20, 2] termed 40NetBin. Mixed integer programs and linear programs were\nsolved using Gurobi [9]. The code for reproducing experiments has been made publicly available\u2020.\nFrom Table 1, it is clear that Geocert and MIP return the exact robustness value while Fast-Lip\nprovides a lower bound. While the runtimes for MIP are faster than those for GeoCert, they are\nwithin an order of magnitude. In these experiments, we record the timing when each method is left\nto run to completion; however, in the experiment to follow we demonstrate that GeoCert provides a\nnon-trivial lower bound faster than other methods.\n\nTable 1: (Left) Times (seconds) to compute exact pointwise robustness on binary MNIST networks\nfor both the (cid:96)2 and (cid:96)\u221e settings over 1000 random examples. Boldface corresponds to the exact\npointwise robustness. (Right) Provable lower bounds for a binary MNIST network under a \ufb01xed 300s\ntime limit. Note that GeoCert initializes at the bound provided by Fast-Lip and continually improves.\nBoldface here corresponds to the tightest lower bound found. Note that our algorithm outperforms all\nprevious methods for this task.\n\nMethod\nFast-Lip\nGeoCert\nMIP\nFast-Lip\nGeoCert\nMIP\n\n(cid:96)p\n\n(cid:96)\u221e\n\n(cid:96)2\n\n70NetBin\nDist.\n0.092\n0.175\n0.175\n0.905\n1.414\n1.414\n\nTime\n0.012\n1.453\n0.771\n0.007\n2.816\n1.972\n\n40NetBin\nDist.\n0.116\n0.190\n0.190\n1.124\n1.533\n1.533\n\nTime\n0.009\n4.924\n0.797\n0.008\n6.958\n4.466\n\nEx.\n1\n2\n3\n4\n5\n\nMethod\n\nFast-Lip GeoCert MIP\n2.0\n1.0\n1.0\n2.0\n2.0\n\n2.251\n1.356\n1.620\n2.499\n2.402\n\n1.782\n1.319\n1.501\n1.975\n1.871\n\nBest Lower Bound Under a Time Limit: To demonstrate the ability of GeoCert to provide a\nlower bound greater than those generated by incomplete veri\ufb01ers and other complete veri\ufb01ers under\n\n\u2020https://github.com/revbucket/geometric-certi\ufb01cates\n\n8\n\n\fFigure 6: Original MNIST images (top) compared to their minimal distance adversarial examples as\nfound by GeoCert (middle) and the minimal distortion adversarial attacks found by Carlini-Wagner\n(cid:96)2 attack. The average (cid:96)2 distortion found by GeoCert is 31.6% less that found by Carlini-Wagner.\n\na \ufb01xed time limit we run the following experiment. On the binary MNIST dataset, we train a\nnetwork with layer sizes [784, 20, 20, 20, 2] using Adam and a weight decay of 0.02 [12]. We\nallow a time limit of 5 minutes per example, which is not suf\ufb01cient for either GeoCert or MIP to\ncomplete. As per the codebase associated with [20], for MIP we use a binary search procedure\nof \u0001 = [0.5, 1.0, 2.0, 4.0, . . . ] to verify increasingly larger lower bounds. We also compare against\nthe lower bounds generated by Fast-Lip [25], noting that using the Lipschitz potential described in\nSection 5 allows GeoCert to immediately initialize to the bound produced by Fast-Lip. We \ufb01nd that in\nall examples considered, after 5 minutes, GeoCert is able to generate larger lower-bounds compared\nto MIP. Table 1 demonstrates these results for 5 randomly chosen examples.\n\n7 Conclusion\n\nThis paper presents a novel approach towards both bounding and exactly computing the pointwise\nrobustness of piecewise linear neural networks for all convex (cid:96)p norms. Our technique differs\nfundamentally from existing complete veri\ufb01ers in that it leverages local geometric information to\ncontinually tighten a provable lower bound. Our technique is built upon the notion of computing the\ncentered Chebyshev ball inside a polyhedral complex. We demonstrate that polyhedral complices\nhave ef\ufb01cient boundary decompositions and that each decision region of a piecewise linear neural\nnetwork forms such a polyhedral complex. We leverage the Lipschitz continuity of PLNN\u2019s to\nimmediately output a nontrivial lower bound to the pointwise robustness and improve this lower\nbound until it ultimately becomes tight.\nWe observe that mixed integer programming approaches are typically faster in computing the exact\npointwise robustness compared to our method. However, our method provides intermediate valid\nlower bounds that are produced signi\ufb01cantly faster. Hence, under a time constraint, our approach is\nable to produce distance lower bounds that are typically tighter compared to incomplete veri\ufb01ers and\nfaster compared to MIP solvers. An important direction for future work would be to optimize our\nimplementation so that we can scale our method to larger networks. This is a critical challenge for all\nmachine learning veri\ufb01cation methods.\n\n8 Acknowledgements\n\nThis research has been supported by NSF Grants 1618689, DMS 1723052, CCF 1763702, AF\n1901292 and research gifts by Google, Western Digital and NVIDIA.\n\nReferences\n[1] Osbert Bastani, Yani Ioannou, Leonidas Lampropoulos, Dimitrios Vytiniotis, Aditya Nori, and\n\nAntonio Criminisi. Measuring neural net robustness with constraints. May 2016.\n\n9\n\n\f[2] N D Botkin and V L Turova-Botkina. An algorithm for \ufb01nding the chebyshev center of a convex\n\npolyhedron. Appl. Math. Optim., 29(2):211\u2013222, March 1994.\n\n[3] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press,\n\nMarch 2004.\n\n[4] Chih-Hong Cheng, Georg N\u00fchrenberg, and Harald Ruess. Maximum resilience of arti\ufb01cial\n\nneural networks. April 2017.\n\n[5] Francesco Croce, Maksym Andriushchenko, and Matthias Hein. Provable robustness of relu\n\nnetworks via maximization of linear regions. arXiv preprint arXiv:1810.07481, 2018.\n\n[6] Souradeep Dutta, Susmit Jha, Sriram Sankaranarayanan, and Ashish Tiwari. Output range\nanalysis for deep feedforward neural networks. In NASA Formal Methods Symposium, pages\n121\u2013138. Springer, 2018.\n\n[7] Ruediger Ehlers. Formal veri\ufb01cation of Piece-Wise linear Feed-Forward neural networks. May\n\n2017.\n\n[8] Matteo Fischetti and Jason Jo. Deep neural networks and mixed integer linear optimization.\n\nConstraints, 23:296\u2013309, 2018.\n\n[9] LLC Gurobi Optimization. Gurobi optimizer reference manual, 2019.\n\n[10] Matthias Hein and Maksym Andriushchenko. Formal guarantees on the robustness of a classi\ufb01er\n\nagainst adversarial manipulation. May 2017.\n\n[11] Guy Katz, Clark Barrett, David Dill, Kyle Julian, and Mykel Kochenderfer. Reluplex: An\n\nef\ufb01cient SMT solver for verifying deep neural networks. February 2017.\n\n[12] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint\n\narXiv:1412.6980, 2014.\n\n[13] Alessio Lomuscio and Lalit Maganti. An approach to reachability analysis for feed-forward\n\nrelu neural networks. arXiv preprint arXiv:1706.07351, 2017.\n\n[14] Matthew Mirman, Timon Gehr, and Martin Vechev. Differentiable abstract interpretation for\nprovably robust neural networks. In Jennifer Dy and Andreas Krause, editors, Proceedings of\nthe 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine\nLearning Research, pages 3578\u20133586, Stockholmsm\u00e4ssan, Stockholm Sweden, 2018. PMLR.\n\n[15] Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certi\ufb01ed defenses against adversarial\n\nexamples. arXiv preprint arXiv:1801.09344, 2018.\n\n[16] Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certi\ufb01ed defenses against adversarial\n\nexamples. January 2018.\n\n[17] Hadi Salman, Greg Yang, Huan Zhang, Cho-Jui Hsieh, and Pengchuan Zhang. A convex\nrelaxation barrier to tight robustness veri\ufb01cation of neural networks. CoRR, abs/1902.08722,\n2019.\n\n[18] Sahil Singla and Soheil Feizi. Robustness certi\ufb01cates against adversarial examples for relu\n\nnetworks. CoRR, abs/1902.01235, 2019.\n\n[19] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Good-\n\nfellow, and Rob Fergus. Intriguing properties of neural networks. December 2013.\n\n[20] Vincent Tjeng, Kai Xiao, and Russ Tedrake. Evaluating robustness of neural networks with\n\nmixed integer programming. November 2017.\n\n[21] Yusuke Tsuzuku, Issei Sato, and Masashi Sugiyama. Lipschitz-Margin training: Scalable\n\ncerti\ufb01cation of perturbation invariance for deep neural networks. February 2018.\n\n[22] Aladin Virmaux and Kevin Scaman. Lipschitz regularity of deep neural networks: analysis and\nef\ufb01cient estimation. In Advances in Neural Information Processing Systems, pages 3835\u20133844,\n2018.\n\n10\n\n\f[23] Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. Ef\ufb01cient formal\n\nsafety analysis of neural networks. CoRR, abs/1809.08098, 2018.\n\n[24] Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. Formal security\nanalysis of neural networks using symbolic intervals. In 27th {USENIX} Security Symposium\n({USENIX} Security 18), pages 1599\u20131614, 2018.\n\n[25] Tsui-Wei Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Duane Boning,\nInderjit S Dhillon, and Luca Daniel. Towards fast computation of certi\ufb01ed robustness for relu\nnetworks. arXiv preprint arXiv:1804.09699, 2018.\n\n[26] J Zico Kolter and Eric Wong. Provable defenses against adversarial examples via the convex\n\nouter adversarial polytope. November 2017.\n\n[27] G\u00fcnter M Ziegler. Lectures on Polytopes. Graduate Texts in Mathematics. Springer-Verlag New\n\nYork, 1 edition, 1995.\n\n11\n\n\f", "award": [], "sourceid": 7849, "authors": [{"given_name": "Matt", "family_name": "Jordan", "institution": "UT Austin"}, {"given_name": "Justin", "family_name": "Lewis", "institution": "University of Texas at Austin"}, {"given_name": "Alexandros", "family_name": "Dimakis", "institution": "University of Texas, Austin"}]}