{"title": "Beyond the Single Neuron Convex Barrier for Neural Network Certification", "book": "Advances in Neural Information Processing Systems", "page_first": 15098, "page_last": 15109, "abstract": "We propose a new parametric framework, called k-ReLU, for computing precise\nand scalable convex relaxations used to certify neural networks. The key idea is to\napproximate the output of multiple ReLUs in a layer jointly instead of separately.\nThis joint relaxation captures dependencies between the inputs to different ReLUs\nin a layer and thus overcomes the convex barrier imposed by the single neuron\ntriangle relaxation and its approximations. The framework is parametric in the\nnumber of k ReLUs it considers jointly and can be combined with existing verifiers\nin order to improve their precision. Our experimental results show that k-ReLU en-\nables significantly more precise certification than existing state-of-the-art verifiers\nwhile maintaining scalability.", "full_text": "Beyond the Single Neuron Convex Barrier\n\nfor Neural Network Certi\ufb01cation\n\nGagandeep Singh1, Rupanshu Ganvir2, Markus P\u00fcschel1, Martin Vechev1\n\nDepartment of Computer Science\n\nETH Zurich, Switzerland\n\n1{gsingh,pueschel,martin.vechev}@inf.ethz.ch\n\n2rganvir@student.ethz.ch\n\nAbstract\n\nWe propose a new parametric framework, called k-ReLU, for computing precise\nand scalable convex relaxations used to certify neural networks. The key idea is to\napproximate the output of multiple ReLUs in a layer jointly instead of separately.\nThis joint relaxation captures dependencies between the inputs to different ReLUs\nin a layer and thus overcomes the convex barrier imposed by the single neuron\ntriangle relaxation and its approximations. The framework is parametric in the\nnumber of k ReLUs it considers jointly and can be combined with existing veri\ufb01ers\nin order to improve their precision. Our experimental results show that k-ReLU en-\nables signi\ufb01cantly more precise certi\ufb01cation than existing state-of-the-art veri\ufb01ers\nwhile maintaining scalability.\n\n1\n\nIntroduction\n\nNeural networks are being increasingly used in many safety critical domains including autonomous\ndriving, medical devices, and face recognition. Thus, it is important to ensure they are provably\nrobust and cannot be fooled by adversarial examples [1]: small changes to a given image that can\nfool the network into making a wrong classi\ufb01cation. To address this challenge, a range of veri\ufb01cation\ntechniques were introduced recently ranging from exact but expensive methods based on SMT solvers\n[2\u20134], mixed integer linear programming [5], and Lipschitz optimization [6] to approximate and\nincomplete, but more scalable methods based on abstract interpretation [7\u20139], duality [10, 11], semi\nde\ufb01nite [12, 13] and linear relaxations [14\u201317]. Recently, combinations of approximate methods with\nsolvers have been used for producing more precise results than approximate methods alone while\nalso being more scalable than exact methods [18, 19].\nThe key challenge any veri\ufb01cation method must address is computing the output of ReLU assignments\nwhere the input can take both positive and negative values. Exact computation must consider two paths\nper neuron, which quickly becomes infeasible due to a combinatorial explosion while approximate\nmethods trade precision for scalability via a convex relaxation of ReLU outputs.\nThe most precise convex relaxation of ReLU output is based on the convex hull of Polyhedra [20]\nwhich is practically infeasible as it requires an exponential number of convex hull computations,\neach with a worst-case exponential complexity in the number of neurons. The most common convex\nrelaxation of y1:=ReLU(x1) used in practice [17, 5] is the triangle relaxation from [3]. We note\nthat other works such as [8, 9, 14\u201316, 11] approximate this relaxation. The triangle relaxation\ncreates constraints only between y1 and x1 and is optimal in the x1y1-plane. Because of this\noptimality, recent work [17] refers to the triangle relaxation as the convex barrier, meaning the best\nconvex approximation one can obtain when processing each ReLU separately. However, the triangle\nrelaxation is not optimal when one considers multiple neurons at a time as it ignores all dependencies\nbetween x1 and any other neuron x2 in the same layer, and thus loses precision.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fx2\n\n(0, 2)\n\n(\u22122, 0)\n\n(2, 0)\nx1\n\n(0, \u22122)\n\nz = x1 + x2\n\n(0, 2, \u22122)\n\n(1, 2, \u22122)\n\ny2\n\n(2, 1, \u22122)\n\n(2, 0, \u22122)\n\ny2\n\n(0, 2, 2)\n\n(0, 0, \u22122)\n\ny1\n\ny1\n\n(0, 0, 2)\n\nz = x1 + x2\n\n(2, 0, 2)\n\n(a) Input shape\n\n(b) 1-ReLU\n\n(c) 2-ReLU\n\nFigure 1: The input space for the ReLU assignments y1 := ReLU (x1), y2 := ReLU (x2) is shown\non the left in blue. Shapes of the relaxations projected to 3D are shown on the right in red.\n\nThis work: beyond the single neuron convex barrier In this work, we address this issue by\nproposing a novel parameterized framework, called k-ReLU, for generating convex approximations\nthat consider multiple ReLUs jointly. Here, the parameter k determines how many ReLUs are\nconsidered jointly with large k resulting in more precise output. For example, unlike prior work, our\nframework can generate a convex relaxation for y1:=ReLU(x1) and y2:=ReLU(x2) that is optimal\nin the x1x2y1y2-space. We next illustrate this point with an example.\nPrecision gain with k-ReLU on an example Consider the input space of x1x2 as de\ufb01ned by the\nblue area in Fig. 1 and the ReLU operations y1:=ReLU(x1) and y2:=ReLU(x2). The input space is\nbounded by the relational constraints x2 \u2212 x1 \u2264 2, x1 \u2212 x2 \u2264 2, x1 + x2 \u2264 2 and \u2212x1 \u2212 x2 \u2264 2.\nThe relaxations produced are in a four dimensional space of x1x2y1y2. For simplicity of presentation,\nwe show the feasible shape of y1y2 as a function of z = x1 + x2.\nThe triangle relaxation from [3] is in fact a special case of our framework with k = 1, that is,\n1-ReLU. 1-ReLU independently computes two relaxations - one in the x1y1 space and the other\nin the x2y2 space. The \ufb01nal relaxation is the cartesian product of the feasible sets of the two\nindividually computed relaxations and is oblivious to any correlations between x1 and x2. The\nrelaxation adds triangle constraints {y1 \u2265 0, y1 \u2265 x1, y1 \u2264 0.5 \u00b7 x1 + 1} between x1 and y1 as well\nas {y2 \u2265 0, y2 \u2265 x2, y2 \u2264 0.5 \u00b7 x2 + 1} between x2 and y2.\nIn contrast, 2-ReLU considers the two ReLU\u2019s jointly and captures the relational constraints between\nx1 and x2. 2-ReLU computes the following relaxation:\n\n{y1 \u2265 0, y1 \u2265 x1, y2 \u2265 0, y2 \u2265 x2, 2 \u00b7 y1 + 2 \u00b7 y2 \u2212 x1 \u2212 x2 \u2264 2}\n\nThe polytope produced is shown in Fig. 1c. Note that in this case the shape of y1y2 is not independent\nof x1 + x2 as opposed to the triangle relaxation. At the same time, it is more precise than Fig. 1b for\nall values of z.\n\nMain contributions Our main contributions are:\n\n\u2022 A novel framework, called k-ReLU, that computes optimal convex relaxations for the output\nof k ReLU operations jointly. k-ReLU is generic and can be combined with existing veri\ufb01ers\nfor improved precision while maintaining scalability. Further, k-ReLU is also adaptive and\ncan be tuned to balance precision and scalability by varying k.\n\n\u2022 A method for computing approximations of the optimal relaxations for larger k, which is\n\nmore precise than simply using l < k.\n\n\u2022 An instantiation of k-ReLU with the recent DeepPoly convex relaxation [9] resulting in a\n\nveri\ufb01er called kPoly.\n\n\u2022 An evaluation showing kPoly is more precise and scalable than the state-of-the-art veri\ufb01ers\n[9, 19] on the task of certifying neural networks of up to 100K neurons against challenging\nadversarial perturbations (e.g., L\u221e balls with \u0001 = 0.3).\n\nWe note that the work of [12] computes semi de\ufb01nite relaxations that consider multiple ReLUs jointly,\nhowever these are not optimal and do not scale to the large networks used in our experiments.\n\n2\n\n\fTable 1: Volume of the output bounding box computed by kPoly on a 9 \u00d7 200 network.\n\nk\nVolume\n\n1-ReLU\n\n4.5272 \u00b7 1014\n\n2-ReLU\n5.1252 \u00b7 107\n\n3-ReLU\n2.9679 \u00b7 105\n\nPrecision gain in practice Table 1 quantitatively compares the precision of kPoly instantiated with\nthree relaxations: k = 1, k = 2, and k = 3. We measure the volume of the output bounding box\ncomputed after propagating an L\u221e-ball of radius \u0001 = 0.015 through a deep, fully connected MNIST\nnetwork with 9 layers containing 200 neurons each. We can observe that the volume of the output\nfrom 3-ReLU and 2-ReLU is respectively 9 and 7 orders of magnitude smaller than from 1-ReLU.\nWe note that the networks we consider, as for example the 9 \u00d7 200 network above, are especially\nchallenging for state-of-the-art veri\ufb01ers as these methods either lose unnecessary precision [8, 9, 14,\n15, 19, 16, 17] or simply do not scale [5, 18, 12, 4, 13, 10].\nFinally, we remark that while we consider robustness certi\ufb01cation against norm based perturbations in\nour evaluation, our framework can also be used for precise and scalable veri\ufb01cation of other network\nsafety properties such as stability [21] or robustness against geometric and semantic perturbations\n[9, 22, 23].\n\n2 Overview of k-ReLU\n\nWe now show, on a simple example, that the k-ReLU concept can be used to improve the results of\nstate-of-the-art veri\ufb01ers. In particular, we illustrate how the output of our veri\ufb01er kPoly instantiated\nwith 1-ReLU is re\ufb01ned by instantiating it with 2-ReLU. This is possible as the 2-ReLU relaxation\ncan capture extra relationships between neurons that 1-ReLU inherently cannot.\nConsider the simple feedforward neural network with ReLU activations shown in Fig. 2. The network\nhas two inputs each taking values independently in the range [\u22121, 1], one hidden layer and one output\nlayer each containing two neurons. For simplicity, we split each layer into two parts: one for the\naf\ufb01ne transformation and the other for the ReLU. The weights of the af\ufb01ne transformation are shown\non the arrows and the biases are above or below the respective neuron. The goal is to verify that\nx9 \u2264 4 holds for the output x9 with respect to all inputs.\nWe \ufb01rst show that 1-ReLU instantiated with the state-of-the-art DeepPoly [9] relaxation fails to verify\nthe property. DeepPoly, described formally in Section 4, associates two pairs of lower and upper\n\u2264\nj aj \u00b7 xj + c\nbounds with each neuron xi: (a\ni , a\nwhere c, li, ui,\u2208 R \u222a {\u2212\u221e, +\u221e} and aj \u2208 R. The bounds computed by the veri\ufb01er using this\ninstantiation are shown as annotations in Fig. 2.\nFirst Layer The veri\ufb01er starts by computing the bounds for x1 and x2 which are simply taken from\nthe input speci\ufb01cation resulting in:\n\ni have the form(cid:80)\n\n\u2265\n\u2264\n\u2265\ni ) and (li, ui). Here, a\ni and a\n\nx1 \u2265 \u22121, x1 \u2264 1, l1 = \u22121, u1 = 1, and x2 \u2265 \u22121, x2 \u2264 1, l2 = \u22121, u2 = 1.\n\nSecond Layer Next, the af\ufb01ne assignments x3 := x1 +x2 and x4 := x1\u2212x2 are handled. DeepPoly\nhandles af\ufb01ne transformations exactly and thus no precision is lost. The af\ufb01ne transformation results\nin the following bounds for x3 and x4:\n\nx3 \u2265 x1 + x2, x3 \u2264 x1 + x2, l3 = \u22122, u3 = 2,\nx4 \u2265 x1 \u2212 x2, x4 \u2264 x1 \u2212 x2, l4 = \u22122, u4 = 2.\n\nDeepPoly can precisely handle ReLU assignments when the input neuron takes only positive or\nnegative values, otherwise it loses precision. Since x3 and x4 can take both positive and negative\nvalues, an approximation based on the triangle relaxation is applied which for x5 yields:\n\n(1)\nNote that DeepPoly discards the other lower bound x5 \u2265 x3 from the triangle relaxation. The lower\nbound l5 is set to 0 and the relation x3 \u2264 x1 + x2 is substituted for x3 in (1) for computing the upper\nbound which yields l5 = 0, u5 = 2. Analogously, for x6 we obtain:\n\nx5 \u2265 0,\n\nx5 \u2264 1 + 0.5 \u00b7 x3.\n\nx6 \u2265 0,\n\nx6 \u2264 1 + 0.5 \u00b7 x4,\n\nl6 = 0,\n\nu6 = 2.\n\n(2)\n\n3\n\n\fx1 \u2265 \u22121\nx1 \u2264 1\nl1 = \u22121\nu1 = 1\n\n[-1,1]\n\nx1\n\nx2\n\n[-1,1]\n\nx2 \u2265 \u22121\nx2 \u2264 1\nl2 = \u22121\nu2 = 1\n\n1\n\n1\n\n1\n\n-1\n\n1-ReLU\n\n2-ReLU\n\nmax(0, x3)\n\nmax(0, x4)\n\nx5\n\nx6\n\n0\n\n2\n\n1\n\n1\n\nx3\n\nx4\n\n0\n\n0\n\nx7\n\nx8\n\n1.5\n\nmax(0, x7)\n\nmax(0, x8)\n\nx3 \u2265 x1 + x2\nx3 \u2264 x1 + x2\nl3 = \u22122\nu3 = 2\n0\n\nx5 \u2265 0\nx5 \u2264 1 + 0.5 \u00b7 x3\nl5 = 0\nu5 = 2\n\nx7 \u2265 x5 + 2 \u00b7 x6\nx7 \u2264 x5 + 2 \u00b7 x6\nl7 = 0\nu7 = 5\n\nx4 \u2265 x1 \u2212 x2\nx4 \u2264 x1 \u2212 x2\nl4 = \u22122\nu4 = 2\n\nx6 \u2265 0\nx6 \u2264 1 + 0.5 \u00b7 x4\nl6 = 0\nu6 = 2\n\nx8 \u2265 x6 + 1.5\nx8 \u2264 x6 + 1.5\nl8 = 1.5\nu8 = 3.5\n\n2 \u00b7 x5 + 2 \u00b7 x6 \u2212 x3 \u2212 x4 \u2264 2\n\nx7 \u2264 4,\n\nx3 + x4 \u2264 2,\nx3 \u2212 x4 \u2264 2,\nx4 \u2212 x3 \u2264 2,\n\u2212x3 \u2212 x4 \u2264 2\n\nx9 \u2265 x7\nx9 \u2264 x7\nl9 = 0\nu9 = 5\n\nx9\n\nx10\n\nx10 \u2265 x8\nx10 \u2264 x8\nl10 = 1.5\nu10 = 3.5\n\nx9 \u2264 4\n\nFigure 2: Veri\ufb01cation of property x9 \u2264 2. Re\ufb01ning DeepPoly with 1-ReLU fails to prove the property\nwhereas 2-ReLU adds extra constraints (in green) that help in verifying the property.\n\nThird Layer Next, the af\ufb01ne assignments x7 := x5 + 2x6 and x8 := x6 + 1.5 are handled.\nDeepPoly adds the constraints:\n\nx7 \u2265 x5 + 2 \u00b7 x6,\n\nx7 \u2264 x5 + 2 \u00b7 x6,\n\nx8 \u2265 x6 + 1.5,\n\nx8 \u2264 x6 + 1.5,\n\n(3)\n\nTo compute the upper and lower bounds for x7 and x8, DeepPoly substitutes the polyhedral constraints\nfor x5 and x6 from (1) and (2) in (3). It again substitutes for the constraints for x5 and x6 in terms of\nx3 and x4 and iterates until it reaches the input layer where it substitutes the concrete bounds for x1\nand x2. Doing so yields l7 = 0, u7 = 5 and l8 = 1.5, u8 = 3.5.\nRe\ufb01nement with 1-ReLU fails Because DeepPoly discards one of the lower bounds from the\ntriangle relaxations for the ReLU assignments in the previous layer, it is possible to re\ufb01ne lower and\nupper bounds for x7 and x8 by encoding the network upto the \ufb01nal af\ufb01ne transformation using the\nrelatively tighter ReLU relaxations based on the triangle formulation and then computing bounds for\nx7 and x8 with respect to this formulation. However, this does not improve bounds and still yields\nl7 = 0, u7 = 5, l8 = 1.5, u8 = 3.5.\nAs the lower bounds for both x7 and x8 are non-negative, the DeepPoly ReLU approximation simply\npropagates x7 and x8 to the output layer. The \ufb01nal output is thus:\n\nx9 \u2265 x7,\nx10 \u2265 x8,\n\nx9 \u2264 x7,\nx10 \u2264 x8,\n\nl9 = 0,\nl10 = 1.5,\n\nu9 = 5,\nu10 = 3.5.\n\nBecause the upper bound is u9 = 5, the veri\ufb01er fails to prove the property that x9 \u2264 4.\n\nRe\ufb01nement with 2-ReLU veri\ufb01es the property Now we consider re\ufb01nement with our 2-ReLU\nrelaxation which considers the two ReLU assignments x5 := ReLU (x3) and x6 := ReLU (x4)\njointly. Besides the box constraints for x3 and x4, it also considers the constraints x3 + x4 \u2264\n2, x3 \u2212 x4 \u2264 \u22122,\u2212x3 \u2212 x4 \u2264 2, x4 \u2212 x3 \u2264 2 for computing the output of ReLU. The ReLU output\ncontains the extra constraint 2 \u00b7 x5 + 2 \u00b7 x6 \u2212 x3 \u2212 x4 \u2264 2 that 1-ReLU cannot capture. We again\nencode the network upto the \ufb01nal af\ufb01ne transformation with the tighter ReLU relaxations obtained\nusing 2-ReLU and re\ufb01ne the bounds for x7, x8. Now, we obtain better upper bounds as u7 = 4. The\nbetter bound for u7 is then propagated to u9 and is suf\ufb01cient for proving the desired property.\nWe remark that while in this work we instantiate the k-ReLU concept with the DeepPoly relaxation,\nthe idea can be applied to other relaxation [11, 7\u201310, 12, 14, 15, 17, 18].\n\n4\n\n\f3 k-ReLU relaxation framework\n\nIn this section we formally describe our k-ReLU framework for generating optimal convex relax-\nations in the input-output space for k ReLU operations jointly. In the next section, we discuss the\ninstantiation of our framework with existing veri\ufb01ers which enables more precise results.\nWe consider a ReLU based feedforward, convolutional or residual neural network with h neurons\nfrom a set H (that is h = |H|) and a bounded input region I \u2286 Rm where m < h is the number\nof neural network inputs. In our exposition, we treat the af\ufb01ne transformation and the ReLUs as\nseparate layers. We consider a convex approximation method M that processes network layers in\nsequence from the input to the output layer passing the output of predecessor layers as input to the\nsuccessor layers. Let S \u2286 Rh be a convex set computed via M approximating the set of values that\nneurons upto layer l-1 can take with respect to I and B \u2287 S be the smallest bounding box around S.\nWe use Conv(S1,S2) and S1 \u2229 S2 to denote the convex hull and the intersection of convex sets S1\nand S2, respectively.\nLet X ,Y \u2286 H be respectively the set of input and output neurons in the l-th layer consisting of n\nReLU assignments of the form yi:=ReLU(xi) where xi \u2208 X and yi \u2208 Y. In the general case, each\ninput neuron xi takes on both positive and negative values in S. We de\ufb01ne the polyhedra induced by\nthe two branches of each ReLU assignment yi:=ReLU(xi) as C+\ni = {xi \u2265 0, yi = xi} \u2286 Rh and\nC\u2212\n| s \u2208 J \u2192 {\u2212, +}} (where J \u2286 [n]}) be\ni\u2208J C s(i)\nthe set of polyhedra Q \u2286 Rh constructed by the intersection of polyhedra Ci \u2286 Rh for neurons xi, yi\nindexed by the set J such that each Ci \u2208 {C+\ni }. We next formulate the best convex relaxation of\nthe output after n ReLU assignments.\n\ni = {xi \u2264 0, yi = 0} \u2286 Rh. Let QJ = {(cid:84)\n\ni ,C\u2212\n\ni\n\n3.1 Best convex relaxation\n\nThe best convex relaxation after the n ReLU assignments is given by\n\n(4)\nSbest considers all n assignments jointly. Computing it is practically infeasible as it involves computing\n2n convex hulls each of which has exponential cost in the number of neurons h [24].\n\nSbest = ConvQ\u2208Q[n](Q \u2229 S).\n\n3.2\n\n1-ReLU\n\nWe now describe the prior convex relaxation [3] through triangles (here called 1-ReLU) that handles\nthe n ReLU assignments separately. Here, the input to the i-th assignment yi:=ReLU(xi) is the\npolyhedron P1-ReLU \u2287 S where for each xi \u2208 X , P1-ReLU,i contains only an interval constraint [li, ui]\nthat bounds xi, that is, li \u2264 xi \u2264 ui. Here, the interval bounds are simply obtained from the bounding\nbox B of S. The output of this method after n assignments is\nConv(P1-ReLU,i \u2229 C+\n\nS1-ReLU = S \u2229 n(cid:92)\n\ni , P1-ReLU,i \u2229 C\u2212\ni ).\n\n(5)\n\ni=1\n\ni , P1-ReLU,i \u2229 C\u2212\n\nThe projection of Conv(P1-ReLU,i \u2229 C+\ni ) onto the xiyi-plane is a triangle minimizing\nthe area and is the optimal convex relaxation in this plane. However, because the input polyhedron\nP1-ReLU is a hyperrectangle (when projected to X ), it does not capture relational constraints between\ndifferent xi\u2019s in X (meaning it typically has to substantially over-approximate the set S). Thus, as\nexpected, the computed result S1-ReLU of the 1-ReLU method will incur signi\ufb01cant imprecision when\ncompared with the Sbest result.\n\n3.3 k-ReLU relaxations\n\nWe now describe our k-ReLU framework for computing a convex relaxation of the output of n ReLUs\nin one layer by considering groups of k ReLUs jointly with k > 1. For simplicity, we assume that\nn > k and k divides n. Let J be a partition of the set of indices [n] such that each block Ji \u2208 J\ncontains exactly k indices. Let Pk-ReLU,i \u2286 Rh be a polyhedron containing interval and relational\nconstraints over the neurons from X indexed by Ji. In our framework, Pk-ReLU,i is derived via B and\nS and satis\ufb01es S \u2286 Pk-ReLU,i.\n\n5\n\n\f(S \u2229(cid:84)n/k\n\ni=1 Ki) as per (6)\n\nPk-ReLU,i \u2229 Q\n\nfor each\nQ \u2208 QJi\n\nconvex\n\nhull\n\nfor each\nPk-ReLU,i\n\nPk-ReLU,i\n\n{Pk-ReLU,i}\n{Pk-ReLU,i}\n\nConvex hull for Ji\nDenoted by Ki\n2x1 + x2 + x3 \u2212 y1 \u2264 0\ny2 + x2 \u2212 x3 \u2264 \u22121\ny3 \u2212 x1 + x3 \u2264 1\n\n...\n...\n\nFigure 3: Steps to instantiating the k-ReLU framework.\n\nConvex set S via M=\n\nSDP [12, 13]\n\nAbstract\n\nInterpretation[7\u20139]\nLinear relaxations\n[14, 15, 17, 18]\nDuality [10, 11]\nPartition J of [n]\n\nn/k(cid:92)\n\nOur k-ReLU framework produces the following convex relaxation of the output:\n\nSk-ReLU = S \u2229\n\nConvQ\u2208QJi\n\n(Pk-ReLU,i \u2229 Q).\n\n(6)\n\ni=1\n\nThe result of (6) is the optimal convex relaxation for the output of n ReLUs for the given choice of\nS, k,J , and Pk-ReLU,i.\nTheorem 3.1. For k > 1 and a partition J of indices, if there exists a Ji for which Pk-ReLU,i (cid:36)\nu\u2208Ji\n\nP1-ReLU,u holds, then Sk-ReLU (cid:36) S1-ReLU.\n\n(cid:84)\n\nThe proof of Theorem 3.1 is given in appendix. Note that P1-ReLU only contains interval constraints\nwhereas Pk-ReLU contains both, the same interval constraints and extra relational constraints. Thus,\nany convex relaxation obtained using k-ReLU is typically strictly more precise than a 1-ReLU one.\nPrecise and scalable relaxations for large k For each Ji, the optimal convex relaxation Ki =\n(Pk-ReLU,i \u2229 Q) from (6) requires computing the convex hull of 2k convex sets each of\nConvQ\u2208QJi\nwhich has a worst-case exponential cost in terms of k. Thus, computing Ki via (6) can become\ncomputationally expensive for large values of k. We propose an ef\ufb01cient relaxation K(cid:48)\ni for each block\nJi \u2208 J (where |Ji|= k as described earlier) based on computing relaxations for all subsets of Ji\nthat are of size 2 \u2264 l < k. Let Ri = {{j1, . . . , jl} | j1, . . . , jl \u2208 Ji} be the set containing all subsets\nof Ji containing l indices. For each R \u2208 Ri, let P (cid:48)\nl-ReLU,R \u2286 Rh be a polyhedron containing interval\nand relational constraints between the neurons from X indexed by R with S \u2286 P (cid:48)\nThe relaxation K(cid:48)\n\n(cid:1) times as:\n\nl-ReLU,R.\n\ni is computed by applying l-ReLU(cid:0)k\n(cid:92)\nk-ReLU = S \u2229(cid:84)n/k\n\nConvQ\u2208QR (P (cid:48)\n\nK(cid:48)\ni =\n\nR\u2208Ri\n\nl\n\nl-ReLU,R \u2229 Q).\n\n(7)\n\ni=1 K(cid:48)\nThe layerwise convex relaxation S(cid:48)\ni via (7) is tighter than computing relaxation\ni \u2208 J (cid:48) there exists Rj corresponding to a\nSl-ReLU via (6) with a partition J (cid:48) where for each block J (cid:48)\nblock of J such that J (cid:48)\n\u2286 Pl-ReLU,J (cid:48)\nis the polyhedron in (6)\nwhere Pl-ReLU,J (cid:48)\nfor computing Sl-ReLU. In our instantiations, we ensure that this condition holds for gaining precision.\n\ni \u2208 Rj and P (cid:48)\n\nl-ReLU,J (cid:48)\n\ni\n\ni\n\ni\n\n4\n\nInstantiating the k-ReLU framework\n\nOur k-ReLU framework from Section 3 can be instantiated to produce different relaxations depending\non the parameters S, k, J , and Pk-ReLU,i. Fig. 3 shows the steps to instantiating our framework. The\ninputs to the framework is the convex set S and the partition J based on k. These inputs are \ufb01rst used\nto produce a set containing n/k polyhedra {Pk-ReLU,i}. Each polyhedron Pk-ReLU,i is then intersected\nwith polyhedra from the set QJi producing 2k polyhedra which are then combined via the convex\nhull (each called Ki). The Ki\u2019s are then combined with S to produce the \ufb01nal relaxation that captures\nthe values which neurons can take after the ReLU assignments. This relaxation is tighter than that\nproduced by applying M directly on the ReLU layer enabling precision gains.\n\n6\n\n\fyi\n\nyi\n\nli\n\nl(cid:48)\n\ni\n\nu(cid:48)\n\ni\n\nui\n\nxi\n\nli\n\nl(cid:48)\n\ni\n\nu(cid:48)\n\ni\n\nui\n\nxi\n\n(a)\n\n(b)\n\nFigure 4: DeepPoly relaxations for yi:=ReLU(xi) using the original bounds li, ui (in blue) and the\nre\ufb01ned bounds l(cid:48)\ni (in green) for xi. The re\ufb01ned relaxations have smaller area in the xiyi-plane.\n\ni, u(cid:48)\n\n4.1 Computing key parameters\nWe next describe the choice of the key parameters S, k,J , Pk-ReLU,i in our framework.\nInput convex set Examples of convex approximation method M for computing S include [11, 7\u2013\n10, 12, 14, 15, 17, 18]. In this paper, we use the DeepPoly [9] relaxation for computing S which is a\nstate-of-the-art precise and scalable veri\ufb01er for neural networks.\nk and partition J We use (6) to compute the output relaxation when k \u2208 {2, 3}. For larger k, we\ncompute the output based on (7). To maximize precision gain, we group those indices i together into\na block where the triangle relaxation for yi:=ReLU(xi) has the larger area in the xiyi-plane.\nComputing Pkrelu,i We note that for a \ufb01xed block Ji, several polyhedron Pk-ReLU,i are possible that\nproduce convex relaxations with varying degree of precision. Ideally, one would like Pk-ReLU,i to be\nthe projection of S onto the variables in the set X indexed by the block Ji. However, computing this\nprojection exactly is expensive and therefore we compute a relaxation of it.\nWe use the method M to compute Pk-ReLU,i by computing the upper bounds for linear relational\nu=1 au \u00b7 xu with respect to S. In our experiments, we found that setting\nau \u2208 {\u22121, 0, 1} yields maximum precision (except the case where all possible au are zero). Thus\nPk-ReLU,i \u2287 S contains 3k \u2212 1 constraints which include the interval constraints for all xu.\n\nexpressions of the form(cid:80)k\n\n4.2 Veri\ufb01cation with k-ReLU framework\nLet \u03c8 \u2286 Rh be a convex set de\ufb01ning a safe region for the outputs with respect to the input region I\nand SO \u2286 Rh be the output convex relaxation obtained after processing af\ufb01ne layers with the convex\napproximation method M and ReLU layers with our k-ReLU framework. \u03c8 holds if \u03c8 \u2286 SO.\n\n4.3\n\nInstantiation with DeepPoly\n\n\u2265\n\n\u2264\ni , a\n\ni are of the form(cid:80)\n\nWe now show to instantiate the k-ReLU framework with DeepPoly [9]. DeepPoly is a type of a\nrestricted Polyhedra abstraction which balances scalability and precision of the analysis. It associates\n\u2264\nfour constraints per neuron hi \u2208 H: (a) a lower polyhedral constraint of the form a\ni \u2264 hi, (b) an\n\u2265\ni , (c) a lower bound constraint li \u2264 hi, and (d) an upper bound\nupper polyhedral constraint hi \u2264 a\nconstraint hi \u2264 ui. The polyhedral expressions a\nj aj \u00b7hj +c where aj, c \u2208 R\nand capture relational information ensuring that DeepPoly is exact for af\ufb01ne transformations. The\nanalysis proceeds layer by layer and thus the polyhedral constraints for a neuron in layer l contain\nonly the neurons upto layer l-1. S here is the set of points satisfying DeepPoly constraints for all\nneurons. We next discuss how the k-ReLU framework can be used for improving the precision of the\nReLU transformer for DeepPoly and also that of the overall veri\ufb01cation procedure.\nImproving the precision of DeepPoly ReLU relaxation DeepPoly loses precision for ReLU as-\nsignments yi:=ReLU(xi) where xi can take both positive and negative values. It computes convex\nrelaxations shown in Fig. 4 (a) and (b). It keeps the one with smaller area in the xiyi-plane.We\nnote that both of these relaxations depend only on the interval bounds li, ui for xi. DeepPoly uses\nbacksubstitution (see [9] for details) for obtaining precise bounds li, ui. We note that DeepPoly\n\n7\n\n\fTable 2: Neural network architectures and parameters used in our experiments.\n\nDataset\n\nModel\n\nType\n\n#Neurons\n\n#Layers\n\nDefense Re\ufb01ne\nReLU\n\nk\n\nMNIST\n\n6 \u00d7 100\n9 \u00d7 100\n6 \u00d7 200\n9 \u00d7 200\nConvSmall\nConvBig\n\nCIFAR10 ConvSmall\n\nConvBig\nResNet\n\nfully connected\nfully connected\nfully connected\nfully connected\nconvolutional\nconvolutional\n\nconvolutional\nconvolutional\nResidual\n\n610\n910\n1 210\n1 810\n3 604\n34 688\n\n4 852\n62,464\n107,496\n\nNone\n6\nNone\n9\nNone\n6\nNone\n9\nNone\n3\n6 DiffAI [29]\n\nPGD [30]\n3\n6\nPGD [30]\n13 Wong [11]\n\n3\n\u0013\n2\n\u0013\n2\n\u0013\n2\n\u0013\n\u0017 Adaptive\n5\n\u0017\n\n\u0017 Adaptive\n5\n\u0017\n\u0017 Adaptive\n\ni, u(cid:48)\n\nReLU relaxations are weaker than the 1-ReLU relaxation (as they contain one constraint less than the\ntriangle). This precision loss accumulates as the analysis proceeds deeper in the network. We now\nshow how the k-ReLU framework can recover precision for DeepPoly.\nWe compute re\ufb01ned bounds l(cid:48)\ni for those neurons xi that are inputs to a ReLU and can take positive\nvalues. We maximize and minimize xi with respect to the convex relaxation produced by replacing\nthe DeepPoly ReLU relaxation (Fig. 4 (a) and (b)) with our k-ReLU relaxations based on (6). Since\nthe constraints from both DeepPoly and k-ReLU are linear, we use a LP solver for maximizing and\nminimizing. l(cid:48)\ni facilitate the two DeepPoly ReLU relaxations shown in green in Fig. 4 (a) and\n(b). These relaxations are tighter than the original ones and improve the precision of DeepPoly.\nk-ReLU for improving robustness certi\ufb01cation When DeepPoly alone cannot prove the target\nproperty, we instead check if \u03c8 holds with the tighter ReLU relaxations from k-ReLU via LP solver.\n\ni and u(cid:48)\n\n5 Evaluation\n\nWe instantiated our k-ReLU framework with DeepPoly in the form of a veri\ufb01er called kPoly. kPoly\nis written in Python and uses cdd [25, 26] for computing convex hulls, and Gurobi [27] for re\ufb01ning\nDeepPoly ReLU relaxations and proving that \u03c8 holds with k-ReLU relaxations. We made kPoly\npublicly available as part of the ERAN [28] framework for neural network veri\ufb01cation. We evaluated\nkPoly for the task of robustness certi\ufb01cation of challenging deep neural networks. We compare kPoly\nagainst two state-of-the-art veri\ufb01ers: DeepPoly [9] and Re\ufb01neZono [19]. DeepPoly has the same\nprecision as [15, 16] whereas Re\ufb01neZono re\ufb01nes the results of DeepZ [8] and is more precise than\n[8, 11, 14]. Both, DeepPoly and Re\ufb01neZono are more scalable than [5, 18, 12, 4, 13, 10], however we\nshow that kPoly is more precise than DeepPoly and Re\ufb01neZono while also scaling to large networks.\nWe next describe the neural networks, benchmarks and parameters used in our experiments.\nNeural networks We used 9 MNIST [31] and CIFAR10 [32] fully connected (FNNs), convolu-\ntional (CNNs), and residual networks with ReLU activations shown in Table 2. The \ufb01rst 8 net-\nworks in Table 2 are available at https://github.com/eth-sri/eran; the residual network is taken from\nhttps://github.com/locuslab/convex_adversarial. Five of the networks do not use adversarial training\nwhile the rest use different variants of it. The MNIST ConvBig network is trained with DiffAI [29],\nthe two CIFAR10 convolutional networks are trained with PGD [30] and the residual network is\ntrained via [11]. The largest network in our experiments contains > 100K neurons and has 13 layers.\nRobustness property We consider the L\u221e-norm [33] based adversarial region around a correctly\nclassi\ufb01ed image from the test set parameterized by the radius \u0001 \u2208 R. Our goal is to certify that the\nnetwork classi\ufb01es all images in the adversarial region correctly.\nMachines The runtimes of all experiments for the MNIST FNNs were measured on a 3.3 GHz 10\nCore Intel i9-7900X Skylake CPU with a main memory of 64 GB whereas the experiments for the\nrest were run on a 2.6 GHz 14 core Intel Xeon CPU E5-2690 with 512 GB of main memory.\n\n8\n\n\fTable 3: Number of veri\ufb01ed adversarial regions and runtime of kPoly vs. DeepPoly and Re\ufb01neZono.\n\nDataset\n\nModel\n\n#correct\n\n\u0001\n\nDeepPoly[9]\n\nRe\ufb01neZono [19]\n\nkPoly\n\nMNIST\n\n6 \u00d7 100\n9 \u00d7 100\n6 \u00d7 200\n9 \u00d7 200\nConvSmall\nConvBig\n\nCIFAR10 ConvSmall\n\nConvBig\nResNet\n\n960\n947\n972\n950\n980\n929\n\n630\n631\n290\n\n0.026\n0.026\n0.015\n0.015\n0.12\n0.3\n\n2/255\n2/255\n8/255\n\nveri\ufb01ed(#)\n\ntime(s)\n\nveri\ufb01ed(#)\n\ntime(s)\n\nveri\ufb01ed(#)\n\ntime(s)\n\n160\n182\n292\n259\n158\n711\n\n359\n421\n243\n\n0.3\n0.4\n0.5\n0.9\n3\n21\n\n4\n43\n12\n\n312\n304\n341\n316\n179\n648\n\n347\n305\n243\n\n310\n411\n570\n860\n707\n285\n\n716\n592\n27\n\n441\n369\n574\n506\n347\n736\n\n399\n459\n245\n\n307\n171\n187\n464\n477\n40\n\n86\n346\n91\n\nBenchmarks For each MNIST and CIFAR10 network, we selected the \ufb01rst 1000 images from the\nrespective test set and \ufb01ltered out incorrectly classi\ufb01ed images. The number of correctly classi\ufb01ed\nimages by each network are shown in Table 3. We chose challenging \u0001 values for de\ufb01ning the\nadversarial region for each network. We note that our benchmarks (e.g., the 9 \u00d7 200 network with\n\u0001 = 0.015) are quite challenging to handle for state-of-the-art veri\ufb01ers (as we will see below).\nk-ReLU parameters for the experiments We re\ufb01ne both the DeepPoly ReLU relaxation and the\nveri\ufb01cation results for the MNIST FNNs whereas only the veri\ufb01cation results are re\ufb01ned for the rest.\nAll neurons that can take positive values after the af\ufb01ne transformation are selected for re\ufb01nement.\nAs an optimization, we use the MILP ReLU encoding from [5] when re\ufb01ning the ReLU relaxation\nfor the second ReLU layer. The last column of Table 2 shows the value of k for all networks. For the\nMNIST and CIFAR10 ConvBig networks, we encode the \ufb01rst 3 ReLU layers with 1-ReLU while the\nremaining are encoded with 5-ReLU. We use l = 3 in (7) for encoding 5-ReLU. For the remaining\n3 networks, we encode the \ufb01rst ReLU layer with 1-ReLU while the remaining layers are encoded\nadaptively. Here, we choose a value of k for which the total number of calls to 3-ReLU is \u2264 500. We\nnext discuss our experimental results shown in Table 3.\nkPoly vs DeepPoly and Re\ufb01neZono Table 3 compares the precision in number of adversarial regions\nveri\ufb01ed and the average runtime per image in seconds for kPoly, DeepPoly and Re\ufb01neZono. We\nre\ufb01ne the veri\ufb01cation results with Re\ufb01neZono and kPoly only when DeepZ and DeepPoly fails to\nverify. kPoly is more precise than both DeepPoly and Re\ufb01neZono on all networks. Re\ufb01neZono is\nmore precise than DeepPoly on the networks trained without adversarial training. On the 9 \u00d7 200 and\nMNIST ConvSmall networks, kPoly veri\ufb01es 506 and 347 regions respectively whereas Re\ufb01neZono\nveri\ufb01es 316 and 179 regions respectively. The precision gain is less on networks with adversarial\ntraining and kPoly veri\ufb01es 25, 40, 38 and 2 regions more than DeepPoly on the last 4 networks in\nTable 3. kPoly is faster than Re\ufb01neZono on all networks and has an average runtime of < 8 minutes.\nThe largest runtimes are on the MNIST 9 \u00d7 200 and ConvSmall networks. These are quite small\ncompared to the CIFAR10 ResNet network where kPoly has an average runtime of only 91 seconds.\n1-ReLU vs k-ReLU We consider the \ufb01rst 100 regions for the MNIST ConvSmall network and\ncompare the number of regions veri\ufb01ed by kPoly when run with k-ReLU and 1-ReLU. We note that\nkPoly run with 1-ReLU is equivalent to [17]. kPoly with 1-ReLU veri\ufb01es 20 regions whereas with\nk-ReLU it veri\ufb01es 35. kPoly with 1-ReLU has an average runtime of 9 seconds.\nEffect of heuristic for J We ran kPoly based on k-ReLU with random partitioning Jr using the\nsame setup as for 1-ReLU. We observed that kPoly produced worse bounds and veri\ufb01ed 34 regions.\n\n6 Conclusion\n\nWe presented k-ReLU, a novel parametric framework which produces more precise results than the\nsingle neuron triangle convex relaxation. The key idea of k-ReLU is to consider multiple ReLUs\njointly. We showed k-ReLU leads to signi\ufb01cantly improved precision, enabling us to prove properties\nbeyond the reach of prior work, while preserving scalability.\n\n9\n\n\fReferences\n[1] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus,\n\n\u201cIntriguing properties of neural networks,\u201d arXiv preprint arXiv:1312.6199, 2013.\n\n[2] G. Katz, C. W. Barrett, D. L. Dill, K. Julian, and M. J. Kochenderfer, \u201cReluplex: An ef\ufb01cient\nSMT solver for verifying deep neural networks,\u201d in Computer Aided Veri\ufb01cation - 29th Inter-\nnational Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I,\n2017.\n\n[3] R. Ehlers, \u201cFormal veri\ufb01cation of piece-wise linear feed-forward neural networks,\u201d in Automated\n\nTechnology for Veri\ufb01cation and Analysis (ATVA), 2017.\n\n[4] R. Bunel, I. Turkaslan, P. H. Torr, P. Kohli, and M. P. Kumar, \u201cA uni\ufb01ed view of piecewise\nlinear neural network veri\ufb01cation,\u201d in Proc. Advances in Neural Information Processing Systems\n(NeurIPS), 2018, pp. 4795\u20134804.\n\n[5] V. Tjeng, K. Y. Xiao, and R. Tedrake, \u201cEvaluating robustness of neural networks with mixed\ninteger programming,\u201d in International Conference on Learning Representations, (ICLR), 2019.\n\n[6] W. Ruan, X. Huang, and M. Kwiatkowska, \u201cReachability analysis of deep neural networks with\nprovable guarantees,\u201d in Proc. International Joint Conference on Arti\ufb01cial Intelligence, (IJCAI),\n2018.\n\n[7] T. Gehr, M. Mirman, D. Drachsler-Cohen, P. Tsankov, S. Chaudhuri, and M. Vechev, \u201cAI2:\nSafety and robustness certi\ufb01cation of neural networks with abstract interpretation,\u201d in Proc.\nIEEE Symposium on Security and Privacy (SP), vol. 00, 2018, pp. 948\u2013963.\n\n[8] G. Singh, T. Gehr, M. Mirman, M. P\u00fcschel, and M. Vechev, \u201cFast and effective robustness\ncerti\ufb01cation,\u201d in Proc. Advances in Neural Information Processing Systems (NeurIPS), 2018,\npp. 10 825\u201310 836.\n\n[9] G. Singh, T. Gehr, M. P\u00fcschel, and M. Vechev, \u201cAn abstract domain for certifying neural\n\nnetworks,\u201d Proc. ACM Program. Lang., vol. 3, no. POPL, pp. 41:1\u201341:30, 2019.\n\n[10] K. Dvijotham, R. Stanforth, S. Gowal, T. Mann, and P. Kohli, \u201cA dual approach to scalable\nveri\ufb01cation of deep networks,\u201d in Proc. Uncertainty in Arti\ufb01cial Intelligence (UAI), 2018, pp.\n162\u2013171.\n\n[11] E. Wong and J. Z. Kolter, \u201cProvable defenses against adversarial examples via the convex outer\n\nadversarial polytope,\u201d arXiv preprint arXiv:1711.00851, 2017.\n\n[12] A. Raghunathan, J. Steinhardt, and P. S. Liang, \u201cSemide\ufb01nite relaxations for certifying robust-\nness to adversarial examples,\u201d in Advances in Neural Information Processing Systems (NeurIPS),\n2018, pp. 10 877\u201310 887.\n\n[13] K. D. Dvijotham, R. Stanforth, S. Gowal, C. Qin, S. De, and P. Kohli, \u201cEf\ufb01cient neural network\nveri\ufb01cation with exactness characterization,\u201d in Proc. Uncertainty in Arti\ufb01cial Intelligence, UAI,\n2019, p. 164.\n\n[14] L. Weng, H. Zhang, H. Chen, Z. Song, C.-J. Hsieh, L. Daniel, D. Boning, and I. Dhillon,\n\u201cTowards fast computation of certi\ufb01ed robustness for ReLU networks,\u201d in Proc. International\nConference on Machine Learning (ICML), vol. 80, 2018, pp. 5276\u20135285.\n\n[15] H. Zhang, T.-W. Weng, P.-Y. Chen, C.-J. Hsieh, and L. Daniel, \u201cEf\ufb01cient neural network robust-\nness certi\ufb01cation with general activation functions,\u201d in Proc. Advances in Neural Information\nProcessing Systems (NeurIPS), 2018.\n\n[16] A. Boopathy, T.-W. Weng, P.-Y. Chen, S. Liu, and L. Daniel, \u201cCnn-cert: An ef\ufb01cient framework\nfor certifying robustness of convolutional neural networks,\u201d in AAAI Conference on Arti\ufb01cial\nIntelligence (AAAI), Jan 2019.\n\n[17] H. Salman, G. Yang, H. Zhang, C. Hsieh, and P. Zhang, \u201cA convex relaxation barrier to tight\n\nrobustness veri\ufb01cation of neural networks,\u201d CoRR, vol. abs/1902.08722, 2019.\n\n10\n\n\f[18] S. Wang, K. Pei, J. Whitehouse, J. Yang, and S. Jana, \u201cEf\ufb01cient formal safety analysis of neural\nnetworks,\u201d in Proc. Advances in Neural Information Processing Systems (NeurIPS), 2018, pp.\n6369\u20136379.\n\n[19] G. Singh, T. Gehr, M. P\u00fcschel, and M. Vechev, \u201cBoosting robustness certi\ufb01cation of neural\n\nnetworks,\u201d in International Conference on Learning Representations, 2019.\n\n[20] P. Cousot and N. Halbwachs, \u201cAutomatic discovery of linear restraints among variables of a\n\nprogram,\u201d in Proc. Principles of Programming Languages (POPL), 1978, pp. 84\u201396.\n\n[21] Y. Yang and M. Rinard, \u201cCorrectness veri\ufb01cation of neural networks,\u201d 2019.\n\n[22] M. Balunovic, M. Baader, G. Singh, T. Gehr, and M. Vechev, \u201cCertifying geometric robustness\nof neural networks,\u201d in Advances in Neural Information Processing Systems (NeurIPS), 2019,\npp. 15 287\u201315 297.\n\n[23] J. Mohapatra, Tsui-Wei, Weng, P.-Y. Chen, S. Liu, and L. Daniel, \u201cTowards verifying robustness\n\nof neural networks against semantic perturbations,\u201d 2019.\n\n[24] G. Singh, M. P\u00fcschel, and M. Vechev, \u201cFast polyhedra abstract domain,\u201d in ACM SIGPLAN\n\nNotices. ACM, 2017.\n\n[25] \u201cExtended convex hull,\u201d Computational Geometry, vol. 20, no. 1, pp. 13 \u2013 23, 2001.\n\n[26] \u201cpycddlib,\u201d 2018. [Online]. Available: https://pypi.org/project/pycddlib/\n\n[27] Gurobi Optimization, LLC, \u201cGurobi optimizer reference manual,\u201d 2018. [Online]. Available:\n\nhttp://www.gurobi.com\n\n[28] \u201cERAN: ETH Robustness Analyzer for Neural Networks,\u201d 2018. [Online]. Available:\n\nhttps://github.com/eth-sri/eran\n\n[29] M. Mirman, T. Gehr, and M. Vechev, \u201cDifferentiable abstract interpretation for provably robust\nneural networks,\u201d in Proc. International Conference on Machine Learning (ICML), 2018, pp.\n3575\u20133583.\n\n[30] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, \u201cTowards deep learning models\nresistant to adversarial attacks,\u201d in Proc. International Conference on Learning Representations\n(ICLR), 2018.\n\n[31] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, \u201cGradient-based learning applied to document\n\nrecognition,\u201d in Proc. of the IEEE, 1998, pp. 2278\u20132324.\n\n[32] A. Krizhevsky, \u201cLearning multiple layers of features from tiny images,\u201d Tech. Rep., 2009.\n\n[33] N. Carlini and D. A. Wagner, \u201cTowards evaluating the robustness of neural networks,\u201d in Proc.\n\nIEEE Symposium on Security and Privacy (SP), 2017, pp. 39\u201357.\n\n11\n\n\fA Appendix\n\nA.1 Proof of Theorem 3.1\n\nProof. Since Pk-ReLU,i (cid:36)(cid:84)\n\nP1-ReLU,u for Ji, by monotonicity of intersection and convex hull,\n\n(cid:92)\n\nConvQ\u2208QJi\n\nu\u2208Ji\n(Pk-ReLU,i \u2229 Q) (cid:36) ConvQ\u2208QJi\nFor any Q \u2208 QJi, we have that either Q \u2286 C+\nu or Q \u2286 C\u2212\non the right hand side of (8) with either C+\nu or C\u2212\nu such that for all u \u2208 Ji both C+\n(cid:92)\nat least in one substitution and obtain by monotonicity,\nu , P1-ReLU,u \u2229 C\u2212\nu )\n\n\u2286 Convu\u2208Ji(P1-ReLU,u \u2229 C+\n\nP1-ReLU,u) \u2229 C+\nu , (\n\nP1-ReLU,u) \u2229 C\u2212\nu )\n\n\u2286 Convu\u2208Ji((\n\nP1-ReLU,u) \u2229 Q)\n\nu\u2208Ji\n\n(cid:92)\n\nu\u2208Ji\n\n((\n\nu\u2208Ji\n\n(8)\nu for u \u2208 Ji. Thus, we can replace all Q\nu are used\n\nu and C\u2212\n\n(cid:92)\n\n(\nu\u2208Ji\n\nP1-ReLU,u \u2286 P1-ReLU,u).\n\nFor remaining i, similarly ConvQ\u2208Qi(Pk-ReLU,i \u2229 Q) \u2286 Convu\u2208Ji(P1-ReLU,u \u2229 C+\nholds. Since (cid:36) relation holds for at least one i and \u2286 holds for others, Sk-ReLU (cid:36) S1-ReLU holds.\n\nu , P1-ReLU,u \u2229 C\u2212\nu )\n\n12\n\n\f", "award": [], "sourceid": 8646, "authors": [{"given_name": "Gagandeep", "family_name": "Singh", "institution": "ETH Zurich"}, {"given_name": "Rupanshu", "family_name": "Ganvir", "institution": "ETH Zurich"}, {"given_name": "Markus", "family_name": "P\u00fcschel", "institution": "ETH Zurich"}, {"given_name": "Martin", "family_name": "Vechev", "institution": "ETH Zurich, Switzerland"}]}