{"title": "Lookahead Bayesian Optimization with Inequality Constraints", "book": "Advances in Neural Information Processing Systems", "page_first": 1890, "page_last": 1900, "abstract": "We consider the task of optimizing an objective function subject to inequality constraints when both the objective and the constraints are expensive to evaluate. Bayesian optimization (BO) is a popular way to tackle optimization problems with expensive objective function evaluations, but has mostly been applied to unconstrained problems. Several BO approaches have been proposed to address expensive constraints but are limited to greedy strategies maximizing immediate reward. To address this limitation, we propose a lookahead approach that selects the next evaluation in order to maximize the long-term feasible reduction of the objective function. We present numerical experiments demonstrating the performance improvements of such a lookahead approach compared to several greedy BO algorithms, including constrained expected improvement (EIC) and predictive entropy search with constraint (PESC).", "full_text": "Lookahead Bayesian Optimization\n\nwith Inequality Constraints\n\nMassachusetts Institute of Technology\n\nMassachusetts Institute of Technology\n\nRemi R. Lam\n\nCambridge, MA\nrlam@mit.edu\n\nKaren E. Willcox\n\nCambridge, MA\n\nkwillcox@mit.edu\n\nAbstract\n\nWe consider the task of optimizing an objective function subject to inequality\nconstraints when both the objective and the constraints are expensive to evaluate.\nBayesian optimization (BO) is a popular way to tackle optimization problems\nwith expensive objective function evaluations, but has mostly been applied to\nunconstrained problems. Several BO approaches have been proposed to address\nexpensive constraints but are limited to greedy strategies maximizing immediate\nreward. To address this limitation, we propose a lookahead approach that selects\nthe next evaluation in order to maximize the long-term feasible reduction of the\nobjective function. We present numerical experiments demonstrating the perfor-\nmance improvements of such a lookahead approach compared to several greedy\nBO algorithms, including constrained expected improvement (EIC) and predictive\nentropy search with constraint (PESC).\n\n1\n\nIntroduction\n\nConstrained optimization problems are often challenging to solve, due to complex interactions be-\ntween the goals of minimizing (or maximizing) the objective function while satisfying the constraints.\nIn particular, non-linear constraints can result in complicated feasible spaces, sometimes partitioned\nin disconnected regions. Such feasible spaces can be dif\ufb01cult to explore for a local optimizer, po-\ntentially preventing the algorithm from converging to a global solution. Global optimizers, on the\nother hand, are designed to tackle disconnected feasible spaces and optimization of multi-modal\nobjective functions. Such algorithms typically require a large number of evaluations to converge.\nThis can be prohibitive when the evaluation of the objective function or the constraints is expensive,\nor when there is a \ufb01nite budget of evaluations allocated for the optimization, as it is often the case\nwith expensive models. This evaluation budget typically results from resource scarcity such as the\nrestricted availability of a high-performance computer, \ufb01nite \ufb01nancial resources to build prototypes,\nor even time when working on a paper submission deadline.\nBayesian optimization (BO) [19] is a global optimization technique designed to address problems\nwith expensive function evaluations. Its constrained extension, constrained Bayesian optimization\n(CBO), iteratively builds a statistical model for the objective function and the constraints. Based on\nthis model that leverages all the past evaluations, a utility function quanti\ufb01es the merit of evaluating\nany design under consideration. At each iteration, a CBO algorithm evaluates the expensive objective\nfunction and constraints at the design which maximizes this utility function.\nIn most existing methods, the utility function only quanti\ufb01es the reward obtained over the immediate\nnext step, and ignores the gains that could be collected at future steps. This results in greedy CBO\nalgorithms. However, quantifying long-term rewards may be bene\ufb01cial. For instance, in the presence\nof constraints, it could be valuable to learn the boundaries of the feasible space. In order to do so, it\nis likely that an infeasible design would need to be evaluated, bringing no immediate improvement,\n\n31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.\n\n\fbut leading to long-term bene\ufb01ts. Such strategy requires planning over several steps. Planning is also\nrequired to balance the so-called exploration-exploitation trade-off. Intuitively, in order to improve\nthe statistical model, the beginning of the optimization should mainly be dedicated to exploring the\ndesign space, while the end of the optimization should focus on exploiting that statistical model to\n\ufb01nd the best design. To balance this trade-off in a principled way, the optimizer needs to plan ahead\nand be aware of the remaining evaluation budget.\nTo address the shortcomings of greedy algorithms, we propose a new lookahead formulation for\nCBO with a \ufb01nite budget. This approach is aware of the remaining budget and can balance the\nexploration-exploitation trade-off in a principled way. In this formulation, the best optimization policy\nsequentially evaluates the design yielding the maximum cumulated reward over multiple steps. This\noptimal policy is the solution of an intractable dynamic programming (DP) problem. We circumvent\nthis issue by employing an approximate dynamic programming (ADP) algorithm: rollout, building on\nthe unconstrained BO algorithm in [17]. Numerical examples illustrate the bene\ufb01ts of the proposed\nlookahead algorithm over several greedy ones, especially when the objective function is multi-modal\nand the feasible space has a complex topology.\nThe next section gives an overview of CBO and discusses some of the related work (Sec. 2). Then, we\nformulate the lookahead approach to CBO as a dynamic programming problem and demonstrate how\nto approximately solve it by adapting the rollout algorithm (Sec. 3). Numerical results are provided\nin Sec. 4. Finally, we present our conclusions in Sec. 5.\n\n2 Constrained Bayesian Optimization\n\nWe consider the following optimization problem:\n= argmin\n\n(OPc) x\n\n(1)\n\n\u2217\n\ns.t.\n\nf (x)\n\nx\u2208X\ngi(x) \u2264 0,\u2200i \u2208 {1, . . . , I},\n\nwhere x is a d-dimensional vector of design variables. The design space X is a bounded subset of\nRd, f : X (cid:55)\u2192 R is an objective function, I is the number of inequality constraints and gi : X (cid:55)\u2192 R\nis the ith constraint function. The functions f and gi are considered expensive to evaluate. We are\ninterested in \ufb01nding the minimizer x\u2217 of the objective function f subject to the non-linear constraints\ngi \u2264 0 with a \ufb01nite budget of N evaluations. We refer to this problem as the original constrained\nproblem (OPc).\nConstrained Bayesian optimization (CBO) addresses the original constrained problem (OPc) by\nmodeling the objective function f and the constraints gi as realizations of stochastic processes.\nTypically, each expensive-to-evaluate function is modeled with an independent Gaussian process\n(GP). At every iteration n, new evaluations of f and gi become available and augment a training\nj=1. Using Bayes rule, the statistical model is updated\nset Sn = {(xj, f (xj), g1(xj),\u00b7\u00b7\u00b7 , gI (xj))}n\nand the posterior quantities of the GP, conditioned on Sn, re\ufb02ect the current representation of the\nunknown expensive functions. In particular, for any design x, the posterior mean \u00b5n(x; \u03d5) and the\nposterior variance \u03c32\nn(x; \u03d5) of the GP associated with the expensive function \u03d5 \u2208 {f, g1,\u00b7\u00b7\u00b7 , gI}\ncan be computed cheaply using a closed-form expression (see [24] for an overview of GP). CBO\nleverages this statistical model to quantify, in a cheap-to-evaluate utility function Un, the usefulness of\nany design under consideration. The next design to evaluate is then selected by solving the following\nauxiliary problem (AP):\n\n(AP) xn+1 = argmax\n\nx\u2208X\n\nUn(x;Sn).\n\n(2)\n\nThe vanilla CBO algorithm is summarized in Algorithm 1.\nMany utility functions have been proposed in the literature. To decide which design to evaluate next,\n[27] proposed the use of constrained expected improvement EIc, which, in the case of independent\nGPs, can be computed in closed-form as the product of the expected improvement (obtained by\nconsidering the GP associated with the objective function) and the probability of feasibility associated\nwith each constraint. This approach was later applied to machine learning applications [6] and\nextended to the multi-objective case [5]. Note that this method transforms an original constrained\noptimization problem into an unconstrained auxiliary problem by modifying the utility function.\nOther attempts to cast the constrained problem into an unconstrained one include [3]. That work uses\n\n2\n\n\fAlgorithm 1 Constrained Bayesian Optimization\n\nInput: Initial training set S1, budget N\nfor n = 1 to N do\nConstruct GPs using Sn\nUpdate hyper-parameters\nSolve AP for xn+1 = argmaxx\u2208X Un(x;Sn)\nEvaluate f (xn+1), g1(xn+1), \u00b7\u00b7\u00b7 , gI (xn+1)\nSn+1 = Sn \u222a {(xn+1, f (xn+1), g1(xn+1),\u00b7\u00b7\u00b7 , gI (xn+1))}\n\nend for\n\na penalty method to transform the original constrained problem into an unconstrained problem, to\nwhich they apply a radial basis functions (RBF) method for global optimization (constrained RBF\nmethods exist as well [25]). Other techniques from local constrained optimization have been leveraged\nin [10] where the utility function is constructed based on an augmented Lagrangian formulation. This\ntechnique was recently extended in [22] where a slack-variables formulation allows the handling\nof equality and mixed constraints. Another approach is proposed by [1]: at each iteration, a \ufb01nite\nset of candidate designs is \ufb01rst generated from a Latin hypercube, second, candidate designs with\nexpected constraint violation higher than a user-de\ufb01ned threshold are rejected. Finally, among the\nremaining candidates, the ones achieving the best expected improvement are evaluated (several\ndesigns can be selected simultaneously at each iteration in this formulation). Another method [26]\nsolves a constrained auxiliary optimization problem: the next design is selected to maximize the\nexpected improvement subject to approximated constraints (the posterior mean of the GP associated\nwith a constraint is used in lieu of the constraint itself). Note that the two previous methods solve a\nconstrained auxiliary problem.\nAnother method to address constrained BO is proposed by [11], who develop an integrated conditional\nexpected improvement criterion. Given a candidate design, this criterion quanti\ufb01es the expected\nimprovement point-wise (conditioned on the fact that the candidate will be evaluated). This point-\nwise improvement is then integrated over the entire design space. In the unconstrained case, in the\nintegration phase, equal weight is given to designs throughout the design space. The constrained\ncase is addressed by de\ufb01ning a weight function that depends on the feasible probability of a design:\nimprovement at designs that are likely to be infeasible have low weight. The probability of a design\nbeing feasible is calculated using a classi\ufb01cation GP. The computation of this criterion is more\ninvolved as there is no closed-form formulation available for the integration and techniques such as\nMonte Carlo or Markov chain Monte Carlo must be employed. In a similar spirit, [21] introduces a\nutility function which quanti\ufb01es the bene\ufb01t of evaluating a design by integrating its effect over the\ndesign space. The proposed utility function computes the expected reduction of the feasible domain\nbelow the best feasible value evaluated so far. This results in the expected volume of excursion\ncriteria which also requires approximation techniques to be computed.\nThe former approaches revolve around computing a quantity based on improvement and require\nhaving at least one feasible design. Other strategies use information gain as the key element to\ndrive the optimization strategy. [7] proposed a two-step approach for constrained BO when the\nobjective and the constraints can be evaluated independently. The \ufb01rst step chooses the next location\nby maximizing the constrained EI [27], the second step chooses whether to evaluate the objective or\na constraint using an information gain metric (i.e., entropy search [12]). [13, 14] developed a strategy\nthat simultaneously selects the design to be evaluated and the model to query (the objective or a\nconstraint). The criterion used, predictive entropy search with constraints (PESC), is an extension\nof predictive entropy search (PES) [15]. One of the advantages of information gain-based methods\nstems from the fact that one does not need to start with a feasible design.\nAll aforementioned methods use myopic utilities to select the next design to evaluate, leading\nto suboptimal optimization strategies. In the unconstrained BO setting, multiple-steps lookahead\nalgorithms have been explored [20, 8, 18, 9, 17] and were shown to improve the performance of\nBO. To our knowledge, such lookahead strategies for constrained optimization have not yet been\naddressed in the literature and also have the potential to improve the performance of CBO algorithms.\n\n3\n\n\f3 Lookahead Formulation of CBO\n\nIn this section, we formulate CBO with a \ufb01nite budget as a dynamic programming (DP) problem\n(Sec. 3.1). This leads to an optimal but computationally challenging optimization policy. To mitigate\nthe cost of computing such a policy, we employ an approximate dynamic programming algorithm,\nrollout, and demonstrate how it can be adapted to CBO with a \ufb01nite budget (Sec. 3.2).\n\n3.1 Dynamic Programming Formulation\n\nWe seek an optimization policy which leads, after consumption of the evaluation budget, to the\nmaximum feasible decrease of the objective function. Because the value of the expensive objective\nfunction and constraints are not known before their evaluations, it is impossible to quantify such\nlong-term reward within a cheap-to-evaluate utility function Un. However, CBO endows the objective\nfunction and the constraints with a statistical model that can be interrogated to inform the optimizer\nof the likely values of f and gi at a given design. This statistical model can be leveraged to simulate\noptimization scenarios over multiple steps and quantify their probabilities. Using this simulation\nmechanism, it is possible to quantify, in an average sense, the long-term reward achieved under a\ngiven optimization policy. The optimal policy is the solution of the DP problem that we formalize\nnow.\nLet n be the current iteration number of the CBO algorithm, and N the total budget of evaluations,\nor horizon. We refer to the future iterations of the optimization generated by simulation as stages.\nFor any stage k \u2208 {n,\u00b7\u00b7\u00b7 , N}, all the information collected is contained in the training set Sk. The\nfunction f and the I functions gi are modeled with independent GPs. Their posterior quantities,\nconditioned on Sk, fully characterize our knowledge of f and gi. Thus, we de\ufb01ne the state of our\nknowledge at stage k to be the training set Sk \u2208 Zk.\nBased on the training set Sk, the simulation makes a decision regarding the next design xk+1 \u2208 X to\nevaluate using an optimization policy. An optimization policy \u03c0 = {\u03c01,\u00b7\u00b7\u00b7 , \u03c0N} is a sequence of\nrules, \u03c0k : Zk (cid:55)\u2192 X for k \u2208 {1,\u00b7\u00b7\u00b7 , N}, mapping a training set Sk to a design xk+1 = \u03c0k(Sk).\nIn the simulations, the values f (xk+1) and gi(xk+1) are unknown and are treated as uncertainties.\nWe model those I + 1 uncertain quantities with I + 1 independent Gaussian random variables W f\nand W gi\n\nk+1 based on the GPs:\n\nk+1\n\nk(xk+1; f )),\nk(xk+1; gi)),\n\nk+1,\u00b7\u00b7\u00b7 , gI\nk+1, W g1\n\nk+1,\u00b7\u00b7\u00b7 , W gI\n\nW f\nk+1 \u223c N (\u00b5k(xk+1; f ), \u03c32\nW gi\nk+1 \u223c N (\u00b5k(xk+1; gi), \u03c32\n\nk+1 are not f (xk+1) and gi(xk+1).\nk+1,\u00b7\u00b7\u00b7 , gI\n\n(3)\n(4)\nk(xk+1; \u03d5) are the posterior mean and variance of the GP\nwhere we recall that \u00b5k(xk+1; \u03d5) and \u03c32\nassociated with any expensive function \u03d5 \u2208 {f, g1,\u00b7\u00b7\u00b7 , gI}, conditioned on Sk, at xk+1. Then, the\nsimulation generates an outcome. A simulated outcome wk+1 = (fk+1, g1\nk+1) \u2208 W \u2282\nRI+1 is a sample of the (I + 1)-dimensional random variable Wk+1 = [W f\nk+1].\nNote that simulating an outcome does not require evaluating the expensive f and gi. In particular,\nfk+1 and gi\nOnce an outcome wk+1 = (fk+1, g1\nSk+1, governed by the system dynamic Fk : Zk \u00d7 X \u00d7 W (cid:55)\u2192 Zk+1 given by:\n(5)\nk+1,\u00b7\u00b7\u00b7 , gI\nNow that the simulation mechanism is de\ufb01ned, one needs a metric to assert the quality of a given\noptimization policy. At stage k, a stage-reward function rk : Zk \u00d7 X \u00d7 W (cid:55)\u2192 R quanti\ufb01es the merit\n(cid:111)\nof querying a design if the outcome wk = (fk+1, g1\nk+1) occurs. This stage-reward is\nde\ufb01ned as the reduction of the objective function satisfying the constraints:\nSk\nbest \u2212 fk+1\nk+1 \u2264 0 for all i \u2208 {1,\u00b7\u00b7\u00b7 , I}, and rk(\u00b7,\u00b7,\u00b7) = 0 otherwise, where f\n(cid:35)\n\n,\nSk\nif gi\nbest is the best feasible value\nat stage k. Thus, the expected (long-term) reward starting from training set Sn under optimization\npolicy \u03c0 is:\n\nSk+1 = Fk(Sk, xk+1, wk+1) = Sk \u222a {(xk+1, fk+1, g1\n\nk+1) is simulated, the system transitions to a new state\n\n(cid:110)\nk+1,\u00b7\u00b7\u00b7 , gI\n\n0, f\n\nk+1)).\n\n(6)\n\nrk(Sk, xk+1, wk+1) = max\n\n(cid:34) N(cid:88)\n\nk=n\n\nJ\u03c0(Sn) = E\n\nrk(Sk, \u03c0k(Sk), wk+1)\n\n,\n\n(7)\n\n4\n\n\fwhere the expectation is taken with respect to the (correlated) simulated values (wn+1,\u00b7\u00b7\u00b7 , wN +1),\nand the state evolution is governed by Eq. 5. An optimal policy, \u03c0\u2217, is a policy maximizing this\nlong-term expected reward in the space of admissible policies \u03a0:\n\nJ\u03c0\u2217 (Sn) = max\n\u03c0\u2208\u03a0\n\nJ\u03c0(Sn).\n\n(8)\n\nThe optimal reward J\u03c0\u2217 (Sn) is given by Bellman\u2019s principle of optimality and can be computed\nusing the DP recursive algorithm, working backward from k = N \u2212 1 to k = n,\n\nJN (SN ) = max\nxN +1\u2208X\nJk(Sk) = max\nxk+1\u2208X\n\nE[rN (SN , xN +1, wN +1)] = max\nxN +1\u2208X EIc(xN +1;SN )\nE[rk(Sk, xk+1, wk+1) + Jk+1(Fk(Sk, xk+1, wk+1))],\n\n(9)\n\nwhere each expectation is taken with respect to one simulated outcome vector wk+1, and we have\nused the fact that E[rk(Sk, xk+1, wk+1)] = EIc(xk+1;Sk) is the constrained expected improvement\nknown in closed-form [27]. The optimal reward is given by J\u03c0\u2217 (Sn) = Jn(Sn). Thus, at iteration n\nof the CBO algorithm, the optimal policy select the next design xn+1 that maximizes Jn(Sn) given\nby Eqs. 9. In other words, the best decision to make at iteration n maximizes, on average, the sum of\nthe immediate reward rn and the future long-term reward Jn+1(Sn+1) obtained by making optimal\nsubsequent decisions. This is illustrated in Fig. 1, left panel.\n\nFigure 1: Left: Tree illustrating the intractable DP formulation. Each black circle represents a training\nset and a design, each white circle is a training set. Dashed lines represent simulated outcomes\nresulting in expectations. The double arrows represent designs selected with the (unknown) optimal\npolicy, leading to nested maximizations. Double arrows depict the bidirectional way information\npropagates when the optimal policy is built: each optimal decision depends on the previous steps and\nrelies on the optimality of the future decisions. Right: Single arrows represent designs selected using\na heuristic. This illustrates the unidirectional propagation of information when a known heuristic\ndrives the simulations: each decision depends on the previous steps but is independent of the future\nones. The absence of nested maximization leads to a tractable formulation.\n\n3.2 Rollout for Constrained Bayesian Optimization\n\nThe best optimization policy evaluates, at each iteration n of the CBO algorithm, the design xn+1\nmaximizing the optimal reward J\u03c0\u2217 (Sn) (Eq. 8). This requires solving a problem with several nested\nmaximizations and expectations (Eqs. 9), which is computationally intractable. To mitigate the cost\nof solving the DP algorithm, we employ an approximate dynamic programming (ADP) technique:\nrollout (see [2, 23] for an overview). Rollout selects the next design by maximizing a (suboptimal)\nlong-term reward J\u03c0. The reward is computed by simulating optimization scenarios over several\nfuture steps. However, the simulated steps are not controlled by the optimal policy \u03c0\u2217. Instead,\nrollout uses a suboptimal policy \u03c0, i.e. a heuristic, to drive the simulation. This circumvents the\nneed for nested maximizations (as illustrated in Fig. 1, right panel) and simpli\ufb01es the computation of\nJ\u03c0 compared to J\u03c0\u2217. We now formalize the rollout algorithm, propose a heuristic \u03c0 adapted to the\ncontext of CBO with a \ufb01nite budget, and detail further numerical approximations.\nLet us consider the iteration n of the CBO algorithm. The long-term reward J\u03c0(Sn) induced by\na (known) heuristic \u03c0 = {\u03c01,\u00b7\u00b7\u00b7 , \u03c0N}, starting from state Sn, is de\ufb01ned by Eq. 7. This can be\nrewritten as J\u03c0(Sn) = Hn, where Hn is recursively de\ufb01ned, from k = N back to k = n, by:\n\nHN +1(SN +1) = 0\nHk(Sk) = E[rk(Sk, \u03c0k(Sk), wk+1) + \u03b3Hk+1(Fk(Sk, \u03c0k(Sk), wk+1))],\n\n(10)\n\n5\n\nSkxk+1wk+1Sk+1xk+2wk+2Sk+2xk+3\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7Skxk+1wk+1Sk+1\u03c0k+1(Sk+1)wk+2Sk+2\u03c0k+2(Sk+2)\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\u00b7\fwhere each expectation is taken with respect to one simulated outcome vector wk+1, and \u03b3 \u2208 [0, 1]\nis a discount factor encouraging the early collection of reward. A discount factor \u03b3 = 0 leads to a\ngreedy policy, focusing on immediate reward. In that case, the reward J\u03c0 simpli\ufb01es to the constrained\nexpected improvement EIc. A discount factor \u03b3 = 1, on the other hand, is indifferent to when the\nreward is collected.\nThe fundamental simpli\ufb01cation introduced by the rollout algorithm lies in the absence of nested\nmaximizations in Eqs. 10. This is illustrated in Fig. 1, right panel. By applying a known heuristic,\ninformation only propagates forward: every simulated step depends on the previous steps, but is\nindependent from the future simulated steps. This is in contrast to the DP algorithm, illustrated in\nFig. 1. Because the optimal policy is not known, it needs to be built by solving a sequence of nested\nproblems. Thus, information propagates both forward and backward.\nWhile Hn is simpler to compute than Jn, it still requires computing nested expectations for which\nthere is no closed-form expression. To further alleviate the cost of computing the long-term reward,\nwe introduce two numerical simpli\ufb01cations. First, we use a rolling horizon h \u2208 N to decrease the\nnumber of future steps simulated. A rolling horizon h replaces the horizon N by \u02dcN = min{N, n+h}.\nSecond, the expectations with respect to the (I + 1)-dimensional Gaussian random variables are\nnumerically approximated using Gauss-Hermite quadrature. We obtain the following formulation:\n\n\u02dcH \u02dcN +1(S \u02dcN +1) = 0\n\u02dcHk(Sk) = EIc(\u03c0k(Sk);Sk) + \u03b3\n\nNq(cid:88)\n\nq=1\n\nwhere Nq is the number of quadrature weights \u03b1(q) \u2208 R and points w(q)\nFor all iteration n \u2208 {1,\u00b7\u00b7\u00b7 , N} and for all xn+1 \u2208 X , we de\ufb01ne the utility function of our rollout\nalgorithm for CBO with \ufb01nite budget to be:\n\n\u03b1(q)[ \u02dcHk+1(Fk(Sk, \u03c0k(Sk), w(q)\nk+1 \u2208 RI+1.\n\nk+1))],\n\n(11)\n\nNq(cid:88)\n\nq=1\n\nUn(xn+1;Sn) = EIc(xn+1;Sn) + \u03b3\n\n\u03b1(q)[ \u02dcHn+1(Fn(Sn, xn+1, w(q)\n\nn+1))].\n\n(12)\n\nThe heuristic \u03c0 is problem-dependent. A desirable heuristic combines two properties: (1) it is\ncheap to compute, (2) it is a good approximation of the optimal policy \u03c0\u2217. In the case of CBO\nwith a \ufb01nite budget, the heuristic \u03c0 ought to mimic the exploration-exploitation trade-off balanced\nby the optimal policy \u03c0\u2217. To do so, we propose using a combination of greedy CBO algorithms:\nmaximization of the constrained expected improvement (which has an exploratory behavior) and\na constrained optimization based on the posterior means of the GPs (which has an exploitative\nbehavior). For a given iteration n, we de\ufb01ne the heuristic \u03c0 = {\u03c0n+1,\u00b7\u00b7\u00b7 , \u03c0 \u02dcN} such that for stages\nk \u2208 {n + 1,\u00b7\u00b7\u00b7 , \u02dcN \u2212 1}, the policy component \u03c0k : Zk (cid:55)\u2192 X , maps a state Sk to the design xk+1\nsatisfying:\n(13)\n\nxk+1 = argmax\n\nx\u2208X\n\nEIc(x;Sk).\n\nThe last policy component, \u03c0 \u02dcN : Z \u02dcN (cid:55)\u2192 X , maps a state S \u02dcN to x \u02dcN +1 such that:\n\n(14)\n\nx \u02dcN +1 = argmin\nx\u2208X\n\n\u00b5 \u02dcN (x; f )\n\ns.t. P F (x;S \u02dcN ) \u2265 0.99,\n\n(cid:0)N h\n\nq\n\n(cid:0)\n\n|Sk|2(cid:1) of work.\n\n(cid:1) applications of a heuristic component \u03c0k. The heuristic that we propose\n\nwhere P F is the probability of feasibility known in closed-form. Every evaluation of the utility\nfunction Un requires O\noptimizes a quantity that requires O\nTo summarize, the proposed approach sequentially selects the next design to evaluate by maximizing\nthe long-term reward induced by a heuristic. This rollout algorithm is a one-step lookahead formu-\nlation (one maximization) and is easier to solve than the N-steps lookahead approach (N nested\nmaximizations) presented in Sec. 3.1. Rollout is a closed-loop approach where the information\ncollected at a given stage of the simulation is used to simulate the next stages. The heuristic used in\nthe rollout is problem-dependent, and we proposed using a combination of greedy CBO algorithms\nto construct such a heuristic. The computation of the utility function is detailed in Algorithm 2.\n\n6\n\n\fAlgorithm 2 Rollout Utility Function\n\nFunction: utility(x, h,S)\nConstruct GPs using S\nif h = 0 then\n\nelse\n\nU \u2190 EIc(x;S)\nU \u2190 EIc(x;S)\nGenerate Nq Gauss-Hermite quadrature weights \u03b1(q) and points w(q) associated with x\nfor q = 1 to Nq do\n\n(cid:48)\n\nS\nif h > 1 then\n\u2190 \u03c0(S\n\u2190 \u03c0(S\n\n\u2190 S \u222a {(x, w(q))}\nx(cid:48)\nelse\nx(cid:48)\nend if\nU \u2190 U + \u03b3\u03b1(q)utility(x(cid:48), h \u2212 1,S\nend for\n\n(cid:48)) using Eq. 13\n(cid:48)) using Eq. 14\n\n(cid:48))\n\nend if\nOutput: U\n\n4 Results\n\nIn this section, we numerically investigate the proposed algorithm and demonstrate its performance\non classic test functions and a reacting \ufb02ow problem.\nTo compare the performance of the different CBO algorithms tested, we use the utility gap metric\n[14]. At iteration n, the utility gap en measures the error between the optimum feasible value f\u2217 and\nthe value of the objective function at a recommended design x\u2217\nn:\n\n(cid:26)\n\nen =\n\nn) \u2212 f\u2217\n\n|f (x\u2217\n|\u03a8 \u2212 f\u2217\n\n|\n\n|\n\nif x\u2217\nelse,\n\nn is feasible,\n\n(15)\n\n(16)\n\nwhere \u03a8 is a user-de\ufb01ned penalty punishing infeasible recommendations. The recommended design,\nx\u2217\nn, differs from the design selected for evaluation xn. It is the design that the algorithm would\nrecommend to evaluate if the optimization were to be stopped at iteration n, without early notice. We\nuse the same system of recommendation as [14]:\n\n\u2217\nn = argmin\nx\nx\u2208X\n\n\u00b5n(x; f )\n\ns.t. P F (x;Sn) \u2265 0.975.\n\nNote that the utility gap en is not guaranteed to decrease because recommendations x\u2217\nn are not\nnecessarily better with iterations. In particular, en is not the best error achieved in the training set Sn.\nIn the following numerical experiments, for the rollout algorithm, we use independent zero-mean GPs\nwith automatic relevance determination (ARD) square-exponential kernel to model each expensive-to-\nevaluate function. In Algorithm. 1, when the GPs are constructed, the vector of hyper-parameters \u03b8i\nassociated with the ith GP kernel is estimated by maximization of the marginal likelihood. However,\nto reduce the cost of computing Un, the hyper-parameters are kept constant in the simulated steps (i.e.,\nin Algorithm. 2). To compute the expectations of Eqs. 11-12, we employ Nq = 3I+1 Gauss-Hermite\nquadrature weights and points and we set the discount factor to \u03b3 = 0.9. Finally, at iteration n, the\nSn\nbest value f\nbest is set to the minimum posterior mean \u00b5n(x; f ) over the designs x in the training\nset Sn, such that the posterior mean of each constraint is feasible. If no such point can be found,\nthen f\nm is the\nmaximum variance of the GP associated with f. The EIC algorithm is computed as a special case of\nthe rollout with rolling horizon h = 0, and we use the Spearmint package1 to run the PESC algorithm.\nWe additionally run a CBO algorithm that selects the next design to evaluate based on the posterior\nmeans of the GPs2:\n\nSn\nbest is set to the maximum of {\u00b5n(x; f ) + 3\u03c3m} over the designs x in Sn, where \u03c32\n\nxn+1 = argmin\n\n\u00b5n(x; f )\n\ns.t.\n\nx\u2208X\n\n\u00b5n(x; gi) \u2264 0,\u2200i \u2208 {1, . . . , I}.\n\n(17)\n\n1https://github.com/HIPS/Spearmint/tree/PESC\n2As suggested by a reviewer.\n\n7\n\n\fFigure 2: Left: Multi-modal objective and single constraint (P1). Right: Linear objective and multiple\nnon-linear constraints (P2). Shaded region indicates 95% con\ufb01dence interval of the median statistic.\n\nWe refer to this algorithm as PM. We also compare the CBO algorithms to three local algorithms\n(SLSQP, MMA and COBYLA) and to one global evolutionary algorithm (ISRES).\nWe now consider four problems with different design space dimensions d, several numbers of\nconstraints I, and various topologies of the feasible space. The three \ufb01rst problems, P1-3, are analytic\nfunctions while the last one, P4, uses a reacting \ufb02ow model that requires solving a set of partial\ndifferential equations (PDEs) [4]. For P1 and P2, we use N = 40 evaluations (as in [6, 10]). For\nP3 and P4, we use a small number of iterations N = 60, which corresponds to situations where the\nfunctions are very expensive to evaluate (e.g. solving large systems of PDEs can take over a day on a\nsupercomputer). The full description of the problems is available in the appendix. In Figs. 2-3, we\nshow the median of the utility gap, the shadings represent the 95% con\ufb01dence interval of the median\ncomputed by bootstrap. Other statistics of the utility gap are shown in the appendix.\nFor P1, the median utility gap for EIC, PESC, PM and the rollout algorithm with h \u2208 {1, 2, 3} is\nshown in Fig. 2 (left panel). The PM algorithm does not improve its recommendations. This is not\nsurprising because PM focuses on exploitation (PM does not depends on posterior variance) which\ncan result in the algorithm failing to make further progress. Such behavior has already been reported\nin [16] (Sec. 3). The three other CBO algorithms perform similarly in the \ufb01rst 10 iterations. PESC\nis the \ufb01rst to converge to a utility gap \u2248 10\u22122.7. The rollout performs better or similarly than EIC.\nIn the 15 \ufb01rst iterations, longer rolling horizons lead to slightly lower utility gaps. This is likely\nto be due to the more exploratory behavior associated with lookahead, which helps differentiating\nthe global solution from the local ones. For the remaining iterations, the shorter rolling horizons\nreduce the utility gap faster than longer rolling horizons before reaching a plateau. EIC and rollout\noutperform PESC after 25 iterations. We note that EIC and rollout have essentially converged.\nFor P2, the median performance of EIC, PESC, PM and rollout with rolling horizon h \u2208 {1, 2, 3} is\nshown in Fig. 2 (right panel). The PM algorithm reduces the utility gap in the \ufb01rst 10 iterations, but\nreaches a plateau at 10\u22121.7. The three other CBO algorithms perform similarly up to iteration 15,\nwhere PESC reaches a plateau 3. This similarity may be explained by the fact that the local solutions\nare easily differentiable from the global one, leading to no advantage for exploratory behavior. In\nthis example, the rollout algorithms reached the same plateau at 10\u22123, with longer horizons h taking\nmore iterations to converge. EIC performs better than rollout h = 2 before its performance slightly\ndecreases, reaching a plateau at a larger utility gap 10\u22122.6 (note that the utility gap is not computed\nwith the best value observed so far and thus is not guaranteed to decrease). This increase of the\nmedian utility gap can be explained by the fact that a few runs change their recommendation from\none local minimum to another one, resulting in the change in median utility function. This is also\nre\ufb02ected in the 95% con\ufb01dence interval of the median, which further indicates that the statistic is\nsensitive to a few runs.\nFor P3, the median utility gap for the four CBO algorithms is shown in Fig. 3 (left panel). PM is\nrapidly outperformed by the other algorithms. The PESC algorithm is outperformed by EIC and\nrollout after 25 iterations. Again, we note that rollout with h = 1 obtains a lower utility gap that EIC\nat every iteration. The rollout with h \u2208 {2, 3} exhibits a different behavior: it starts decreasing the\nutility gap later in the optimization but achieves a better performance when the evaluation budget\n\n3Results obtained for PESC mean utility gap are consistent with [13].\n\n8\n\n0510152025303540Iterationn\u22125\u22124\u22123\u22122\u2212101log10MedianUtilityGapenPESCPMEIcRollout,h=1Rollout,h=2Rollout,h=30510152025303540Iterationn\u22123.0\u22122.5\u22122.0\u22121.5\u22121.0\u22120.50.0log10MedianUtilityGapenPESCPMEIcRollout,h=1Rollout,h=2Rollout,h=3\fFigure 3: Left: Multi-modal 4-d objective and constraint (P3). Right: Reacting \ufb02ow problem (P4).\nThe awareness of the remaining budget explains the sharp decrease in the last iterations for the rollout.\n\nis consumed. Note that none of the algorithms has converged to the global solution, and the strong\nmulti-modality of the objective and constraint function seems to favor exploratory behaviors.\nFor the reacting \ufb02ow problem P4, the median performances are shown in Fig. 3 (right panel). PM\nrapidly reaches a plateau at en \u2248 101.3. PESC reduces rapidly the utility gap, outperforming the\nother algorithms after 15 iterations. EIC and rollout perform similarly and slowly decrease the utility\ngap up to iteration 40, where EIC reaches a plateau and rollout continues to improve performance,\nslightly outperforming PESC at the end of the optimization.\nThe results are summarized in Table. 1, and show that the rollout algorithm with different rolling\nhorizons h (R-h) performs similarly or favorably compared to the other algorithms.\n\nTable 1: Log median utility gap log10(eN ). Statistics computed over m independent runs.\n\nProb\n\nd N\n\nP1\nP2\nP3\nP4\n\n2\n2\n4\n4\n\n40\n40\n60\n60\n\nI\n\n1\n2\n1\n1\n\nm SLSQP MMA\n\nCOBYLA\n\nISRES\n\nPESC\n\n500\n500\n500\n50\n\n0.59\n-0.40\n2.15\n0.80\n\n0.59\n-0.40\n3.06\n0.80\n\n-0.05\n-0.82\n3.06\n0.80\n\n-0.19\n-0.70\n1.68\n0.13\n\n-2.68\n-2.43\n1.66\n0.09\n\nPM\n\n0.30\n-1.76\n1.79\n1.26\n\nEIC\n\n-4.45\n-2.62\n1.60\n0.57\n\nR-1\n-4.59\n-2.99\n1.48\n-0.10\n\nR-2\n\n-4.52\n-2.99\n1.31\n-0.10\n\nR-3\n\n-4.42\n-2.994\n1.35\n0.19\n\nBased on the four previous examples, we notice that increasing the rolling horizon h does not\nnecessarily improve the performance of the rollout algorithm. One possible reason stems from the\nfact that lookahead algorithms rely more on the statistical model that greedy algorithms. Because\nthis model is learned as the optimization unfolds, it is an imperfect model (in particular the hyper-\nparameters of the GPs are updated after each iteration, but not after each stage of a simulated\nscenario). By simulating too many steps with the GPs, one may be over-con\ufb01dently using the model.\nIn some sense, the rolling horizon h, as well as the discount factor \u03b3, can be interpreted as a form\nof regularization. The effect of a larger rolling horizon is problem-dependent, and experiment P3\nsuggests that multimodal problems in higher dimension may bene\ufb01ts from longer rolling horizons.\n\n5 Conclusions\n\nWe proposed a new formulation for constrained Bayesian optimization with a \ufb01nite budget of\nevaluations. The best optimization policy is de\ufb01ned as the one maximizing, in average, the cumulative\nfeasible decrease of the objective function over multiple steps. This optimal policy is the solution of\na dynamic programming problem that is intractable due to the presence of nested maximizations. To\ncircumvent this dif\ufb01culty, we employed the rollout algorithm. Rollout uses a heuristic to simulate\noptimization scenarios over several step, thereby computing an approximation of the long-term\nreward. This heuristic is problem-dependent and, in this paper, we proposed to use a combination\nof cheap-to-evaluate greedy CBO algorithms to construct such heuristic. The proposed algorithm\nwas numerically investigated and performed similarly or favorably compared to constrained expected\nimprovement (EIC) and predictive entropy search with constraint (PESC).\nThis work was supported in part by the AFOSR MURI on multi-information sources of multi-physics\nsystems under Award Number FA9550-15-1-0038, program manager Dr. Jean-Luc Cambier.\n\n4For cost reasons, the median for h = 3 was computed with m = 100 independent runs instead of 500.\n\n9\n\n0102030405060Iterationn1.21.41.61.82.02.2log10MedianUtilityGapenPESCPMEIcRollout,h=1Rollout,h=2Rollout,h=30102030405060Iterationn\u22120.50.00.51.01.52.02.53.03.5log10MedianUtilityGapenPESCPMEIcRollout,h=1Rollout,h=2Rollout,h=3\fReferences\n[1] C. Audet, A. J. Booker, J. E. Dennis Jr, P. D. Frank, and D. W. Moore. A surrogate-model-based method\n\nfor constrained optimization. AIAA paper, 4891, 2000.\n\n[2] D. P. Bertsekas. Dynamic programming and optimal control, volume 1. Athena Scienti\ufb01c, 1995.\n\n[3] M. Bj\u00f6rkman and K. Holmstr\u00f6m. Global optimization of costly nonconvex functions using radial basis\n\nfunctions. Optimization and Engineering, 4(1):373\u2013397, 2000.\n\n[4] M. Buffoni and K. E. Willcox. Projection-based model reduction for reacting \ufb02ows. In 40th Fluid Dynamics\n\nConference and Exhibit, page 5008, 2010.\n\n[5] P. Feliot, J. Bect, and E. Vazquez. A Bayesian approach to constrained single-and multi-objective\n\noptimization. Journal of Global Optimization, 67(1-2):97\u2013133, 2017.\n\n[6] J. Gardner, M. Kusner, K. Q. Weinberger, J. Cunningham, and Z. Xu. Bayesian optimization with inequality\nconstraints. In T. Jebara and E. P. Xing, editors, Proceedings of the 31st International Conference on\nMachine Learning (ICML-14), pages 937\u2013945. JMLR Workshop and Conference Proceedings, 2014.\n\n[7] M. A. Gelbart, J. Snoek, and R. P. Adams. Bayesian optimization with unknown constraints. arXiv preprint\n\narXiv:1403.5607, 2014.\n\n[8] D. Ginsbourger and R. Le Riche. Towards Gaussian process-based optimization with \ufb01nite time horizon.\n\nIn mODa 9\u2013Advances in Model-Oriented Design and Analysis, pages 89\u201396. Springer, 2010.\n\n[9] J. Gonz\u00e1lez, M. Osborne, and N. D. Lawrence. GLASSES: Relieving the myopia of Bayesian optimisation.\nIn Proceedings of the 19th International Conference on Arti\ufb01cial Intelligence and Statistics, pages 790\u2013799,\n2016.\n\n[10] R. B. Gramacy, G. A. Gray, S. Le Digabel, H. K. H. Lee, P. Ranjan, G. Wells, and S. M. Wild. Modeling\n\nan augmented Lagrangian for blackbox constrained optimization. Technometrics, 58(1):1\u201311, 2016.\n\n[11] R. B. Gramacy and H. K. H. Lee. Optimization under unknown constraints. arXiv preprint arXiv:1004.4027,\n\n2010.\n\n[12] P. Hennig and C. J. Schuler. Entropy search for information-ef\ufb01cient global optimization. The Journal of\n\nMachine Learning Research, 13(1):1809\u20131837, 2012.\n\n[13] J. M. Hern\u00e1ndez-Lobato, M. A. Gelbart, R. P. Adams, M. W. Hoffman, and Z. Ghahramani. A gen-\neral framework for constrained bayesian optimization using information-based search. arXiv preprint\narXiv:1511.09422, 2015.\n\n[14] J. M. Hern\u00e1ndez-Lobato, M. A. Gelbart, M. W. Hoffman, R. P. Adams, and Z. Ghahramani. Predictive en-\ntropy search for bayesian optimization with unknown constraints. In Proceedings of the 32nd International\nConference on Machine Learning, Lille, France, 2015.\n\n[15] J. M. Hern\u00e1ndez-Lobato, M. W. Hoffman, and Z. Ghahramani. Predictive entropy search for ef\ufb01cient\nglobal optimization of black-box functions. In Advances in Neural Information Processing Systems, pages\n918\u2013926, 2014.\n\n[16] D. R. Jones. A taxonomy of global optimization methods based on response surfaces. Journal of Global\n\nOptimization, 21(4):345\u2013383, 2001.\n\n[17] R. R. Lam, K. E. Willcox, and D. H. Wolpert. Bayesian optimization with a \ufb01nite budget: An approximate\ndynamic programming approach. In Advances in Neural Information Processing Systems, pages 883\u2013891,\n2016.\n\n[18] C. K. Ling, K. H. Low, and P. Jaillet. Gaussian process planning with lipschitz continuous reward functions:\nTowards unifying Bayesian optimization, active learning, and beyond. arXiv preprint arXiv:1511.06890,\n2015.\n\n[19] J. Mockus, V. Tiesis, and A. Zilinskas. The application of bayesian methods for seeking the extremum.\n\nTowards Global Optimization, 2(117-129):2, 1978.\n\n[20] M. A. Osborne, R. Garnett, and S. J. Roberts. Gaussian processes for global optimization.\nInternational Conference on Learning and Intelligent Optimization (LION3), pages 1\u201315, 2009.\n\nIn 3rd\n\n[21] V. Picheny. A stepwise uncertainty reduction approach to constrained global optimization. In AISTATS,\n\npages 787\u2013795, 2014.\n\n10\n\n\f[22] V. Picheny, R. B. Gramacy, S. Wild, and S. Le Digabel. Bayesian optimization under mixed constraints\nwith a slack-variable augmented Lagrangian. In Advances in Neural Information Processing Systems,\npages 1435\u20131443, 2016.\n\n[23] W. B. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality, volume 842.\n\nJohn Wiley & Sons, 2011.\n\n[24] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge,\n\nMA, 2006.\n\n[25] R. G. Regis. Constrained optimization by radial basis function interpolation for high-dimensional expensive\n\nblack-box problems with infeasible initial points. Engineering Optimization, 46(2):218\u2013243, 2014.\n\n[26] M. J. Sasena, P. Y. Papalambros, and P. Goovaerts. The use of surrogate modeling algorithms to exploit\n\ndisparities in function computation time within simulation-based optimization. Constraints, 2:5, 2001.\n\n[27] M. Schonlau, W. J. Welch, and D. R. Jones. Global versus local search in constrained optimization of\n\ncomputer models. Lecture Notes-Monograph Series, pages 11\u201325, 1998.\n\n11\n\n\f", "award": [], "sourceid": 1177, "authors": [{"given_name": "Remi", "family_name": "Lam", "institution": "MIT"}, {"given_name": "Karen", "family_name": "Willcox", "institution": "MIT"}]}