Part of Advances in Neural Information Processing Systems 20 (NIPS 2007)
Andrea Lecchini-visintini, John Lygeros, Jan Maciejowski
Simulated annealing is a popular method for approaching the solution of a global optimization problem. Existing results on its performance apply to discrete com- binatorial optimization where the optimization variables can assume only a finite set of possible values. We introduce a new general formulation of simulated an- nealing which allows one to guarantee finite-time performance in the optimiza- tion of functions of continuous variables. The results hold universally for any optimization problem on a bounded domain and establish a connection between simulated annealing and up-to-date theory of convergence of Markov chain Monte Carlo methods on continuous domains. This work is inspired by the concept of finite-time learning with known accuracy and confidence developed in statistical learning theory.
Optimization is the general problem of finding a value of a vector of variables θ that maximizes (or minimizes) some scalar criterion U (θ). The set of all possible values of the vector θ is called the optimization domain. The elements of θ can be discrete or continuous variables. In the first case the optimization domain is usually finite, such as in the well-known traveling salesman problem; in the second case the optimization domain is a continuous set. An important example of a continuous optimization domain is the set of 3-D configurations of a sequence of amino-acids in the problem of finding the minimum energy folding of the corresponding protein [1].
In principle, any optimization problem on a finite domain can be solved by an exhaustive search. However, this is often beyond computational capacity: the optimization domain of the traveling salesman problem with 100 cities contains more than 10155 possible tours. An efficient algorithm to solve the traveling salesman and many similar problems has not yet been found and such prob- lems remain reliably solvable only in principle [2]. Statistical mechanics has inspired widely used methods for finding good approximate solutions in hard discrete optimization problems which defy efficient exact solutions [3, 4, 5, 6]. Here a key idea has been that of simulated annealing [3]: a random search based on the Metropolis-Hastings algorithm, such that the distribution of the ele- ments of the domain visited during the search converges to an equilibrium distribution concentrated around the global optimizers. Convergence and finite-time performance of simulated annealing on finite domains has been evaluated in many works, e.g. [7, 8, 9, 10].
On continuous domains, most popular optimization methods perform a local gradient-based search and in general converge to local optimizers; with the notable exception of convex criteria where convergence to the unique global optimizer occurs [11]. Simulated annealing performs a global search and can be easily implemented on continuous domains. Hence it can be considered a powerful complement to local methods. In this paper, we introduce for the first time rigorous guarantees on the finite-time performance of simulated annealing on continuous domains. We will
show that it is possible to derive simulated annealing algorithms which, with an arbitrarily high level of confidence, find an approximate solution to the problem of optimizing a function of continuous variables, within a specified tolerance to the global optimal solution after a known finite number of steps. Rigorous guarantees on the finite-time performance of simulated annealing in the optimiza- tion of functions of continuous variables have never been obtained before; the only results available state that simulated annealing converges to a global optimizer as the number of steps grows to infin- ity, e.g. [12, 13, 14, 15].
The background of our work is twofold. On the one hand, our notion of approximate solution to a global optimization problem is inspired by the concept of finite-time learning with known accuracy and confidence developed in statistical learning theory [16, 17]. We actually maintain an important aspect of statistical learning theory which is that we do not introduce any particular assumption on the optimization criterion, i.e. our results hold regardless of what U is. On the other hand, we ground our results on the theory of convergence, with quantitative bounds on the distance to the target dis- tribution, of the Metropolis-Hastings algorithm and Markov Chain Monte Carlo (MCMC) methods, which has been one of the main achievements of recent research in statistics [18, 19, 20, 21].
In this paper, we will not develop any ready-to-use optimization algorithm. We will instead in- troduce a general formulation of the simulated annealing method which allows one to derive new simulated annealing algorithms with rigorous finite-time guarantees on the basis of existing theory. The Metropolis-Hastings algorithm and the general family of MCMC methods have many degrees of freedom. The choice and comparison of specific algorithms goes beyond the scope of the paper. In Simulated annealing we introduce the method and fix the notation. In Convergence we recall the reasons why finite-time guarantees for simulated annealing on continuous domains have not been obtained before. In Finite-time guaran- tees we present the main result of the paper. In Conclusions we state our findings and conclude the paper.
The paper is organized in the following sections.
1 Simulated annealing
The original formulation of simulated annealing was inspired by the analogy between the stochastic evolution of the thermodynamic state of an annealing material towards the configurations of minimal energy and the search for the global minimum of an optimization criterion [3]. In the procedure, the optimization criterion plays the role of the energy and the state of the annealed material is simulated by the evolution of the state of an inhomogeneous Markov chain. The state of the chain evolves according to the Metropolis-Hastings algorithm in order to simulate the Boltzmann distribution of thermodynamic equilibrium. The Boltzmann distribution is simulated for a decreasing sequence of temperatures (“cooling”). The target distribution of the cooling procedure is the limiting Boltzmann distribution, for the temperature that tends to zero, which takes non-zero values only on the set of global minimizers [7].
The original formulation of the method was for a finite domain. However, simulated anneal- ing can be generalized straightforwardly to a continuous domain because the Metropolis-Hastings algorithm can be used with almost no differences on discrete and continuous domains The main difference is that on a continuous domain the equilibrium distributions are specified by probability densities. On a continuous domain, Markov transition kernels in which the distribution of the el- ements visited by the chain converges to an equilibrium distribution with the desired density can be constructed using the Metropolis-Hastings algorithm and the general family of MCMC methods [22].
We point out that Boltzmann distributions are not the only distributions which can be adopted as equilibrium distributions in simulated annealing [7]. In this paper it is convenient for us to adopt a different type of equilibrium distribution in place of Boltzmann distributions.