Authors

Marina Meila, Michael Jordan

Abstract

When triangulating a belief network we aim to obtain a junction tree of minimum state space. According to (Rose, 1970), searching for the optimal triangulation can be cast as a search over all the permutations of the graph's vertices. Our approach is to embed the discrete set of permutations in a convex continuous domain D. By suitably extending the cost function over D and solving the continous nonlinear optimization task we hope to obtain a good triangulation with respect to the aformentioned cost. This paper presents two ways of embedding the triangulation problem into continuous domain and shows that they perform well compared to the best known heuristic.

1

INTRODUCTION. WHAT IS TRIANGULATION?

Belief networks are graphical representations of probability distributions over a set of variables. In what follows it will be always assumed that the variables take values in a finite set and that they correspond to the vertices of a graph. The graph's arcs will represent the dependencies among variables. There are two kinds of representations that have gained wide use: one is the directed acyclic graph model, also called a Bayes net, which represents the joint distribution as a product of the probabilities of each vertex conditioned on the values of its parents; the other is the undirected graph model, also called a Markov field, where the joint distribution is factorized over the cliques! of an undirected graph. This factorization is called a junction tree and optimizing it is the subject of the present paper. The power of both models lies in their ability to display and exploit existent marginal and conditional independencies among subsets of variables. Emphasizing independencies is useful

1 A clique is a fully connected set of vertices and a maximal clique is a clique that is

not contained in any other clique.

558

M. Meilii and M. /. Jordan

from both a qualitative point of view (it reveals something about the domain under study) and a quantitative one (it makes computations tractable). The two models differ in the kinds of independencies they are able to represent and often times in their naturalness in particular tasks. Directed graphs are more convenient for learning a model from data; on the other hand, the clique structure of undirected graphs organizes the information in a way that makes it immediately available to inference algorithms. Therefore it is a standard procedure to construct the model of a domain as a Bayes net and then to convert it to a Markov field for the purpose of querying it.

This process is known as decomposition and it consists of the following stages: first, the directed graph is transformed into an undirected graph by an operation called moralization. Second, the moralized graph is triangulated. A graph is called triangulated if any cycle of length> 3 has a chord (i.e. an edge connecting two nonconsecutive vertices). If a graph is not triangulated it is always possible to add new edges so that the resulting graph is triangulated. We shall call this procedure triangulation and the added edges the fill-in. In the final stage, the junction tree (Kjrerulff, 1991) is constructed from the maximal cliques of the triangulated graph. We define the state space of a clique to be the cartesian product of the state spaces of the variables associated to the vertices in the clique and we call weight of the clique the size of this state space. The weight of the junction tree is the sum of the weights of its component cliques. All further exact inference in the net takes place in the junction tree representation. The number of computations required by an inference operation is proportional to the weight of the tree.

For each graph there are several and usually a large number of possible triangu(cid:173) lations, with widely varying state space sizes. Moreover, triangulation is the only stage where the cost of inference can be influenced. It is therefore critical that the triangulation procedure produces a graph that is optimal or at least "good" in this respect.

Unfortunately, this is a hard problem. No optimal triangulation algorithm is known to date. However, a number of heuristic algorithms like maximum cardinality search (Tarjan and Yannakakis, 1984), lexicographic search (Rose et al., 1976) and the minimum weight heuristic (MW) (Kjrerulff, 1990) are known. An optimization method based on simulated annealing which performs better than the heuristics on large graphs has been proposed in (Kjrerulff, 1991) and recently a "divide and conquer" algorithm which bounds the maximum clique size of the triangulated graph has been published (Becker and Geiger, 1996). All but the last algorithm are based on Rose's (Rose, 1970) elimination procedure: choose a node v of the graph, connect all its neighbors to form a clique, then eliminate v and all the edges incident to it and proceed recursively. The resulting filled-in graph is triangulated.

It can be proven that the optimal triangulation can always be obtained by applying Rose's elimination procedure with an appropriate ordering of the nodes. It follows then that searching for an optimal triangulation can be cast as a search in the space of all node permutations. The idea of the present work is the following: embed the discrete search space of permutations of n objects (where n is the number of vertices) into a suitably chosen continuous space. Then extend the cost to a smooth function over the continuous domain and thus transform the discrete optimization problem into a continuous nonlinear optimization task. This allows one to take advantage of the thesaurus of optimization methods that exist for continuous cost functions. The rest of the paper will present this procedure in the following sequence: the next section introduces and discusses the objective function; section 3 states the continuous version of the problem; section 4 discusses further aspects of the optimization procedure and presents experimental results and section 5 concludes

Triangulation by Continuous Embedding