Unified Inference for Variational Bayesian Linear Gaussian State-Space Models

Barber, David; Chiappa, Silvia

Unified Inference for Variational Bayesian Linear Gaussian State-Space Models

David Barber, Silvia Chiappa

Advances in Neural Information Processing Systems 19 (NIPS 2006)

Abstract

Linear Gaussian State-Space Models are widely used and a Bayesian treatment of parameters is therefore of considerable interest. The approximate Variational Bayesian method applied to these models is an attractive approach, used successfully in applications ranging from acoustics to bioinformatics. The most challenging aspect of implementing the method is in performing inference on the hidden state sequence of the model. We show how to convert the inference problem so that standard Kalman Filtering/Smoothing recursions from the literature may be applied. This is in contrast to previously published approaches based on Belief Propagation. Our framework both simplifies and unifies the inference problem, so that future applications may be more easily developed. We demonstrate the elegance of the approach on Bayesian temporal ICA, with an application to finding independent dynamical processes underlying noisy EEG signals.

Linear Gaussian State-Space Models

Linear Gaussian State-Space Models (LGSSMs)1 are fundamental in time-series analysis [1, 2, 3]. In these models the observations v1:T 2 are generated from an underlying dynamical system on h1:T according to: v v vt = B ht + t , t N (0V , V ), h h ht = Aht-1 + t , t N (0H , H ) ,

where N (, ) denotes a Gaussian with mean and covariance , and 0X denotes an X dimensional zero vector. The observation vt has dimension V and the hidden state ht has dimension H . Probabilistically, the LGSSM is defined by: p(v1:T , h1:T |) = p(v1 |h1 )p(h1 ) tT p(vt |ht )p(ht |ht-1 ),

with p(vt |ht ) = N (B ht , V ), p(ht |ht-1 ) = N (Aht-1 , H ), p(h1 ) = N (, ) and where = {A, B , H , V , , } denotes the model parameters. Because of the widespread use of these models, a Bayesian treatment of parameters is of considerable interest [4, 5, 6, 7, 8]. An exact implementation of the Bayesian LGSSM is formally intractable [8], and recently a Variational Bayesian (VB) approximation has been studied [4, 5, 6, 7, 9]. The most challenging part of implementing the VB method is performing inference over h1:T , and previous authors have developed their own specialized routines, based on Belief Propagation, since standard LGSSM inference routines appear, at first sight, not to be applicable. 1 2

Also called Kalman Filters/Smoothers, Linear Dynamical Systems. v1:T denotes v1 , . . . , vT .

A key contribution of this paper is to show how the Variational Bayesian treatment of the LGSSM can be implemented using standard LGSSM inference routines. Based on the insight we provide, any standard inference method may be applied, including those specifically addressed to improve numerical stability [2, 10, 11]. In this article, we decided to describe the predictor-corrector and Rauch-Tung-Striebel recursions [2], and also suggest a small modification that reduces computational cost. The Bayesian LGSSM is particularly of interest when strong prior constraints are needed to find adequate solutions. One such case is in EEG signal analysis, whereby we wish to extract sources that evolve independently through time. Since EEG is particularly noisy [12], a prior that encourages sources to have preferential dynamics is advantageous. This application is discussed in Section 4, and demonstrates the ease of applying our VB framework.

Bayesian Linear Gaussian State-Space Models

In the Bayesian treatment of the LGSSM, instead of considering the model parameters as fixed, ^ ^ we define a prior distribution p(|), where is a set of hyperparameters. Then: ^ ^ p(v1:T |) = p(v1:T |)p(|) . (1) In a full Bayesian treatment we would define additional prior distributions over the hyperparameters ^ . Here we take instead the ML-II (`evidence') framework, in which the optimal set of hyperpa^ ^ rameters is found by maximizing p(v1:T |) with respect to [6, 7, 9]. For the parameter priors, here we define Gaussians on the columns of A and B 3 : p(A|, H ) jH e- j 2

Abstract

Name Change Policy