{"title": "Blind Separation of Filtered Sources Using State-Space Approach", "book": "Advances in Neural Information Processing Systems", "page_first": 648, "page_last": 656, "abstract": null, "full_text": "Blind Separation of Filtered Sources \n\nUsing State-Space Approach \n\nLiqing Zhang\u00b7  and Andrzej  Cichockit \nLaboratory for  Open Information Systems, \n\nBrain Science Institute,  RIKEN \n\nSaitama 351-0198, Wako shi,  JAPAN \n\nEmail:  {zha.cia}@open.brain.riken.go.jp \n\nAbstract \n\nIn  this  paper  we  present  a  novel  approach  to  multichannel  blind \nseparation/generalized deconvolution,  assuming that  both mixing \nand demixing models are described by stable linear state-space sys(cid:173)\ntems.  We  decompose  the  blind  separation  problem  into  two  pro(cid:173)\ncess:  separation and state estimation.  Based on  the minimization \nof Kullback-Leibler  Divergence,  we  develop  a  novel  learning algo(cid:173)\nrithm to train the matrices in the output equation.  To estimate the \nstate  of the  demixing  model,  we  introduce  a  new  concept,  called \nhidden  innovation,  to  numerically  implement  the  Kalman  filter. \nComputer simulations are given  to show  the validity  and  high  ef(cid:173)\nfectiveness of the state-space approach. \n\n1 \n\nIntrod uction \n\nThe field  of blind separation and deconvolution has grown dramatically during re(cid:173)\ncent years due to its similarity to the separation feature in human brain, as well as its \nrapidly  growing applications in  various fields,  such  as telecommunication  systems, \nimage enhancement and biomedical signal processing.  The blind source separation \nproblem  is  to  recover  independent  sources from  sensor  outputs  without  assuming \nany priori knowledge of the original signals besides certain statistic features.  Refer \nto review papers [lJ  and [5J  for  the current state of theory and methods in the field. \nAlthough  there exist  a  number of models  and methods,  such  as  the infomax,  nat(cid:173)\nural gradient approach and equivariant adaptive algorithms, for  separating blindly \nindependent sources, there still are several challenges in generalizing mixture to dy-\n\n\u00b7On leave from  South China University of Technology,  China \ntan leave from  Warsaw  University of Technology,  Poland \n\n\fBlind Separation of Filtered Sources \n\n649 \n\nnamic  and  nonlinear systems,  as  well  as in  developing  more rigorous and  effective \nalgorithms with general convergence.[1-9],  [11-13] \nThe  state-space  description  of systems  is  a  new  model  for  blind  separation  and \ndeconvolution[9,12].  There are several reasons why we use linear state-space systems \nas  blind  deconvolution  models.  Although  transfer function  models  are  equivalent \nto  the  state-space  ones,  it  is  difficult  to  exploit  any  common  features  that  may \nbe  present  in  the  real  dynamic  systems.  The  main  advantage of the  state space \ndescription for  blind deconvolution is that it not only gives the internal description \nof a  system,  but there are various equivalent  types of state-space realizations for  a \nsystem,  such as  balanced realization and observable canonical forms.  In  particular \nit is known how to parameterize some specific classes of models which are of interest \nin applications.  Also  it  is  much  easy to tackle the stability problem of state-space \nsystems  using  the  Kalman  Filter.  Moreover,  the state-space  model  enables much \nmore general  description  than standard  finite  impulse  response  (FIR)  convolutive \nfiltering.  All known filtering (dynamic) models, like AR, MA,  ARMA, ARMAX and \nGamma filterings,  could  also  be  considered  as  special  cases of flexible  state-space \nmodels. \n\n2  Formulation of Problem \n\nAssume that the source signals are a stationary zero-mean i.i.d processes and mutu(cid:173)\nally statistically independent.  Let  s(t) = (SI (t),\"', sn(t))  be an unknown vector of \nindependent Li.d.  sources.  Suppose that the mixing model is  described by  a stable \nlinear state discrete-time system \n\nx(k + 1) \nu(k) \n\nAx(k) + Bs(k) + Lep(k), \nCx(k) + Ds(k) + 6(k), \n\n(1) \n(2) \nwhere x E RT is the state vector of system, s(k)  E R n  is the vector of source signals \nand  u(k)  E  R m  is  the  vector  of sensor  signals.  A,  B,  C  and  D  are  the  mixing \nmatrices of the state space model with consistent dimensions.  ep(k) is the process \nnoise  and  6(k)  is  sensor  noise  of the mixing  system.  If we  ignore  the  noise  terms \nin the mixing model, its transfer function matrix is described by a m  x n  matrix of \nthe form \n\nH(z) =  C(zI - A)-l B  + D, \n\n(3) \n\nwhere  Z-1  is  a  delay operator. \n\nWe  formulate  the  blind  separation  problem  as  a  task  to  recover  original  signals \nfrom  observations u(t)  without prior knowledge on the source signals and the state \nspace matrices  [A, B, C, D] besides certain statistic features  of source signals.  We \npropose that the demixing model  here  is  another linear  state-space system,  which \nis described  as follows,  (see  Fig.  1) \n\nx(k + 1)  =  Ax(k) + Bu(k) + LeR(k), \n\n(4) \n(5) \nwhere  the  input  u(k)  of the  demixing  model  is  just  the  output  (sensor  signals) \nof the  mixing  model  and  the eR(k)  is  the  reference  model  noise.  A,  B,  C  and \nD  are  the  demixing  matrices  of consistent  dimensions.  In  general,  the  matrices \nW  =  [A, B, C, D, L] are parameters to be determined in  learning process. \n\ny(k)  =  Cx(k) + DU(k), \n\nFor  simplicity,  we  do  not  consider,  at  this  moment,  the  noise  terms  both  in \nthe  mixing  and  demixing  models.  The  transfer  function  of  the  demixing  model \nis  W(z)  =  C(zI - A)-1 B  + D.  The output y(k)  is  designed  to recover the source \nsignals in the following  sense \n\ny(k) = W(z)H(z)s(k) = PA(z)s(k), \n\n(6) \n\n\f650 \n\nL. Zhang and A. Cichocki \n\nu(k) \n\nFigure 1:  General state-space model for  blind deconvolution \n\nwhere  P  is  any  permutation  matrix  and  A(z)  is  a  diagonal  matrix  with  Aiz-Ti \nin  diagonal  entry  (i,i),  here  Ai  is  a  nonzero  constant  and  Ti  is  any  nonnegative \ninteger.  It is  easy to see that the linear state space model  mixture is  an extension \nof instantaneous mixture.  When  both the  matrices  A, B, C  in  the  mixing  model \nand  A, B, C  in the demixing model are null  matrices,  the problem is simplified  to \nstandard leA problem  [1-8]. \nThe question here is whether exist matrices [A, B, C, D] in the demixing model  (4) \nand  (5),  such  that its transfer function  W(z)  satisfies  (6).  It is  proven  [12]  that if \nthe  matrix  D  in  the mixing model  is  of full  rank,  rank(D)  =  n,  then  there exist \nmatrices  [A, B, C, D], such that the output signal  y  of state-space system  (4)  and \n(5)  recovers the independent source signal  8  in the sense of (6). \n\n3  Learning Algorithm \n\nAssume that p(y, W),Pi(Yi, W) are the joint probability density function of y  and \nmarginal pdf of Yi,  (i  =  1\" \n. \"  n)  respectively.  We  employ  the mutual information \nof the output signals, which measures the mutual independence of the output signals \nYi(k),  as a  risk function  [1,2] \n\nl(W) =  -H(y, W) + L H(Yi, W), \n\nn \n\ni=l \n\n(7) \n\nwhere \n\nH(y, W) =  - J p(y, W)logp(y, W)dy,  H(Yi, W) =  - J Pi(Yi, W)logpi(Yi, W)dYi. \n\nIn this paper we  do not directly develop  learning algorithms to update all param(cid:173)\neters W  = [A, B, C, D] in  demixing model.  We  separate the  blind  deconvolution \nproblem  into  two  procedures:  separation  and  state-estimation.  In  the  separation \nprocedure we  develop  a  novel  learning algorithm,  using  a  new  search direction,  to \nupdate  the  matrices  C  and  D  in  output  equation  (5).  Then  we  define  a  hidden \ninnovation of the output and use Kalman filter  to estimate the state vector x(k). \nFor simplicity we  suppose that the matrix D  in  the demixing model  (5)  is  nonsin(cid:173)\ngular  n  x  n  matrix.  From  the risk  function  (7),  we  can obtain a  cost  function  for \non line learning \n\nl(y, W) =  -2logdet(DT D)  - L logpi(Yi, W), \n\nn \n\n1 \n\n(8) \n\ni=l \n\n\fBlind Separation  of Filtered Sources \n\n651 \n\nwhere  det(DT D)  is  the determinant  of symmetric positive  definite  matrix  DT D. \nFor  the gradient  of I  with  respect  to  W,  we  calculate  the  total  differential  dl  of \nl(y, W) when we  takes a  differential dW on W \n\ndl(y, W)  =  l(y, W  + dW) -l(y, W). \n\n(9) \n\nFollowing Amari's derivation  for  natural gradient methods [1-3],  we  have \n\ndl(y, W) =  -tr(dDD- I ) + cpT(y)dy, \n\n(10) \nis  the  trace  of a  matrix  and  cp(y)  is  a  vector  of  nonlinear  activation \n\nwhere  tr \nfunctions \n\n(11) \n\n(12) \n\n(13) \n\nCPi(Yi)  = - dlogpi(Yi)  = _P~(Yi). \nPi(Yi) \n\ndYi \n\nTaking the derivative on equation  (5),  we  have following  approximation \n\ndy =  dCx(k) + dDu(k). \n\nOn the other hand, from  (5),  we  have \n\nu(k) =  D- I (y(k) - Cx(k)) \n\nSubstituting (13)  into  (12),  we  obtain \n\n(14) \nIn order to improve the computing efficiency of learning algorithms, we  introduce a \nnew  search direction \n\ndy =  (dC - dDD-IC)x + dDD-ly. \n\n=  dC-dDD-IC , \n\ndX 2  =  dDD- I . \n\n(15) \n(16) \n\nThen the total differential dl  can be expressed by \n\n(17) \nIt is  easy  to obtain  the derivatives  of the  cost  function  I  with  respect  to matrices \nXl  and X 2  as \n\ndl =  -tr(dX 2) + cpT(y)(dXIX + dX2Y)' \n\ncp(y(k))XT(k), \n\ncp(y(k))yT (k)  - I. \n\n(18) \n\n(19) \n\nFrom (15)  and  (16),  we derive a novel learning algorithm to update matrices C  and \nD. \n\n~C(k)  = \n~D(k)  = \n\n'T]  (-cp(y(k))xT(k) + (I - cp(y(k))yT(k))C(k)) , \n'T]  (I - cp(y(k))yT(k)) D(k). \n\n(20) \n(21) \n\nThe equilibrium points of the learning algorithm satisfy the following equations \n\nE[cp(y(k))XT(k)]  =  0, \nE  [I - cp(y(k))yT (k)]  =  O. \n\n(22) \n(23) \n\nThis  means  that  separated  signals  y  could  achieve  as  mutually  independent  as \npossible if the nonlinear activation function  cp(y)  are,suitably chosen and the state \nvector x(k) is well  estimated.  From  (20)  and  (21),  we  see that the natural gradient \nlearning  algorithm  [2]  is  covered  as  a  special  case  of the  learning algorithm  when \nthe mixture is  simplified  to instantaneous case. \n\nThe above derived learning algorithm enable to solve the blind separation problem \nunder assumption that state matrices A  and B  are known or designed appropriately. \nIn the next section instead of adjusting state matrices A  and B  directly,  we propose \nnew  approaches how  to estimate state vector x. \n\n\f652 \n\nL.  Zhang and A.  Cichocki \n\n4  State Estimator \n\nFrom output equation (5), it is observed that if we can accurately estimate the state \nvector  x(k)  of the system,  then  we  can separate mixed  signals  using the learning \nalgorithm  (20)  and  (21). \n\n4.1  Kalman Filter \n\nThe Kalman filter  is  a  useful  technique to estimate the state vector  in  state-space \nmodels.  The function of the Kalman Filter is to generate on line the state estimate \nof the state x(k).  The Kalman filter  dynamics are given as follows \n\nx(k + 1)  =  Ax(k) + BU(k) + Kr(k) + eR(k), \n\n(24) \n\nwhere  K  is  the  Kalman  filter  gain  matrix,  and  r(k)  is  the  innovation  or  residual \nvector  which  measures  the error  between  the  measured(or  expected)  output  y(k) \nand  the  predicted  output  Cx(k) + Du(k).  There  are  varieties  of algorithms  to \nupdate the Kalman filter  gain matrix K  as well  as the state x(k), refer  to [10]  for \nmore details. \n\nHowever  in  the blind  deconvolution  problem  there exists  no  explicit  residual  r(k) \nto  estimate  the  state  vector  x(k)  because  the  expected  output  y(t)  means  the \nunavailable source signals.  In order to solve this problem, we present a new concept \ncalled hidden innovation to implement the Kalman filter in blind deconvolution case. \nSince updating matrices C  and D  will produces an innovation in each learning step, \nwe  introduce a  hidden innovation as follows \n\nr(k) = b.y(k) = t1Cx(k) + t1Du(k), \n\n(25) \nwhere t1C =  C(k + 1) - C(k) and t1D =  D(k + 1) - D(k).  The hidden innovation \npresents the adjusting direction  of the output  of the demixing system and  is  used \nto generate an a  posteriori  state estimate.  Once  we  define  the hidden  innovation, \nwe  can employ  the commonly used  Kalman filter to estimate the state vector  x(k), \nas  well  as  to  update  Kalman gain  matrix  K .  The  updating  rule  in  this  paper  is \ndescribed as follows: \n(1)  Compute the Kalman gain matrix \n\nK(k) =  P(k)C(kf(C(k)P(k)CT(k) + R(k))-l \n\n(2)  Update state vector with hidden  innovation \n\nx(k)  =  x(k) + K(k)r(k) \n\n(3)  Update the error covariance matrix \n\nP(k)  =  (I - K(k)C(k))P(k) \n\n(4)  evaluate the state vector ahead \n\n(5)  evaluate the error covariance matrix ahead \n\nXk+l  =  A(k)x(k) + B(k)u(k) \n\nP(k)  =  A(k)P(k)A(kf + Q(k) \n\nwith  the initial condition P(O) =  I,  where Q(k),  R(k)  are the covariance matrices \nof the noise vector eR  and output measurement noise nk. \nThe theoretic problems, such as convergence and stability, remain to be elaborated. \nSimulation  experiments  show  that  the  proposed  algorithm,  based  on  the  Kalman \nfilter,  can separate the convolved signals well. \n\n\fBlind Separation  of Filtered Sources \n\n653 \n\n4.2 \n\nInformation Back-propagation \n\nAnother solution to estimating the state of a  system is  to propagate backward the \nmutual information.  If we  consider the cost function is also a function of the vector \nx, than we  have the partial derivative of l(y, W) with  respect to x \n\n8l(y , W)  =  C T \n\n8x \n\n( \n\n) \ncp  Y  . \n\nThen we  adjust the state vector x(k)  according to the following  rule \n\nThen the estimated state vector is  used  as a  new state of the system. \n\nx(k) =  x(k) - TlC(kf cp(y(k)). \n\n(26) \n\n(27) \n\n5  Numerical Implementation \n\nSeveral  numerical  simulations have  been  done  to demonstrate the  validity  and  ef(cid:173)\nfectiveness  of the proposed algorithm.  Here we  give  a  typical example \nExample  1.  Consider the following  MIMO  mixing model \n\n10 \n\nU(k) + L AiU(k - i)  =  s(k) + L BiS(k - i) + v(k), \n\n10 \n\ni=l \n\ni=l \n\nwhere u, s, v  E R 3 ,  and \n\nA2 \n\nAlO  = \n\nB8 \n\n-0.16  -0.64 ), \n\n-0.48 \n-0.24 \n-0.16  -0.08 \n\n-0.48 \n-0.16 \n-0.16 \n0.32  0.19  0.38  ) \n0.16  0.29  0.20 \n0.08  0.08  0.10 \n-0.40 \n-0.08 \n-0.08 \n\n-0.08 \n-0.40 \n-0.08 \n\nA8  = \n\nB2 \n\nBlO \n\n-0.40  ) \n-0.20 \n-0.10 \n\n-0.10 \n-0.50 \n-0.10 \n\n-0.50 \n-0.10 \n-0.10 \n0.42  0.21  0.,4) \n0.10  0.56  0.14 \n, \n0.21  0.21  0.35 \n-0.19 \n-0.11 \n-0.16 \n\n-0.15  -0.,0) \n-0.27  -0.12 \n-0.18 \n-0.22 \n\n, \n\n, \n\n, \n\n-0.08 ), \n\n-0.16 \n-0.56 \n\nand  other  matrices  are  set  to  the  null  matrix.  The  sources  s  are  chosen  to  be \nLLd  signals uniformly distributed in the range (-1,1),  and v  are the Gaussian noises \nwith zero mean and a covariance matrix 0.11.  We  employ the state space approach \nto separate mixing signals.  The nonlinear activation function  is  chosen  cp(y)  =  y3. \nThe  initial  value  for  matrices  A  and  B  in  the  state  equation  are  chosen  as  in \ncanonical  controller  form.  The initial  values  for  matrix  C  is  set  to null  matrix or \ngiven  randomly  in  the  range  (-1,1) ,  and  D  =  1 3 .  A  large  number  of simulations \nshow  that  the state space  method  can easily  recover  source  signals  in  the sense  of \nW(z)H( z )  =  PA.  Figure  2  illustrates  the  coefficients  of global  transfer function \nG(z)  =  W( z )H(z )  after  3000  iterations,  where  the  (i,j)th  sub-figure  plots  the \ncoefficients of the transfer function  Gij (z)  =  E~o gijkZ-k  up  to order of 50. \n\nReferences \n\n[1]  S.  Amari and A.  Cichocki,  \"Adaptive blind signal processing- neural network \n\napproaches\",  Proceedings  of the  IEEE,  86(10):2026-2048, 1998. \n\n[2]  S.  Amari,  A.  Cichocki,  and  H.H.  Yang,  \"A  new  learning algorithm for  blind \nsignal separation\",  Advances  in  Neural  Information  Processing  Systems  1995 \n(Boston,  MA:  MIT Press,  1996),  pp.  752- 763. \n\n\f654 \n\nL.  Zhang and A. Cichocki \n\nG(z) 1 1 \n\nG(Z) 1 2 \n\nG(Z) '3 \n\no \n\no \n\n~ \n\na \n\n~ \n\n~ \n\na \n\n~ \n(3(:) 21 \n\n~ \nG(Z)22 \n\n00 \nG(Z) :!3 \n\n_~CJ _~CJ~Cl \n~CJ ~CJ ~CJ \n_~CJ }:~ _~CJ \n~CJ ~CJ ~CJ \n_~c:J r~~ _;CJ \n\n~CJ ~CJ ~CJ \n\n00 \n(3(Z) 3 1 \n\n00 \nG (Z')33 \n\n00 \nG(Z)32 \n\n40 \n\n00 \n\n~ \n\n0 \n\n00 \n\n~ \n\n~ \n\na \n\n~ \n\na \n\n00 \n\n~ \n\na \n\na \n\nFigure 2:  The coefficients of global transfer function  after 3000 iterations \n\n[3]  S.  Amari  \"Natural gradient works efficiently in learning\",  Neural  Computation, \n\nVoLlO,  pp251-276,  1998. \n\n[4]  A.  J.  Bell  and  T.  J.  Sejnowski,  \"An information-maximization  approach  to \nblind  separation  and  blind  deconvolution\",  Neural  Computation,  Vol. 7,  pp \n1129-1159, 1995. \n\n[5]  J.-F  Cardoso,  \"Blind  signal  separation:  statistical principles\",  Proceedings \n\nof the  IEEE,  86(10):2009-2025, 1998. \n\n[6]  J.-F. Cardoso and B. Laheld,  \"Equivariant adaptive source separation,\"  IEEE \n\nTrans.  Signal Processing,  vol.  SP-43, pp. 3017-3029, Dec.  1996. \n\n[7]  A.Cichocki and R.  Unbehauen,  \"Robust neural networks with on-line learning \nfor  blind  identification  and  blind  separation of sources\"  IEEE Trans  Circuits \nand  Systems  I  :  Fundamentals  Theory  and  Applications,  vol  43,  No.Il,  pp. \n894-906, Nov.  1996. \n\n[8]  P.  Comon,  \"Independent  component  analysis:  a  new  concept?\",  Signal  Pro(cid:173)\n\ncessing,  vol.36,  pp.287- 314,  1994. \n\n[9]  A.  Gharbi  and  F.  Salam,  \"Algorithm for  blind  signal  separation  and  recov(cid:173)\n\nery in  static and  dynamics environments\",  IEEE Symposium  on  Circuits  and \nSystems,  Hong Kong,  June,  713-716,  1997. \n\n[10]  O.  L.  R.  Jacobs,  \"Introduction to  Control  Theory\",  Second  Edition,  Oxford \n\nUniversity  Press,  1993. \n\n[11]  T.  W.  Lee,  A.J.  Bell,  and  R.  Lambert,  \"Blind  separation  of  delayed  and \n\nconvolved sources\",  NIPS 9,  1997, MIT Press, Cambridge MA,  pp758-764. \n\n[12]  L.  -Q. Zhang and A.  Cichocki,  \"Blind deconvolution/equalization using state(cid:173)\nspace models\",  Proc.  '98 IEEE Signal Processing Society Workshop on NNSP, \nppI23-131 , Cambridge, 1998. \n\n[13]  S.  Choi,  A.  Cichocki  and  S.  Amari,  \"Blind  equalization of simo  channels  via \nspatio-temporal anti-Hebbian learning rule\",  Proc. '98 IEEE Signal Processing \nSociety Workshop on  NNSP,  pp93-102, Cambridge, 1998. \n\n\fPART V \n\nIMPLEMENTATION \n\n\f\f", "award": [], "sourceid": 1568, "authors": [{"given_name": "Liqing", "family_name": "Zhang", "institution": null}, {"given_name": "Andrzej", "family_name": "Cichocki", "institution": null}]}