{"title": "Multi Dimensional ICA to Separate Correlated Sources", "book": "Advances in Neural Information Processing Systems", "page_first": 993, "page_last": 1000, "abstract": null, "full_text": "Multi  Dimensional ICA to  Separate \n\nCorrelated  Sources \n\nRoland  Vollgraf,  Klaus  Obermayer \n\nDepartment of Electrical Engineering and  Computer Science \n\nTechnical  University of Berlin Germany \n\n{ vro, oby} @cs.tu-berlin.de \n\nAbstract \n\nWe present a new method for the blind separation of sources, which \ndo not fulfill the independence assumption.  In contrast to standard \nmethods  we  consider  groups  of  neighboring  samples  (\"patches\") \nwithin the observed mixtures. \nFirst  we  extract independent  features  from  the  observed  patches. \nIt turns out that the average dependencies  between these features \nin  different  sources  is  in  general  lower  than  the  dependencies  be(cid:173)\ntween  the  amplitudes  of different  sources.  We  show  that it  might \nbe  the  case  that  most  of  the  dependencies  is  carried  by  only  a \nsmall  number  of features. \nIs  this  case  - provided  these  features \ncan  be  identified  by  some  heuristic  - we  project  all  patches  into \nthe subspace  which  is  orthogonal to  the subspace  spanned by  the \n\"correlated\"  features. \nStandard ICA is then performed on the elements of the transformed \npatches  (for  which  the  independence  assumption  holds)  and  ro(cid:173)\nbustly yields a  good estimate of the mixing matrix. \n\n1 \n\nIntroduction \n\nICA  as  a  method for  blind source separation has been proven  very useful  in a  wide \nrange  of  statistical  data  analysis.  A  strong  criterion,  that  allows  to  detect  and \nseparate linearly  mixed  source signals from  the observed mixtures, is  the indepen(cid:173)\ndence of the source signals amplitude distribution.  Many contrast functions rely on \nthis  assumption,  e.g.  in  the way,  that they  estimate the  Kullback-Leibler  distance \nto  a  (non-Gaussian)  factorizing  multivariate distribution  [1 ,  2, 3].  Others consider \nhigher  order  moments  of the  source  estimates  [4,  5].  Naturally  these  algorithms \nfail  when the independence assumption does  not hold.  In such situations it can be \nvery useful  to consider temporal/spatial statistical properties of the  source  signals \nas well.  This has been done in form of suitable linear filtering  [6]  to achieve a sparse \nand independent  representation of the signals.  In  [7]  the author suggests to model \nthe  sources  as  a  stochastic  process  and  to  do  the  ICA  on  the  innovations  rather \nthan on the signals them self. \nIn this work we extend the ICA to multidimensional channels of neighboring realiza(cid:173)\ntions.  The used data model is explained in detail in the following section.  In section \n3  it  will  be  shown,  that there  are optimal  features,  that  carry lower  dependencies \n\n\fbetween  the  sources  and  can  be  used  for  source  separation.  A  heuristic  is  intro(cid:173)\nduced,  that  allows  to  discard  those features,  that carry most  of the  dependencies. \nThis leads to the Two-Step algorithm described in section 4.  Our method requires \n(i)  sources  which  exhibit  correlations  between  neighboring  pixels  (e.g.  continuous \nsources like images or sound signals), and (ii)  sources from which sparse and almost \nindependent features can be extracted.  In section 5 we  show separation results and \nbenchmarks for  linearly mixed passport photographs.  The method is  fast  and pro(cid:173)\nvides  good  separation  results  even  for  sources,  whose  correlation  coefficient  is  as \nlarge as  0.9. \n\n2  Sources  and  observations \n\nLet  us  consider  a  set  of N  source  signals  Si(r),  i  =  1, ... , N  of length  L,  where \nr  is  a  discrete  sample  index.  The  sample  index  could  be  of arbitrary  dimension, \nbut we  assume that it belongs to some metric space so that neighborhood relations \ncan  be  defined.  The  sample  index  might  be  a  scalar  for  sources  which  are  time \nseries and a  two-dimensional vector for  sources which are images1 .  The sources are \nlinearly combined by an unknown mixing matrix A  of full  rank to produce a  set of \nN  observations Xi(r), \n\nN \n\nXi(r)  =  l: AijSj(r)  , \n\nj =l \n\n(1) \n\nand we assume that the mixing process is stationary, i.e. that the mixing matrix A is \nindependent of r.  In the following we refer to the vectors S(r)  =  (Sl (r), ... ,SN(r))T \nand X(r)  =  (X1(r), ... , XN(r))T as  a  source and an observation stack.  The goal is \nto find an appropriate demixing matrix W  which - when applied to the observations \nX(r)  - recovers good estimates S(r), \n\nS(r)  =  WX(r)  ~ S(r) \n\n(2) \n\nof the original source signals (up to a permutation and scaling of the sources).  Since \nthe mixing matrix A  is  not known its inverse W  has to be detected blindly, i.e. only \nproperties of the sources which are detectable in the mixtures can be exploited.  For \na large class of ICA algorithms one assumes that the sources are non-Gaussian and \nindependent,  i.e.  that the random vector S  which is  sampled by L  realizations \n\nS: {S(rd,  1= I, ... ,L} \n\n(3) \n\nhas  a  factorizing  and  non-Gaussian  joint  probability  distribution2 .  In  situations, \nhowever,  where  the  independence  assumption  does  not  hold,  it  can  be  helpful  to \ntake  into  account  spatial  dependencies,  which  can  be  very  prominent  for  natural \nsignals,  and have  been  subject for  a  number of blind source separation algorithms \n[8,  9,  6].  Let  us  now  consider patches si(r), \n\ns(r)  = \n\n(4) \n\n1 In the following  we will  mostly consider images,  hence we  will  refer  to  the abovemen(cid:173)\n\ntioned neighborhood relations  as  spatial  relations. \n\n2In the following,  symbols without sample index will refer to the random variable rather \n\nthan to the particular realization. \n\n\fof  M  \u00ab  L  neighboring  source  samples.  si(r)  could  be  a  sequence  of M  adjacent \nsamples of an audio signal or a  rectangular patch of M  pixels in an image.  Instead \nof L  realizations  of a  random  N-vector  S  (cf.  eq.  (3))  we  now  obtain  a  little  less \nthan L  realizations of a  random N  x  M  matrix s, \n\ns: {s(r)}. \n\nBecause of the stationarity of the mixing process we  obtain \n\nx  =  As \n\nand \n\ns =  Wx, \n\n(5) \n\n(6) \n\nwhere  x  is  an N  x  M  matrix of neighboring  observations  and where  the  matrices \nA  and W  operate on every column vector of sand x. \n\n3  Optimal spatial features \n\nLet us now consider a  set of sources which are not statistically independent , i.e.  for \nwhich \n\np(S)  =  p(Slk\"'\"  SNk)  :j:.  IIp(sik) \n\nfor  all  k  =  1 ... M. \n\n(7) \n\nN \n\ni=1 \n\nOur  goal  is  to  find  in  a  first  step  a  linear  transformation  0  E  IRMxM  which  -\nwhen  applied  to  every  patch - yields  transformed  sources  u  =  sOT  for  which  the \nindependence  assumption,  p(Ulk, ... ,UNk)  =  rr~1p(Uik)  does  hold  for  all  k  = \n1 .. . M, at least approximately.  When 0  is applied to the observations x , v  =  xOT , \nwe  obtain a  modified  source separation problem \n\n(8) \n\nwhere the demixing matrix W  can be estimated from the transformed observations \nv  in  a  second  step  using  standard  ICA.  Eq. (7)  is  tantamount  to  positive  trans(cid:173)\ninformation of the source amplitudes. \n\n(9) \n\nwhere  DKL  is  the  Kullback-Leibler  distance.  As  all  elements  of the  patches  are \nequally  distributed,  this  quantity is  the  same for  all  k.  Clearly,  the  dependencies, \nthat  are carried by single  elements  of the  patches,  are also  present  between  whole \npatches,  i.e.  J(S1 , S2,\"', SN)  > O.  However, since  neighboring  samples  are  corre(cid:173)\nlated, it holds \n\nJ(S1 ,S2, \"' ,SN ) < LJ(Slk ,S2k\"\",SNk ) . \n\nM \n\nk=1 \n\n(10) \n\nOnly if the sources where spatially white and s would consist of independent column \nvectors,  this  would  hold  with  equality.  When  0  is  applied  to  the  source  patches, \nthe trans-information between patches is  not changed, provided 0  is  a  non-singular \ntransformation.  Neither  information  is  introduced nor  discarded  by  this  transfor(cid:173)\nmation and it holds \n\n(11) \n\n\fFor the optimal 0  now the column vectors of u  =  sOT shall be independent.  From \n(10)  and  (11)  it follows  that \n\nM \n\nM \n\nI(u1 ,u2, \" ',uN) = 2::I(ulk,u2k\"\",uNk) < 2::I(slk ,s2k\"\",sNk) \n\n(12) \n\nk=1 \n\nk=1 \n\nThe column vectors of u  are in general not equally distributed anymore, however the \naverage trans-information has decreased to the level of information carried between \nthe  patches.  In  the experiments  we  shall  see  that this  can be sufficiently  small to \nreliably estimate the de-mixing matrix W. \nSo  it  remains  to  estimate a  matrix  0  that  provides  a  matrix  u  with  independent \ncolumns.  We  approach  this  by  estimating  0  so  that  it  provides  row  vectors  of \nu  that  have  independent  elements,  i.e.  P(Ui)  =  IT;!1 P(Uik)  for  all  i.  With  that \nand  under  the  assumption  that  all  sources  may  come  from  the  same  distribution \nand  that  there  are  no  \"cross  dependencies\"  in  u  (i.e.  p( Uik)  is  independent  from \np( Ujl)  for  k :j:.  l),  the independence  is  guaranteed also  for  whole  column  vectors of \nu.  Thus,  standard leA can be applied to patches of sources which  yields  0  as the \nde-mixing matrix.  For real world applications however, 0  has to be estimated from \nthe observations  xOT  =  v.  It holds  the relation  v  =  Au,  i.e. A  only  interchanges \nrows.  So  column vectors of u  are independent to each other if,  and only if columns \nof v  are independent3 .  Thus,  0  can be computed from  x  as  well. \nAccording  to  Eq. (12)  the  trans-information  of the  elements  of columns  of  u  has \ndecreased  in average, but not necessarily uniformly.  One can expect some columns \nto have more independent  elements  than others.  Thus, it may be advantageous to \ndetect these columns rsp. the corresponding rows of 0  and discard them prior to the \nsecond leA step.  Each source patch  Si  can be  considered as linear combination of \nindependent components, that are given by the columns of 0- 1 ,  where the elements \nof Ui  are the coefficients.  In  the result of the leA, the coefficients  have normalized \nvariance.  Therefore,  those  components,  that have  large  Euklidian  norm,  occur  as \nfeatures with high entropy in  the source patches.  At the same time it is  clear that, \nif there are features, that are responsible for the source dependencies,  these features \nhave to be present with large entropy, otherwise the source dependencies would have \nbeen low.  Accordingly we  propose a  heuristic that discards the rows  of 0  with the \nsmallest Euklidian norm prior to the second leA step.  How  many rows  have to be \ndiscarded and if this type of heuristic is  applicable at all, depends of the statistical \nnature of the sources.  In  section 5  we  show  that for  the  test  data this  heuristic  is \nwell  applicable and almost all dependencies  are contained in one feature. \n\n4  The Two-Step  algorithm \n\nThe  considerations  of the  previous  section  give  rise  to  a  Two-Step  algorithm.  In \nthe first  step the transformation  0  has  to be estimated.  Standard leA  [1,  2,  5]  is \nperformed on M -dimensional patches, which are chosen with equal probability from \nall  of the  observed  mixtures  and  at random positions.  The  positions  may overlap \nbut don't overlap the boundaries of the signals. \nThe resulting  \"demixing matrix\"  0  is  applied to the patches of observations,  gen(cid:173)\nerating a  matrix v(r)  =  x(r )OT, the columns of which are candidates for  the input \nfor  the second leA. A number of M D  columns that belong to rows of 0  with small \nnorm are discarded  as  they very likely  represent features , that carry dependencies \nbetween the sources.  M D  is  chosen  as  a  model parameter or it can be determined \nempirically, given  the data at hand  (for  instance by  detecting a  major jump in  the \n\n3We  assume non-Gaussian  distributions for  u  and v. \n\n\fincrease of the row norm of n).  For the remaining columns it is  not obvious which \none represents the most sparse and independent feature.  So any of them with equal \nprobability  now  serve  as  input  sample  for  the  second  ICA,  which  estimates  the \ndemixing matrix W. \nWhen  the  number  N  of sources  is  large,  the  first  ICA  may  fail  to  extract  the  in(cid:173)\ndependent  source  features,  because,  according  to  the  central  limit  theorem,  the \ndistribution of their  coefficients  in  the mixtures may be close to a  Gaussian distri(cid:173)\nbution.  In such a  situation we  recommend to apply the abovementioned two steps \nrepeatedly.  The  source  estimates  Wx(r)  are  used  as  input  for  the  first  ICA  to \nachieve a  better n, which in turn allows to better estimate W. \n\nFigure  1:  Results  of  standard  and  multidimensional  ICA  performed  on  a  set  of \n8  correlated  passport  images.  Top  row:  source  images;  Second  row:  linearly \nmixed sources; Third row:  separation results using kurtosis optimization (FastICA \nMatlab  package);  Bottom  row:  separation  results  using  multidimensional  ICA \n(For explanation see text). \n\n5  Numerical experiments \n\nWe  applied  our  method  to  a  linear  mixture  of 8  passport  photographs  which  are \nshown  in  Fig.  1,  top  row.  The  images  were  mixed  (d.  Fig.  1,  second  row)  using \na  matrix  whose  elements  were  chosen  randomly  from  a  normal  distribution  with \nmean  zero  and  variance  one.  The  mixing  matrix  had  a  condition  number  of 80. \nThe  correlation coefficients  of the source images  were  between  0.4  and  0.9  so  that \nstandard  ICA  methods  failed  to  recover  the  sources:  Fig. 1,  3rd  row,  shows  the \nresults of a  kurtosis optimization using the  FastICA  Matlab package4 . \nFig.  1,  bottom  row,  shows  the  result  of the  Two-Step  multidimensional  ICA  de(cid:173)\nscribed in section 4.  For better comparison images were inverted manually to appear \npositive.  In the first  step n was estimated using FastICA on 105  patches, 6 x 6 pix(cid:173)\nels in size,  which were taken with equal probability from  random positions from all \nmixtures.  The result of the first  ICA is  displayed in Fig.  2.  The top row shows the \nrow vectors of n sorted by the logarithm of their norm.  The second row shows the \nfeatures  (the corresponding columns of n - 1 )  which are extracted by n . In the dia-\n\n4http://www.cis.hut.fi/projects/ica/fastica/ \n\n\fgram below  the stars indicate the logarithm of the row  norm, log V'Lt!1 0%1'  and \nthe  squares indicate  the  mutual information  J(Ulk,U7k)  between  the  k-th features \nin  sources  1  and  7  5,  calculated  using  a  histogram  estimator.  It is  quite  promi(cid:173)\nnent  that  (i)  a  small norm of a  column vector corresponds to a  strongly correlated \nfeature,  and  (ii)  there  is  only  one  feature  which  carries  most  of the  dependencies \nbetween  the  sources.  Thus,  the first  column of v  was  discarded.  The second  ICA \nwas  applied to any of the remaining components,  chosen  randomly and with equal \nprobability.  A  comparison  between  Figs.  1,  top  and  bottom rows,  shows  that  all \nsources were successfully recovered. \n\nFigure 2:  Result  of an ICA  (kurtosis optimization)  performed on patches of obser(cid:173)\nvations  (cf.  Fig.  1,  2nd  row),  6  x  6  pixels  in  size.  Top  row:  Row  vectors  of the \ndemixing matrix O.  Second row:  Corresponding column vectors of 0- 1 .  Vectors \nare  sorted  by  increasing  norm  of the  row  vectors;  dark  and  bright  pixels  indicate \npositive  and  negative  values.  Bottom diagram:  Logarithm  of the  norm  of row \nvectors (stars) and mutual information J(Ulk' U7k)  (squares) between the coefficients \nof the corresponding features  in the source images  1 and  7. \n\nIn the  next  experiment  we  examined the influence  of selecting columns  of v  prior \nto  the  second  ICA.  In  Fig.  3  we  show  the  reconstruction  error  (cf.  appendix  A), \nthat could be achieved with the second ICA when only a  single column of v  served \nas  input.  From  the  previous  experiment  we  have  seen,  that  only  the  first  compo(cid:173)\nnent has considerable dependencies.  As  expected, only the first  column yields  poor \nreconstruction error.  Fig.  4  shows  the  reconstruction error vs.  M D  when  the  M D \nsmallest  norm  rows  of 0  (rsp.  columns  of v)  are  discarded.  We  see,  that  for  all \nvalues  a good reconstruction is  achieved (re  < 0.6).  Even if no row is  discarded the \nresult  is  only slightly worse  than for  one or two  discarded rows.  The dependencies \nof  the  first  component  are  \"averaged\"  by  the  vast  majority  of components,  that \ncarry no dependencies,  in this case.  The conspicuous large variance of the error for \nlarger numbers  M D  might be due to convergence instabilities or close  to Gaussian \ndistributed columns  of u.  In either case it gives  rise  to discard as few  components \nas possible.  To evaluate the influence of the patch size M, the Two-Step algorithm \nwas  applied  to  9  different  mixtures  of the  sources  shown  in  Fig.  1,  top  row,  and \nusing  patch  sizes  between  M  =  2  x  2  and  M  =  6  x  6.  Table  1  shows  the  mean \nand standard deviation of the achieved reconstruction error.  The mixing matrix A \nwas randomly chosen from  a  normal distribution with mean zero and variance one. \nFastICA  was used for  both steps, where 5.105  sample patches were used to extract \nthe optimal features  and  2.5.104  samples  were  used  to estimate W.  The smallest \nrow  of 0  was  always  discarded.  The algorithm shows  a  quite  robust  performance, \nand even  for  patch sizes  of 2  x  2 pixels  a  fairly  good separation result  is  achieved \n\n5Images no.  1 and 7 were chosen exemplarily as the two most strongly correlated sources. \n\n\fJl \u00b7~. ==1 !C.,,\".  : .. ::.  :':,. ::!\u00b7;::=I \n\n36 \n\n31 \n\n10 \n\n5 \n\n1 \n\n6 \n\n11 \n\n16 \n\n21 \n\n26 \n\n0 \n\nlarge  row  norm \n\nsmall  row  norm \n\nFigure 3:  Every single row of 0  used to \ngenerate input for the second leA. Only \nthe first  (smallest norm) row causes bad \nreconstruction error for  the second leA \nstep. \n\nFigure 4:  M D  rows  with smallest norm \ndiscarded.  All  values  of  M D  provide \ngood reconstruction error in the second \nstep.  Note  the  slidely  worse  result  for \nMD=O! \n\npatch size  M \n\nJ-lr e \n\n(Jre \n\n2x2 \n3x3 \n4x4 \n5x5 \n6x6 \n\n0.4361 \n0.0383 \n0.2322  0.0433 \n0.1667  0.0263 \n0.1408  0.0270 \n0.1270  0.0460 \n\nTable  1:  Separation  result  of  the  Two(cid:173)\nStep  algorithm  performed  on  a  set  of  8 \ncorrelated passport images (d. Fig.  1, top \nrow).  The table shows the average recon(cid:173)\nstruction error J-lr e  and its standard devi(cid:173)\nation (Jr e  calculated from  9 different  mix(cid:173)\ntures. \n\n(Note,  for  comparison,  that  the  reconstruction  error  of  the  separation  in  Fig.  1, \nbottom row,  was  0.2). \n\n6  Summary and  outlook \n\nWe  extended  the  source  separation  model  to  multidimensional  channels  (image \npatches).  There are two  linear transformations to be considered,  one operating in(cid:173)\nside  the channels  (0)  and one operating between the different  channels  (W).  The \ntwo  transformations  are  estimated  in  two  adjacent  leA  steps.  There  are  mainly \ntwo  advantages,  that can be taken from  the first  transformation:  (i)  By arranging \nindependence  among  the  columns  of the  transformed  patches,  the  average  trans(cid:173)\ninformation  between  different  channels  is  decreased.  (ii)  A  suitable  heuristic  can \nbe  applied  to  discard  those  columns  of the  transformed  patches,  that  carry  most \nof the dependencies between different  channels.  A heuristic,  that identifies the de(cid:173)\npendence  carrying  components  by  a  small  norm  of the  corresponding  rows  of  0 \nhas been introduced.  It shows, that for  the image data only one component carries \nmost of the dependencies.  Due this fact, the described method works well also when \nall  components  are  taken  into  account .  In  future  work,  we  are  going  to  establish \na  Maximum Likelihood model for  both transformations.  We  expect a  performance \ngain  due  to  the  mutual  improvement  of the  estimates  of W and  0  during  the  it(cid:173)\nerations.  It remains  to examine what the model has  to be  in case  some rows  of 0 \nare discarded.  In this case the transformations don't  preserve the dimensionality of \nthe observation patches. \n\nA  Reconstruction error \n\nThe  reconstruction  error  re  is  a  measure  for  the  success  of  a  source  separation. \nIt compares  the  estimated  de-mixing  matrix  W  with  the  inverse  of  the  original \nmixing  matrix  A  with  respect  to the  indeterminacies:  scalings  and  permutations. \nIt is  always  nonnegative  and equals  zero  if,  and only  if  P  =  W A  is  a  nonsingular \n\n\fpermutation matrix.  This is the case when for  every row of P  exactly  one element \nis  different  from  zero  and  the  rows  of  P  are  orthogonal,  i.e.  ppT  is  a  diagonal \nmatrix.  The reconstruction error is the sum of measures for  both aspects \n\nN \n\nN \n\nN \n\nN \n\nN \n\nN \n\nre \n\n2LlogL P 7j  - Llog LPij  +  Llog L P 7j  -log  detppT \n\ni=1 \nN \n\nj=1 \nN \n\n3 L log L P 7j  - L log L pij  -\n\ni=1 \nN \n\nj=1 \nN \n\ni=1 \n\nj=1 \n\nlog  det ppT  . \n\n(13) \n\ni=1 \n\nj=1 \n\ni=1 \n\nj=1 \n\nAcknowledgment:  This  work  was  funded  by  the  German  Science  Founda(cid:173)\ntion  (grant  no.  DFG  SE  931/1-1  and  DFG  OB  102/3-1  )  and  Wellcome  Trust \n061113/Z/00. \n\nReferences \n\n[1]  Anthony  J.  Bell  and  Terrence  J .  Sejnowski,  \"An  information-maximization \napproach to  blind  separation and blind  deconvolution,\"  Neural  Computation, \nvol.  7,  no.  6, pp.  1129-1159, 1995. \n\n[2]  S.  Amari,  A.  Cichocki, and H.  H.  Yang,  \"A  new  learning  algorithm for  blind \nin  Advances  in  Neural  Information  Processing  Systems, \n\nsignal  separation,\" \nD. S.  Touretzky, M.  C.  Mozer,  and M. E.  Hasselmo,  Eds.,  1995, vol.  8. \n\n[3]  J . F. Cardoso,  \"Infomax and maximum likelihood for blind source separation,\" \n\nIEEE Signal Processing  Lett.,  1997. \n\n[4]  J ean-Franc;ois  Cardoso, Sandip Bose, and Benjamin Friedlander,  \"On optimal \nsource separation based on second and fourth order cumulants,\"  in Proc.  IEEE \nWorkshop  on  SSAP,  Co rfou,  Greece,  1996. \n\n[5]  A.  Hyvarinen and E.  Oja,  \"A fast fixed  point algorithm for  independent com(cid:173)\n\nponent analysis.,\"  Neural  Comput.,  vol.  9, pp.  1483- 1492,1997. \n\n[6]  M.  Zibulevski  and  B.  A.  Pearlmutter,  \"Blind  source  separation  by  sparse \ndecomposition in a  signal dictionary,\"  Neural  Computation,  vol.  12, no.  3,  pp. \n863- 882,  April  200l. \n\n[7]  A.  Hyvi:irinen,  \"Independent component analysis for time-dependent stochastic \nin  Proc.  Int.  Conf.  on  Artificial  Neural  Networks  (ICANN'98), \n\nprocesses,\" \n1998, pp.  541-546. \n\n[8]  1.  Molgedey  and  H.  G.  Schuster,  \"Separation  of  a  mixture  of independent \nsignals  using  time  delayed  correlations,\"  Phys.  Rev.  Lett.,  vol.  72,  pp.  3634-\n3637,  1994. \n\n[9]  H.  Attias  and  C.  E.  Schreiner,  \"Blind  source  separation  and  deconvolution: \nThe  dynamic  component  analysis  algorithm,\"  Neural  Comput.,  vol.  10,  pp. \n1373- 1424,  1998. \n\n[10]  Anthony J. Bell and Terrence J. Sejnowski,  \"The 'independent components' of \n\nnatural scenes  are edge filters,\"  Vision  Res. , vol.  37,  pp.  3327- 3338, 1997. \n\n\f", "award": [], "sourceid": 2046, "authors": [{"given_name": "Roland", "family_name": "Vollgraf", "institution": null}, {"given_name": "Klaus", "family_name": "Obermayer", "institution": null}]}