{"title": "Connectionist Speaker Normalization with Generalized Resource Allocating Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 865, "page_last": 874, "abstract": null, "full_text": "Connectionist  Speaker Normalization \n\nwith Generalized \n\nResource Allocating Networks \n\nCesare  Furlanello \nIstituto per  La Ricerca \nScientifica e Tecnologica \n\nPovo  (Trento),  Italy \nfurlan\u00ablirst. it \n\nDiego Giuliani \n\nIstituto per La Ricerca \nScientifica e Tecnologica \n\nPovo  (Trento), Italy \ngiuliani\u00ablirst.it \n\nEdmondo Trentin \nIstituto per La Ricerca \nScientifica e Tecnologica \n\nPovo (Trento),  Italy \ntrentin\u00ablirst.it \n\nAbstract \n\nThe paper presents  a  rapid speaker-normalization technique based \non  neural  network  spectral  mapping.  The  neural  network  is  used \nas a front-end  of a  continuous speech  recognition system  (speaker(cid:173)\ndependent,  HMM-based) to normalize the input acoustic data from \na  new  speaker.  The  spectral  difference  between  speakers  can  be \nreduced  using  a  limited amount  of new  acoustic  data (40  phonet(cid:173)\nically  rich  sentences).  Recognition  error  of phone  units  from  the \nacoustic-phonetic  continuous  speech  corpus  APASCI  is  decreased \nwith an adaptability ratio of 25%.  We used  local basis networks of \nelliptical  Gaussian  kernels,  with  recursive  allocation  of units  and \non-line  optimization of parameters  (GRAN model).  For  this  ap(cid:173)\nplication,  the  model included  a  linear  term.  The results  compare \nfavorably  with  multivariate linear  mapping based  on  constrained \northonormal transformations. \n\n1 \n\nINTRODUCTION \n\nSpeaker  normalization methods are  designed  to minimize inter-speaker  variations, \none of the principal error sources in automatic speech recognition.  Training a speech \nrecognition  system  on  a  particular speaker  (speaker-dependent  or SD  mode)  gen(cid:173)\nerally  gives better performance than using  a  speaker-independent  system, which  is \n\n\f868 \n\nCesare  Furlanello.  Diego Giuliani.  Edmondo  Trentin \n\ntrained  to  recognize  speech  from  a  generic  user  by  averaging  over  individual  dif(cid:173)\nferences.  On the  other  hand,  performance may be dramatically worse  when  a  SD \nsystem \"tailored\" on the acoustic characteristics of a speaker (the reference speaker) \nis  used  by  another one  (the  new or  target speaker).  Training a  SD  system for  any \nnew  speaker  may  be  unfeasible:  collecting  a  large  amount  of new  training  data \nis  time consuming for  the  speaker  and  unacceptable  in  some  applications.  Given \na  pre-trained  SD  speech  recognition  system,  the  goal  of normalization methods is \nthen to reduce  to a few  sentences  the amount of training data required from a  new \nspeaker to achieve acceptable recognition performance.  The inter-speaker variation \nof the  acoustic  data is  reduced  by  estimating a  feature  vector  transformation be(cid:173)\ntween  the  acoustic  parameter space  of the  new  speaker  and  that  of the  reference \nspeaker  (Montacie  et  al.,  1989;  Class  et  al.,  1990;  Nakamura and  Shikano,  1990; \nHuang,  1992;  Matsukoto and Inoue,  1992).  This multivariate transformation,  also \ncalled  spectral  mapping  given  the  type  of features  considered  in  the  parameteri(cid:173)\nzation  of speech  data,  provides  an  acoustic  front-end  to  the  recognition  system. \nSupervised speaker normalization methods require that the text of the training ut(cid:173)\nterances  required  from  the  new  speaker  is  known,  while  arbitrary  utterances  can \nbe  used  by  unsupervised  methods  (Furui  and  Sondhi,  1991).  Good  performance \nhave  been  achieved  with spectral  mapping techniques  based  on  MSE  optimization \n(Class et  al.,  1990;  Matsukoto and Inoue,  1992).  Alternative approaches presented \nestimation of the spectral normalization mapping with Multi-Layer Perceptron neu(cid:173)\nral  networks  (Montacie et  al.,  1989;  Nakamura and  Shikano,  1990;  Huang,  1992; \nWatrous, 1994). \nThis paper introduces a supervised speaker  normalization method based on  neural \nnetwork regression with a generalized local basis model of elliptical kernels (General(cid:173)\nized Resource Allocating Network:  GRAN model).  Kernels are recursively allocated \nby introducing the heuristic procedure of  (Platt, 1991) within the generalized RBF \nschema proposed in  (Poggio  and  Girosi,  1989).  The model includes  a  linear  term \nand  efficient  on-line  optimization of parameters  is  achieved  by  an  automatic  dif(cid:173)\nferentiation  technique.  Our results  compare favorably  with  normalization by affine \nlinear transformations based on orthonormal constrained pseudoinverse.  In this pa(cid:173)\nper, the normalization module was integrated and tested as an acoustic front-end for \nspeaker-dependent  continuous  speech  recognition  systems.  Experiments  regarded \nphone units recognition  with Hidden  Markov Model  (HMM)  recognition systems. \n\nThe  diagram  in  Figure  1  outlines  the  general  structure  of  the  experiment  with \nGRAN normalization modules.  The architecture  is  independent  from  the  specific \nspeech  recognition system  and allows comparisons between  different  normalization \ntechniques.  The GRAN model and a general procedure for data standardization are \ndescribed  in  Section  2 and  3.  After  a  discussion  of the spectral  mapping problem \nin  Section  4,  the  APASCI  corpus  used  in  the  experiments  and  the  characteristics \nof the  acoustic  data are  described  in  Section  5.  The  recognition  system  and  the \nexperiment set-up are detailed in Sections 6-8.  Results are presented  and discussed \nin Section  9. \n\n\fConnectionist  Speaker Normalization  with  Generalized Resource Allocating  Networks \n\n869 \n\nDataBase: \n\nreference phrase \n\nphraseS \n\n(Yj }  j-I \u2022...\u2022 ] \n\nDynamic Time Warping \n\nTraining \n\nI (Xi(t), Yj(t}} \n-' \n\nNeural Network \nsupervised training \n\n: \n\nfx)  i-I \u2022...\u2022 I \n\n' - - - - - - - - - - - - - - - - - - - \"1  GRAN normalizati \n\nTest \n\nFeature Extraction \n\nSpeech Signal \ncorresponding to phrase S \nuttered by a new speaker \n\nOutput \n\nFigure 1:  System overview \n\n2  THE  GRAN MODEL \n\nFeedforward  artificial  neural  networks  can  be regarded  as  a  convenient  realization \nof general  functional  superpositions  in  terms  of simpler  kernel  functions  (Barron \nand  Barron,  1988).  With  one  hidden  layer  we  can  implement  a  multivariate su-\nperposition  f(z)  =  Ef=o cxjKj(z,wj)  where  Kj  is  a  function  depending  on  an \ninput  vector  z  and  a  parameter vector  Wj,  a  general structure  which  allows  to re(cid:173)\nalize  flexible  models for  multivariate regression.  We  are  interested  in  the  schema: \ny  = H K(x) + Ax + b  with  input  vector  x  E  Rd 1  and  estimated  output  vec(cid:173)\ntor  y  E  R  2 .  K  =  (Kj)  is  a  n-dimensional  vector  of  local  kernels,  H  is  the \nd2  x  n  real  matrix  of kernel  coefficients,  b  E  R d 2  is  an  offset  term  and  A  is  a \nd2  x  d1  linear  term.  Implemented kernels  are  Gaussian,  Hardy  multiquadrics,  in(cid:173)\nverse  of  Hardy  multiquadrics  and  Epanenchnikov  kernels,  also  in  the  N adaraya(cid:173)\nWatson  normalized  form \nrecursive  procedure:  if appropriate  novelty  conditions  are  satisfied  for  the  exam(cid:173)\nple  (x', y/),  a  new  kernel  Kn+1  is  allocated  and  the  new  estimate  Yn+l  becomes \nYn+l (x)  =  Yn(X)  + Kn+1 (llx - x'llw)(y' - Yn(X))  (HardIe,  1990).  Global  proper(cid:173)\nties  and rates of convergence  for  recursive  kernel  regression  estimates are  given  in \n(Krzyzak,  1992).  The heuristic  mechanism suggested  by  (Platt,  1991)  has  been \nextended  to  include  the  optimization of the  weighted  metrics  as  requested  in  the \ngeneralized  versions  of RBF  networks of  (Poggio  and  Girosi,  1989).  Optimization \nregards  kernel  coefficients,  locations  and  bandwidths,  the  offset  term,  the  coeffi(cid:173)\ncient  matrix A  if considered,  and  the  W  matrix defining  the  weighted  metrics  in \nthe  input space:  IIxll~ = xtwtWx.  Automatic differentiation  is  used  for  efficient \non-line gradient-descent  procedure  w.r. t.  different  error functions  (L2,  L1,  entropy \nfit),  with different  learning rates for  each  type of parameters. \n\n(HardIe,  1990).  The  kernel  allocation  is  based  on  a \n\n\f870 \n\nCesare FurLanello,  Diego GiuLiani,  Edmondo  Trentin \n\nX  - - - - - - - - - - -+ \"  Y \n\nIj;-::=<p \n\nTJx \n\nTJy \n\n-1 \n\nTJy \n\nx -----------\" Y \n\nFigure 2:  Commutative diagram for  the speaker  normalization problem.  The spec(cid:173)\ntral mapping <p  between  original spaces  X  and Y is  estimated by  Ij; = TJy 1  .  ip  . TJx, \nobtained by  composition of the  neural  GRAN  mapping ip  between  PCA spaces  X \nand Y with the two invertible PCA  transformations TJx  and TJy. \n\n3  NETWORKS  AND  PCA  TRANSFORMATIONS \n\nThe normalization module is  designed  to estimate a  spectral mapping between  the \nacoustic  spaces  of two  different  speakers.  Inter-speaker  variability is  reflected  by \nsignificant differences in data distribution in these multidimensional spaces (we  con(cid:173)\nsidered 8 dimensions); in particular it is important to take into account  global data \nanisotropy.  More  generally,  it  is  also  crucial  to  decorrelate  the features  describing \nthe data.  A general recipe is to apply the well-known Principal Component Analy(cid:173)\nsis  (PCA)  to the data,  in  this  case  implemented from standard  numerical routines \nbased on  Singular Value  Decomposition of the  data covariance matrices.  The net(cid:173)\nwork  was  applied  to perform  a  mapping between  the  new  feature  spaces  obtained \nfrom  the PCA  transformations, mean translation included (Figure 2). \n\n4  THE  SPECTRAL  MAPPING  PROBLEM \n\nA sound uttered by a speaker is generally described  by a sequence offeature vectors \nobtained from the speech signal via short-time spectral analysis (Sec.  5).  The spec(cid:173)\ntral representations of the same sequence of sounds uttered by two speakers are sub(cid:173)\nject to significant variations (e.g.  differences  between male and female speakers, re(cid:173)\ngional accents,  ... ).  To deal with acoustic differences,  a suitable transformation (the \nspectral mapping) is seeked  which  performs the  \"best\"  mapping between  the corre(cid:173)\nsponding spectra oftwo speakers.  Let Y  =  (Yl, Y2,  ... , YJ)  and X  = (x 1, X2,  ... , X I) be \nthe spectral feature  vector sequences  of the same sentence  uttered by two speakers, \ncalled  respectively  the  reference and the  new speaker.  The desired  mapping is per(cid:173)\nformed by a function <pC Xi)  such that the transformed vector sequence obtained from \nX  = (Xi)  approximates as  close  as  possible  the spectral  vector  sequence  Y = (Yi). \nTo eliminate time differences  between  the  two acoustic realizations,  a  time  warping \nfunction  has  to  be  determined  yielding  pairs  C(k)  = (i(k),j(k))k=1.. .K  of corre(cid:173)\nsponding indexes  of feature  vectors in X  and Y,  respectively.  The desired spectral \n\n\fConnectionist Speaker Normalization  with  Generalized Resource Allocating  Networks \n\n87 J \n\nmapping r,o(Xi)  is  the one  which  minimizes Ef=l d(Yj(k)' r,o(Xi(k\u00bb))  where  d(\u00b7,\u00b7) is  a \ndistorsion measure in the acoustic feature space.  To estImate the transformation, a \nset of supervised pairs (Xi(k), Yj(k\u00bb)  is considered.  In summary, the training material \nconsidered in the experiments consisted of a set of vector pairs obtained by applying \nthe  Dynamic Time Warping  (DTW)  algorithm  (Sakoe  and  Chiba,  1978)  to  a  set \nof phrases uttered  by  the reference  and the new  speaker. \n\n5  THE APASCI  CORPUS \n\nThe experiments reported  in  this paper  were  performed on  a  portion of APASCI, \nan  italian  acoustic-phonetic  continuous  speech  corpus.  For  each  utterance,  text \nand phonetic  transcriptions  were  automatically generated  (Angelini  et  al.,  1994). \nThe  corpus  consists  of two  portions.  The first  part,  for  the  training  and  valida(cid:173)\ntion  of speaker  independent  recognition  systems,  consists  of a  training  set  (2140 \nutterances),  a  development  set  (900  utterances)  and  a  test  set  (860  utterances). \nThe sets  contain, respectively, speech  material from  100 speakers  (50  males and 50 \nfemales),  36  speakers  (18  males and  18  females)  and 40  speakers  (20  males and  20 \nfemales).  The second portion of the corpus is for  training and validation of speaker \ndependent  recognition  systems.  It  consists  of speech  material from  6  speakers  (3 \nmales and  3 females).  Each speaker  uttered  520 phrases,  400  for  training and  120 \nfor  test.  Speech  material in the test set  was  acquired in different  days  with respect \nto the training set.  A subset of 40  utterances from  the training material forms the \nadaptation training set,  to be used for  speaker adaptation/normalization purposes. \nFor this application, each signal in the corpus was processed to obtain its parametric \nrepresentation.  The signal  was  preemphasized  using  a  filter  with transfer function \nH(z)  = 1 - 0.95  X  z-l,  and  a  20  ms  Hamming window  is  then  applied  every  10 \nms.  For  each  frame,  the  normalized  log-energy  as  well  as  8  Mel  Scaled  Cepstral \nCoefficients  (MSCC) based on a  24-channel filter-bank  were  computed.  Normaliza(cid:173)\ntion of log-energy  was  performed  by  subtracting the maximum log-energy  value in \nthe sentence;  for  each  Mel  coefficient,  normalization was  performed by subtracting \nthe  mean  value  of the  whole  utterance.  For  both  MSCC  and  the  log-energy,  the \nfirst  order  derivatives  as  well  as  the second  order  derivatives  were  computed.  For \neach frame, all the computed acoustic parameters were combined in a single feature \nvector  with  27  components. \n\n6  THE RECOGNITION  SYSTEM \n\nFor each of the 6 speakers,  a SD HMM  recognition system was trained with the 400 \nutterances  available in  the  APASCI  corpus;  the systems  were  bootstrapped  with \ngender  dependent  models  trained  on  the  gender  dependent  speech  material (1000 \nutterances for male and 1140 utterances for female).  A set of 38 context independent \nacoustic-phonetic  units  was  considered.  Left-to-right  HMMs  with  three  and  four \nstates  were  adopted  for  short  (i.e.  p,t,k,b,d,g)  and  long  (e.g.  a,i,u,Q,e)  sounds \nrespectively.  Silence,  pause  and  breath  were  modeled  with  a  single  state  ergodic \nmodel.  The  output  distribution  probabilities  were  modeled  with  mixtures  of 16 \ngaussian  probability  densities,  diagonal  covariance  matrixes.  Transitions  leaving \nthe same state shared the same output distribution probabilities. \n\n\f872 \n\nCesare Furlanello,  Diego Giuliani,  Edmondo  Trentin \n\nTable 1:  Phone  Recognition  Rate (Unit Accuracy  %)  without normalization \n\n7  TRAINING THE NORMALIZATION  MODULES \n\nA set of 40 phrases was considered for each pair (new, re f erence) of speakers to train \nthe normalization modules.  In order to take into account alternative pronunciation, \ninsertion or deletion of phonemes, pauses between  words and other phenomena, the \nautomatic phonetic transcription  and segmentation available in APASCI  was  used \nfor each utterance.  Given two utterances corresponding to the same phrase,  we con(cid:173)\nsidered  only  their segments having  the same phonetic  transcription.  To determine \nthese segments the DTW algorithm was applied to the phonetic transcription of the \ntwo  utterances.  The  DTW algorithm  was  applied  a  second  time  to  the  obtained \nsegments and  the resulting  optimal alignment paths gave  the  desired  set  of vector \npairs.  The DTW algorithm was applied only to the 8 MSCC and the other acoustic \nparameters were  left unmodified. \n\nWe  trained  networks  with  8  inputs  and  8  outputs.  The  model  included  a  linear \nterm:  first  the linear  term was  fit  to  the  data,  and  then  the rest  of the expansion \nwas  estimated by fitting  the residuals  of the  linear  regression.  The  networks  grew \nup  to  50  elliptical  gaussian  kernels  using  dynamic allocation.  Kernel  coefficients, \nlocations and bandwidths were optimized using different learning rates for  10 epochs \nw.r.t the  Ll  norm,  which  proved  to be more efficient  than the usual  L2  norm. \n\n8  THE RECOGNITION  EXPERIMENTS \n\nExperiments  concerned  continuous  phone  recognition  without  any  lexical  and \nphonetical  constraint  (no  phone  statistic  was  used). \nFor  all  the  couples \n(new, reference)  of speakers  in  the  database,  a  recognition  experiment  was  per(cid:173)\nformed  using  90  (of the  120  available)  test  utterances from  the  new  speaker  with \nthe  SD  recognition  system  previously  trained  for  the  reference  speaker.  On  aver(cid:173)\nage  the  test  sets  consisted  of 4770  phone  units.  The  experiments  were  repeated \ntransforming  the  test  data with  different  normalization modules and  performance \ncompared.  Results  are expressed  in terms of insertions (Ins),  deletions  (Del)  and \nsubstitutions  (Sub)  of phone  units  made by  the  recognizer.  Unit  Accuracy  (U A) \nand  Percent  Correct  (PC)  performance  indicators  are  respectively  defined  w.r.t. \n(Ins + Del + Sub)/nunit.)  and \nthe  total  number  of  units  nunih  as  U A  =  100 (1  -\n(Del + Sub)/nunit.).  In  Table  1  the  baseline  speaker  dependent \nPC  =  100 (1  -\nperformance  for  the  6  speaker  dependent  systems  is  reported.  Row  labels  indi(cid:173)\ncate  the speaker  reference  model  while  column labels identify  whose  target  acous-\n\n\fConnectionist  Speaker Normalization  with  Generalized Resource Allocating  Networks \n\n873 \n\nTable 2:  Phone Recognition Rate (Unit  Accuracy  %)  with  NN  normalization \n\ntic  data  are  used.  Thus  U A  and  PC  entries  in  the  main  diagonal  are  for  the \nsame speaker  who  trained  the system  while  the  remaining entries relate  to  perfor(cid:173)\nmance obtained  with  new  speakers.  We  also  considered  the  adaptability  ratios for \na = U A  and P = PC  (Montacie et al.,  1989):  Pa  = (aRT  - aRT )/(aRR - aRT)  and \nPp  =  (PRT  - PRT )/(PRR - PRT)  where  aRT  indicate accuracy for  reference  speaker \nR  and target T  without normalization, aRR  is the speaker dependent baseline accu(cid:173)\nracy and apex n  indicates normalization.  The same notation applies to the percent \ncorrect  adaptability ratio  pp. \n\n9  RESULTS  AND  CONCLUSIONS \n\nNormalization experiments  have  been  performed  with  the  set-up  described  in  the \nprevious Section.  The phone recognition rates obtained with normalization modules \nbased on the GRAN  model are reported in Table 2 in terms of Unit  Accuracy  (dee \nTable  1 for  the  baseline  performance).  In  Table 3  the  performance  of the  GRAN \nmodel (NN)  and constrained orthonormal linear mapping (LIN)  are compared with \nthe baseline performance (SD:  no  adaptation) in terms of both Unit Accuracy  and \nPercent Correct.  The network shows an improvement, as evidenced by the variation \nin  the  Pa  and  Pp  values.  Results  are  reported  averaging  performance  over  all  the \npairs (new,reference)  of speakers  (Total column), and  considering pairs of speakers \nof the same gender  and of different  genders  (Female:  only female subjects,  Male: \nonly  males,  Dill:  different  genders).  An analysis of the adaptability ratios shows \nthat the effect  of the network  normalization is higher than with  the linear  network \nfor  all  the  3  subgroups  of  pairs:  p~N = 0.20  vs  p~IN  = 0.16  for  the  Female \ncouples and liN =  0.16 vs  p~IN = 0.15 for  the  Male  couples.  The improvement is \nhigher  (p~N =  0.28, p~IN = 0.24)  for  speaker  of different  genders.  Although  these \npreliminary experiments show  only  a  minor improvement of performance achieved \nby  the network  with  respect  to  linear  mappings,  we  expect  that  the selectivity  of \nthe  network  could  be exploited  using  acoustic  contexts and  code  dependent  neural \nnetworks. \n\nAcknowledgements \n\nThis  work  has  been  developed  within  a  grant  of  the  \"Programma N azionale  di \nRicerca  per  la Bioelettronica\"  assigned  by  the  Italian  Ministry  of University  and \nTechnologic Research  to Elsag Bailey.  The authors would like to thank B. Angelini, \nF.  Brugnara,  B.  Caprile,  R.  De  Mori,  D.  Falavigna, G.  Lazzari  and  P.  Svaizer. \n\n\f874 \n\nCesare  Furlanello,  Diego Giuliani,  Edmondo  Trentin \n\nTable  3:  Phone  Recognition  Rate  (%)  in  terms  of both  Unit  Accuracy,  Percent \nCorrect,  and adaptability ratio p. \n\nReferences \n\nAngelini,  B.,  Brugnara,  F.,  Falavigna,  D.,  Giuliani,  D.,  Gretter,  R.,  and Omologo, \nM.  (September  1994).  Speaker Independent  Continuous Speech  Recognition  Using \nan Acoustic-Phonetic Italian Corpus.  In  Proc.  of ICSLP,  pages  1391-1394. \nBarron,  A.  R.  and  Barron,  R.  L.  (1988).  Statistical learning  networks:  a  unifying \nview.  In  Symp.  on  the  Interface:  Statistics  and  Computing  Science,  Reston,  VI. \nClass,  F.,  Kaltenmeier,  A.,  Regel,  P.,  and Troller, K.  (1990).  Fast speaker  adapta(cid:173)\ntion for  speech  recognition system.  In  Proc.  of ICASSP  90,  pages 1-133-136. \nFurui, S.  and Sondhi, M.  M.,  editors (1991).  Advances in  Speech  Signal  Processing. \nMarcel Dekker  and Inc. \nHardIe,  W.  (1990).  Applied  nonparametric  regression,  volume  19  of Econometric \nSociety  Monographs.  Cambridge University  Press,  New  York. \nHuang,  X.  D.  (1992).  Speaker  normalization for  speech  recognition.  In  Proc.  of \nICASSP 92,  pages 1-465-468. \nKrzyzak,  A.  (1992).  Global convergence of the recursive kernel  regression estimates \nwith applications in  classification and nonlinear system estimation.  IEEE  Transac(cid:173)\ntions  on  Information  Theory,  38(4):1323-1338. \nMatsukoto,  H.  and  Inoue,  H.  (1992).  A  piecewise  linear spectral  mapping for  su(cid:173)\npervised speaker adaptation.  In  Proc.  of ICASSP  92,  pages 1-449-452. \nMontacie,  C.,  Choukri,  K.,  and  Chollet,  G.  (1989).  Speech  recognition  using  tem(cid:173)\nporal  decomposition  and  multi-layer feed-forward  automata.  In  Proc.  of ICASSP \n89,  pages 1-409-412. \nNakamura,  S.  and  Shikano,  K.  (1990).  A  comparative study  of spectral  mapping \nfor  speaker  adaptation.  In  Proc.  of ICASSP  90,  pages 1-157-160. \nPlatt,  J.  (1991).  A  resource-allocating  network  for  function  interpolation.  Neural \nComputation,  3(2):213-225. \nPoggio,  T.  and  Girosi,  F.  (1989).  A  theory  of networks  for  approximation  and \nlearning.  A.1.  Memo  No.  1140,  MIT. \nSakoe,  H.  and Chiba, S.  (1978).  Dynamic programming algorithm optimization for \nspoken  word  recognition.  IEEE-A SSP,  26(1):43-49. \nWatrous, R.  (1994).  Speaker normalization and adaptation using second-order  con(cid:173)\nnectionist  networks.  IEEE  Trans.  on  Neural  Networks,  4(1):21-30. \n\n\f", "award": [], "sourceid": 1016, "authors": [{"given_name": "Cesare", "family_name": "Furlanello", "institution": null}, {"given_name": "Diego", "family_name": "Giuliani", "institution": null}, {"given_name": "Edmondo", "family_name": "Trentin", "institution": null}]}