{"title": "On the Use of Projection Pursuit Constraints for Training Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 3, "page_last": 10, "abstract": null, "full_text": "On  the Use of Projection Pursuit  Constraints  for \n\nTraining  Neural Networks \n\nNathan Illtl'ator'\" \n\nComput.er Science  Department \n\nTel-Aviv  Universit.y \n\nRamat.-A viv, 69978 ISRAEL \n\nand \n\nInst.itute  for  Brain  and  Neural  Systems, \n\nBrown  University \n\nnin~math,tau.ac.il \n\nAbstract \n\n\\Ve  present  a  novel  classifica t.ioll  and  regression  met.hod  that  com(cid:173)\nbines  exploratory  projection  pursuit.  (unsupervised  traiuing)  with  pro(cid:173)\njection  pursuit.  regression  (supervised  t.raining),  t.o  yield  a.  nev,,'  family  of \ncost./complexity  penalLy  terms .  Some  improved  generalization  properties \nare  demonstrat.ed  on  real  \\vorld  problems. \n\n1 \n\nIntroduction \n\nParameter  estimat.ion  becomes  difficult.  in  high-dimensional  spaces  due  t.o  the  in(cid:173)\ncreasing  sparseness  of t.he  dat.a.  Therefore.  when  a  low  dimensional  representation \nis  embedded  in  t.he  da.t.a.  dimensionality  l'eJuction  methods  become  useful.  One \nsuch  met.hod  - projection  pursuit.  regression  (Friedman and  St.uet.zle,  1981)  (PPR) \nis  capable  of performing dimensionality  reduct.ion  by  composit.ion,  namely,  it  con(cid:173)\nstructs  an  approximat.ion  to  the  desired  response  function  using  a  composition  of \nlower  dimensional  smooth  functions,  These  functions  depend  on  low  dimensional \nprojections  t.hrough  t.he  data . \n\n\u2022 Research  was  support.ed  by  the  N at.ional  Science  Foundat.ion.  the Army  Research  Of(cid:173)\n\nfice,  and  the  Office  of Naval  Researclr . \n\n3 \n\n\f4 \n\nIntrator \n\nWhen  the  dimensionality of the  problem is  in  the  thousands,  even  projection  pur(cid:173)\nsuit methods are  almost alwa.ys over-parametrized, t.herefore,  additional smoothing \nis  needed  for  low  variance  estimation.  Explol'atory  Projection  Pursuit  (Friedman \nand  Thkey,  1974;  Friedman,  1987)  (EPP)  may be  useful  for  t.hat.  It searches  in  a \nhigh  dimensional space  for  structure  in  the  form  of (semi)  linear  projections  with \nconstraints  characterized  by  a  projection index.  The projection  index  may be  con(cid:173)\nsidered  as  a  universal  prior  for  a  large  class  of problems,  or  may  be  tailored  t.o  a \nspecific  problem  based on  prior  knowledge. \n\nIn  this paper,  the general for111  of exploratory projection pursuit  is formulated to be \nan  additional  constraint  for  projection  pUl'suit  regression.  In  particular,  a  hybrid \ncombination of supervised  and  unsupervised  artificial  neural  network  (ANN)  is  de(cid:173)\nscribed as a special case.  In  addition, a specific  project.ion index that is particularly \nuseful  for  classification  (Int.rator,  1990;  Intrator and Cooper,  1992)  is  introduced in \nthis context.  A  more  detailed  discussion  appears in  Intrator (1993). \n\n2  Brief Description of Projection  Pursuit  Regression \n\nLet  (X, Y)  be  a  pair of random  variables,  X  E  R d ,  and  Y  E  R.  The  problem  is  to \napproximate the d  dimensiona.l surfa('e \n\nI(x) = E[Y'IX  = x} \n\nfrom  n  observations (Xl, YI), ... , (Xu, Yn). \nPPR tries  t.o  approximate a  funct.ion  1 by  a  sum of ridge  functions  (functions  that \nare  constant.  along lines) \n\n1(:1')  ~ L gj(af x). \n\nThe  fit.t.ing  procedure  alt.ernat.es  between  a.n  estimation  of  a  direction  a and  an \nestimat.ioll  of a  smoot.h  funct.ion  g.  such  that  at.  iterat.ion  j, t.he  square  average  of \nt.he  resid uals \n\nj=l \n\nl'ij(xd = 1'ij-l - 9j((IJ xd \n\nis  minimized.  This  process  is  init.ialized  by  setting  1'jO  =  !Ii.  Usually,  the  initial \nvalues  of aj  a.re  t.aken  to  be  the first  few  principal component.s of the data. \n\nEstimation of the ridge functions call  be  achieved  by various nonparamet.ric smooth(cid:173)\ning  techniques  such  as  locally  linear  functions  (Friedman  and  Stuetzle,  1981), \nk-nearest  neighbors  (Hall.  1989b),  splines  or  variable  degree  polynomials.  The \nsmoot.hness  const.raint.  imposed  on  !1,  implies  t.hat.  t.he  actual  projection  pursuit \nis  achieved  by  minimizing at.  it.erat.ioJl  j. t.lte  sum \n\nII \n\ni= 1 \n\nfor  some smoothness measure C. \n\nAlthough  PPR cOllverg('s  t.o  the  desired  response  function  (Jones,  1987),  the  use \nof non-paramet.ric  function  estimat.ion  is  likely  to  lead  to  ovel'fitt.ing.  Recent  re(cid:173)\nsults  (Hornik,  1991) suggest.  that a  feed  forward  net.work  archit.ecture  with a  single \n\n\fOn the Use of Projection Pursuit Constraints for  Training Neural Networks \n\n5 \n\nhidden  layer  and  a  rat.her  general  fixed  activat.ion  function  is  a  universal  approxi(cid:173)\nmator.  Therefore,  the  use  of a  non-parametric single  ridge function  estimation can \nbe  avoided.  It  is  thus  appropriate  to  concentrate  on  the  est.imation  of good  pro(cid:173)\njections.  In  the  next  section  we  present  a  general  framework  of PPR architecture, \nand in  sect.ion  4  we  restrict  it.  t.o  a  feed-forward  architecture  with  sigmoidal hidden \nunits. \n\n3  EstiInating The  Projections Using  Exploratory \n\nProjection  Pursuit \n\nExplorat.ory  projection  pursuit \u00b7is  based  on  seeking  interesting  projections  of high \ndimensional  data  points  (Krllskal,  1969;  Switzer,  1970;  Kruskal,  1972;  Friedman \nand Tukey,  1974; Friedman, 1987; Jones and Sibson,  1987;  Hall,  1988;  Huber,  1985, \nfor  review).  The  notion  of interesting  projections  is  motivated  by  an  observation \nt.hat  for  most.  high-dimensional data  clouds,  most  low-dimensional  projections  are \napproximat.ely  normal  (Oiaconis  alld  F!'('edlllan,  1984).  This  finding  suggests  that \nthe  important information in  the  data is  conveyed  in  t.hose  direct.ions  whose  single \ndimensional project.ed  dist.ribution  is  far  from  Gaussian.  Variolls  projection  indices \n(measures  for  t.he  goodrwss  of  a.  projl-'ction)  differ  on  the  assumptions  about  the \nnature of deviation from  norl1lality, (Iud  ill  their comput.ational efficiency.  They can \nbe considered  as different  priOl's mot.ivat.ed by specific assumptions on t.he underlying \nmodel. \n\nTo partially decouple  the search  for  a  projection  vectol'  from  the search  for  a  non(cid:173)\nparametric  ridge  function,  we  propose  to  add  a  penalty  term,  which  is  based  on \na  pl'Oject.ion  index,  t.o  t.he  energy  minimizat.ion  associated  wit.h  the  estimation  of \nthe  ridge  functions  and  t.he  projections.  Specifically,  let  p( a)  be  a  projection  index \nwhich  is  minimized for  project.ions  wit.h  a  certain  deviation  fl'0111  normality; At  the \nj'th iterat.ion,  we  minimize the sum \n\nL 1}( .r;) + (,'(gj) + p(aj). \n\nWhen a  concurrent  minimizat.ion ovet' several  project.ions/functions is  practical, we \nget  a  penalty  t.erm  of t.he  form \n\ni \n\nB(j) = L[C(gj) + p(aj )]. \n\nj \n\nSince C  and  p  may not be linear,  t.he  more general  measure t.hat  does  not assume a \nstep\",Tise  approach, but.  instead seeks I projections and  ridge functions  concurrently, \nis  given  by \n\nB(f) =  C(9J,\"  \u00b7,gd + p(a.J,  .. . ,ad, \n\nIn  practice,  p  depends  implicit.ly  011  t.he  t.raining  dat.a,  (t.he  empirical density)  and \nis  therefore  replaced  by  its empirical  measure ii. \n\n3.1  Some Possible Measures \n\nSome  applicable  projection  indices  are  disc.ussed  in  (Huber,  1985;  Jones  and  Sib(cid:173)\nson,  1987;  Friedman,  1987;  Hall,  1989a;  Intrator,  1990).  Probably,  a.ll  the  possible \n\n\f6 \n\nIntrator \n\nmeasures  should  emphasize  some  form  of deviation  from  normality  but  the  spe(cid:173)\ncific  type  may  depend  on  the  problem  at  hand.  For  example,  a  measure  based \non  the  Karhunen  Loeve  expansion  O\"Iougeot  et  al.,  1991)  may  be  useful  for  image \ncompression  with  autoassociative  net.works,  since  in  this  case  one  is  int.erested  in \nminimizing the  L2  norm  of tlH'  dist.ance  between  t.he  reconst.ructed  image and  the \noriginal  one,  and  under  mild  condit.ions,  t.he  Karhunen  Loeve  expansion  gives  the \noptimal solution. \n\nA  different  type  of  prior  knowledge  is  required  for  classificat.ion  problems.  The \nunderlying  a'5sumption  then  is  that  the  data  is  clustered  (when  projecting  in  the \nright  direct.ions)  and  that  t.he  classification  may  be  achieved  by  some  (nonlinear) \nma.pping  of these  clustel\u00b7s.  In  such  a  case,  the  projection  index  should  emphasize \nmulti-modality as  a  specific  deviation from  normality.  A  projection  index  that em(cid:173)\nphasizes  multimodalities in the projected  distribution  (without  relying on  the  class \nla.bels) has recently  been int.roduced (Intrator,  1990) and implemented efficiently us(cid:173)\ning a variant of a biologically motivated unsupervised  network (Intrat.or and Cooper, \n1992) .  Its  int.egration  into a  back-propagat.ion  classifier  will  be  discussed  below . \n\n3.2  Adding EPP  constraints to  baek-propagatioll network \n\nOne  way  of adding  SOllie  prior  knowledge  int 0  the  archi t.ecLme  is  by  111lllll1llZmg \nthe  effective  number  of parameters  llsing  weight.  sharing,  ill  which  a  single  weight \nis  shared  among  many  connections  in  the  network  (\\\\'aibel  et.  al.,  1989;  Le  Cun \net  aI.,  }989).  An  ext.f'nsion  of t.his  idea  is  the  \"soft.  \\',\u00b7eight.  sharing\"  which  favors \nirregularities  in  the  weight  distribution  in  the  form  of mult.imodality  (Nowlan  and \nHinton,  1992).  This  penalty  improved  generalization  results  obtained  by  weight \nelimination  penalt.y.  Bot.h  t.hese  wet.hods  make  an  explicit.  assumption  about  the \nstructure of t.he  weight. space,  but.  wit.h  110 regarJ  to the structure of the illput space. \n\nAs  described  in  the  context  of project.ion  pursuit.  regression.  a  penalt.y  term  may \nbe  added  t.o  the  energy  funct.ional  minimized  by  error  back  propagation,  for  the \npurpose of mea<;uring direct.ly  t.he  goodness of t.he  projections sOllght by  the network. \nSince our main int.erest.  is  in  reducing ovedHt.ing fOI'  high  dimensional pl'Oblems, our \nunderlying assumpt.ion is  t.hat.  t.ile  slll-faCf.'  fUllct.ion  to be estirnat.ed can  be faithfully \nrepresented  using  a  low  dimensiollal  composition  of  sigmoidal  functions,  namely, \nusing  a  back-propagation  net.work  in  which  t.he  number  of  hidden  units  is  much \nsmaller t.han  the number of input  unit.s.  Therefore,  t.he  penalty term may be added \nonly  to  the  hidden  layer.  The synapt.ic  modification equat.ions  of the  hidden  units' \nweights  become \n\nOWij \nfJt \n\n-c  [ ot(w, .1') \n\naWij \n\n0P(Wl, .... wn) \n+------\n\nOU'ij \n\n+(Contrihul,ion  of cost/complexity  t.erms)]. \n\nAn  appl'Oach  of  t.his  type  has  lWl'1I  used  in  ima.ge  compl'cssion,  wit.h  a  penalty \naimed  at  minimizing  tIl<'  ent.ropy  of the  projected  distribution  (Bichsel  and  Seitz, \n1989).  This  penalt.y  eel'tainly  measures  deviat.ion  from  normality, since  entropy  is \nmaximized for  a  Gaussian  distribution. \n\n\fOn the  Use  of Projection Pursuit Constraints for  Training Neural Networks \n\n7 \n\n4  Projection Index for  Classification:  The  Unsupervised \n\nBCM  Neuron \n\nIntrator  (1990)  has  recently  shown  that  a  variant of the  Bienenstock,  Cooper  and \nMunro  neuron  (Bienenstock  et  al.,  1982)  performs  exploratory  projection  pursuit \nusing  a  projection  index  that measures multi-modality.  This  neuron  version  allows \ntheoretical  analysis  of some  visual  deprivation  experiments  (lntrat.or  and  Cooper, \n1992),  and  is  in  agreement.  with  the  vast  experimental  result.s  on  visual  cortical \nplasticity (Clothiaux et al., 1991).  A network implementation which can find several \nprojections in  parallel while ret.aining its  computational efficiency,  was found  to be \napplicable for extracting features from  very  high dimensional vector spaces (Intrator \nand  Gold,  1993; Int.rator  et  al.,  1991;  Intrator,  1992) \nThe  activity  of neuron  k  in  the  network  is  Ck  =  Li XiWik  + WOk.  The  inhibited \nactivity  and  threshold  of the  k'th  neuron  is  given  by \n\nC/.:  =  (1(Ck  -II LCj), \n\nj'f;/.: \n\n- m  =  . cj,:  . \nE[''l] \n8~ k \n\nThe  threshold  e~~l  is  the  point.  at.  which  the  modificat.ion  function  </J  changes  sign \n(see  Intrator and  Cooper,  1992  for  further  det.ails) .  The function  </J  is  given  by \n\nThe  risk  (projection  index)  for  a  single neuron  is  given  by \n\n</J(c,  8/11}  =  c(c - 8 m }. \n\nR( Wk)  = -{ ~ E[c2]  - ~ E2(c~]}. \n\nThe  total  risk  is  the sum of each  local  risk.  The  negative gradient.  of the  risk  that \nleads  to  the synaptic  modification equations  is  given  by \n\nat = \n\nOWjj \n\nE[A..( - 8 \n\nIJ)  Cj,  - m \n\nj}  '( ~  ) \n\n(1  Cj  l!j  -\n\n~ A.( ~  8-k)  '( -)  ] \n11  L \n. \n<p  Cl',  - III  (1  Ck  Xi \nk'f;j \n\nThis last equa.tion  is  an  a.dditional  pellalt.y  to t.he  energy  minimizat.ion of the super(cid:173)\nvised  net.work .  Not.e  that  there  is  an  int.eract.ion  between  adjacent  neurons  in  the \nhidden  layer.  In  practice,  t.he  st.ochast.ic  version  of t.he  different.ial  equat.ion  can  be \nused  as  the learning  ntle. \n\n5  Applications \n\nVve  have  applied  t.his  hybrid  classification  met.hod  to  various  speech  and  image \nrecognition  problems in  high  dimensional space.  In  one speech  application  we  used \nvoiceless  stop  consonant.s  extracted  from  the  TIMIT  database  as  training  tokens \n(Intrator and  Tajchman,  1991).  A det.ailed  biologically motivated speech  represen(cid:173)\ntation  was  produced  by  Lyoll's  cochlear  model  (Lyon,  1982;  Slaney,  1988).  This \nrepresentation  produced  5040  dimensions  (84  channels  x  60  t.ime  slices) .  In  ad(cid:173)\ndition  t.o  an  init.ial  voiceless  st.op,  each  t.oken  cont.ained  a  final  vowel  from  the  set \n[aa,  ao,  er,  iy].  Classificat.ion  of t.he  voiceless  stop  consonant.s  using  a  test  set  that \nincluded  7  vowels  [uh,  ih,  eh,  ae,  ah,  uw,  ow]  produced  an  average  error  of 18.8% \n\n\f8 \n\nIntrator \n\nwhile  on  the same task  classification  using  back-propagation  network  produced  an \naverage error of 20.9% (a significant difference,  P  < .0013).  Addit.ional experiments \non  vowel  tokens appear  in  Tajchman and  Intrator (1992). \n\nAnother application is in the area of face  l\u00b7ecognit.ion  from gray level  pixels (Intrator \net  al.,  1992) .  After  aligning  and  normalizing  the  images,  the  input  was  set  to  37 \nx  62  pixels  (total of 2294  dimensions).  The recognition  performance was tested  on \na  subset  of t.he  MIT  Media  Lab  database  of face  images  made  available  by  Turk \nand  Pent.land  (1991)  which  cont.ained  27 face  images of each of 16  different  persons. \nThe images  were  taken  under  val'ying  illumiuation and camera  location .  Of the 27 \nimages  available,  17  randomly  chosen  ones  served  for  tl'aining  and  the  remaining \n10  were  used  for  test.iug ,  U siug  all  ensemble  average  of hybrid  networks  (Lincoln \nand  Skrzypek,  1990;  Pearlmut.t.er  and  Rosenfeld,  1991;  Perrone  and Cooper,  1992) \nwe  obtained  an errOl'  rat.e  of 0.62% as opposed  to  1.2%  using  a  similar ensemble of \nback-prop networks.  A single back-prop network  achieves an error  between  2.5% to \n6% on  this  data .  The experiments were  done  using 8 hidden  units, \n\n6  SUl11l11ary \n\nA  penalty  that allows  the  incol'porat.ioll  of additional  prior  information on  the  un(cid:173)\nderlying model was presC'llt.ed.  This prior was introduced in  t.he context of projection \npursuit  regression,  classificat.ioll,  aud  in  the  context  of back-propagation  network. \nIt achieves  pa.rt.ial  decoupling of est.illIat.ion  of t.he  ridge  fuuctions  (in  PPR)  or the \nregression  function  in  back-propagat.ion  net.  from  t.he  est.imatioll of t.he  projections, \nThus it is  potentially useful  in reducing  problems associat.ed  wit.h  overfitting which \nare  more pronounced  in  high  dimensional dat.a. \n\nSome possible projection indices  were discllssed  and a specific  projection index that \nis  particula.rly  useful  for  classificat.ion  was  pt'esented  in  this  ('on text.  This measure \nthat  emphasizes  multi-modality in  the  projected  distribut.ioll,  was  found  useful  in \nseveral  very  high  dimensional problems . \n\n6.1  Ackllowledglueuts \n\nI  wish  to  t.hank  Leon  Cooper,  Stu  Gel1lan  anJ  Michael  Pel'l'one  for  many  fruitful \nconversations and to t.he  referee  for  helpful comments.  The speech experiments were \nperformed  using  the  comput.at.ional  facilit.ies  of the  Cognitive Science  Department \nat Browll  University.  Research  was supported  by  the National Science  Foundation, \nt.he  Army  Research  Office,  a.nd  t.he  Office  of Naval  Research. \n\nReferences \n\nBichsel,  M.  and  Seit.z,  P.  (1989).  Minimuln  class  ent.ropy:  A  maximum  informat.ion  ap(cid:173)\n\nproa.ch  t.o  layered  netowrks.  ,\\\"cllmi  Ndworks,  2:133-141. \n\nBienenstock,  E.  L ..  Cooper,  L .  N.,  and  ~'ltlHro,  P.  W.  (198'2) .  Theory  for  t.he  development \nof neuron  select.ivit.~,:  orientat.ioll  s~wcificit.y allel  binocular int.eract.ion  in  visual  cortex. \nJournal  Nctll'Oscicllct'.  1 ::32- 48. \n\n\fOn the  Use  of Projection Pursuit Constraints for  Training Neural Networks \n\n9 \n\nClothiaux,  E.  E.,  Cooper,  L.  N.,  and  Bear,  M.  F.  (1991).  Synaptic  plasticity  in  visual \ncortex:  Comparison  of theory  with experiment.  Joumal of Neurophysiology, 66:1785-\n1804. \n\nDiacollis,  P.  and Freedman,  D.  (1984).  Asymptotics of graphical  projection pursuit.  Annals \n\nof Statistics, 12:793-815. \n\n. \n\nFriedman,  J.  H.  {1987}.  Exploratory projection pursuit.  Journal of the American Statistical \n\nAssociation, 82:249-266. \n\nFriedman,  J.  H.  and  Stuetzle,  W.  (1981).  Projection  pursuit  regression.  Journal  of the \n\nAme\"ican Statistical Association, 76:817-823. \n\nFriedman,  J.  H.  and  Tukey,  J.  W.  (1974).  A  projection  pursuit  algorithm  for  exploratory \n\ndata analysis.  IEEE  Tmnsactions  on  Compute,-s,  C(23):881-889. \n\nHall,  P.  (1988).  Estimating  t.he  direction  in  which  data set  is  most  interesting.  PJ'Obab. \n\nTheo,'Y  Rei.  Fields,  80:51-78. \n\nHall,  P.  (1989a).  On  polynomial-based  projection  indices  for  exploratory  projection  pur(cid:173)\n\nsuit..  The  Annals of Statistics, 17:589-605. \n\nHall,  P.  (1989b).  On  projection  pursuit.  regression.  The  Annals of Statistics,  17:573-588. \nHornik,  K.  (1991).  Approximat.ioll  capabilities  of lUult.ilayer  feedforward  networks.  Neural \n\nNetwo,'ks,4:251-257 . \n\nHuber,  P.  J.  (1985).  Project.ion  pursuit..  (wit.h  discussion).  The  Annals  of Statistics, \n\n13:435-475. \n\nInt.l'ator,  N.  (1990).  Featllre  extract.ion  llsing  an  ullsupervised  neural  network.  In  Touret(cid:173)\n\nzky,  D.  S ..  EHman,  J.  L.,  Sejnowski.  T.  J.,  and  Hint.on,  G.  E ..  editors,  Proceedings  of \nthe  1990  Connectionist  Modds  Summer  Sclwol,  pages  310- :118.  Morgan  Kaufmann, \nSan  Mateo,  CA. \n\nIntrator,  N.  (1992).  Feat.ure  extraction  lIsing  an  unsupervised  nemal  network.  Neural \n\nCOml)utation,4:98-1tli. \n\nInt.rator,  N.  (1993).  Combining  exploratory  project.ion  pursuit  and  projection  pursuit \n\nregression  with  application  to neural  netwol'ks.  Neural  Computation.  In  press. \n\nIntrator,  N.  and  COOPCl',  L.  N.  (1992).  Object.ive  fUllction  formulation  of the  BCM  the(cid:173)\n\nory  of visual  cortical  plast.icity:  Stat.ist.ical  connect.ions,  stability  conditions.  Neural \nNetwOJ\u00b7ks,5:3-17. \n\nIntrator,  N.  and  Gold,  J.  I.  (1993).  Three-dimensional  object  recognition  of  gray  level \n\nimages:  The usefulness  of dist.inguishing  features.  New'al  Computation.  In  press. \n\nIntrator,  N.,  Gold,  J.I.. Biilthoff,  H.  Hoo  and  Edelman,  S.  (1991).  Three-dimensional  object \nrecognition  using  an  unsupervised  neural  net.work:  Underst.anding  the distinguishing \nfeatures.  In  Feldman.  Y.  and  Bruckstein,  A.,  edit.ors,  Pmceedings  of the  8th  Ismeli \nConference on AICll,  pages  113-123.  Elsevier. \n\nIntrator,  N.,  Reisfeld,  D.,  and  YeshUl'u 11 ,  Y.  (1992).  Face  recognition  using  a  hybrid \n\nsupervised/unsupervised  neural  network.  Preprint. \n\nIntrator,  N.  and  Tajchman,  G.  (1991).  Supervised  and  unsupervised  feature  extraction \nfrom  a  cochlear  model  for  speech  recognition. \nIn  Juang,  B.  H.,  Kung,  S.  Y.,  and \nKamm,  C.  A.,  editors,  Neuml  NetwOJ\u00b7J.;s  for  Signal  Pmcessing  - Proceedings  of the \n1991  IEEE  WOJ'kshop,  pages  460-469.  IEEE  Press,  New  York,  NY. \n\nJones,  L.  (1987).  On a conjecture of huber concerning t.he cOllvergE'nce of projection pursuit \n\nregression.  Annals of Statistics.  15:880-882. \n\nJones,  M.  C.  and  Sibson,  R.  (198i).  What.  is  projection  pursuit?  (with  discussion).  J. \n\nRoy.  Statist.  Soc ..  Ser.  A(150):1 \u00b7-36. \n\n\f10 \n\nIntrator \n\nKruskal,  J.  B.  (1969).  Toward  a  practical  method  which  helps  uncover  the structure of the \nset of multivariate  observat.ions  by  finding  the linear  transformation  which  optimizes \na  new  'index of condensat.ion'.  In  Milton,  R.  C.  and  Neider,  J.  A., editors,  Statistical \nComputation, pages  42i-440.  Academic  Press,  New  York. \n\nKruskal,  J.  B.  (1972).  Linear  transformation  of multivariate  data  to reveal  clustering.  In \nShepard,  R.  N.,  Romney,  A.  K.,  and  Nerlove,  S.  B., editors,  Multidimensional Scaling: \nTheol'Y and Application in the Behavioral Sciences,  I,  Theory, pages 179-191. Seminar \nPress,  New  York  and  London. \n\nLe  Cun,  Y.,  Boser,  B.,  Denker,  J.,  Henderson,  D.,  Howard,  R.,  Hubbard,  W.,  and  Jackel, \nL.  (1989).  Backpropagat.ion  applied  to  handwritten  zip  code  recognition.  Neural \nComputation,  1 :541-551. \n\nLincoln,  \\V.  P.  and  Skrzypek,  J.  (1990).  Synergy  of clustering  multiple  back-propagation \nnetworks.  In  Touretzky.  D.  S.  and  Lippmann,  R.  P.,  editors,  Advances  in  Neural \nIn/m'mation  Pmcessing  Systems,  volume  2,  pages  650-657.  Morgan  Kaufmanll,  San \nMateo,  CA. \n\nLyon,  R.  F.  (1982).  A  comput.at.iollal  model  of  filtering,  det.ect.ion,  and  compression  in \nthe cochlea.  In  Pmaedings IEEE Intenw/;o'Ual Con/et'ence on Acotlstics, Speech,  and \nSignal Pl'Ocessing.  Paris,  France. \n\nMougeot,  M.,  Azencott,  R.,  and  Angeniol,  B.  (1991).  Image compression  with  back  prop(cid:173)\n\nagation:  Improvement.  of t.he  visual  restoration  using  different.  cost  functions.  Neural \nNetlV07'ks,  4:467-476. \n\nNowlan, S.  J.  and HintoH,  G.  E.  (1992).  Simplifying  lIeurall1etwork~ by soft.  weight-sharing. \n\nNeum/  Computotion.  In  press. \n\nPearlmutter.  B.  A.  and  Rosenfeld,  R.  (1991).  Chaitin-kohnogorov  complexity  and  gen(cid:173)\n\nIII  Lippmann,  R.  P.,  Moody.  J.  E.,  and  Touretzky, \neralization  in  Heural  networks. \nD.  S.,  editors,  Adv(ltICfS  in  Neumlln/ol'flwtion  Pl'Ocessillg  Systems,  volume  3,  pages \n925-931.  Morgan  I\\aufmanll,  San  Mateo,  CA. \n\nPerrone,  M.  P.  and  Coop~r, L.  N.  (1992}.  When  lletworks disagree:  Generalized  ensemble \nIn  Mammone,  R.  J.  and  Zeevi,  Y.,  editors,  Neural \n\nmethod  for  neural  net.works. \nNetworks:  Theory  (mel  .4ppiicn/.iol1s,  volume  2.  Aca.demic  Press. \n\nSlaney,  M.  (1988).  Lyoll's  cochlear  model.  Technical  repOl\u00b7t,  Apple  Corporat.e  Library, \n\nCupertino,  CA  95014. \n\nSwitzer,  P.  (1970).  Numerical  c1assificat.ion.  In  BarIlelt..  V.,  edit.or,  Geostatistics.  Plenum \n\nPress,  New  York. \n\nTajchmall,  G.  N.  and  Intrator,  N.  (1992).  Phonet.ic  classification  of T1MIT segments  pre(cid:173)\n\nprocessed  with  lyoll's  cochlear  model  using  a  sllperviscd/un:mpenrised  hybrid  neural \nnetwork. \n[n  P\"oct'Cdings  Itllenwliolwl  COII/CI'CtlCC  on  Spoh'/l  Language  Processing, \nBanff,  Albert.a,  Canada. \n\nTurk, M.  and Pent.land,  A. (1991).  Eigc'lIfaces  for  recognit.ion.  1.  0/ Cognitive Netl1'Oscience, \n\n3:71-86. \n\nWaibel,  A.,  Hanazawa,  T.,  Hinton,  G.,  Shikano,  K.,  and  Lang,  I<.  {1989}.  Phoneme \nrecognition  using  time-delay  neura.!  net.works.  IEEE  Transoctions  on  ASSP, 37:328-\n339. \n\n\f", "award": [], "sourceid": 646, "authors": [{"given_name": "Nathan", "family_name": "Intrator", "institution": null}]}