{"title": "Computer Recognition of Wave Location in Graphical Data by a Neural Network", "book": "Advances in Neural Information Processing Systems", "page_first": 706, "page_last": 713, "abstract": null, "full_text": "Computer  Recognition  of  Wave  Location \nin  Graphical  Data  by  a  Neural  Network \n\nDonald  T.  Freeman \nSchool of Medicine \nUniversity of Pittsburgh \nPittsburgh. PA  15261 \n\nAbstract \n\nFive experiments were performed using several neural  network architectures to \nidentify  the  location  of a  wave  in  the  time  ordered  graphical  results  from  a \nmedical  test.  Baseline  results  from  the  first  experiment  found  correct \nidentification of the  target  wave in  85%  of cases  (n=20).  Other  experiments \ninvestigated the effect of different architectures and preprocessing the raw data on \nthe results.  The methods used seem most appropriate for time oriented graphical \ndata  which  has  a clear starting point such  as  electrophoresis  Or  spectrometry \nrather than continuous teSts such as ECGs and EEGs. \n\nI \n\nINTRODUCTION \n\nComplex  wave  form  recognition  is  generally  considered  to  be  a  difficult  task  for \nmachines.  Analytical approaches to this problem have been described and they work with \nreasonable accuracy  (Gabriel  et  al.  1980.  Valdes-Sosa  et al.  1987)  The  use  of these \ntechniques, however,  requires substantial mathematical  Iraining and  the process is often \ntime consuming and labor intensive (Boston 1987).  Mathematical modeling also requires \nsubstantial  knowledge of the particular details of the  wave  forms  in  order to  determine \nhow  to apply the models and to determine detection criteria.  Rule-based expert systems \nhave also been used for the recognition of wave forms (Boston  1989).  They require that a \nknowledge engineer work closely with a domain expert to exlract the rules that the expert \nuses  to perform  the recognition.  If the rules are ad hoc or if it is difficult for experts  to \narticulate  the  rules  they  use.  then  rule-based  expert  systems  are  cumbersome  to \nimplement. \nThis paper describes the use of neural networks to recognize the location of peak V from \nthe  wave-form  recording  of  brain  stem  auditory  evoked  potential  tests.  General \ndiscussions of connectionist networks can be found  in (Rumelhart and McClelland  1986). \nThe main features  of neural  networks that are relevant for our purposes revolve around \ntheir ease of use  as compared  to  other modeling  techniques.  Neural  networks  provide \nseveral advantages over modeling with differential equations or rule-based systems.  First. \nthere is no knowledge engineering phase.  The network is  trained  automatically  using a \nseries of examples along with  the \"right answer\"  to each example.  Second. the resulting \nnetwork  typically  has  significant predictive  power when  novel  examples are  presented. \nSo,  neural  network  technology  allows  expert  performance  to  be  mimicked  without \nrequiring that expert knowledge be codified in a Iraditional  fashion.  In addition. neural \nnetworks.  when  used  to  perform  signal  analysis.  require  vastly  less  restrictive \n\n706 \n\n\fComputer Recognition of Wave Location in Graphical  Data by a Neural Network \n\n707 \n\nassumptions about the strucblre of the input signal than  analytical  techniques (Gonnan \nand Sejnowski 1988).  Still, neural nets have not yet been  widely applied to problems of \nthis  sort  (DeRoach  1989).  Nevertheless,  it seems  that  interest  is  growing  in  using \ncomputers, especially neural  networks, to solve advanced  problems  in  medical decision \nmaking (Sblbbs  1988). \n\n1.1  BRAIN  STEM  AUDITORY  EVOKED  POTENTIAL  (BAEP) \n\nSensory evoked  potentials are electric signals from  the brain  that occur in  response  to \ntransient auditory, somatosensory, or visual stimuli such as a click, pinprick, or flash  of \nlight.  The signals, recorded from electrodes placed on a subject's scalp, are a measure of \nthe electrical activity in the subject's brain both  from  response to  the stimulus and  from \nthe  spontaneous  electroencephalographic  (EEG)  activity  of the  brain.  One  way  of \ndiscerning the response to the stimulus from  the background EEG noise is to average the \nindividual  responses  from  many  identical  stimuli.  When  \"cortical  noise\"  has  been \nremoved  in  this  way,  evoked  potentials can  be  an  important  noninvasive  measure  of \ncentral nervous system function.  They are used in sbldies of physiology and psychology, \nfor the diagnosis of neurologic disorders (Greenberg et al.  1981).  Recently attention has \nfocused  on  continuous automated  monitoring  of the  BAEP  intraoperatively  as  well  as \npost-operatively for evaluation of central nervous system function  (Moulton et al.  1991). \nBrain stem auditory evoked potentials (BAEP) are generated in  the auditory pathways of \nthe  brain  stem.  They  can  be  used  to  asses  hearing  and  brain  stem  function  even  in \nunresponsive or uncooperative patients. \nThe BAEP test involves placing headphones on  the patient, flooding  one ear with  white \nnoise. and delivering clicks into the other ear.  Electrodes on the scalp both on  the same \nside (ipsilateral) and opposite side (contralateral) of the clicks record the electric potentials \nof brain activity for  10 msec. following each click.  In the protocol used at the University \nof Pittsburgh Presbyterian University Hospital (pUH). a series of 2000 clicks is delivered \nand  the  results  from  each click  - a graph  of electrode activit>;:  over the  10  msec.  - are \naveraged into a single graph.  Results from  the  stimulation of one ear  with  the  clicks is \nreferred to as \"one ear of data\". \nA graph of the wave fonn  which results from  the averaging of many stimuli appears as a \nseries  of peaks following  the stimulus  (Figure  1).  The resulting graph  typically  has  7 \nimportant peaks but often includes other peaks resulting from  the noise  which  remains \nafter averaging.  Each important peak represents the  firing  of a group of neurons in  the \nauditory  neural pathwayl.  The time of arrival of the peaks (the peak latencies) and  the \namplitudes of the peaks are used to characterize the response.  The latencies of peaks I. III, \nand V are typically used to detennine if there is evidence of slowed central nervous system \nconduction  which  is  of value  in  the  diagnosis  of multiple  sclerosis  and  other disease \nstates2.  Conduction delay  may be seen in  the  left, right, or both BAEP pathways.  It is \nof interest that the time of arrival of a wave on  the ipsilateral and contralateral sides may \nbe slightly different.  This effect becomes more exagerated the more distant the correlated \npeaks are from  the origin (Durrant. Boston, and Martin  1990). \nTypically there are several  issues  in  the  interpretation  of the graphs.  First.  it  must  be \nclear that some neural response to the auditory stimulus is represented in the wave fonn. \nIf a response  is present,  the peaks  which correspond  to nonnal and  abnonnal responses \nmust be distinguished from  noise which remains in the signal even after averaging.  Wave \nIV  and  wave  V occasionally  fuse,  forming  a  wave  IV N  complex,  confounding  this \n\nIPutative generators are:  I-Acoustic nerve;  II-Cochlear nucleus;  III-Superior olivary \nnucleus;  IV -Lateral  lemniscus;  V -Inferior colliculus:  VI-Medial  geniculate nucleus; \nVII-Auditory radiations. \n\n20ther disorders  include  brain  edema. acoustic  neuroma.  gliomas.  and  central  pontine \nmyelinolysis. \n\n\f708 \n\nFreeman \n\nf \n\ni \ni \n\nr'\" _.  I \nI \nl-- I \nI I \n\nI \nI ' \n\nprocess.  In  these cases we say that wave V is absenL  Finally, the latencies and possibly \nthe amplitudes of the identified peaks are be  measured and a diagnostic explanation for \nthem is developed. \n\n.\n\n.... .  \u00b7\u00b7 ... \u00b7f\u00b7\u00b7--- \u00b7\u00b7 \u00b7j \n\n._, \n\nI \n\n.... , \n\n---n \nI i \nI ! \n\u00b7\u00b7_\u00b74 \u00b7! \n!  I I ; \n\nf.  ___ . \u2022 \n\n..i.. \n\n! . \n\n. \"  .\n\n. , \n\n., ' .. \n\nI \n\niJ \nI  i \nI  I \nI  I \nI \n.. . __ .J \nI \nj \nj \n\n.-~-' \n\n.1,- _ . . __ . \n\nFigure I.  BAEP chart with the time of arrival for waves I to V identified. \n\n2  METHODS  AND  PROCEDURES \n\n2.1  DATA \n\nPlots of BAEP tests were obtained from  the evoked potential files from  the last 4 years at \nPUH.  A preliminary group of training  cases consisting of 13  patients or 26 ears  was \nselected by traversing the files alphabetically from  the  beginning of the alphabet.  This \n\n\fComputer Recognition of Wave Location  in Graphical  Data by a Neural  Network \n\n709 \n\ngroup was subsequently extended to 25 patients Or 50 ears, 39 nonnals and 11  abnonnals. \nMost BAEP tests show  no  abnonnalities:  only  1 of the  first 40 ears  was abnonnal.  In \norder to create a training set with an adequate number of abnonnal cases we included only \npatients with abnonnal ears after these first 40 had been selected.  Ten abnonnal ears were \nobtained from  a search of 60 patient meso  Test cases were selected from  files starting at \nthe end of the alphabet, moving toward  the  beginning,  the  opposite of the process used \nfor  the  training  cases.  Unlike  the  training  set  - where  some cases  were  selected  over \nothers - all  cases were included in  the  test set without bias.  No cases were common  to \nboth sets.  A total of 10 patients or 20 ears were selected.  Table  I  summarizes the input \ndata. \nFor one of the experiments, another data  set was made  using  the  ipsilateral  data for 80 \ninputs  and  the  derivative  of  the  curve  for  the  other  80  inputs.  The  derivative  was \ncomputed by subtracting the amplitude of the point's successor from  the amplitude of the \npoint and dividing by 0.1. \nThe ipsilateral and contralateral wave recordings were  transfonned  to machine readable \nfonnat by manual tracing with a BitPad Plus~ digitizer.  A fonnal protocol was followed \nto  ensure that a high  fidelity  transcription  had  been  effected.  The approximately 400 \npoints which resulted from  the digitization of each ear were graphed and compared to the \noriginal  tracings.  If the  tracings  did  not  match,  then  the  transcription  was  performed \nagain.  In addition, the originally recorded latency values for peak V were corrected for any \ndistortion  in  the digitizing  process.  The distortion  was  judged  by  a neurologist  to  be \nminimal. \n\nTable  I:  Composition of Input Data \n\nCases \n\nNonnalEars \n\nAbnonnal Ears \n\nTotal Ears \n\nProlonged V \n\nAbsent V \n\nTotal \n\nTraining \n\nTesting \n\n39 \n\n18 \n\n8 \n\n0 \n\n3 \n\n2 \n\n11 \n\n2 \n\n50 \n\n20 \n\nA program was written to process the digital wave fonns, creating an output file readable \nby the neural network simulator.  The program discarded the rust and last  1 msec.  of the \nrecordings.  The  remaining  points  were  sampled  at  0.1  msec.  intervals  using  linear \ninterpolation to estimate an amplitude if a point had not been recorded  within 0.01  msec. \nof the  desired  time.  These  points  were  then  normalized  to  the  range  <-1,1>.  The \nresulting 80 points  for  the ipsilateral  wave and  80  points for  the contralateral  wave  (a \ntotal of 160 points)  were used  as  the initial activations for the input layer of processing \nelements. \n\n2.2  ARCHITECTURES \n\nEach of the four  network architectures had  160 input nodes.  Each  node represented the \namplitude  of  the  wave  at  each  sample  time  (1.0  to  8.9  ms,  every  0.1  ms).  Each \narchitecture also had  80 output nodes with  a similar temporal  interpretation  (Figure  2). \nArchitecture  1 (AI) had  30 hidden units connected only  to  the ipsilateral  input units. 5 \nhidden  units connected only to the contralateral input units and 5 hidden units connected \nto all  the  input units.  The hidden  units for all  architectures  were fully  connected  to  the \noutput units.  Architecture 2 (A2)  reversed  these  proportions.  Architecture 3 (A3)  was \nfully connected to  the inputs.  Architecture 4 (A4)  preserved  the proportions of Al but \nhad  16 ipsilateral hidden units, 3 contralateral. and 3 connected to both.  All architectures \nused the sigmoid transfer function  at both the hidden and output layers and all units were \nattached to a bias unit. \nThe distribution of the hidden  units was chosen  with  the knowledge that human experts \nusually  use information  from  the ipsilateral  side but refer to  the  contralateral side only \n\n\f710 \n\nFreeman \n\nwhen  features  in  the  ipsilateral  side are  too  obscure  to  resolve.  The  selection  of the \nnumber of hidden  units  in neural network models remains an  art.  In order to detennine \nwhether the size of the hidden unit layer could be changed,  we repeated  the experiments \nusing  Architecture  2  where  the  number  of hidden  units  was  reduced  to  16,  with  10 \nconnected to the ipsilateral inputs, 3 to the contralateral inputs, and 3 connected to all  the \ninputs. \n\n2.3  TRAININ G \n\nFor training,  target values  for  the output layer were all  0.0 except for  the output nodes \nrepresenting the time of arrival for wave V (reported on the BAEP chart) and one node on \neach side of it  The peak node target was 0.95 and  the two adjacent nodes had targets of \n0.90.  For cases in which  wave V was absent, the target for all the output nodes was 0.0. \n\nA  neural  network  simulator  (NeuralWorks  Professional  II~ version  3.5)  was  used  to \nconstruct the networks and run the simulations.  The back-propagation learning algorithm \nwas  used  to  train  the  networks.  The  random  number  generator  was  initialized  with \nrandom  number seeds taken  from  a random  number table.  Then network weights were \ninitialized  to  random  values  between  -0.2  and  0.2  and  the  training  begun.  Since  our \nrandom  number generator is detenninistic - given the random number seed - these trials \nare replicable. \n\noutput \n\nhidden \n\n'--____  --'  input \n\nipsilateral \n\ncontralateral \n\nFigure 2.  Diagram of Architecture 1 with representation of input and output data shown. \n\nEach  of the 50 ears of data in  the  training set was presented using a randomize, shuffle, \nand deal  technique.  Network  weights were saved  at various stages of learning,  usually \nafter every  1000 presentations (20 epochs) until the cumulative RMS  error for an epoch \nfell  below  0.01.  The  contribution  of each  training  example  to  the  total  error  was \nexamined to detennine whether a few examples were the source of most of the error.  If \nso,  training  was  continued  until  these  examples  had  been  learned  to  an  error  level \ncomparable  to  the  rest of the  cases.  After  training,  the  20  ears  in  the  test  set  were \npresented to each of the saved networks and the output nodes of the net were examined for \neach test case. \n\n\fComputer Recognition of Wave  Location  in  Graphical  Data by a Neural  Network \n\n711 \n\n2.4  ANALYSIS  OF  RESULTS \n\nA  threshold  method  was used  to analyze the data.  For each of the  test cases the actual \nlocation of the maximum valued output unit was compared to the expected location of the \nmaximum  valued  output  unit.  For  a  network  result  to  be  classified  as  a  correct \nidentification in  the wave V present (true positive), we  require that the maximum  valued \noutput unit have an activation which is over an activity-threshold (0.50) and that the unit \nbe within a distance-threshold (0.2 msec.) of the expected location of wave V.  For a true \nnegative identification of wave V - a correct identification of wave V being absent - we \nrequire that all the output activities be below the activity threshold and that the case have \nno wave V to find.  The network makes a false positive prediction of the location of wave \nV  if some activity  is  above  the  activity  threshold  for  a  case  which  has  no  wave  V. \nFinally,  there are two ways  for  the network  to  make a  false  negative  identification  of \nwave V.  In both  instances, wave V must be present in  the case.  In  one instance, some \noutput node  has  activity above the  activity  threshold,  but  it is  outside  of the  distance \nthreshold.  This corresponds to  the identification of a wave V but in the wrong place.  In \nthe other instance, no node attains activity over the activity threshold, corresponding to a \nfailure to find a wave V when there exists a wave V in  the case to find. \n\n2.5  EXPERIMENTS \n\nFive experiments were performed.  The flfst four used different architectures on the same \ndata set and the last used architecture Al on  the derivatives data set.  Each of the network \narchitectures  was  trained  from  different  random  starting  positions.  For each  trial,  a \nnetwork was randomized and trained as described above.  The networks were sampled as \nlearning progressed. \n\nExperiment 1 determined how  well archtecture Al  could identify  wave  V and provided \nbaseline results for  the remaining experiments.  Experiments 2 and  3 tested  whether our \nuse  of  more  hidden  units  attached  to  ipsilateral  data  made  sense  by  reversing  the \nproportion  of hidden  units  alloted  to  ipsilateral  data  processing  (experiment 2) and  by \ntring a fully connected network (experiment 3).  Experiment 4 determined whether fewer \nhidden units could be used.  Experiment 5 investigated whether preprocessing of the input \ndata to  make derivative information available would  facilitate  network  identification of \npeak location. \n\n3  RESULTS \n\nResults from the best network found for each of five experiments are shown in Table 2. \n\nTable 2:  Results from presentation of 20 test cases to various network architectures. \n\nExperiment \n\nNetwork \n\nTP \n\n'IN \n\nTotal \n\nFP \n\nFN \n\nTotal \n\nI \n\n2 \n\n3 \n\n4 \n\n5 \n\nAl \n\nA2 \n\nA3 \n\nA4 \n\nAl \n\n16 \n\n16 \n\n16 \n\n15 \n\n15 \n\n1 \n\n0 \n\n0 \n\n0 \n\n1 \n\n17 \n\n16 \n\n16 \n\nIS \n\n16 \n\n1 \n\n2 \n\n2 \n\n3 \n\n1 \n\n2 \n\n2 \n\n2 \n\n2 \n\n3 \n\n3 \n\n4 \n\n4 \n\n5 \n\n4 \n\n4  DISCUSSION \n\nIn Experiment I, the three cases which were incorrectly identified were examined closely. \nIt is  not evident from  inspection  why  the  net  failed  to  identify  the peaks or identified \n\n\f712 \n\nFreeman \n\npeaks where there were none to identify.  Where peaks are present, they are not unusually \nlocated or surrounded by noise.  The appearance of their shape seems similar to the cases \nwhich  were  identified  correctly.  We  believe  that  more  training  examples  which  are \n\"similar\"  to  these 3  test cases, as  well  as  examples  with  greater variety,  will  improve \nrecognition of these cases.  This improvement comes not from  better generalization but \nrather from a reduced requirement for generalization.  If the net is trained with cases which \nare increasingly similar to the cases which will be used to  test it,  then recognition of the \ntest cases becomes easier at any given level of generalization. \n\nThe distribution of hidden units in Al  was chosen with the knowledge that human experts \nuse information primarily from  the ipsilateral side, referring to the contralateral side only \nwhen  ipsilateral  features  are too  obscure  to  resolve.  Experiments  2 and  3  investigate \nwhether this reliance on  ipsilateral data suggests that there should be  more hidden units \nfor  the  ipsilateral  side or for  the contralateral  side.  The  identical  results  from  these \nexperiments are similar to those of Experiment l.  One interpretation is that it is possible \nto make diagnoses of BAEPs  using very few  features from  the ipsilateral side.  Another \ninterpretation is  that it is possible to  use  the contralateral  data as the chief information \nsource, contrary to our expert's belief. \n\nExperiment 4 investigates whether fewer features are needed by restricting the hidden layer \nto 20 hidden  units.  The slight degradation of performance indicates that it is possible to \nmake  BAEP  diagnoses  with  fewer  ipsilateral  features.  Experiment  5  utilized  the \nipsilateral  waveform and  its  derivative  to determine  whether this  pre-processing  would \nimprove  the results.  Surprisingly,  the  results did  nOl  improve,  but  it  is  possible  that a \nbetter estimator of the derivative will prove this method  useful. \n\nFinally,  when  the  weights  from  all  the  networks  above  were  examined,  we  found  that \namplitudes from only the area where wave  V falls were used.  This suggests that it is  not \nnecessary to know the location of wave III  before determining the location of wave V,  in \nsharp contrast to expert's intuition.  We believe the networks form  a \"local expert\" for the \nidentification of wave V which does not need 10 interact with da\"l from other parts of the \ngraph, and that other such local experts will be formed as we expand the project's scope. \n\n5  CONCLUSIONS \n\nAutomated wave form recognition is considered to be a difficult task for  machines and an \nespecially difficult task for neural networks.  Our results offer some encouragement that \nin some domains neural networks may  be applied to perform wave form  recognition and \nthat the technique will be extensible as problem complexity increases. \nStill, the accuracy of the networks we  have discussed is not high enough  for clinical use. \nSeveral extensions have been attempted and others considered including 1) increasing the \nsampling rate to decrease the granularity of the input data, 2)  increasing the training set \nsize, 3) using a different representation of the output for wave V absent cases, 4) using a \ndifferent representation  of the  input,  such  as  the  derivative  of the  amplitudes,  and  5) \narchitectures which allow hybrids of these ideas. \n\nFinally.  since  many  other  tests  in  medicine  as  well  as  other  fields  require  the \ninterpretation of graphical data, it is  tempting to consider extending this method to other \ndomains.  Many other tests in  medicine as well as other fields require the interpretation of \ngraphical data.One distinguishing feature  of the BAEP is  that there  is  no  difficulty  with \nthe time registration  of the data;  we  always  know  where  to start looking  for  the  wave. \nThis is in  contrast to an  EKG, for  example,  which  may  require substantial effort just to \nidentify the beginning of a QRS  complex.  Our results indicate that  the interpretation of \ngraphs  where  the  time  registration  of data  is  not  an  issue  is  possible  using  neural \nnetworks.  Medical  tests  for  which  this  technique would  be appropriale include:  other \nevoked potentials, spectrometry, and gel electrophoresis. \n\n\fComputer Recognition of Wave Location  in Graphical  Data by a Neural Network \n\n713 \n\nAcknowledgements \n\nThe author wishes to thank Dr. Scott Shoemaker of the Department of Neurology  for his \nexpertise, encouragement, constructive criticism, patience, and  collaboration  throughout \nthe progress of this work.  This research has been supported by NLM Training grant TIS \nLM-07059. \n\nReferences \n\nBoston, ].R.  1987. Detection criteria for  sensory evoked potentials.  Proceedings of 9th \nAnn.  IEEElEMBS Conf., Boston, MA. \n\nBoston, ].R. 1989.  Automated  interpretation of brainstem  auditory evoked potentials:  a \nprototype system. IEEE Trans. Biomed. Eng. 36 (5) : 528-532. \n\nDeRoach, ].N. 1989.  Neural networks - an artificial intelligence approach to the analysis \nof clinical data.  Austral. Pbys.  &  Eng. Sci. in Med.  12 (2);  100-106. \n\nDurrant, ].0., ] .R.  Boston, and  W.H.  Martin.  1990.  Correlation  study of two-channel \nrecordings of the brain stem auditory evoked potential. Ear and Hearing  11  (3) ; 215-221. \n\nGabriel, S., ].0. Durrant, A.E.  Dickter, and 1.E. Kephart.  1980.  Computer identification \nof waves in  the auditory  brain  stem evoked potentials.  EEG  and  Clin.  Neurophys. 49  : \n421-423. \n\nGorman, R.  Paul, and Terrence 1. Sejnowski.  1988.  Analysis of hidden  units in a layered \nnetwork trained to classify sonar targets. Neural  Networks  1 : 75-89. \n\nGreenberg,  R.P.,  P.G.  Newlon,  M.S.  Hyatt,  R.K.  Narayan,  and  D.P.  Becker.  1981. \nPrognostic implications of early multi modality evoked potentials in severely head-injured \npatients. 1.  Neurosurg 5; 227-236. \n\nMoulton, Richard, Peter Kresta, Mario Ramirez, and William Tucker.  1991.  Continuous \nautomated monitoring of somatosensory evoked potentials in posttraumatic coma.l.o.um.al \nof Trauma 31  (5)  ; 676-685. \n\nRumelhart,  David E., and James L.  McClelland.  1986.  Parallel  distributed  processing. \nCambridge. Mass:  MIT Press. \nStubbs, 0  F.  1988.  Neurocomputers.  MD  ComDut 5 (3) :  14-24. \n\nValdes-Sosa,  M.J.,  M.A.  Bobes.  M.C.  Perez-abalo.  M.  Perra.  1.A.  Carballo,  and  P. \nValdes-Sosa.  1987.  Comparison of auditory  evoked potential detection methods  using \nsignal detection theory. Au.di,Ql26:  166-178. \n\n\f", "award": [], "sourceid": 556, "authors": [{"given_name": "Donald", "family_name": "Freeman", "institution": null}]}