{"title": "A Neural Network for Real-Time Signal Processing", "book": "Advances in Neural Information Processing Systems", "page_first": 248, "page_last": 255, "abstract": null, "full_text": "248  MalkofT \n\nA Neural Network for Real-Time Signal Processing \n\nDonald B.  Malkoff \n\nGeneral Electric  /  Advanced Technology Laboratories \n\nMoorestown Corporate Center \n\nBuilding  145-2,  Route 38 \nMoorestown,  NJ  08057 \n\nABSTRACT \n\nThis paper describes a  neural network algorithm that (1)  performs \ntemporal pattern matching in real-time, (2)  is trained on-line, with \na single pass,  (3)  requires only a single template for training of each \nrepresentative  class,  (4)  is  continuously  adaptable  to  changes  in \nbackground noise, (5) deals with transient signals having low signal(cid:173)\nto-noise ratios,  (6) works in the presence of non-Gaussian noise,  (7) \nmakes use of context dependencies and (8) outputs Bayesian proba(cid:173)\nbility estimates.  The algorithm has been adapted to the problem of \npassive sonar signal detection and classification.  It runs on a  Con(cid:173)\nnection  Machine  and  correctly  classifies,  within  500  ms  of onset, \nsignals embedded in noise and subject to considerable uncertainty. \n\nINTRODUCTION \n\n1 \nThis paper describes a  neural network algorithm, STOCHASM, that was developed \nfor  the  purpose  of real-time  signal  detection and  classification.  Of prime  concern \nwas  capability  for  dealing  with  transient  signals  having  low  signal-to-noise  ratios \n(SNR). \nThe algorithm was first developed in 1986 for real-time fault detection and diagnosis \nof malfunctions  in  ship  gas  turbine  propulsion  systems  (Malkoff,  1987).  It subse(cid:173)\nquently was adapted for  passive sonar signal detection and classification.  Recently, \nversions for  information fusion  and radar classification  have  been developed. \n\nCharacteristics of the algorithm that are of particular merit include  the following: \n\n\fA Neural Network for Real-Time Signal Processing \n\n249 \n\n\u2022  It performs  well  in  the  presence  of either  Gaussian  or  non-Gaussian  noise, \n\neven where the  noise  characteristics are changing. \n\n\u2022  Improved classifications result  from  temporal  pattern matching in  real-time, \n\nand by taking advantage of input data context dependencies. \n\n\u2022  The  network  is  trained  on-line.  Single  exposures  of target  data require  one \npass  through  the  network.  Target  templates,  once  formed,  can  be  updated \non-line. \n\n\u2022  Outputs consist  of numerical estimates of closeness  for  each of the  template \n\nclasses,  rather than nearest-neighbor  \"all-or-none\"  conclusions. \n\n\u2022  The algorithm is  implemented in  parallel code  on a  Connection Machine. \n\nSimulated signals,  embedded  in  noise  and  subject  to considerable  uncertainty,  are \nclassified  within 500  ms of onset. \n\n2  GENERAL  OVERVIEW OF THE NETWORK \n2.1  REPRESENTATION  OF THE  INPUTS \n\nSonar  signals  used  for  training  and  testing  the  neural  network  consist  of pairs  of \nsimulated  chirp signals  that  are  superimposed  and  bounded  by  a  Gaussian  enve(cid:173)\nlope.  The signals are subject to random fluctuations and  embedded in  white noise. \nThere  is  considerable  overlapping  (similarity)  of the  signal  templates.  Real  data \nhas recently become available for  the radar domain. \n\nOnce generated,  the time series of the sonar signal is subject to special transforma(cid:173)\ntions.  The  outputs of these  transformations are  the  values  which are  input  to the \nneural network.  In addition,  several higher-level signal features,  for  example,  zero \ncrossing  data,  may  be  simultaneously  input  to the  same  network,  for  purposes  of \ninformation fusion.  The  transformations differ  from  those  used  in  traditional  sig(cid:173)\nnal processing.  They contribute to the real-time performance and temporal pattern \nmatching capabilities of the algorithm by possessing all the following characteristics: \n\n\u2022  Time-Origin Independence:  The sonar input signal is  transformed so the \nresulting  time-frequency  representation  is  independent  of the  starting  time \nof the  transient  with  respect  to  its  position  within  the  observation  window \n(Figure  1).  \"Observation  window\"  refers  to  the  most  recent  segment  of the \nsonar time series that is currently under analysis. \n\n\u2022  Translation  Independence:  The  time-frequency  representation  obtained \nby  transforming  the  sonar  input  transient  does  not  shift  from  one  network \ninput node to another as the transient signal moves across most of the obser(cid:173)\nvation  window  (Figure  1).  In  other  words,  not  only  does  the  representation \nremain the same while the transient moves, but its position relative to specific \nnetwork nodes also does  not change.  Each given  node continues  to receive its \n\n\f250  Malkoff \n\nusual kind of information about the sonar transient, despite the relative posi(cid:173)\ntion of the  transient in the  window.  For  example,  where  the  transform is  an \nFFT, a  specific  input layer node  will always receive the output of one specific \nfrequency  bin, and none  other. \nWhere  the SNR is  high,  translation  independence  could  be accomplished  by \na  simple  time-transformation  of the  representation  before  sending  it  to  the \nneural network.  This is not possible in conditions where the SNR is sufficiently \nlow  that  segmentation of the  transient  becomes  impossible  using  traditional \nmethods  such  as  auto-regressive  analysis;  it  cannot  be  determined  at  what \ntime the transient signal originated and where it is in the observation window . \n\n\u2022  The  representation  gains  time-origin  and  translation  .ndependence  without \n\nsacrificing  knowledge about  the signal's  temporal  characteristics or  its com(cid:173)\nplex  infrastructure.  This  is  accomplished  by  using  (1)  the absolute  value of \nthe  Fourier transform (with respect  to time)  of the spectrogram of the sonar \ninput,  or  (2)  the  radar  Woodward  Ambiguity  Function.  The derivation and \ncharacterization of these  methods for  representing data is  discussed  in a sep(cid:173)\narate paper (Malkoff,  1990). \n\nEncoded Outputs \n\nOlff.ent Aspects of the TransfOtmltlon Output. must \nalways enter their same 'l*lal node. of the Network \nand result In 1M same c/asslflcatlon. \n\nFigure  1:  Despite  passage  of  the  transient,  encoded  data  enters  the  same  net(cid:173)\nwork  input  nodes  (translation  independence)  and  has  the  same  form  and  output \nclassification (time-origin independence) . \n\n\fA Neural Network for Real-Time Signal Processing \n\n251 \n\n2.2  THE NETWORK ARCHITECTURE \n\nSonar data,  suitably transformed,  enters the  network input layer.  The input layer \nserves  as  a  noise  filter,  or discriminator.  The  network  has  two  additional  layers, \nthe  hidden and  output  layers  (Figure  2).  Learning of target  templates,  as  well  as \nclassification of unknown targets, takes place in a single \"feed-forward\" pass through \nthese layers.  Additional exposures to the same target lead to further enhancement of \nthe  template,  if training, or refinement of the classification  probabilities,  if testing. \nThe hidden layer deals only with data that passes through the input filter.  This data \npredominantly represents a  target.  Some  degree  of context dependency evaluation \nof the  data  is  achieved.  Hidden  layer  data  and  its  permutations  are  distributed \nand  maintained  intact,  separate,  and  transparent.  Because  of this,  credit  (error) \nassignment is  easily performed. \n\nIn  the  output  layer,  evidence  is  accumulated,  heuristically  evaluated,  and  trans(cid:173)\nformed  into figures  of merit for  each possible  template class. \n\nIINPU'f LA YEA \n\nI OUTPUT LAYER  I \n\nFigure .2:  STOCHASM  network architecture. \n\n2.2.1  The Input Layer \n\nEach input layer node receives a succession of samples of a  unique  part of the sonar \nrepresentation.  This series of samples is stored in a  first-in,  first-out  queue. \n\nWith  the  arrival  of each  new  input  sample,  the  mean  and  standard  deviation  of \nthe values in the queue are recomputed at every node.  These statistical parameters \n\n\f252  Malkdf \n\nare  used  to detect  and  extract  a  signal  from  the  background  noise  by  computing \na  threshold  for  each  node.  Arriving  input  values  that  exceed  the  threshold  are \npassed  to  the  hidden  layer  and  not  entered  into  the  queues.  Passed  values  are \nexpressed  in  terms  of z-values  (the  number  of standard  deviations  that  the  input \nvalue differs  from  the mean of the queued  values).  Hidden layer nodes receive only \ndata exceeding thresholds;  they are otherwise inactive. \n\n2.2.2  The Hidden Layer \n\nThere are  three basic types of hidden layer nodes: \n\n\u2022  The first  type receive values from only a  single  input layer node;  they  reflect \n\nabsolute changes in an input layer parameter. \n\n\u2022  The second type receive values from a pair of inputs where each of those values \n\nsimultaneously deviates from  normal in the same direction. \n\n\u2022  The third type  receive values from a  pair of inputs where each of those values \n\nsimultaneously deviates from normal in opposite directions. \n\nFor N  data inputs,  there are a  total of N2  hidden layer nodes. \n\nValues are  passed  to  the  hidden  layer  only  when  they  exceed  the  threshold  levels \ndetermined  by  the  input  node  queue.  The  hidden  layer  values are  stored  in  first(cid:173)\nin,  first-out  queues,  like  those  of the  input  layer.  If  the  network  is  in  the  testing \nmode, these values represent signals awaiting classification.  The mean and standard \ndeviation  are  computed for  each of these  queues,  and  used  for  subsequent pattern \nmatching.  If,  instead,  the  network  is  in  the  training mode,  the passed  values and \ntheir statistical descriptors are stored as templates at their corresponding nodes. \n\n2.2.3  Pattern Matching Output Layer \n\nPattern matching consists of computing Bayesian  likelihoods  for  the  undiagnosed \ninput  relative  to  each  template  class.  The  computation assumes  a  normal  distri(cid:173)\nbution  of the  values  contained  within  the  queue  of each  hidden  layer  node.  The \nstatistical  parameters  of the  queue  representing  undiagnosed  inputs  are  matched \nwith  those  of each of the  templates.  For  example,  the  number  of standard  devia(cid:173)\ntions distance between the means of the \"undiagnosed\"  queue and a  template queue \nmay  be  used  to demarcate  an area  under a  normal  probability distribution.  This \narea is  then  used  as a  weight,  or measure,  for  their  closeness  of match.  Note  that \nthis computation has a  non-linear,  sigmoid-shaped output. \n\nThe  weights  for  each  template  are  summed  across  all  nodes.  Likelihood  values \nare  computed  for  each  template.  A  priori  data  is  used  where  available,  and  the \nresults normalized  for  final  outputs.  The number of computations is  minimal and \ndone  in  parallel;  they  scale  linearly  with  the  number  of  templates  per  node.  If \nmore  computer  processing  hardware  were  available,  separate  processors  could  be \nassigned  for  each  template  of every  node,  and  computational  time  would  be  of \nconstant complexity. \n\n\fA Neural Network for Real-Time Signal Processing \n\n253 \n\n3  PERFORMANCE \nThe sonar version was  tested against three sets of totally overlapping double chirp \nsignals,  the  worst  possible  case  for  this  algorithm.  Where  training  and  testing \nSNR's differed  by a  factor of anywhere from 1 to 8,  46  of 48  targets were correctly \nrecognized . \n\nIn extensive simulated testing against radar and jet engine modulation data, classi(cid:173)\nfications  were better than 95%  correct down  to -25  dB  using  the unmodified  sonar \nalgorithm. \n\n4  DISCUSSION \nDistinguishing features  of this algorithm include  the following  capabilities: \n\n\u2022  Information fusion. \n\n\u2022  Improved classifications. \n\n\u2022  Real-time  performance. \n\n\u2022  Explanation of outputs. \n\n4.1 \n\nINFORMATION FUSION \n\nIn STOCHASM,  normalization of the  input data facilitates  the comparison of sep(cid:173)\narate data items  that are  diverse  in  type.  This is  followed  by  the  fusion,  or  com(cid:173)\nbination,  of all  possible  pairs of the  set  of inputs.  The resulting  combinations are \ntransferred  to  the  hidden  layer  where  they are  evaluated  and  matched  with  tem(cid:173)\nplates.  This allows the combining of different features derived either from  the same \nsensor  suite  or  from  several different  sensor suites.  The latter  is  often  one  of the \nmost challenging tasks in  situation assessment. \n\n4.2 \n\nIMPROVED  CLASSIFICATIONS \n\n4.2.1  Multiple Output Weights  per Node \n\nIn STOCHASM,  each hidden  layer node  receives  a  single  piece  of data represent(cid:173)\ning  some  key  feature  extracted  from  the  undiagnosed  target  signal.  In  contrast, \nthe  node  has  many separate  output  weights;  one  for  every  target  template.  Each \nof those output  weights  represents an actual correlation between the  undiagnosed \nfeature  data and  one  of the  individual  target  templates.  STOCHASM  optimizes \nthe  correlations of an unknown input  with each possible  class.  In so  doing,  it also \ngenerates figures  of merit  (numerical estimates  of closeness of match)  for  ALL  the \npossible  target classes,  instead of a  single  \"all-or-none\"  classification. \n\nIn  more  popularized  networks,  there  is  only  one  output  weight  for  each  node.  Its \neffectiveness is diluted by having to contribute to t!1e  correlation between one  undi(cid:173)\nagnosed feature data and MANY different templates.  In order to achieve reasonable \nclassifications, an extra set of input connection weights is employed.  The connection \n\n\f254 \n\nMalkofT \n\nweights  provide a  somewhat  watered-down numerical estimate of the contribution \nof their particular input data feature  to the correct classification,  ON  THE A VER(cid:173)\nAGE, of targets representing all possible classes.  They employ iterative procedures \nto compute  values for  those  weights,  which  prevents real-time  training and  gener(cid:173)\nates sub-optimal correlations.  Moreover,  because all of this results in only a  single \noutput for  each hidden  layer node,  another set  of connection  weights  between the \nhidden  layer  node  and  each  node  of the output  layer  is  required  to complete  the \nclassification process.  Since  these  tend  to be fully  connected  layers,  the number of \nweights and computations is  prohibitively large. \n\n4.2.2  Avoidance of Nearest-Neighbor Techniques \n\nSome  popular  networks  are  sensitive  to  initial  conditions.  The  determination  of \nthe final  values of their weights is influenced by the initial values assigned  to them. \nThese  networks  require  that,  before  the  onset  of training,  the  values  of weights \nbe  randomly assigned.  Moreover,  the  classification outcomes  of these  networks  is \noften altered by changing the order in which  training samples are submitted to the \nnetwork.  Networks of this type may be unable to express their conclusions in figures \nof merit  for  all  possible  classes.  When inputs  to the  network share characteristics \nof more than one  target class,  these  networks tend  to gravitate to the classification \nthat  initially  most  closely  resembles  the  input,  for  an  \"all-or-none\"  classification. \nSTOCHASM  has  none of these drawbacks \n\n4.2.3  Noisy Data \n\nThe algorithm handles SNR's of lower-than-one and  situations where  training and \ntesting SNR's differ.  Segmentation of one  dimensional  patterns  buried  in  noise  is \ndone automatically.  Even the noise itself can be classified.  The algorithm can adapt \non-line  to changing background noise  patterns. \n\n4.3  REAL-TIME  PERFORMANCE \n\nThere is no need for  back-propagation/ gradient-descent methods to set the weights \nduring  training.  Therefore,  no iterations or recursions  are required.  Only a  single \nfeed-forward  pass of data through the network is needed  for  either training or clas(cid:173)\nsification.  Since  the number of nodes,  connections,  layers,  and  weights is  relatively \nsmall, and the algorithm is implemented in parallel, the compute time is fast enough \nto keep up with real-time in most  application domains. \n\n4.4  EXPLANATION  OF  OUTPUTS \n\nThere is  strict separation of target classification evidence  in the nodes of this  net(cid:173)\nwork.  In addition,  the  evidence  is  maintained so  that positive and negative corre(cid:173)\nlation data is separate and  easily accessable.  This enables  improved credit  (error) \nassignment that leads to more  effective classifications and  the potential for  making \navailable to the operator real-time  explanations of program behavior. \n\n\fA Neural Network for Real-Time Signal Processing \n\n255 \n\n4.5  FUTURE DIRECTIONS \n\nPrevious  versions of the algorithm dynamically created,  destroyed,  or  re-arranged \nnodes and their linkages to optimize the network, minimize computations, and elim(cid:173)\ninate  unnecessary  inputs.  This  algorithm also  employed  a  multi-level  hierarchical \ncontrol  system.  The  control  system,  on-line  and  in  real-time,  adjusted  sampling \nrates and queue  lengths,  governing when  the background  noise  template is  permit(cid:173)\nted to adapt to current noise inputs, and the rate at which it does so.  Future versions \nof the Connection Machine version will  be able  to effect  the same procedures. \n\nEfforts are now underway to: \n\n1.  Improve the temporal pattern matching capabilities. \n\n2.  Provide better heuristics for  the computation of final figures of merit from the \nmassive amount of positive and  negative correlation data resident  within  the \nhidden  layer nodes. \n\n3.  Adapt  the algorithm to radar domains where  time and spatial warping prob(cid:173)\n\nlems are prominent. \n\n4.  Simulate  more  realistic  and  complex  sonar  transients,  with  the  expectation \n\nthe algorithm will  perform better on  those  targets. \n\n5.  Apply  the algorithm to information fusion  tasks. \n\nReferences \n\nMalkoff,  D.B.,  \"The  Application of Artificial  Intelligence  to  the Handling of Real(cid:173)\nTime Sensor Based Fault Detection and Diagnosis,\"  Proceedings of the Eighth Ship \nControl Systems Symposium,  Volume 3,  Ministry of Defence,  The Hague,  pp 264-\n276.  Also  presented at the Hague,  Netherlands,  October 8,  1987. \n\nMalkoff,  D.B.,  \"A Framework for  Real-Time  Fault Detection and  Diagnosis  Using \nTemporal Data,\"  The International Journal for Artificial Intelligence in Engineering, \nVolume 2,  No.2, pp 97-111,  April 1987. \nMalkoff, D.B. and L. Cohen,  \"A Neural Network Approach to the Detection Problem \nUsing Joint Time-Frequency Distributions,\"  Proceedings of the IEEE 1990 Interna(cid:173)\ntional Conference on  Acoustics,  Speech,  and Signal Processing,  Albuquerque,  New \nMexico,  April  1990  (to appear). \n\n\f\fPART III: \nVISION \n\n\f", "award": [], "sourceid": 284, "authors": [{"given_name": "Donald", "family_name": "Malkoff", "institution": null}]}