{"title": "A Boundary Hunting Radial Basis Function Classifier which Allocates Centers Constructively", "book": "Advances in Neural Information Processing Systems", "page_first": 139, "page_last": 146, "abstract": null, "full_text": "A Boundary Hunting Radial Basis Function \n\nClassifier Which Allocates Centers \n\nConstructively \n\nEric I. Chang and Richard P. Lippmann \n\nMIT Lincoln Laboratory \n\nLexington, MA02173-0073, USA \n\nAbstract \n\nA  new  boundary  hunting  radial  basis  function  (BH-RBF)  classifier \nwhich  allocates  RBF  centers  constructively  near  class  boundaries  is \ndescribed.  This  classifier creates complex  decision  boundaries  only  in \nregions  where  confusions  occur  and  corresponding  RBF  outputs  are \nsimilar.  A  predicted  square  error  measure  is  used  to  determine  how \nmany centers to add and to determine when to stop adding centers. Two \nexperiments are presented which demonstrate the advantages of the BH(cid:173)\nRBF classifier.  One uses artificial data with  two classes and  two  input \nfeatures  where each class contains  four clusters but only  one cluster is \nnear a decision region boundary. The other uses a large seismic database \nwith  seven  classes and  14  input features.  In  both  experiments  the  BH(cid:173)\nRBF classifier provides a lower error rate  with  fewer  centers  than  are \nrequired  by  more  conventional  RBF,  Gaussian  mixture,  or  MLP \nclassifiers. \n\n1 \n\nINTRODUCTION \n\nRadial  basis  function  (RBF)  classifiers  have  been  successfully  applied to many  pattern \nclassification problems (Broomhead,  1988, Ng,  1991). These classifiers have the advan(cid:173)\ntages  of short  training  times  and  high  classification  accuracy.  In  addition,  RBF outputs \nestimate minimum-error Bayesian a posteriori probabilities  (Richard,  1991). Performing \nclassification  with  RBF outputs  requires  selecting  the  output which  is  highest  for  each \ninput. In regions where one class dominates, the Bayesian a posteriori probability for that \nclass  will  be uniformly  \"high\"  and  near  1.0.  Detailed modeling  of the  variation  of the \nBayesian a posteriori probability in these regions is not necessary for classification. Only \n\n139 \n\n\f140 \n\nChang and Lippmann \n\nat the boundary between different classes is accurate estimation of the Bayesian a posteri(cid:173)\nori probability necessary for high classification accuracy. If the boundary between differ(cid:173)\nent classes can be located in  the input space, RBF centers can be judiciously allocated in \nthose  regions  without wasting  RBF  centers  in  regions  where accurate estimation  of the \nBayesian a posteriori probability does not improve classification perfonnance. \n\nIn  general,  having  more RBF centers allows  better approximation  of the  desired output. \nWhile training a RBF classifier,  the number of RBF centers must be selected.  The tradi(cid:173)\ntional approach has been to randomly choose patterns from the training set as centers, or to \nperfonn K-means clustering on the data and then to use these centers as the RBF centers. \nFrequently  the correct number of centers to use is not known a priori and the number of \ncenters has to be tuned. Also, with K-means clustering, the centers are distributed without \nconsidering  their usefulness in classification. In contrast, a constructive approach to add(cid:173)\ning  RBF  centers  based on modeling  Bayesian  a posteriori  probabilities accurately  only \nnear class boundaries provides good perfonnance with  fewer centers than are required to \nseparately model class PDF's. \n\nMany algorithms have been proposed for constructively building up the structure of a RBF \nnetwork (Mel,  1991). However. the algorithms proposed have all been designed for train(cid:173)\ning  a RBF network to perfonn function mapping. For mapping  tasks, accuracy is impor(cid:173)\ntant  throughout  the  input  region  and  the  mean  squared  error  is  the  criterion  that  is \nminimized. In classification tasks, only boundaries between different classes are important \nand the overall mean squared error is not as important as the error in class boundaries. \n\n2  ALGORITHM DESCRIPTION \n\nA block diagram of a new boundary hunting  RBF (BH-RBF) classifier that adds centers \nconstructively near class boundaries is presented in Figure 1. A simple unimodal Gaussian \nclassifier is first fonned by clustering the training patterns from a randomly selected class \nand assigning a center to that class. The confusion matrix generated by using  this simple \nclassifier is then examined to determine the pair of classes A and B,  which have the most \nmutual  confusion.  Training  patterns  that  are  close  to  the  boundary  between  these  two \nclasses are detennined by looking at the outputs of the RBF classifier. Boundary patterns \n\nONE RBF \nCENTER \n\nINITIAL RBF \nNETWORK \n\nADD NEW RBF CENTERS TO \nCLASS PAIR RESPONSIBLE \n\nFOR MOST ERRORS & OVERLAP' \n\nCALCULATE \n\nPREDICTED SQUARED \n\nERROR SCORE \n\nINTERMEDIATE \nRBF NETWORKS \n\nFigure 1:  Block Diagram of Training of BH-RBF Network \n\nFINAL NETWORK \n\n\fA Boundary Hunting Radial  Basis  Function Classifier (Allocates Centers Constructively) \n\n141 \n\nwhich  produce  similar \"high\"  outputs  for  both  classes  that are  different  by  less  than  a \n\"closecall\" threshold are used to produce new cluster centers. \n\nFigure 2  shows  RBF outputs corresponding  to class A  and  B  as  the  input  varies over a \nsmall range. This figure illustrates how network outputs are used to determine the \"close(cid:173)\ncall\" region between classes. Network outputs are high in regions dominated by a particu(cid:173)\nlar class  and  therefore  these regions are outside  the  boundary between  different classes. \nNetworlc outputs are close in the region where  the absolute difference of the two highest \nnetwork outputs is less than the closecall threshold. Training patterns which fall  into this \nclosecall region plus all the points that are misclassified as the other class in the class pair \nare considered to be points in  the boundary.  For example,  a  pattern  in  class A which  is \nmisclassified as class B would be considered to be in the boundary between class A and B. \nOn  the  other hand,  a  pattern  in  class A which  is  misclassified as  class  C  would  not be \nplaced in the boundary between class A and B. \n\n1 \n\nFCA) \n\n0.9 \n0.8  CLASS  A \n\n=>  0.7 \nc. =>  0.6 \n0 \n~ 0.5 \n~ 0.4 \nZ  0.3 \n\nCLOSECALL \nTHRESHOLD \n\n0.2 \n\n0.1 \n\n0 \n\n-3 \n\n-2 \n\n-1 \n\nCLOSECALL \n\nREGION \n\n~  \u2022 \n\n0 \n\nF(B) \n\nCLASS  B \n\n1 \n\n2 \n\n3 \n\nFigure 2:  Using the Network Output to Determine Closecall Regions \n\nINPUT \n\nAfter the patterns which belong  in  the boundary are determined,  clustering is performed \nseparately  on  boundary  patterns  from  different classes  using  K-means  clustering  and  a \nnumber of centers ranging  from  zero to a  preset maximum number of centers.  After the \ncenters are  found,  new RBF classifiers are trained using the new  sets of centers plus the \noriginal set of centers. The combined set of centers that provides the best performance is \nsaved and  the  cycle repeats  again by fmding  the  next class  pair which  accounts  for  the \nmost remaining confusions. Overfitting by adding too many centers at a time is avoided by \nusing  the predicted squared error (PSE) as the criterion for choosing new centers (Barron, \n1984): \n\nCxa2 \n\nPSE=RMS+-(cid:173)\nN \n\n\f142 \n\nChang and Lippmann \n\nIn this equation, RMS is  the root mean squared error on the training set, (12  estimates the \nvariance of the error,  C is the total number of centers in the RBF classifier, and N is the \ntotal number of patterns in  the training  set.  The error variance  (12  is  selected empirically \nusing  left-out evaluation data.  Different values of cr2  are tried and  the  value  which  pro(cid:173)\nvides the best performance on the evaluation data is chosen. On each cycle, different num(cid:173)\nber of centers are tried for each class of the selected class pair and the PSE is used to select \nthe best subset of centers. The best PSE on each cycle is used to determine when training \nshould be stopped to prevent overfitting. Training  stops after the  PSE has not decreased \nfor five consecuti ve cycles. \n\n3  EXPERIMENTAL RESULTS \n\nTwo experiments were performed using  the new BH-RBF classifier, a more conventional \nRBF classifier, a Gaussian mixture classifier (Ng,  1991), and aMLP classifier. Five regu(cid:173)\nlar RBF classifiers (RBF) were trained by asSigning  1, 2,  3,4, or 5 centers to each  class. \nSimilarly, five Gaussian mixture classifiers (GMIX) were trained with 1,2,3,4, or 5 cen(cid:173)\nters in each class. The means of each center were trained individually using K-means clus(cid:173)\ntering  to  find  the  centers  for patterns  from  each  class.  The  diagonal  covariance of each \ncenter was set using all the patterns that were assigned to a cluster during the last pass of \nK-means clustering. The structure of the regular RBF classifier and the Gaussian mixture \nclassifier  are  identical  when  the  number  of centers  are  the  same.  The  only  difference \nbetween the classifiers is the method used to  train parameters. \n\nMLP classifiers  were  trained for  10  independent trials for each data set.  The  number of \nhidden nodes was varied from  2 to 30 in increments of 2. The goal of the experiment was \nto explore the relationship between the complexity of the  classifier and the classification \naccuracy of the classifier. Training was stopped using cross validation to avoid overfitting. \n\n3.1  FOUR-CLUSTER DATABASE \n\nThe flfst problem is an artificial data set designed to illustrate the difference between BH(cid:173)\nRBF and other classifiers. There are two classes, each class consist of one large Gaussian \ncluster with  700 random points and three smaller clusters with  100 points each. Figure 3 \nshows the distribution of the data and the ideal decision boundary if the actual centers and \nvariances are used to train a Bayesian minimum error classifier. There were 2000 training \npatterns,  2000  evaluation  patterns,  and  2000  test  patterns.  The  BH -RBF  classifier  was \ntrained with the closecall threshold set to 0.75,  (12 set to 0.5, and a maximum of two extra \ncenters per class at between each pair of classes. The theoretically optimal Bayesian clas(cid:173)\nsifier for this database provides the error rate of 1.95% on the test set. This optimal Baye(cid:173)\nsian classifier is obtained using the actual centers, variances, and a priori probability used \nto generate the data in a Gaussian mixture classifier. In a real classification task, these cen(cid:173)\nter parameters are not known and have to be estimated from training data. \n\nFigure 4 shows the testing error rate of the three different classifiers. The BH-RBF classi(cid:173)\nfier was able to achieve 2.35% error rate with only 5 centers and the error rate gradually \ndecreased to 2.15% with  15 centers. The BH-RBF classifier performed well with few cen(cid:173)\nters because it allocated these centers near the boundary between the two classes. On the \nother hand, the perfonnance of the RBF classifier and the Gaussian mixture classifier was \nworse with few  centers. These classifiers perfonned worse because they allocated centers \n\n\fA Boundary  Hunting Radial  Basis  Function Classifier (Allocates Centers Constructively) \n\n143 \n\n\u00ae \n\n\u00ae \n\n60 \n\n50 \n\n40 \n\n30 \n\nY \n\n20 \n\no \n\nFigure 3:  The Artificially Generated Four-Cluster Problem \n\nX \n\n15 \n\n10 \n\nex: \n!! ex: \n\nw \nH \n\n5 \n\no \n\no \n\n~ \n\n!~ \n.  . \n/\\RBF \n, \n, \n0\\ \n,- . \nJ  /, l \n. \\ \n\nGMIX \n\n\\  t=:. \n,. .. ,-\n\u00b7t \n\n. ....  /'0. \n\n.\".'  I\u00b7! \n....... \n\n\"-\n\n. \n\n'--_~=--\";\"_\"'~\":':':\"':'::'.'::._:,:,,,,:,:,,,~, .. :=s-::. \u2022\u2022. :.;,:' -=.':.:.!'.~r::.;'\u00b7 \u2022\u2022 ' \u2022\u2022 11:'1 \n\n.... .. .... I\n\nBH-RBF \n\nI \u2022 . -\n\n5 \n\n15 \nNUMBER OF CENTERS \n\n10 \n\n20 \n\nFigure 4:  Testing Error Rate Of The BH-RBF Classifier, The Gaussian Mixture \n\nClassifier, And The Regular RBF Classifier On The Four-Cluster Problem. \n\nin regions that had many patterns. The training algorithm did not distinguish between pat(cid:173)\nterns that are easily confusable between classes (Le.  near the class boundary) and patterns \nthat clearly belong in a given class.  Furthermore, adding  more centers did  not monotoni-\n\n\f144 \n\nChang and Lippmann \n\ncally decrease the error rate. For example, the RBF classifier had 5% error using two cen(cid:173)\nters, but when the number of centers was increased to four,  the error rate jumped to  11 %. \nOnly until the number of centers increased above 14 did the RBF classifier and the Gauss(cid:173)\nian mixture classifier's error rates converge. The RBF and the Gaussian mixture classifiers \nperformed poorly with few  centers because the centers were concentrated away from  the \ndecision  boundary  due  to  the  high  concentration  of data  far  away  from  the  boundary. \nThus,  there  weren't enough centers to model the  decision boundary accurately.  The BH(cid:173)\nRBF classifier added centers near the  boundary and thus  was  able  to define an  accurate \nboundary with fewer centers. \n\nFigure 5 presents the results from training MLP classifiers on the same data set using dif(cid:173)\nferent numbers of hidden nodes. The learning rate  was set to 0.001, the momentum term \nwas set to 0.6, and each classifier was trained for  100 epochs. The error rate on a left out \nevaluation set was checked to assure that the net had not overfitted the  training data.  As \nthe  number  of hidden  nodes  increased,  the  MLP classifier generally  performed  better. \nHowever,  the testing error rate  did not decrease monotonically  as  the  number of hidden \nnodes  increased.  Furthermore,  the  random  initial  condition  set  by  the  different random \nseeds  affected the classification error rate of each  classifier.  In  comparison,  the training \nalgorithms used for BH -RBF, RBF, and GMIX classifiers do not exhibit such sensitivity \nto initial conditions. \n\n15 \n\n10 \n\na: o \na: \na: w \n~  5 \n\no \n2 \n\nMAX \n\nMIN \n\n4 \n\n6 \n\n8 \n\n10  12  14  16  18  20  22  24  26  28 \nNUMBER OF HIDDEN NODES \n\nFigure 5:  Testing Error Rate Of The MLP Classifiers On The Four-Cluster Problem \n\n3.2  SEISMIC DATABASE \n\nThe second problem consists of data for classification of seismic events. The input consist \nof 14 continuous and binary measurements derived from seismic waveform signals. These \nfeatures are used to classify a wavefonn as belonging to one of 7 classes which represent \ndifferent seismic phases. There were 3038 training, 3033 evaluation, and 3034 testing pat-\n\n\fA Boundary Hunting Radial  Basis Function  Classifier (Allocates  Centers Constructively) \n\n145 \n\n20 \n\n15 \n\nc: \n~10 \n\nLLJ \n~ \n\n5 \n\n\u2022 \nIII\u00b7I~ \u2022 \n--~--~~ \n\n-\n\n.. \n\n\u2022  - - . .  \n\nGMIX \n\n---\n\nBH-RBF \n\nI \u2022\u2022\u2022\u2022\u2022 If \u2022\u2022\u2022\u2022 If \u2022\u2022\u2022 \"., .................. I ........... I  \u2022\u2022\u2022\u2022 II I  \u2022 \n\n---\n\n.--\n\nRBF \n\no ~------~----~~~~--~~~~~~~~~~--~~ \no \n40 \n\n30 \n\n10 \n\n35 \n\n5 \n\n15 \nNUMBER OF CENTERS \n\n25 \n\n20 \n\nFigure 6:  Error Rate Comparison Between The BH -RBF Classifier, The Regular \nRBF Classifier, And The Gaussian Mixture Classifier On The Seismic Problem \n\nterns.  Once again, the number of centers per class was varied from  1 to  5 for the regular \nRBF  classifier  and  the  Gaussian  mixture  classifier,  while  the  BH-RBF  classifier  was \nstarted with  1 center in the frrst class and then more centers were automaticallY assigned. \nThe BH-RBF classifier was  trained with the closecall threshold set to 0.75,  (52 set to 0.5, \nand a maximum of one extra center per class at each boundary. The parameters were cho(cid:173)\nsen according to the performance of the classifier on the left-out evaluation data. For this \nproblem,  the closecall threshold and (52  turned out to be the same as the ones used in the \nfour-cluster problem. \n\nFigure 6 shows the error rate on the testing patterns for all three classifiers. The BH-RBF \nclassifier clearly performed better than  the regular RBF classifier and the Gaussian mix(cid:173)\nture  classifier.  The BH-RBF classifier added centers only at  the  boundary region  where \nthey improved discrimination. Also, the diagonal covariance of the added centers are more \nlocal in  their influence and  can  improve discrimination of a particular boundary without \naffecting other decision region boundaries. \n\nMLP classifiers were also trained on this data set with the number of hidden nodes varying \nfrom  2 to 32 in increments of 2.  The learning  rate was set to 0.001, the momentum term \nwas set to 0.6, and each classifier was trained for  100 epochs. The classification error rate \non  the  left-out evaluation set showed that the network had  not overfitted on  the  training \ndata. Once more, the MLP classifiers exhibited great sensitivity to initial conditions, espe(cid:173)\ncially  when  the  number of hidden  nodes  were small.  Also,  for  this high  dimensionality \nclassification task, even the best performance of the MLP classifier (15.5%) did not match \nthe  best performance  of the  BH-RBF classifier.  This  result  suggests  that  for  this  high \n\n\f146 \n\nChang and Lippmann \n\ndimensionality data, the radially symmetric boundaries fonned with local basis functions \nsuch as the RBF classifier are more appropriate than the ridge-like boundaries formed with \nthe MLP classifier. \n\n4  CONCLUSION \n\nA new boundary-hunting RBF classifier was developed which adds RBF centers construc(cid:173)\ntively  near boundaries of classes which produce classification confusions. Experimental \nresults from two problems differing in input dimension, number of classes, and difficulty \nshow that the BH-RBF classifier performed better than traditional training algorithms used \nfor RBF, Gaussian mixture, and MLP classifiers. Experiments have also been conducted \non other problems such as Peterson and Barney's vowel database and the disjoint database \nused  by  Ng  (Peterson,  1952,  Ng,  1990).  In  all  experiments,  the  BH-RBF  constructive \nalgorithm  performed  at  least as  well  as  the  traditional  RBF  training  algorithm.  These \nresults, and the experiments described above, confirm the hypothesis that better discrimi(cid:173)\nnation  performance  can  be  achieved  by  training  a  classifier  to  perform  discrimination \ninstead of probability density function estimation. \n\nAcknowledgments \n\nThis work was supported by DARPA. The views expressed are those of the authors and do \nnot reflect the official policy or position of the U.S. Government. Experiments were con(cid:173)\nducted using LNKnet, a general purpose classifier program developed at Lincoln Labora(cid:173)\ntory by Richard Lippmann, Dave Nation, and Linda Kukolich. \n\nReferences \n\nG. E. Peterson and H. L. Barney. (1952) Control Methods Used in a Study of Vowels.  The \nJournal of the Acoustical Society of America 24:2, 175-84. \n\nA. Barron. (1984) Predicted squared error: a criterion for automatic model selection.  In S. \nFarlow, Editor. Self-Organizing Methods in Modeling. New York, Marcel Dekker. \n\nD.  S. Broomhead and D. Lowe. (1988) Radial Basis Functions,  multi-variable functional \ninterpolation  and  adaptive  networks.  Technical Report RSRE  Memorandum  No.  4148, \nRoyal Speech and Radar Establishment, Malvern, Worcester, Great Britain. \n\nB. W. Mel and S.  M. Omohundro. (1991) How Receptive Field Parameters Affect Neural \nLearning. In R.  Lippmann, J. Moody and D.  Touretzky (Eds.), Advances in Neural Infor(cid:173)\nmation Processing Systems 3,  1991. San Mateo, CA: Morgan Kaufman. \n\nK.  Ng  and R.  Lippmann.  (1991) A Comparative Study of the Practical Characteristics of \nNeural Networks and Conventional Pattern Classifiers. In R. Lippmann, 1. Moody and D. \nTouretzky  (Eds.),  Advances  in  Neural  Information  Processing  Systems  3,  1991.  San \nMateo, CA: Morgan Kaufman. \n\nM.D. Richard and R.  P.  Lippmann. (1991) Neural Network Classifier Estimates Bayesian \na posteriori Probabilities. Neural Computation, Volume 3, Number 4. \n\n\f", "award": [], "sourceid": 715, "authors": [{"given_name": "Eric", "family_name": "Chang", "institution": null}, {"given_name": "Richard", "family_name": "Lippmann", "institution": null}]}