{"title": "Mapping Classifier Systems Into Neural Networks", "book": "Advances in Neural Information Processing Systems", "page_first": 49, "page_last": 56, "abstract": null, "full_text": "49 \n\nMapping Classifier Systems \n\nInto  Neural  Networks \n\nLawrence  Davis \nBBN Laboratories \n\nBBN Systems and Technologies Corporation \n\n10 Moulton Street \n\nCambridge, MA 02238 \n\nJanuary  16,  1989 \n\nAbstract \n\nClassifier systems  are  machine  learning  systems  incotporating  a  genetic  al(cid:173)\n\ngorithm  as the learning mechanism.  Although they  respond to inputs that neural \nnetworks can respond to,  their internal  structure, representation  fonnalisms,  and \nlearning mechanisms differ marlcedly from those employed by neural network re(cid:173)\nsearchers in the same sorts of domains.  As a result, one might conclude that these \ntwo types  of machine learning fonnalisms are intrinsically different.  This is  one \nof two papers that, taken together, prove instead that classifier systems and neural \nnetworks  are  equivalent.  In this  paper, half of the  equivalence is  demonstrated \nthrough  the  description  of  a  transfonnation  procedure  that  will  map  classifier \nsystems into neural networks that  are isomotphic in behavior.  Several alterations \non  the  commonly-used paradigms  employed by  neural  networlc  researchers  are \nrequired  in  order to make  the  transfonnation  worlc.  These alterations  are  noted \nand their appropriateness is discussed.  The paper concludes with a  discussion  of \nthe practical import  of these  results,  and with comments on  their extensibility. \n\n1  Introd uction \n\nClassifier systems are  machine  learning  systems that  have  been  developed since  the \n1970s by 10hn Holland and, more recently, by other members of the genetic algorithm \nresearch  community  as  well l .  Classifier systems  are  varieties  of genetic  algorithms \n-\nalgorithms  for optimization  and learning.  Genetic  algorithms employ  techniques \ninspired by  the process of biological evolution in order to  \"evolve\" better and better \n\nIThis paper has  benefited  from  discussions  with  Wayne  Mesard,  Rich Sutton,  Ron Williams,  Stewart \nWilson, Craig Shaefer, David Montana, Gil  Syswerda and other members  of BARGAIN, the  Boston Area \nResearch Group  in Genetic Algorithms and  Inductive Networks. \n\n\f50 \n\nDavis \n\nindividuals that  are  taken to be  solutions  to  problems such  as  optimizing a function, \ntraversing  a  maze,  etc. \n(For  an  explanation  of genetic  algorithms,  the  reader  is \nreferred  to  [Goldberg  1989].)  Classifier systems  receive  messages  from  an  external \nsource as inputs and organize themselves using  a genetic algorithm so that  they  will \n\"learn\"  to  produce  responses  for  internal  use  and  for  interaction  with  the  external \nsource. \n\nThis paper is one  of two papers exploring the  question of the  fonnal relationship \nbetween classifier systems and neural networks.  As normally employed, the  two sorts \nof algorithms  are probably distinct, although a procedure for  translating the operation \nof neural networks into isomorphic classifier systems is  given in  [Belew and Gherrity \n1988].  The  technique Belew and Gherrity use does not include the conversion of the \nneural network learning procedure into the classifier system framework, and it appears \nthat the technique will not support such a conversion.  Thus, one might conjecture that \nthe two sorts of machine learning systems employ learning techniques  that cannot be \nreconciled,  although if there  were  a subsumption  relationship,  Belew and  Gherrity's \nresult  suggests  that  the  set  of classifier  systems  might  be  a  superset  of the  set  of \nneural networks. \n\nThe  reverse  conclusion is suggested by  consideration of the  inputs  that each sort \nof learning  algorithm  processes.  When  viewed  as  \"black boxes\",  both  mechanisms \nfor learning receive inputs, carry out self-modifying procedures, and produce outputs. \nThe  class  of inputs that  are  traditionally processed by classifier systems -\nthe  class \nof bit  strings  of a  fixed  length  -\nis  a  subset  of the  class  of inputs  that  have  been \ntraditionally  processed  by  neural  networks.  Thus,  it  appears  that  classifier  systems \noperate  on  a subset of the  inputs  that  neural  networks can process,  when viewed  as \nmechanisms that can  modify  their behavior. \n\nIn  fact,  both  these  impressions  are  correct.  One  can  translate  classifier systems \ninto neural networks, preserving their learning behavior,  and one  can translate neural \nnetworks  into  classifier systems,  again  preserving learning behavior.  In  order to  do \nso,  however,  some  specializations  of each  sort  of algorithm  must  be  made.  This \npaper deals  with the  translation  from  classifier systems  to  neural  networks  and with \nthose  specializations of neural networks  that  are  required in  order for  the  translation \nto  take  place.  The  reverse  translation  uses  quite  different techniques,  and is  treated \nin [Davis  1989]. \n\nThe following sections contain a description of classifier systems, a description of \nthe  transformation  operator,  discussions  of the  extensibility  of the  proof,  comments \non some issues  raised in  the  course of the proof,  and conclusions. \n\n2  Classifier  Systems \n\nA classifier system operates  in  the context of an  environment that  sends messages to \nthe  system  and  provides  it  with  reinforcement  based  on  the  behavior it displays.  A \nclassifier system has  two components -\na message  list and a population of rule-like \nentities called classifiers.  Each message on  the  message list is composed of bits, and \n\n\fMapping Classifier Systems Into Neural Networks \n\n51 \n\neach  has  a pointer to  its source  (messages  may  be  generated by  the  environment  or \nby  a classifier.)  Each classifier in  the  population of classifiers has  three  components: \na match  string  made  up  of the  characters  0,1,  and  #  (for  \"don't  care\");  a  message \nmade  up  of the  characters  0  and  1;  and  a  strength.  The  top-level  description  of a \nclassifier system  is  that  it  contains  a  population  of production  rules  that  attempt  to \nmatch some  condition  on  the  message  list  (thus  \"classifying\" some  input)  and post \ntheir message  to  the  message list, thus potentially affecting the  envirorunent or other \nclassifiers.  Reinforcement  from  the  environment is  used  by  the  classifier system  to \nmodify  the  strengths  of its  classifiers.  Periodically,  a  genetic  algorithm  is  invoked \nto  create  new  classifiers,  which  replace  certain  members  of the  classifier  set.  (For \nan  explanation of classifier systems,  their potential  as  machine learning systems, and \ntheir formal  properties,  the  reader is referred to  [Holland et al  1986].) \n\nLet us specify these processing stages more precisely.  A classifier system operates \n\nby  cycling through  a fixed list  of procedures.  In  order,  these  procedures  are: \n\nMessage  List  Processing.  1.  Clear the  message  list.  2.  Post  the  envirorunental \nmessages  to  the  message  list.  3.  Post  messages  to the  message  list  from  classifiers \nin  the  post set  of the  previous  cycle.  4.  Implement envirorunental  reinforcement by \nanalyzing the messages on the  message  list and altering the  strength of classifiers in \nthe  post  set of the  previous  cycle. \n\nForm  the  Bid  Set.  1.  Determine  which  classifiers  match  a  message  in  the \nmessage list.  A classifier matches  a message if each bit in its match field matches its \ncorresponding message bit.  A 0 matches a 0,  a 1 matches a  I, and a # matches either \nbit.  The  set  of all  matching  classifiers  forms  the  current  bid set.  2.  Implement  bid \ntaxes by  subtracting a portion  of the strength  of each  classifier c in  the  bid set.  Add \nthe  strength  taken  from  c  to  the  strength  of the  classifier  or  classifiers  that  posted \nmessages matched by  c in  the  prior step. \n\nForm the  Post  Set.  1.  If the  bid  set  is  larger than  the  maximum post set  size, \nchoose classifiers stochastically to post from the bid set, weighting them in proportion \nto the  magnitude of their bid taxes.  The set of classifiers  chosen  is  the post set. \n\nReproduction Reproduction  generally  does  not  occur on  every  cycle.  When  it \ndoes  occur,  these  steps  are  carried  out:  1.  Create  n  children  from  parents.  Use \ncrossover and/or mutation, chOOSing parents stochastically but  favoring  the strongest \nones.  (Crossover and mutation  are  two  of the  operators  used in genetic  algorithms.) \n2.  Set the  strength  of each child to equal  the  average of the  strength of that  child's \nparents.  (Note:  this  is  one  of many  ways  to  set  the  strength  of a  new  classifier. \nThe  transformation  will  work  in  analogous  ways  for  each  of them.)  3.  Remove  n \nmembers  of the  classifier population  and  add  the  n  new  children  to  the  classifier \npopulation. \n\n3  Mapping  Classifiers  Into  Classifier  Networks \n\nThe  mapping  operator  that  I  shall  describe  maps  each  classifier  into  a  classifier \nnetwork.  Each  classifier  network  has  links  to  environmental  input  units,  links  to \n\n\f52 \n\nDavis \n\nother classifier networks,  and  match,  post,  and  message units.  The  weights  on  the \nlinks  leading  to  a  match  node  and  leaving  a  post  node  are  related  to  the  fields  in \nthe match  and  message lists in  the  classifier.  An additional link is added to  provide \na  bias  term  for  the  match  node.  (Note:  it is  assumed  here  that  the  environment \nposts at most one message per cycle.  Modifications to  the transfonnation  operator to \naccommodate multiple environmental  messages  are  described in the  final  comments \nof this paper.) \n\nGiven a  classifier system CS  with n  classifiers, each matching and sending mes(cid:173)\nsages of length  m,  we  can  construct an  isomorphic  neural  network  composed of n \nclassifier networks in the  following way.  For each classifier c in CS,  we construct its \ncorresponding classifier network,  composed of n  match  nodes,  I  post  node,  and  m \nmessage  nodes.  One match node (the environmental match  node) has  links  to  inputs \nfrom  the  environment.  Each of the  other match nodes  is linked to the message  and \npost  node  of another  classifier  network.  The  reader is  referred  to  Figure  2  for  an \nexample of such a  transformation. \n\nEach match node in a classifier network has  m + 1 incoming links.  The  weights \non  the  first  m  links  are  derived  by  applying  the  following  transformation  to  the  m \nelements  of c's  match  field:  0  is  associated  with  weight  -1,  1 is  associated  with \nweight  1,  and  #  is  associated  with  weight  O.  The  weight . of the  final  link  is  set to \nm + 1 -\nl,  where  l  is the  number of links with  weight =  1.  Thus,  a  classifier with \nmatch field (1  0  #  0  1) would have  an  associated network with  weights on  the links \nleading to its match  nodes  of 1,  -1,  0,  -I,  1,  and 4.  A  classifier with match  field  (1 \n0#) would have weights  of 1,  -I, 0, and 3. \n\nThe  weights on  the  links to  each  message  node in  the  classifier network  are  set \nto  equal  the  corresponding  element  of the  classifier's  message  field.  Thus,  if the \nmessage  field  of the  classifier were  (0  1 0),  the  weights  on  the  links  leading to the \nthree  message  nodes  in  the  corresponding  classifier  network  would  be  0,  I,  and  O. \nThe weights  on  all  other links in the  classifier network are  set to  1. \n\nEach  node  in  a classifier network  uses  a threshold function  to determine its acti(cid:173)\nvation level.  Match nodes have thresholds = m + .9.  All other nodes have thresholds \n= .9.  If a node's threshold is exceeded,  the node's activation  level is set to  1.  If not, \nit is set  to  O. \n\nEach  classifier  network  has  an  associated  quantity  called  strength  that  may  be \n\naltered when  the  network is run, during the processing cycle  described below. \n\nA cycle of processing of a classifier system CS  maps onto the  following  cycle of \n\nprocessing in a set of classifier networks: \n\nMessage  List  Processing.  1.  Compute  the  activation  level  of each  message \nnode  in  CS.  2.  If the  environment supplies  reinforcement  on  this  cycle,  divide  that \nreinforcement  by  the  number  of post  nodes  that  are  currently  active,  plus  1  if the \nenvironment posted  a  message  on  the  preceding  cycle,  and  add  the  quotient  to  the \nstrength of each active post node's classifier network.  3.  If there is a message on this \ncycle  from  the environment, map it onto the first  m  environment nodes so that each \nnode associated with a 0 is off and each node associated with a 1 is on.  Tum the final \nenvironmental node on.  If there is no environmental message, turn all environmental \n\n\fMapping Classifier Systems Into Neural Networks \n\n53 \n\nnodes off. \n\nForm  the  Bid  Set.  1.  Compute  the  activation  level  of each  match  node  in \neach  classifier network.  2.  Compute  the  activation  level  of each  bid  node  in  each \nclassifier network  (the  set of classifier networks  with  an  active  bid  node  is  the  bid \nset).  3.  Subtract a  fixed  proportion  of the  strength  of each  classifier  network  cn  in \nthe bid set.  Add this amount to the strength of those networks connected to an  active \nmatch node in cn.  (Strength  given  to the  environment passes out of the  system.) \n\nForm  the  Post  Set.  1.  If the  bid set  is  larger than  the  maximum  post set  size, \nchoose networks stochastically to post from  the bid set, weighting them in proportion \nto the magnitude of their bid taxes.  The set of networks chosen is the post set.  (This \nmight  be  viewed as  a stochastic n-winners-take-all procedure). \n\nReproduction.  If this  is  a  cycle  on  which  reproduction  would  occur  in  the \nclassifier  system,  carry  out  its  analog  in  the  neural  network  in  the  following  way. \n1.  Create  n  children  from  parents.  Use crossover and/or mutation, choosing parents \nstochastically but favoring the strongest ones.  The ternary alphabet  composed of -I, \nI, and 0 is  used instead of the  classifier alphabet of 0,  1,  and  #.  After each operator \nis applied,  the final  member of the match  list is  set to  m + 1 - l.  2.  Write  over the \nweights on  the match  links  and  the  message  links  of n  classifier networks  to  match \nthe weights in  the  children.  Choose networks to be re-weighted stochastically, so that \nthe  weakest ones  are  most likely to  be  chosen.  Set  the  strength  of each re-weighted \nclassifier network to be  the  average  of the  strengths of its  parents. \n\nIt is simple  to  show  that  a  classifier network  match node  will  match  a  message \nin just  those  cases  in  which  its  associated  classifier matched a  message.  There  are \nthree  cases  to  consider.  If the  original match character was  a  #, then it  matched any \nmessage  bit.  The  corresponding link weight  is  set  to  0,  so  the  state  of the  node  it \ncomes from  will not affect the  activation of the match node it goes to.  If the original \nmatch  character  was  a  1,  then  its  message  bit  had  to  be  a  1 for  the  message  to  be \nmatched.  The corresponding  link weight is  set to  1,  and we see by inspection of the \nweight  on  the  final  link,  the  match  node  threshold,  and  the  fact  that  no  other  type \nof link has  a positive  weight,  that  every link with  weight  I  must  be  connected to  an \nactive node for the match node to be activated.  Finally, the link weight corresponding \nto a 0 is set to -1.  If any  of these links is  connected to a node that is active,  then  the \neffect is  that of turning  off a  node  connected to  a link  with  weight  1,  and  we  have \njust seen  that  this  will cause  the match  node  to  be  inactive. \n\nGiven  this  correspondence  in  matching  behavior,  one  can  verify  that  a  set  of \nclassifier  networks  associated  with  a  classifier system  has  the  following  properties: \nDuring each  cycle of processing of the  classifier system,  a classifier is  in  the  bid set \nin just those  cases in which its associated networlc has an  active bid node.  Assuming \nthat both  systems  use  the  same  randomizing  technique,  initialized in  the  same  way, \nthe  classifier is  in  the  post  set in just  those  cases  when  the  network  is  in  the  post \nset.  Finally, the parents that are chosen for reproduction are  the transform as  of those \nchosen in  the  classifier system,  and the  children produced are  the  transformations of \nthe  classifier system parents.  The two systems are  isomorphic in operation, assuming \nthat they use  the  same  random  number generator. \n\n\f54 \n\nDavis \n\nCLASSIFIER  NETWORK  1 \n\nCLASSIFIER  NETWORK  2 \n\nstrength = 49.3 \n\nstrength = 21.95 \n\nMESSAGE \nNODES \n\nTH  =  .9 \n\nPOST \nNODES \nTH  =.9 \n\nMATCH \nNODES \nTH  =  3.9 \n\n2 \n\nENVIRONMENT \nINPUT \nNODES \n\nFigure  1:  Result of  mapping a classifier system \nwitH two classifiers into a neural  network. \nClassifier  1 has  match  field (0  1 #),  message  field (1  1 0), \nand  strength  49 .3.  Classifier 2  has  match  field (1  1 #), \nmessage  field (0  1  1),  and  strength 21.95. \n\n\fMapping Classifier Systems Into Neural Networks \n\n55 \n\n4  Concluding  Comments \n\nThe  transfonnation  procedure  described  above  will  map  a  classifier  system  into  a \nneural network that operates in the same  way.  There are  several points raised by the \ntechniques used to  accomplish the  mapping.  In closing, let  us consider four of them. \nFirst, there is some excess complexity in the classifier networks as they are shown \nhere. \nIn  fact,  one  could  eliminate  all  non-environmental  match  nodes  and  their \nlinks, since one can determine whenever a classifier network is reweigh ted whether it \nmatches the  message of each other classifier network in the system.  If so,  one  could \nintroduce a link directly from  the post node of the other classifier networlc to the post \nnode  of the  new  networlc.  The  match  nodes  to  the  environment  are  necessary,  as \nlong as one cannot predict what messages the environment will post.  Message nodes \nare  necessary  as  long as messages must  be  sent  out to  the  environment.  If not,  they \nand their incoming links could be eliminated as well.  These  simplifications have not \nbeen  introduced  here  because  the  extensions  discussed  next  require  the  complexity \nof the  current  architecture. \n\nSecond, on  the  genetic algorithm side, the  classifier system considered here is an \nextremely  simple  one.  There  are  many  extensions  and  refinements  that  have  been \nused by classifier system researchers.  I believe that such refinements  can be  handled \nby  expanded  mapping  procedures  and  by  modifications  of the  architecture  of the \nclassifier networks.  To  give  an  indication  of the  way  such  modifications  would go, \nlet  us  consider two  sample  cases.  The  first  is  the  case  of an  environment that  may \nproduce multiple messages on each cycle.  To handle multiple messages, an additional \nlink must be  added to each  environmental match  node  with  weight set  to  the  match \nnode's  threshold.  This  link  will  latch  the  match  node.  An  additional  match  node \nwith  links  to  the  environment  nodes  must  be  added,  and  a  latched  counting  node \nmust  be  attached  to  it.  Given  these  two  architectural  modifications,  the  cycle  is \nmodified  as  follows:  During  the  message  matching  cycle,  a  series  of subcycles  is \ncarried out,  one  for  each  message  posted by the  environment.  In  each subcycle,  an \nenvironmental  message  is  input  and  each  environmental  match  node  computes  its \nactivation.  The  environmental  match  nodes  are  latched.,  so  that  each  will  be  active \nif it  matched  any  environmental  message.  The  count  nodes  will  record  how  many \nwere matched by each classifier network.  When bid strength'is paid from  a classifier \nnetwork  to  the  posters  of messages  that  it  matched,  the  divisor  is  the  number  of \nenvironmental  messages  matched  as  recorded  by  the  count  node,  plus  the  number \nof other messages  matched.  Finally,  when  new  weights  are  written onto a  classifier \nnetwork's  links,  they  are  written  onto  the  match  node  connected  to  the  count  node \nas  well.  A  second  sort  of complication  is  that  of pass-through  bits  -\nbits  that \nare  passed  from  a  message  that  is  matched  to  the  message  that  is  posted.  This \nsort  of mechanism  can  be  implemented  in  an  obvious  fashion  by  complicating  the \nstructure of the classifier networlc.  Similar complications are produced by considering \nmultiple-message  matching,  negation,  messages  to  effectors,  and  so  forth.  It  is  an \nopen question  whether  all  such  cases  can  be  handled by  modifying  the  architecture \nand the  mapping operator, but  I have  not  yet  found  one  that  cannot be  so  handled. \n\n\f56 \n\nDavis \n\nThird, the  classifier networks do not use the sigmoid activation functions that sup(cid:173)\n\nport  hill-c~bing techniques  such  as  back-propagation.  Further,  they  are  recurrent \nnetworks rather than strict feed-forwanl  networks.  Thus, one  might  wonder whether \nthe  fact  that  one  can  carry  out  such  transformations  should  affect  the  behavior  of \nresearchers  in  the  field.  This  point  is  one  that  is  taken  up  at  greater  length  in  the \ncompanion paper.  My conclusion there is that  several of the  techniques imported into \nthe neural network domain by the mapping appear to improve the performance of neu(cid:173)\nral networks.  These include  tracking strength in  order to  guide  the  learning process, \nusing  genetic  operators  to  modify  the  network  makeup.  and  using  population-level \nmeasurements in order to determine what aspects of a network to use  in reproduction. \nThe  reader is  referred  to  [Montana  and  Davis  1989]  for  an  example  of the  benefits \nto be  gained  by employing these  techniques. \n\nFinally,  one  might  wonder what  the  import  of this  proof is  intended  to  be.  In \nmy  view, this proof and  the  companion  proof suggest  some  exciting ways  in  which \none can  hybridize  the  learning  techniques  of each field.  One  such  approach  and its \nsuccessful application to a real-world problem is characterized in [Montana and Davis \n1989]. \n\nReferences \n\n[1]  Belew,  Richard K.  and  Michael  Gherrity,  \"Back Propagation  for  the  Classifier \n\nSystem\", in preparation. \n\n[2]  Davis,  Lawrence,  \"Mapping Neural Networks into Classifier Systems\", submit(cid:173)\n\nted to  the  1989 International  Conference on  Genetic  Algorithms. \n\n[3]  Goldberg,  David E.  Genetic Algorithms in  Search,  Optimization,  and Machine \n\nLearning,  Addison  Wesley  1989. \n\n[4]  Holland,  John  H,  Keith  J.  Holyoak,  Richard  E.  Nisbett,  and  Paul  R.  Thagard, \n\nInduction,  MIT Press,  1986. \n\n[5]  Montana,  David  J.  and  Lawrence  Davis,  \"Training  Feedforward  Neural  Net(cid:173)\n\nworks  Using  Genetic  Algorithms\",  submitted  to  the  1989  International  Joint \nConference on  Artificial  Intelligence. \n\n\f", "award": [], "sourceid": 162, "authors": [{"given_name": "Lawrence", "family_name": "Davis", "institution": null}]}