{"title": "Bach in a Box - Real-Time Harmony", "book": "Advances in Neural Information Processing Systems", "page_first": 957, "page_last": 963, "abstract": "", "full_text": "Bach in a  Box - Real-Time Harmony \n\nRandall R.  Spangler and Rodney M.  Goodman* \n\nComputation and Neural Systems \n\nCalifornia Institute of Technology,  136-93 \n\nPasadena,  CA 91125 \n\nJim Hawkinst \n88B  Milton Grove \n\nStoke Newington,  London N16 8QY,  UK \n\nAbstract \n\nWe describe a system for  learning J. S.  Bach's rules of musical har(cid:173)\nmony.  These  rules  are  learned  from  examples  and  are  expressed \nas  rule-based  neural networks.  The rules  are  then applied  in  real(cid:173)\ntime to generate new  accompanying harmony for  a  live  performer. \nReal-time  functionality  imposes  constraints  on  the  learning  and \nharmonizing processes,  including limitations on the types of infor(cid:173)\nmation the system can use  as  input and the amount of processing \nthe  system  can  perform.  We  demonstrate  algorithms  for  gener(cid:173)\nating  and  refining  musical  rules  from  examples  which  meet  these \nconstraints.  We  describe  a  method  for  including  a  priori knowl(cid:173)\nedge  into the rules  which  yields significant performance gains.  We \nthen describe  techniques  for  applying  these  rules  to generate new \nmusic  in  real-time.  We  conclude  the  paper  with  an  analysis  of \nexperimental results. \n\n1 \n\nIntroduction \n\nThe goal of this research is  the development of a system to learn musical rules from \nexamples of J.S. Bach's music, and then to apply those rules in real-time to generate \nnew  music  in a  similar  style.  These algorithms would  take as input a  melody such \n\n*rspangle@micro.caltech.edu, rogo@micro.caltech.edu \ntjhawkins@cix.compulink.co.uk \n\n\f958 \n\nR. R. Spangler; R.  M.  Goodman and J  Hawkins \n\nI~II- JIJ \n\nFigure 1:  Melody for  Chorale #1  \"Aus meines  Herzens  Grunde\" \n\nFigure 2:  J.  S.  Bach's Harmony For  Chorale #1 \n\nas Figure 1 and produce a complete harmony such as Figure 2.  Performance of this \nharmonization  in  real-time  is  a  challenging  problem.  It also  provides  insight  into \nthe nature of composing music. \n\nWe  briefly  review  the  representation  of  input  data  and  the  process  of  rule base \ngeneration.  Then we  focus  on methods of increasing the performance of rule-based \nsystems.  Finally we  present our data on learning the style of Bach. \n\n1.1  Constraints Imposed  by Real-Time Functionality \n\nA  program which  is  to provide  real-time  harmony to accompany  musicians  at live \nperformances faces  two  major constraints. \n\nFirst, the algorithms must be fast  enough to generate accompaniment without de(cid:173)\ntectable delay between the musician playing the melody and the algorithm generat(cid:173)\ning the corresponding harmony.  For  musical instrument sounds  with sharp attacks \n(plucked  and percussive  instruments,  such as  the  harp or piano),  delays  of even a \nfew  tens of milliseconds  between the start of the  melody  note  and the start of the \nharmony  notes  are  noticeable  and  distracting.  This  limits  the  complexity  of the \nalgorithm and the amount  of information it can process for  each timestep. \n\nSecond,  the  algorithms  must  base  their  output  only  on  information  from  previ(cid:173)\nous  timesteps.  This  differentiates  our  system  from  HARMONET  (Hild,  Feulnzer \nand Menzel,  1992)  which  required  knowledge  of the next note  in the future  before \ngenerating harmony  for  the current note. \n\n1.2  Advantages of a  Rule-Based  Algorithm \n\nA  rule-based  neural  network  algorithm  was  chosen  over  a  recurrent  network  or  a \nnon-linear  feed-forward  network.  Neural  networks  have  been  previously  used  for \nharmonizing  music  with some  success  (Mozer,  1991)(Todd,  1989).  However,  rule(cid:173)\nbased  algorithms  have  several  advantages  when  dealing  with  music.  Almost  all \nmusic  has  some  sort  of rhythm  and  is  tonal,  meaning  both pitch  and  duration  of \nindividual  notes  are  quantized.  This  presents  problems  in  the  use  of  continuous \nnetworks,  which  must  be overtrained to reasonably  approximate discrete  behavior. \n\n\fBach in a Box-Real-Time Harmony \n\n959 \n\nRule-based systems  are  inherently discrete,  and do not have  this problem. \n\nFurthermore  it is  very difficult  to determine  why  a  non-linear  multi-layer  network \nmakes  a  given  decision  or  to extract the  knowledge  contained  in  such  a  network. \nHowever,  it  is  straightforward  to  determine  why  a  rule-based  network  produced \na  given  result  by  examining  the  rules  which  fired.  This  aids  development  of the \nalgorithm, since  it is  easier to determine where mistakes  are being made.  It allows \ncomparison of the results to existing knowledge of music theory as shown below, and \nmay  provide  insight  into the  theory of musical  composition beyond that currently \navailable. \n\nRule-based neural networks can also be modified via segmentation to take advantage \nof additional  a priori knowledge. \n\n2  Background \n\n2.1  Representation of Input Data \n\nThe choice of input representation greatly affects the ability of a learning algorithm \nto  generate  meaningful  rules.  The  learning  and  inferencing  algorithms  presented \nhere  speak an extended  form  of the  classical  figured  bass  representation  common \nin Bach's time.  Paired with a  melody,  figured  bass provides a  sufficient  amount of \ninformation to reconstruct the harmonic content of a  piece of music. \n\nFigured  bass  has  several  characteristics  which  make  it  well-disposed  to  learning \nrules.  It is  a  symbolic  format  which  uses  a  relatively  small  alphabet  of symbols. \nIt  is  also  hierarchical - it  specifies  first  the  chord function  that is  to  be played  at \nthe current note/timestep, then the scale step to be played by the bass voice,  then \nadditional  information  as  needed  to  specify  the  alto  and  tenor  scale  steps.  This \nallows  our algorithm  to fire  sets  of rules  sequentially,  to first  determine  the chord \nfunction which should be associated with a  new  melody note, and then to use  that \nchord  function  as  an input attribute to subsequent  rulebases  which  determine  the \nbass,  alto,  and tenor scale  steps.  In this  way  we  can build  up the final  chord from \nsimpler  pieces,  each governed by a  specialized rulebase. \n\n2.2  Generation of Rulebases \n\nOur  algorithm was  trained  on a  set of 100  harmonized  Bach chorales.  These  were \ntranslated from  MIDI  format  into our  figured  bass  format  by a  preprocessing  pro(cid:173)\ngram which  segmented them into chords  at points where  any voice  changed pitch. \nChord function  was  determined  by simple  table  lookup  in a  table  of 120  common \nBach chords  based on the scale steps played by each voice  in the chord.  The algo(cid:173)\nrithm was  given information on the current timestep  (MelO-TeO),  and the previous \ntwo  timesteps  (Mell-Func2).  This  produced  a  set  of  7630  training  examples,  a \nsubset of which are shown below: \n\nMelO  FuncO  800  BaO  AIO  TeO  Mell  Funcl  801  Bal  All  Tel  Me12  Func2 \nD \nE \nF \nG \n\n82  Bl  A2  TO  E \n81  B3  AO  T2  D \n80  Bl  A2  Tl  E \n80  BO  Al  T2  F \n\nV \n17 \nIV \nV \n\nI \nV \n17 \nIV \n\n81  BO  AO  T2  C \n82  Bl  A2  TO  E \n81  B3  AO  T2  D \n80  Bl  A2  Tl  E \n\nI \nI \nV \n17 \n\n\f960 \n\nR. R. Spangler; R.  M.  Goodman and 1.  Hawkins \n\nA  rulebase  is  a  collection  of rules  which  predict  the  same  right  hand  side  (RHS) \nattribute  (for  example,  FunctionO).  All  rules  have  the  form  IF  Y=y...  THEN \nX=x.  A rule's order is  the number of terms on its left  hand side  (LHS). \n\nRules are  generated  from  examples  using  a  modified  version of the  ITRULE  algo(cid:173)\nrithm.  (Goodman et  al.,  1992)  All  possible  rules  are  considered  and  ranked  by  a \nmeasure of the information contained in each rule  defined  as \n\nJ(X; Y  = y) = p(y)  [P(x1Y)log (p;~~~)) + (I - p(xly))log (11-!;~~~)) ] \n\n(1) \n\nThis measure trades off the amount of information a rule contains against the prob(cid:173)\nability  of being  able  to  use  the rule.  Rules  are  less  valuable  if they contains  little \ninformation.  Thus, the J-measure is  low when p{xly) is  not much higher than p(x) . \nRules  are also  less  valuable  if they fire  only  rarely  (p(y)  is  small)  since  those  rules \nare  unlikely  to be useful in generalizing to new data. \n\nA  rulebase  generated  to  predict  the current chord's  function  might  start  with  the \nfollowing  rules: \n\n1.  IF  HelodyO \n\nE  THEN  FunctionO \n\nI \n\n2.  IF  Function1 \n\nAND  Helody1 \nAND  HelodyO \n\nV  THEN  FunctionO  V7 \nD \nD \n\np(corr)  J-meas \n0.095 \n0.621 \n\n0.624 \n\n0.051 \n\n3.  IF  Function1 \n\nAND  HelodyO \n\nV  THEN  FunctionO  V7 \nD \n\n0.662 \n\n0.049 \n\n2.3 \n\nInferencing Using Rulebases \n\nRule based nets are  a  form  of probabilistic  graph model.  When a  rulebase  is  used \nto  infer  a  value,  each  rule  in  the  rule base  is  checked  in  order  of  decreasing  rule \nJ-measure.  A rule can fire if it has not been inhibited and all the clauses on its LHS \nare true.  When a  rule  fires,  its weight  is  added to the weight  of the value  which  it \npredicts,  After all rules  have  had a  chance  to fire,  the result is  an array of weights \nfor  all  predicted values. \n\n2.4  Process of Harmonizing a  Melody \n\nInput is received a note at a time as a musician plays a melody on a MIDI keyboard. \nThe algorithm initially knows the current melody note and the data for the last two \ntimesteps.  The system first  uses  a  rule base  to determine the  chord function  which \nshould be played for  the current melody note.  For example, given the melody note \n\"e\" , \"it  might playa chord function  \"IV\", corresponding to an F -Major chord.  The \nprogram  then  uses  additional  rulebases  to  specify  how  the  chord  will  be  voiced. \nIn  the  example,  the  bass,  alto,  and  tenor  notes  might  be  set  to  \"BO\",  \"AI\",  and \n\"T2\" , corresponding to the notes  \"F\",  \"A\", and  \"e\". The harmony notes are then \nconverted to MIDI data and sent to a synthesizer, which plays them in real-time to \naccompany the melody. \n\n\fBach in a Box-Real-Time Harmony \n\n961 \n\n3 \n\nImprovement  of Rulebases \n\nThe J-measure is a good measure for determining the information-theoretic worth of \nrules.  However,  it is  unable to take into account any additional  a priori knowledge \nabout the  nature of the problem - for  example,  that harmony  rules  which  use  the \ncurrent  melody  note  as  input  are  more  desirable  because  they  avoid  dissonance \nbetween the melody and harmony. \n\n3.1  Segmentation \n\nA priori knowledge of this nature is incorporated by segmenting rulebases into more(cid:173)\nand less-desirable rules based on the presence or absence of a desired LHS attribute \nsuch as the current melody note (MelodyO).  Rules lacking the attribute are removed \nfrom  the  primary  set  of rules  and  placed  in a  second  \"fallback\"  set.  Only  in  the \nevent that no primary rules are able to fire  is the secondary set allowed to fire.  This \ngives greater impact to the primary rules  (since they are used first)  without the loss \nof domain size  (since  the less  desirable rules are  not actually deleted). \n\nRulebase  segmentation  provides  substantial improvements  in  the  speed  of the  al(cid:173)\ngorithm  in  addition  to  improving  its  inferencing  ability.  When  an  unsegmented \nrule base is fired,  the algorithm has to compare the current input data with the LHS \nof every  rule  in  the  rulebase.  However,  processing  for  a  segmented  rulebase  stops \nafter  the  first  segment  which  fires  a  rule  on  the  input  data.  The  algorithm  does \nnot need to spend time examining rules in lower-priority segments of that rulebase. \nThis increase in efficiency allows segmented rule bases to contain more rules without \nimpacting  performance.  The  greater  number  of rules  provides  a  richer  and  more \nrobust knowledge  base for  generating harmony. \n\n3.2  Realtime Dependency Pruning \n\nWhen rules are used to infer a value, the rules weights are summed to generate prob(cid:173)\nabilities.  This requires that all rules which are allowed  to fire  must be independent \nof one  another.  Otherwise,  one  good rule  could  be  overwhelmed  by the  combined \nweight  of twenty  mediocre  but  virtually  identical  rules.  To  prevent  this  problem, \neach segment of a rulebase is  analyzed to determine which rules are dependent with \nother  rules  in  the same segment.  Two  rules  are  considered  dependent  if they fire \ntogether on more  than half the training examples where  either rule  fires. \n\nFor each rule, the algorithm maintains a list of lower rank rules which are dependent \nwith the  rule.  This list is  used  in real-time  dependency  pruning.  Whenever  a  rule \nfires  on a given input, all rules dependent on it are inhibited for  the duration of the \ninput.  This ensures that all rules which are able to fire for an input are independent. \n\n3.3  Conflict  Resolution \n\nWhen multiple rules fire  and predict different values, an algorithm must be used to \nresolve  the conflict.  Simply picking the value  with the  highest  weight,  while  most \nlikely  to  be correct,  leads  to monotonous  music  since  a  given  melody  then always \nproduces the same  harmony. \n\nTo provide a  more varied harmony, our system exponentiates the  accumulated rule \n\n\f962 \n\nR.  R.  Spangler, R.  M  Goodman and J  Hawkins \n\nFunctionO  MelodyO,  Functionl,  Function2 \n\nRHS \n\nSopranoO \nBassO \n\nAltoO \n\nTenorO \n\nTable  1:  Rulebase  Segments \nREQUIRED  LHS  FOR SEGMENT  RULES \nllO \n380 \n346 \n74 \n125 \n182 \n267 \n533 \n52 \n164 \n115 \n\nMelodyO,Functionl \nMelodyO \nMelodyO,  FunctionO \nFunctionO,  SopranoO \n(none) \nSopranoO,  BassO \n(none) \nSopranoO,  BassO,  AltoO.  FunctionO \nSopranoO,  Bas80,  AltoO \n(none) \n\nTable  2:  Rulebase  Performance \nRULEBASE \n\nRHS \n\nFunctionO \n\nSopranoO \nBas80 \n\nAltoO \n\nTenorO \n\nun8egmented \nsegmented \nunsegmented  #  2 \nun8egmented \nunsegmented \n8egmented \nunsegmented  #2 \nun8egmented \nsegmented \nunsegmented  #2 \nun8egmented \nsegmented \nunsegmented  #2 \n\nRULES \n1825 \n816 \n428 \n74 \n307 \n307 \n162 \n800 \n800 \n275 \n331 \n331 \n180 \n\nAVG  EVAL  CORRECT \n55% \n56% \n50% \n95% \n70% \n70% \n65% \n63% \n63% \n59% \n73% \n74% \n67% \n\n1825 \n428 \n428 \n74 \n307 \n162 \n162 \n800 \n275 \n275 \n331 \n180 \n180 \n\nweights  for  the  possible  outcomes  to produce probabilities for  each value,  and the \nfinal  outcome is chosen randomly based on those probabilities.  It is because we  use \nthe accumulated rule  weights  to determine these  probabilities that all  rules  which \nare allowed  to fire  must be independent of each other. \n\nIf no  rules at all fire,  the system uses  a  first-order  Bayes  classifier  to determine the \nRlIS  value  based  on  the  current  melody  note.  This  ensures  that  the  system  will \nalways return an outcome compatible with the melody. \n\n4  Results \n\nRulebases  were  generated  for  each  attribute.  Up  to  2048  rules  were  kept  in  each \nrule base.  Rules  were  retained  if they  were  correct  at  least  30%  of the  time  they \nfired,  and had a J-measure greater than 0.001.  The rulebases were then segmented. \n\nThese rulebases  were tested on 742  examples derived  from  27  chorales  not  used  in \nthe  training  set.  The  number  of examples  correctly  inferenced  is  shown  for  each \nrule base  before and after segmentation.  Also  shown is  the average  number of rules \nevaluated per test example; the speed of inferencing is proportional to this number. \n\nTo  determine  whether  segmentation  was  in  effect  only  removing  lower  J-measure \nrules,  we  removed  low-order  rules  from  the  unsegmented  rule bases  until  they  had \nthe same average  number  of rules  evaluated as  the segmented rule bases. \n\nIn all  cases,  segmenting the rulebases  reduced  the  average  rules  fired  per example \nwithout  lowering  the accuracy  of the  rule bases  (in  some  cases,  segmentation  even \nincreased accuracy).  Speed gains from segmentation ranged from 80% for TenorO up \nto 320% for FunctionO.  In comparison, simply reducing the size of the unsegmented \n\n\fBach in a Box-Real-Time Harmony \n\n963 \n\nrulebase  to  match  the  speed  of  the  segmented  rulebase  reduced  the  number  of \ncorrectly inferred examples by 4%  to 6%. \n\nThe generated rules for harmony have a great deal of similarity to accepted harmonic \ntransitions (Ottman, 1989).  For example, high-priority rules specify common chord \ntransitions such as  V-V7-I  (a classic way  to end a  piece  of music). \n\n5  Remarks \n\nThe system described in this paper meets the basic objectives described in Section 1. \nIt learns harmony rules from examples of the music of J.S. Bach.  The system is then \nable  to harmonize  melodies  in real-time.  The generated harmonies  are  sometimes \nsurprising  (such  as  the diminished  7th chord  near  the  end  of  \"Happy  Birthday\"), \nyet  are consistent with Bach harmony. \n\n1\\ \n\nI \n\n.. \n\nI \n\nI \n\nI \n\nFigure 3:  Algorithm's Bach-Like Harmony for  \"Happy Birthday\" \n\nRulebase segmentation is  an effective  method for  incorporating  a priori knowledge \ninto learned rulebases.  It can provides significant speed increases over unsegmented \nrule bases with no loss  of accuracy. \n\nAcknowledgements \n\nRandall R.  Spangler  is  supported in part by an NSF fellowship. \n\nReferences \n\nJ. Bach (Ed.:  A. Riemenschneider)  (1941) 371 Harmonized Chorales and 96 Chorale \nMelodies.  Milwaukee,  WI:  G.  Schirmer. \n\nH.  Hild,  J. Feulner &  W.  Menzel.  (1992)  HARMONET:  A  Neural Net for  Harmo(cid:173)\nnizing  Chorales in the Style of J.  S.  Bach.  In J.  Moody  (ed.),  Advances  in Neural \nInformation Processing  Systems 4,267-274.  San Mateo,  CA:  Morgan Kaufmann. \n\nM.  Mozer,  T. Soukup.  {1991}  Connectionist Music  Composition Based on Melodic \nand  Stylistic  Constraints.  In R.  Lippmann  (ed.),  Advances  in  Neural  Information \nProcessing  Systems 3.  San Mateo,  CA:  Morgan Kaufmann. \n\nP. Todd.  (1989) A Connectionist Approach to Algorithmic Composition.  Computer \nMusic  Joumal13(4}:27-43. \n\nR.  Goodman,  P.  Smyth, C.  Higgins,  J. Miller.  {1992}  Rule-Based  Neural Networks \nfor  Classification and Probability Estimation.  Neural  Computation 4(6}:781-804. \n\nR.  Ottman.  (1989)  Elementary Harmony.  Englewood Cliffs,  NJ:  Prentice Hall. \n\n\f", "award": [], "sourceid": 1470, "authors": [{"given_name": "Randall", "family_name": "Spangler", "institution": null}, {"given_name": "Rodney", "family_name": "Goodman", "institution": null}, {"given_name": "Jim", "family_name": "Hawkins", "institution": null}]}