{"title": "Harmonising Chorales by Probabilistic Inference", "book": "Advances in Neural Information Processing Systems", "page_first": 25, "page_last": 32, "abstract": null, "full_text": "Harmonising Chorales by Probabilistic Inference\n\n\n\n                     Moray Allan and Christopher K. I. Williams\n                      School of Informatics, University of Edinburgh\n                                   Edinburgh EH1 2QL\n           moray.allan@ed.ac.uk, c.k.i.williams@ed.ac.uk\n\n\n\n\n                                        Abstract\n         We describe how we used a data set of chorale harmonisations composed\n         by Johann Sebastian Bach to train Hidden Markov Models. Using a prob-\n         abilistic framework allows us to create a harmonisation system which\n         learns from examples, and which can compose new harmonisations. We\n         make a quantitative comparison of our system's harmonisation perfor-\n         mance against simpler models, and provide example harmonisations.\n\n\n\n1    Introduction\n\nChorale harmonisation is a traditional part of the theoretical education of Western classical\nmusicians. Given a melody, the task is to create three further lines of music which will\nsound pleasant when played simultaneously with the original melody. A good chorale\nharmonisation will show an understanding of the basic `rules' of harmonisation, which\ncodify the aesthetic preferences of the style. Here we approach chorale harmonisation as\na machine learning task, in a probabilistic framework. We use example harmonisations\nto build a model of harmonic processes. This model can then be used to compose novel\nharmonisations.\n\nSection 2 below gives an overview of the musical background to chorale harmonisation.\nSection 3 explains how we can create a harmonisation system using Hidden Markov Mod-\nels. Section 4 examines the system's performance quantitatively and provides example\nharmonisations generated by the system. In section 5 we compare our system to related\nwork, and in section 6 we suggest some possible enhancements.\n\n\n2    Musical Background\n\nSince the sixteenth century, the music of the Lutheran church had been centred on the\n`chorale'. Chorales were hymns, poetic words set to music: a famous early example is\nMartin Luther's \"Ein' feste Burg ist unser Gott\". At first chorales had only relatively sim-\nple melodic lines, but soon composers began to arrange more complex music to accompany\nthe original tunes. In the pieces by Bach which we use here, the chorale tune is taken gen-\nerally unchanged in the highest voice, and three other musical parts are created alongside\nit, supporting it and each other. By the eighteenth century, a complex system of rules had\ndeveloped, dictating what combinations of notes should be played at the same time or fol-\nlowing previous notes. The added lines of music should not fit too easily with the melody,\nbut should not clash with it too much either. Dissonance can improve the music, if it is\nresolved into a pleasant consonance.\n\n\f\n       Figure 1: Hidden state representations (a) for harmonisation, (b) for ornamentation.\n\n\nThe training and test chorales used here are divided into two sets: one for chorales in\n`major' keys, and one for chorales in `minor' keys. Major and minor keys are based around\ndifferent sets of notes, and musical lines in major and minor keys behave differently.\n\nThe representation we use to model harmonisations divides up chorales into discrete time-\nsteps according to the regular beat underlying their musical rhythm. At each time-step we\nrepresent the notes in the various musical parts by counting how far apart they are in terms\nof all the possible `semitone' notes.\n\n\n3       Harmonisation Model\n\n3.1      HMM for Harmonisation\n\nWe construct a Hidden Markov model in which the visible states are melody notes and the\nhidden states are chords. A sequence of observed events makes up a melody line, and a\nsequence of hidden events makes up a possible harmonisation for a melody line. We denote\nthe sequence of melody notes as Y and the harmonic motion as C, with yt representing the\nmelody at time t, and ct the harmonic state.\n\nHidden Markov Models are generative models: here we model how a visible melody line is\nemitted by a hidden sequence of harmonies. This makes sense in musical terms, since we\ncan view a chorale as having an underlying harmonic structure, and the individual notes of\nthe melody line as chosen to be compatible with this harmonic state at each time step. We\nwill create separate models for chorales in major and minor keys, since these groups have\ndifferent harmonic structures.\n\nFor our model we divide each chorale into time steps of a single beat, making the assump-\ntion that the harmonic state does not change during a beat. (Typically there are three or\nfour beats in a bar.) We want to create a model which we can use to predict three further\nnotes at each of these time steps, one for each of the three additional musical lines in the\nharmonisation.\n\nThere are many possible hidden state representations from which to choose. Here we rep-\nresent a choice of notes by a list of pitch intervals. By using intervals in this way we\nrepresent the relationship between the added notes and the melody at a given time step,\nwithout reference to the absolute pitch of the melody note. These interval sets alone would\nbe harmonically ambiguous, so we disambiguate them using harmonic labels, which are\nincluded in the training data set. Adding harmonic labels means that our hidden symbols\nnot only identify a particular chord, but also the harmonic function that the chord is serv-\ning. Figure 1(a) shows the representation used for some example notes. Here (an A major\n\n\f\nchord) the alto, tenor and bass notes are respectively 4, 9, and 16 semitones below the\nsoprano melody. The harmonic label is `T', labelling this as functionally a `tonic' chord.\nOur representation of both melody and harmony distinguishes between a note which is\ncontinued from the previous beat and a repeated note.\n\nWe make a first-order Markov assumption concerning the transition probabilities between\nthe hidden states, which represent choices of chord on an individual beat:\n\n                             P (ct|ct-1, ct-2, . . . , c0) = P (ct|ct-1).\n\nWe make a similar assumption concerning emission probabilities to model how the ob-\nserved event, a melody note, results from the hidden state, a chord:\n\n                           P (yt|ct, . . . , c0, yt-1, . . . , y0) = P (yt|ct).\n\nIn the Hidden Markov Models used here, the `hidden' states of chords and harmonic sym-\nbols are in fact visible in the data during training. This means that we can learn transition\nand emission probabilities directly from observations in our training data set of harmonisa-\ntions. We use additive smoothing (adding 0.01 to each bin) to deal with zero counts in the\ntraining data.\n\nUsing a Hidden Markov Model framework allows us to conduct efficient inference over\nour harmonisation choices. In this way our harmonisation system will `plan' over an entire\nharmonisation rather than simply making immediate choices based on the local context.\nThis means, for example, that we can hope to compose appropriate `cadences' to bring our\nharmonisations to pleasant closes rather than finishing abruptly.\n\nGiven a new melody line, we can use the Viterbi algorithm to find the most likely state\nsequence, and thus harmonisation, given our model. We can also provide alternative har-\nmonisations by sampling from the posterior [see 1, p. 156], as explained below.\n\n\n3.2    Sampling Alternative Harmonisations\n\nUsing t-1(j), the probability of seeing the observed events of a sequence up to time t - 1\nand finishing in state j, we can calculate the probability of seeing the first t - 1 events,\nfinishing in state j, and then transitioning to state k at the next step:\n\n          P (y0, y1, . . . , yt-1, ct-1 = j, ct = k) = t-1(j)P (ct = k|ct-1 = j).\n\nWe can use this to calculate t(j|k), the probability that we are in state j at time t - 1 given\nthe observed events up to time t - 1, and given that we will be in state k at time t:\n\n                                                                t\n                                                                    -1 (j )P (ct = k|ct-1 = j )\n t (j|k) = P (ct                                                                                    .\n                  -1 = j |y0 , y1 , . . . , yt-1 , ct = k) =         \n                                                                l      t-1(l)P (ct = k|ct-1 = l)\n\nTo sample from P (C|Y ) we first choose the final state by sampling from its probability\ndistribution according to the model:\n\n                                                                      T (j)\n                            P (cT = j|y0, y1, . . . , yT ) =                        .\n                                                                           \n                                                                      l    T (l)\n\nOnce we have chosen a value for the final state cT , we can use the variables t(j|k) to\nsample backwards through the sequence:\n\n                        P (ct = j|y0, y1, . . . , yT , ct+1) = t+1(j|ct+1).\n\n\n3.3    HMM for Ornamentation\n\nThe chorale harmonisations produced by the Hidden Markov Model described above har-\nmonise the original melody according to beat-long time steps. Chorale harmonisations are\n\n\f\nTable 1: Comparison of predictive power achieved by different models of harmonic se-\nquences on training and test data sets (nats).\n\n                              Training (maj)    Test (maj)    Training (min)    Training (min)\n\n\n     - 1 ln P (C|Y )               2.56           4.90             2.66              5.02\n       T\n     - 1    ln P (c\n       T          t |yt)           3.00           3.22             3.52              4.33\n     - 1    ln P (c\n       T          t |ct-1)         5.41           7.08             5.50              7.21\n     - 1    ln P (c\n       T          t )              6.43           7.61             6.57              7.84\n\n\n\nnot limited to this rhythmic form, so here we add a secondary ornamentation stage which\ncan add passing notes to decorate these harmonisations. Generating a harmonisation and\nadding the ornamentation as a second stage greatly reduces the number of hidden states\nin the initial harmonisation model: if we went straight to fully-ornamented hidden states\nthen the data available to us concerning each state would be extremely limited. Moreover,\nsince the passing notes do not change the harmonic structure of a piece but only ornament\nit, adding these passing notes after first determining the harmonic structure for a chorale is\na plausible compositional process.\n\nWe conduct ornamentation by means of a second Hidden Markov Model. The notes added\nin this ornamentation stage generally smooth out the movement between notes in a line of\nmusic, so we set up the visible states in terms of how much the three harmonising musical\nlines rise or fall from one time-step to the next. The hidden states describe ornamentation\nof this motion in terms of the movement made by each part during the time step, relative\nto its starting pitch. This relative motion is described at a time resolution four times as fine\nas the harmonic movement. On the first of the four quarter-beats we always leave notes\nas they were, so we have to make predictions only for the final three quarter-beats. Figure\n1(b) shows an example of the representation used. In this example, the alto and tenor lines\nremain at the same pitch for the second quarter-beat as they were for the first, and rise by\ntwo semitones for the third and fourth quarter-beats, so are both represented as `0,0,2,2',\nwhile the bass line does not change pitch at all, so is represented as `0,0,0,0'.\n\n\n4     Results\n\nOur training and test data are derived from chorale harmonisations by Johann Sebastian\nBach.1 These provide a relatively large set of harmonisations by a single composer, and are\nlong established as a standard reference among music theorists. There are 202 chorales in\nmajor keys of which 121 were used for training and 81 used for testing; and 180 chorales\nin minor keys (split 108/72).\n\nUsing a probabilistic framework allows us to give quantitative answers to questions about\nthe performance of the harmonisation system. There are many quantities we could com-\npute, but here we will look at how high a probability the model assigns to Bach's own\nharmonisations given the respective melody lines. We calculate average negative log prob-\nabilities per symbol, which describe how predictable the symbols are under the model.\nThese quantities provide sample estimates of cross-entropy. Whereas verbal descriptions\nof harmonisation performance are unavoidably vague and hard to compare, these figures\nallow our model's performance to be directly compared with that of any future probabilistic\nharmonisation system.\n\nTable 1 shows the average negative log probability per symbol of Bach's chord symbol\n\n     1We used a computer-readable edition of Bach's chorales downloaded from ftp://i11ftp.\nira.uka.de/pub/neuro/dominik/midifiles/bach.zip\n\n\f\n        Figure 2: Most likely harmonisation under our model of chorale K4, BWV 48\n\n\n\n\n\n       Figure 3: Most likely harmonisation under our model of chorale K389, BWV 438\n\n\n\nsequences given their respective melodic symbol sequences, - 1 ln P (C|Y ), on training\n                                                                      T\nand test data sets of chorales in major and minor keys. As a comparison we give analo-\ngous negative log probabilities for a model predicting chord states from their respective\nmelody notes, - 1        ln P (c\n                    T               t |yt), for a simple Markov chain between the chord states,\n- 1      ln P (c\n  T            t |ct-1), and for a model which assumes that the chord states are indepen-\ndently drawn, - 1        ln P (c\n                    T          t ). The Hidden Markov Model here has 5046 hidden chord\nstates and 58 visible melody states.\n\nThe Hidden Markov Model finds a better fit to the training data than the simpler models:\nto choose a good chord for a particular beat we need to take into account both the melody\nnote on that beat and the surrounding chords. Even the simplest model of the data, which\nassumes that each chord is drawn independently, performs worse on the test data than the\ntraining data, showing that we are suffering from sparse data. There are many chords, chord\nto melody note emissions, and especially chord to chord transitions, that are seen in the test\ndata but never occur in the training data. The models' performance with unseen data could\nbe improved by using a more sophisticated smoothing method, for example taking into\naccount the overall relative frequencies of harmonic symbols when assigning probabilities\nto unseen chord transitions. However, this lower performance with unseen test data is not\na problem for the task we approach here, of generating new harmonisations, as long as we\ncan learn a large enough vocabulary of events from the training data to be able to find good\nharmonisations for new chorale melodies.\n\nFigures 2 and 3 show the most likely harmonisations under our model for two short\n\n\f\nchorales. The system has generated reasonable harmonisations. We can see, for example,\npassages of parallel and contrary motion between the different parts. There is an appropri-\nate harmonic movement through the harmonisations, and they come to plausible cadences.\n\nThe generated harmonisations suffer somewhat from not taking into account the flow of\nthe individual musical lines which we add. There are large jumps, especially in the bass\nline, more often than is desirable  the bass line suffers most since has the greatest variance\nwith respect to the soprano melody. This excessive jumping also feeds through to reduce\nthe performance of the ornamentation stage, creating visible states which are unseen in the\ntraining data. The model structure means that the most likely harmonisation leaves these\nstates unornamented. Nevertheless, where ornamentation has been added it fits with its\ncontext and enhances the harmonisations.\n\nThe authors will publish further example harmonisations, including MIDI files, online at\nhttp://www.tardis.ed.ac.uk/~moray/harmony/.\n\n\n5    Relationship to previous work\n\nEven while Bach was still composing chorales, music theorists were catching up with mu-\nsical practice by writing treatises to explain and to teach harmonisation. Two famous ex-\namples, Rameau's Treatise on Harmony [2] and the Gradus ad Parnassum by Fux [3],\nshow how musical style was systematised and formalised into sets of rules. The traditional\nformulation of harmonisation technique in terms of rules suggests that we might create an\nautomatic harmonisation system by finding as many rules as we can and encoding them as\na consistent set of constraints. Pachet and Roy [4] provide a good overview of constraint-\nbased harmonisation systems. For example, one early system [5] takes rules from Fux and\nassigns penalties according to the seriousness of each rule being broken. This system then\nconducts a modified best-first search to produce harmonisations. Using standard constraint-\nsatisfaction techniques for harmonisation is problematic, since the space and time needs of\nthe solver tend to rise extremely quickly with the length of the piece.\n\nSeveral systems have applied genetic programming techniques to harmonisation, for ex-\nample McIntyre [6]. These are similar to the constraint-based systems described above,\nbut instead of using hard constraints they encode their rules as a fitness function, and try\nto optimise that function by evolutionary techniques. Phon-Amnuaisuk and Wiggins [7]\nare reserved in their assessment of genetic programming for harmonisation. They make a\ndirect comparison with an ordinary constraint-based system, and conclude that the perfor-\nmance of each system is related to the amount of knowledge encoded in it rather than the\nparticular technique it uses. In their comparison the ordinary constraint-based system actu-\nally performs much better, and they argue that this is because it possesses implicit control\nknowledge which the system based on the genetic algorithm lacks.\n\nEven if they can be made more efficient, these rule-based systems do not perform the full\ntask of our harmonisation system. They take a large set of rules written by a human and\nattempt to find a valid solution, whereas our system learns its rules from examples.\n\nHild et al. [8] use neural networks to harmonise chorales. Like the Hidden Markov Models\nin our system, these neural networks are trained using example harmonisations. However,\nwhile two of their three subtasks use only neural networks trained on example harmonisa-\ntions, their second subtask, where chords are chosen to instantiate more general harmonies,\nincludes constraint satisfaction. Rules written by a human penalise undesirable combi-\nnations of notes, so that they will be filtered out when the best chord is chosen from all\nthose compatible with the harmony already decided. In contrast, our model learns all its\nharmonic `rules' from its training data.\n\nPonsford et al. [9] use n-gram Markov models to generate harmonic structures. Unlike in\n\n\f\nchorale harmonisation, there is no predetermined tune with which the harmonies need to fit.\nThe data set they use is a selection of 84 saraband dances, by 15 different seventeen-century\nFrench composers. An automatically annotated corpus is used to train Markov models\nusing contexts of different lengths, and the weighted sum of the probabilities assigned by\nthese models used to predict harmonic movement. Ponsford et al. create new pieces first by\nrandom generation from their models, and secondly by selecting those randomly-generated\npieces which match a given template. Using templates gives better results, but the great\nmajority of randomly-generated pieces will not match the template and so will have to be\ndiscarded. Using a Hidden Markov Model rather than simple n-grams allows this kind of\ntemplate to be included in the model as the visible state of the system: the chorale tunes\nin our system can be thought of as complex templates for harmonisations. Ponsford et\nal. note that even with their longest context length, the cadences are poor. In our system\nthe `planning' ability of Hidden Markov Models, using the combination of chords and\nharmonic labels encoded in the hidden states, produces cadences which bring the chorale\ntunes to harmonic closure.\n\nThis paper stems from work described in the first author's MSc thesis [10] carried out in\n2002. We have recently become aware that similar work has been carried out independently\nin Japan by a team led by Prof S. Sagayama [11, 12]. To our knowledge this work has\nbeen published only in Japanese2. The basic frameworks are similar, but there are several\ndifferences. First, their system only describes the harmonisation in terms of the harmonic\nlabel (e.g. T for tonic) and does not fully specify the voicing of the three harmony lines or\nornamentation. Secondly, they do not give a quantitative evaluation of the harmonisations\nproduced as in our Table 1. Thirdly, in [12] a Markov model on blocks of chord sequences\nrather than on individual chords is explored.\n\n\n6     Discussion\n\nUsing the framework of probabilistic influence allows us to perform efficient inference to\ngenerate new chorale harmonisations, avoiding the computational scaling problems suf-\nfered by constraint-based harmonisation systems. We described above neural network\nand genetic algorithm techniques which were less compute-intensive than straightforward\nconstraint satisfaction, but the harmonisation systems using these techniques retain a pre-\nprogrammed knowledge base, whereas our model is able to learn its harmonisation con-\nstraints from training data.\n\nDifferent forms of graphical model would allow us to take into account more of the de-\npendencies in harmonisation. For example, we could use a higher-order Markov structure,\nalthough this by itself would be likely to greatly increase the problems already seen here\nwith sparse data. An alternative might be to use an Autoregressive Hidden Markov Model\n[13], which models the transitions between visible states as well as the hidden state transi-\ntions modelled by an ordinary Hidden Markov Model.\n\nNot all of Bach's chorale harmonisations are in the same style. Some of his harmonisations\nare intentionally complex, and others intentionally simple. We could improve our harmon-\nisations by modelling this stylistic variation, either manually annotating training chorales\naccording to their style or by training a mixture of HMMs.\n\nAs we only wish to model the hidden harmonic state given the melody, rather than construct\na full generative model of the data, Conditional Random Fields (CRFs) [14] provide a\nrelated but alternative framework. However, note that training such models (e.g. using\niterative scaling methods) is more difficult than the simple counting methods that can be\napplied to the HMM case. On the other hand the use of the CRF framework would have\n\n     2We thank Yoshinori Shiga for explaining this work to us.\n\n\f\nsome advantages, in that additional features could be incorporated. For example, we might\nbe able to make better predictions by taking into account the current time step's position\nwithin its musical bar. Music theory recognises a hierarchy of stressed beats within a bar,\nand harmonic movement should correlated with these stresses. The ornamentation process\nespecially might benefit from a feature-based approach.\n\nOur system described above only considers chords as sets of intervals, and thus does not\nhave a notion of the key of a piece (other than major or minor). However, voices have a\npreferred range and thus the notes that should be used do depend on the key, so the key\nsignature could also be used as a feature in a CRF. Taking into account the natural range of\neach voice would prevent the bass line from descending too low and keep the three parts\ncloser together. In general more interesting harmonies result when musical lines are closer\ntogether and their movements are more constrained. Another dimension that could be\nexplored with CRFs would be to take into account the words of the chorales, since Bach's\nown harmonisations are affected by the properties of the texts as well as of the melodies.\n\n\nAcknowledgments\n\nMA gratefully acknowledges support through a research studentship from Microsoft Re-\nsearch Ltd.\n\n\nReferences\n\n [1] R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison. Biological sequence analysis. Cambridge\n     University Press, 1998.\n\n [2] J.-P. Rameau. Traite de l'Harmonie reduite `a ses principes naturels. Paris, 1722.\n\n [3] J. J. Fux. Gradus ad Parnassum. Vienna, 1725.\n\n [4] F. Pachet and P. Roy. Musical harmonization with constraints: A survey. Constraints, 6(1):\n     719, 2001.\n\n [5] B. Schottstaedt.    Automatic species counterpoint.      Technical report, Stanford University\n     CCRMA, 1989.\n\n [6] R. A. McIntyre. Bach in a box: The evolution of four-part baroque harmony using the genetic\n     algorithm. In Proceedings of the IEEE Conference on Evolutionary Computation, 1994.\n\n [7] S. Phon-Amnuaisuk and G. A. Wiggins. The four-part harmonisation problem: a comparison\n     between genetic algorithms and a rule-based system. In Proceedings of the AISB'99 Symposium\n     on Musical Creativity, 1999.\n\n [8] H. Hild, J. Feulner, and W. Menzel. HARMONET: A neural net for harmonizing chorales in\n     the style of J.S. Bach. In R.P. Lippman, J.E. Moody, and D.S. Touretzky, editors, Advances in\n     Neural Information Processing 4, pages 267274. Morgan Kaufmann, 1992.\n\n [9] D. Ponsford, G. Wiggins, and C. Mellish. Statistical learning of harmonic movement. Journal\n     of New Music Research, 1999.\n\n[10] M. M. Allan. Harmonising Chorales in the Style of Johann Sebastian Bach. Master's thesis,\n     School of Informatics, University of Edinburgh, 2002.\n\n[11] T. Kawakami. Hidden Markov Model for Automatic Harmonization of Given Melodies. Mas-\n     ter's thesis, School of Information Science, JAIST, 2000. In Japanese.\n\n[12] K. Sugawara, T. Nishimoto, and S. Sagayama. Automatic harmonization for melodies based on\n     HMMs including note-chain probability. Technical Report 2003-MUS-53, Acoustic Society of\n     Japan, December 2003. In Japanese.\n\n[13] P. C. Woodland. Hidden Markov Models using vector linear prediction and discriminative\n     output distributions. In Proc ICASSP, volume I, pages 509512, 1992.\n\n[14] J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional Random Fields: probabilistic\n     models for segmenting and labeling sequence data. In Proc ICML, pages 282289, 2001.\n\n\f\n", "award": [], "sourceid": 2714, "authors": [{"given_name": "Moray", "family_name": "Allan", "institution": null}, {"given_name": "Christopher", "family_name": "Williams", "institution": null}]}