{"title": "MELONET I: Neural Nets for Inventing Baroque-Style Chorale Variations", "book": "Advances in Neural Information Processing Systems", "page_first": 887, "page_last": 893, "abstract": null, "full_text": "MELONET I: Neural Nets for Inventing \n\nBaroque-Style Chorale Variations \n\nDominik Hornel \ndominik@ira.uka.de \n\nInstitut fur Logik, Komplexitat und Deduktionssysteme \n\nUniversitat Fridericiana Karlsruhe (TH) \n\nAm Fasanengarten 5 \n\nD-76128 Karlsruhe, Germany \n\nAbstract \n\nMELONET I is a multi-scale neural network system producing \nbaroque-style melodic variations. Given a melody, the system in(cid:173)\nvents a four-part chorale harmonization and a variation of any \nchorale voice, after being trained on music pieces of composers like \nJ. S. Bach and J . Pachelbel. Unlike earlier approaches to the learn(cid:173)\ning of melodic structure, the system is able to learn and reproduce \nhigh-order structure like harmonic, motif and phrase structure in \nmelodic sequences. This is achieved by using mutually interacting \nfeedforward networks operating at different time scales, in combi(cid:173)\nnation with Kohonen networks to classify and recognize musical \nstructure. The results are chorale partitas in the style of J. Pachel(cid:173)\nbel. Their quality has been judged by experts to be comparable to \nimprovisations invented by an experienced human organist. \n\n1 \n\nINTRODUCTION \n\nThe investigation of neural information structures in music is a rather new, excit(cid:173)\ning research area bringing together different disciplines such as computer science, \nmathematics, musicology and cognitive science. One of its aims is to find out what \ndetermines the personal style of a composer. It has been shown that neural network \nmodels - better than other AI approaches - are able to learn and reproduce style(cid:173)\ndependent features from given examples, e.g., chorale harmonizations in the style \nof Johann Sebastian Bach (Hild et al., 1992) . However when dealing with melodic \nsequences, e.g., folk-song style melodies, all of these models have considerable dif(cid:173)\nficulties to learn even simple structures. The reason is that they are unable to \ncapture high-order structure such as harmonies , motifs and phrases simultaneously \noccurring at multiple time scales. To overcome this problem, Mozer (Mozer, 1994) \n\n\f888 \n\nD. Hamel \n\nproposes context units that learn reduced descriptions of a sequence of individual \nnotes. A similar approach in MELONET (Feulner et Hornel, 1994) uses delayed \nupdate units that do not fire each time their input changes but rather at discrete \ntime intervals. Although these models perform well on artificial sequences , they \nproduce melodies that suffer from a lack of global coherence. \n\nThe art of melodic variation has a long tradition in Western music. Almost every \ngreat composer has written music pieces inventing variations of a given melody, e.g., \nMozart's famous variations KV 265 on the melody \"Ah! Vous dirai-je, Maman\", \nalso known as \"Twinkle twinkle little star\". At the beginning of this tradition there \nis the baroque type of chorale variations. These are organ or harpsichord variations \nof a chorale melody composed for use in the Protestant church. A prominent repre(cid:173)\nsentative of this kind of composition is J. Pachelbel (1653 - 1706) who wrote about \n50 chorale variations or partitas on various chorale melodies. \n\n2 TASK DESCRIPTION \n\nGiven a chorale melody, the learning task is achieved in two steps: \n\n1. A chorale harmonization of the melody is invented. \n2. One of the voices of the resulting chorale is chosen and provided with \n\nmelodic variations. \n\nBoth subtasks are directly learned from music examples composed by J. Pachelbel \nand performed in an interactive composition process which results in a chorale \nvariation of the given melody. The first task is performed by HARMONET, a \nneural network system which is able to harmonize melodies in the style of various \ncomposers like J. S. Bach. The second task is performed by the neural network \nsystem MELONET I, presented in the following . For simplicity we have considered \nmelodic variations consisting of 4 sixteenth notes for each melody quarter note. \nThis is the most common variation type used by baroque composers and presents a \ngood starting point for even more complex variation types, since there are enough \nmusic examples for training and testing the networks, and because it allows the \nrepresentation of higher-scale elements in a rather straightforward way. \n\nHARMONET is a system producing four-part chorales in various harmonization \nstyles, given a one-part melody. It solves a musical real-world problem on a perfor(cid:173)\nmance level appropriate for musical practice. Its power is based on a coding scheme \ncapturing musically relevant information. and on the integration of neural networks \nand symbolic algorithms in a hierarchical system, combining the advantages of both. \nThe details are not discussed in this paper. See (Hild et aI., 1992) or (Hornel et \nRagg, 1996a) for a detailed account . \n\n3 A MULTI-SCALE NEURAL NETWORK MODEL \n\nThe learning goal is twofold. On the one hand, the results produced by the system \nshould conform to musical rules. These are melodic and harmonic constraints such \nas the correct resolving of dissonances or the appropriate use of successive interval \nleaps. On the other hand, the system should be able to capture stilistic features \nfrom the learning examples, e.g., melodic shapes preferred by J. Pachelbel. The \nobservation of musical rules and the aesthetic conformance to the learning set can \nbe achieved by a multi-scale neural network model. The complexity of the learning \ntask is reduced by decomposition in three subtasks (see Figure 1): \n\n\fMELONEI' I: Neural Netsfor Inventing Baroque-Style Chorale Variations \n\n889 \n\nHarmony \n\nT T D, T S, T, D T \n\nk=tmod4 \n\n: MelodIc Vanallon \n\nFigure 1: Structure of the system and process of composing a new melodic variation. \nA melody (previously harmonized by HARMONET) is passed to the supernet which \npredicts the current motif class MGT from a local window given by melody notes MT \nto MT+2 and preceding motif class MGT-I. A similar procedure is performed at a \nlower time scale by the su bnet which predicts the next motif note Nt based on M CT, \ncurrent harmony HT and preceding motif note Nt-I. The result is then returned \nto the supernet through the motif classifier to be considered when computing the \nnext motif class MCT +1 . \n\n1. A melody variation is considered at a higher time scale as a sequence of \nmelodic groups, so-called motifs. Each quarter note of the given melody \nis varied by one motif. Before training the networks, motifs are classified \naccording to their similarity. \n\n2. One neural network is used to learn the abstract sequence of motif classes. \nMotif classes are represented in a l-of-n coding form where n is a fixed \nnumber of classes. The question it solves is: What kind of motif 'fits' a \nmelody note depending on melodic context and the motif that has occurred \nbefore? No concrete notes are fixed by this network. It works at a higher \nscale and will therefore be called stlpernet in the following. \n\n3. Another neural network learns the implementation of abstract motif classes \ninto concrete notes depending on a given harmonic context. It produces \na sequence of sixteenth notes - four notes per motif -\nthat result in a \nmelodic variation of the given melody. Because it works one scale below \nthe supernet, it is called stlbnet. \n\n4. The subnet sometimes invents a sequence of notes that does not coincide \n\n\f890 \n\nD. Homel \n\nwith the motif class determined by the supernet. This motif will be consid(cid:173)\nered when computing the next motif class, however. and should therefore \nmatch the notes previously formed by the subnet. It is therefore reclassified \nby the motif classifier before the supernet determines the next motif class. \n\nThe motivation of this separation into supernet and subnet arised from the following \nconsideration: Having a neural network that learns sequences of sixteenth notes, it. \nwould be easier for this network to predict notes given a contour of each motif. i.e. \na sequence of interval directions to be produced for each quarter note. Consider \na human organist who improvises a melodic variation of a given melody in real \ntime. Because he has to take his decisions in a fraction of a second, he must at \nleast have some rough idea in mind about what kind of melodic variation should \nbe applied to the next melody note to obtain a meaningful continuation of the \nvariation. Therefore, a neural network was introduced at a higher time scale , the \ntraining of which really improved the overall behavior of the system and not just \nshifted the learning problem to another time scale. \n\n4 MOTIF CLASSIFICATION AND RECOGNITION \n\nIn order to realize learning at different time scales as described above, we need \na recognition component to find a suitable classification of motifs. This can be \nachieved using unsupervised learning, e.g. , agglomerative hierarchical clustering or \nKohonen's topological feature maps (Kohonen, 1990). The former has the disadvan(cid:173)\ntage however that an appropriate distance measure is needed which determines the \nsimilarity between small sequences of notes respectively intervals, whereas the latter \nallows to obtain appropriate motif classes through self-organization within a two(cid:173)\ndimensional surface. Figure 2 displays the motif representation and distribution of \nmotif contours over a 10xlO Kohonen feature map. In MELONET I, the Kohonen \nalgorithm is applied to all motifs contained in the training set. Afterwards a corre(cid:173)\nsponding motif classification tree is recursively built from the Kohonen map. While \ncutting this classification tree at lower levels we can get more and more classes. One \nimportant problem remains to find an appropriate number of classes for the given \nlearning task. This will be discussed in section 6. \n\n~ ... -\" .............................................. ', \n\n, jJ 3jl \n\nWinner \n\n1 st interval \n\n2nd interval \n\n3rd interval \n\n-1 \n\nFigure 2: Motifrepresentation example (left) and motif contour distribution (right) \nover a 10xlO Kohonen feature map developed from one Pachelbel chorale variation \n(initial update area 6x6, initial adaptation height 0.95, decrease factor 0.995). Each \ncell corresponds to one unit in the KFM. One can see the arrangement of regions \nresponding to motifs having different motif contours. \n\n\fMELONEr I: Neural Nets for Inventing Baroque-Style Chorale Variations \n\n891 \n\n5 REPRESENTATION \n\nIn general one can distinguish two groups of motifs: Melodic motifs prefer small in(cid:173)\ntervals, mainly seconds, harmonic motifs prefer leaps and harmonizing notes (chord \nnotes) . Both motif groups heavily rely on harmonic information. In melodic mo(cid:173)\ntifs dissonances should be correctly resolved, in harmonic motifs notes must fit the \ngiven harmony. Small deviations may have a significant effect on the quality of \nmusical results. Thus our idea was to integrate musical knowledge about interval \nand harmonic relationships into an appropriate interval representation. Each note \nis represented by its interval to the first motif note, the so-called reference note. \nThis is an important element contributing to the success of MELONET I. A similar \nidea for Jazz improvisation was followed in (Baggi, 1992) . \n\nThe interval coding shown in Table 1 considers several important relationships: \nneighboring intervals are realized by overlapping bits, octave invariance is repre(cid:173)\nsented using a special octave bit. The activation of the overlapping bit was reduced \nfrom 1 to 0.5 in order to allow a better distinction of the intervals. 3 bits are \nused to distinguish the direction of the interval , 7 bits represent interval size. Com(cid:173)\nplementary intervals such as ascending thirds and descending sixths have similar \nrepresentations because they lead to the same note and can therefore be regarded as \nharmonically equivalent. A simple rhythmic element was then added using a tenuto \nbit (not shown -in Table 1) which is set when a note is tied to its predecessor. This \nfinal 3+1+7+1=12 bit coding gave the best results in our simulations. \n\nTable 1: Complementary Interval Coding \n\nninth \noctave \nseventh \nsixth \nfifth \nfourth \nthird \nsecond \npnme \nsecond \nthird \nfourth \nfifth \nsixth \nseventh \noctave \nninth \n\n\\. \n\\. \n\\. \n\\. \n\\. \n\\. \n\\. \n\\. \n-+ \n/' \n/' \n/' \n/' \n/' \n/' \n/' \n/' \n\ndirection \n1 o 0 \n1 o 0 \n1 o 0 \n100 \n100 \n1 0 0 \n1 o 0 \n1 o 0 \n010 \no 0 1 \n0 0 1 \n0 0 1 \no 0 1 \no 0 1 \no 0 1 \n0 0 1 \no 0 1 \n\noctave \n\n1 \n1 \n0 \n0 \n0 \n0 \n0 \n0 \n0 \n0 \n0 \n0 \n0 \n0 \n0 \n1 \n1 \n\n0 \n1 \n\n0.5 \n\n0 \n0 \n0 \n0 \n0 \n1 \n\n0.5 \n\n0 \n0 \n0 \n0 \n0 \n1 \n\n0.5 \n\n0 \n0 \n1 \n\n0.5 \n0 \n0 \n0 \n0 \n0 \n1 \n\n0.5 \n0 \n0 \n0 \n0 \n0 \n1 \n\ninterval size \n0 \n0 \n0 \n0 \n0 \n0 \n0 \n1 \n0 \n1 \n\n0 \n0 \n0 \n0 \n1 \n\n0.5 \n\n0.5 \n\n0 \n0 \n0 \n0 \n0 \n1 \n\n0.5 \n\n0 \n0 \n0 \n0 \n\n0.5 \n0 \n0 \n0 \n0 \n0 \n1 \n0 . 5 \n0 \n0 \n0 \n\n0 \n0 \n0 \n0 \n0 \n1 \n\n0.5 \n\n0 \n0 \n0 \n0 \n0 \n\n0.5 \n\n0 \n0 \n0 \n0 \n0 \n1 \n\n0.5 \n\n0 \n0 \n0 \n0 \n0 \n1 \n\n0.5 \n\n0 \n0 \n\n1 \n0.5 \n0 \n0 \n0 \n0 \n0 \n1 \n0.5 \n0 \n0 \n0 \n0 \n0 \n1 \n0.5 \n0 \n\nNow we still need a representation for harmony. It can be encoded as a harmonic \nfield which is a vector of chord notes of the diatonic scale. The tonic T in C major \nfor example contains 3 chord notes - C, E and G - which correspond to the first, \nthird and fifth degree of the C major scale (1010100). This representation may be \nfurther improved. We have already mentioned that each note is represented by the \ninterval to the first motif note (reference note). We can now encode the harmonic \nfield starting with the first motif note instead of the first degree of the scale . This is \nequivalent to rotating the bits of the harmonic field vector. An example is displayed \nin Figure 3. The harmony of the motif is the dominant D, the first motif note is \nB which corresponds to the seventh degree of the C major scale. Therefore the \n\n\f892 \n\nD.Homel \n\nharmonic field for D (0100101) is rotated by one position to the right resulting in \n(1010010). Starting with the first note B. the harmonic field indicates the intervals \nthat lead to harmonizing notes B, D and G. In the right part of Figure 3 one can \nsee a correspondance between bits activated in the harmonic field and bits set to 1 \nin the three interval codings. This kind of representation helps the neural network \nto directly establish a relationship between intervals and given harmony. \n\n, J J 3d I \n\nD \n\nthird up \nsixth up \npnme \n\no 0 1 0 \no 0 1 0 \n010 0 \n\n0 0.5 1 \n0 \n0 \n1 \n0 \n\n0 \n0 \n\n0 \n\n0 \n0 \n0 0.5 1 \n0 \n\n0 \n\n0 \n0 \n\n0 0.5 \n\nharmonic field \n\n1 \n\n0 \n\n1 \n\n0 \n\n0 \n\n1 \n\n0 \n\nFigure 3: Example illustrating the relationship between interval coding and rotat.ed \nharmonic field. Each note is represented by its interval to the first note. \n\n6 PERFORMANCE \n\nWe carried out several simulations to evaluate the performance of the system. Many \nimprovements could be found however by just listening to the improvisations pro(cid:173)\nduced by the neural organist. One important problem was to find an appropriate \nnumber of classes for the given learning task . The following table lists the classifica(cid:173)\ntion rate on the learning and validation set of the supernet and the subnet using 5, \n12 and 20 motif classes. The learning set was automatically built from 12 Pachelbel \nchorale variations corresponding to 2220 patterns for the subnet and 555 for the \nsupernet. The validation set includes 6 Pachelbel variations corresponding to 1396 \npatterns for the subnet and 349 for the supernet. Supernet and subnet were then \ntrained independently with the RPROP learning algorithm. \n\nlearning set \nvalidation set \n\n5 classes \n91.17% \n49.85% \n\ns'Upernet \n12 classes \n86.85% \n40.69% \n\n20 classes \n87.57% \n37.54% \n\n5 classes \n86.31% \n79.15% \n\ns'Ubnet \n12 classes \n93.92% \n83.38% \n\n20 classes \n95 .68% \n86.96% \n\nThe classification rate of both networks strongly depends on the number of classes, \nesp. on the validation set of the supernet. The smaller the number of classes, \nthe better is the classification of the supernet because there are less alternatives \nto choose from. We can also notice an opposite development of the classification \nbehavior for the subnet. The bigger the number of classes. the easier the subnet will \nbe able to determine concrete motif notes for a given motif class. One can imagine \nthat the optimal number of classes lies somewhere in the middle. Another idea is \nto form a committee of networks each of which is trained with different number of \nclasses. \n\nWe have also tested MELONET I on melodies that do not belong to the baroque \nera. Figure 4 shows a harmonization and variation of the melody \"Twinkle twinkle \nlittle star\" used by Mozart in his famous piano variations. It was produced by a \nnetwork committee formed by 3*2=6 networks trained with 5, 12 and 20 classes. \n\n7 CONCLUSION \n\nWe have presented a neural network system inventing baroque-style variations on \ngiven melodies whose qualities are similar to those of an experienced human organ-\n\n\fMELONEI'L\u00b7 Neural Nets for Inventing Baroque-Style Chorale Variations \n\n893 \n\n! . \n\nFigure 4: Melodic variation on \"Twinkle twinkle little star\" \n\nist. The complex musical task could be learned introducing a multi-scale network \nmodel with two neural networks cooperating at different time scales , together with \nan unsupervised learning mechanism able to classify and recognize relevant musical \nstructure. \nWe are about to test this multi-scale approach on learning examples of other epochs, \ne.g., on compositions of classical composers like Haydn and Mozart or on Jazz \nimprovisations. First results confirm that the system is able to reproduce style(cid:173)\nspecific elements of other kinds of melodic variation as well. Another interesting \nquestion is whether the global coherence of the musical results may be further \nimproved adding another network working at a higher level of abstraction, e.g., at. \na phrase level. In summary, we believe that this approach presents an important \nstep towards the learning of complete melodies. \n\nReferences \n\nDenis L. Baggi. NeurSwing: An Intelligent Workbench for the Investigation of \nSwing in Jazz. In: Readings in Computer-Generated Music, IEEE Computer So(cid:173)\nciety Press, pp. 79-94, 1992. \nJohannes Feulner, Dominik Hornel. MELONET: Neural networks that learn \nharmony-based melodic variations. In: Proceedings of the 1994 International Com(cid:173)\nputer Music Conference. ICMA Arhus, pp. 121-124, 1994. \nHermann Hild, Johannes Feulner, Wolfram Menzel. HARMONET: A Neural Net \nfor Harmonizing Chorales in the Style of J. S. Bach. In: Advances in Neural \nInformation Processing 4 (NIPS 4), pp. 267-274. 1992. \nDominik Hornel, Thomas Ragg. Learning Musical Structure and Style by Recog(cid:173)\nnition, Prediction and Evolution. In: Proceedings of the 1996 International Com(cid:173)\nputer Music Conference. ICMA Hong Kong , pp. 59-62, 1996. \nDominik Hornel, Thomas Ragg. A Connectionist Model for the Evolution of Styles \nof Harmonization. In: Proceedings of the 1996 International Conference on Music \nPerception and Cognition. Montreal, 1996. \nTeuvo Kohonen. The Self-Organizing Map. In: Proceedings of the IEEE, Vol. 78, \nno. 9, pp. 1464-1480, 1990. \nMichael C. Mozer . Neural Network music composition by prediction. In: Connec(cid:173)\ntion Science 6(2,3), pp. 247-280 , 1994. \n\n\f", "award": [], "sourceid": 1370, "authors": [{"given_name": "Dominik", "family_name": "H\u00f6rnel", "institution": null}]}