{"title": "Synchronized Auditory and Cognitive 40 Hz Attentional Streams, and the Impact of Rhythmic Expectation on Auditory Scene Analysis", "book": "Advances in Neural Information Processing Systems", "page_first": 3, "page_last": 9, "abstract": null, "full_text": "Synchronized Auditory and Cognitive 40 Hz \n\nAttentional Streams, and the Impact of \n\nRhythmic Expectation on Auditory Scene Analysis \n\nDept Mathematics, U.C.Berkeley, Berkeley, Ca.  94720. \n\nbaird@math.berkeley.edu \n\nBill Baird \n\nAbstract \n\nWe have developed a neural network architecture that implements a the(cid:173)\nory  of attention,  learning,  and  trans-cortical communication  based  on \nadaptive synchronization of 5-15 Hz and 30-80 Hz oscillations between \ncortical areas.  Here we present a specific higher order cortical model of \nattentional networks, rhythmic expectancy, and the interaction of hi~her\u00ad\norder and primar\u00a5, cortical levels of processing. It accounts for the' mis(cid:173)\nmatch negativity' of the auditory ERP and the results of psychological \nexperiments of Jones showing that auditory stream  segregation depends \non the rhythmic structure of inputs. The timing mechanisms of the model \nallow  us to explain how relative timing information such as the relative \norder of events between  streams is  lost when streams are  formed.  The \nmodel suggests how the theories of auditory perception and attention of \nJones and Bregman may be reconciled. \n\nIntroduction \n\n1 \nAmplitude patterns of synchronized \"gamma band\" (30 to 80 Hz) oscillation have been ob(cid:173)\nserved in ilie ensemble activity (local field potentials) of vertebrate olfactory, visual, audi(cid:173)\ntory, motor, and somatosensory cortex, and in the retlna, thalamus, hippocampus, reticular \nformation, and EMG. Such activity has not only been found in primates, cats, rabbits and \nrats, but also insects, slugs, fish,  amphibians, reptiles, and birds. This suggests that gamma \noscillation may be as fundamental to neural processing at the network level as action poten(cid:173)\ntials are at the cellular level. \nWe  have  shown  how oscillatory associative memories  may  be coupled to recognize  and \ngenerate sequential behavior, and how a set of novel mechanisms utilizing these complex \ndynamics can be configured to solve attentional and perceptual processing problems.  For \npointers to full treatment wi th mathematics and complete references see [BaIrG et al., 1994]. \nAn important element of intra-cortical communication in the brain, and between modules \nin this architecture, is the ability of a module to detect and respond to the proper input sig(cid:173)\nnal from  a particular module, when inputs from  other modules which are irrelevant to tlie \npresent computation are contributing crosstalk noise.  We have demonstrated that selective \ncontrol of synchronization, which we hypothesize to be a model of \"attention\", can be used \nto solve this coding problem and control program flow  in an architecture with dynamic at(cid:173)\ntractors [Baird et al., 1994]. \nUsing dynamical systems theory, the architecture is constructed from recurrently intercon(cid:173)\nnected oscillatory associative memory modules that model higher order sensory and motor \nareas of cortex. The modules learn connection weights between themselves which cause the \nsystem to evolve under a 5-20 Hz clocked sensory-motor processing cycle by a sequence \n\n\f4 \n\nB.  Baird \n\nof transitions of synchronized 30-80 Hz oscillatory attractors within the modules.  The ar(cid:173)\nchitecture employ's selective\"attentional\" control of the synchronization of the 30-80 Hz \ngamma band OSCIllations between modules to direct the flow  of computation to recognize \nand generate sequences.  The 30-80 Hz attractor amplitude patterns code the information \ncontent of a cortIcal area, whereas phase and frequency are used to \"softwire\" the network, \nsince only the synchronized areas communicate by exchanging amplitude information. The \nsystem works like a broadcast network where the unavoidable crosstalk to all areas from pre(cid:173)\nvious learned connections is overcome by frequency coding to allow the moment to moment \noperation of attentional communication only between selected task-relevant areas. \nThe behavior of the time traces  in different modules of the architecture models the tem(cid:173)\nporary appearance and switching of the synchronization of 5-20 and 30-80 Hz oscillations \nbetween cortical areas that is observed during sensorimotor tasks in monkeys and numans. \nThe architecture models the 5-20 Hz evoked potentials seen in the EEG as the control sig(cid:173)\nnals which determine the sensory-motor processing cycle.  The 5-20 Hz clocks which drive \nthese control signals in the archItecture model thalamic pacemakers which are thought to \ncontrol the excitabili ty of neocortical tissue through similar nonspecific biasing currents that \ncause the cogni tive and sensory evoked potentials of the EEG. The 5-20 Hz cycles \"quantize \ntime\" and form the basis of derived somato-motor rhythms with periods up to seconds that \nentrain to each other in motor coordination and to external rhythms in speech perception \n[Jones et al.,  1981]. \n1.1  Attentional Streams of Synchronized 40 Hz Activity \nThere is extensive evidence for the claim of the model that the 30-80 Hz gamma band ac(cid:173)\ntivity in the brain accomplishes attentional processing, since 40 Hz appears in cortex when \nand where attention is required. For example, it is found in somatosensory, motor and pre(cid:173)\nmotor cortex of monkeys when they must pick a  rasin out of a small box, but not when a \nhabitual lever press delivers the reward.  In human attention experiments, 30-80 Hz activity \ngoes up in the contralateral auditory areas when subjects are mstructed to pay attention to \none ear and not the other.  Gamma activity declines in the dominant hemisphere along with \nerrors in a learnable target and distractors task, but not when the distractors and target vary \nat random on each  trial.  Anesthesiologists use the absence of 40 Hz activity as a  reliable \nindicator of unconsciousness.  Recent work has shown that cats with convergent and diver(cid:173)\ngent strabismus who fail on tasks where perceptual binding is required also do not exhibit \ncortical synchrony. This is evidence that gamma synchronization IS perceptually functional \nand not epiphenomenal. \nThe  architecture  illustrates  the notion that  synchronization of gamma  band  activity  not \nonly\"binds\" the features of inputs in primary sensory cortex into \"objects\", but further binds \nthe activity of an  attended object to oscillatory activity in associational and higher-order \nsensory and motor cortical areas to create an evolving attentional network of intercommu(cid:173)\nnicating cortical areas that directs behavior. The binding of sequences of attractor transitions \nbetween modules of the architecture by synchronization of their activity models the physio(cid:173)\nlogical mechanism for the formation of perceptual and cognitive \"streams\" investigated by \nBregman [Bregman, 1990], Jones [Jones et aI.,  1981], and others.  In audition, according to \nBregman's work, successive events of a sound source are bound together into a distinct se(cid:173)\nquence or \"stream\" and segy:egated from other sequences so that one pays attention to onl\u00a5, \none sound source at a time (the cocktail party problem). Higher order cortical or \"cognitive' \nstreams are in evidence when subjects are unable to recall the relative order of the telling of \nevents between two stories told in alternating segments. \nMEG tomographic observations show large scale rostral to caudal motor-sensory sweeps of \ncoherent thalamo-cortical40Hz activity accross the entire brain, the phase of which is reset \nby sensory input in waking, but not in dream states [Llinas and Ribary, 1993]. This suggests \nan inner higher order \"attentional stream\" is constantly cycling between motor (rostral) and \nsensory (caudal)  areas  in the absence of input.  It may  be interrupted by input \"pop out\" \nfrom primary areas or it may reach down as a \"searchlight\" to synchromze with particular \nensembles of primary activity to be attended. \n2  Jones Theory of Dynamic Attention \nJones [Jones et al., 1981] has developed a psychological theory of attention,perception, and \nmotor timing based on the hypotheSIS that these processes are organized by neural rhythms \nin the range of 10 to .5  Hz - the range within which subjects perceive pen odic events as a \nrhythm. These rhythms provide a multi scale representation of time and selectively synchro(cid:173)\nnize with the prominant periodiCities of an input to provide a -temporal expectation mecha(cid:173)\nnism for attention to target particular points in time. \n\n\f40 Hz Attentional Streams, Rhythmic Expectation, and Auditory Scene Analysis \n\n5 \n\nFor example, some work suggests that the accented parts of speech create a rhythm to which \nlisteners entrain.  Attention can then be focused on these expected locations as recognition \nanchor points for inference of less prominant parts of the speech stream.  This is the temporal \nanalog of the body centered spatial coordinate frame and multiscale covert attention window \nsystem  in  vision.  Here the body centered  temporal coordinates of the internal  time base \norient by entrainment to the external rhythm, and the window of covert temporal attention \ncan then select a level of the multiscale temporal coordinates. \nIn this view, just as two cortical areas must synchronize to communicate, so must two ner(cid:173)\nvous systems.  Work using frame by frame film analysis of human verbal interaction, shows \nevidence of \"interactional synchrony\" of gesture and body movement changes and EEG of \nboth speaker and listener With the onsets of phonemes in speech at the level of a 10 Hz \"mi(cid:173)\ncrorhythm\" - the base clock rate of our models.  Normal infants synchronize their spon(cid:173)\ntaneous body ftailings at this  10 Hz level to the mothers voice accents, while autistic and \ns~hitzophrenic children fail to show interactional synchrony.  Autistics are unable to tap in \norne t9 a metronome. \nNeural expectation rhythms that support Jones' theory have been found in the auditory EEG. \nIn experiments where the arrival time of a target stimulus is regular enough to be learned by \nan experimental subject, it has been shown that the 10 Hz activity in advance of the stimulus \nbecomes phase locked to that expected arrival time.  This fits our model of rhythmic expec(cid:173)\ntation where the 10Hz rhythm IS  a fast base clock that is  shifted in phase and frequency \nto produce a match in timmg between the stimulus arrival and the output of longer period \ncycles derived from this base clock. \n2.1  Mismatch Negativity \nThe \"mismatch negativity\" (MNN) [Naatanen, 1992] of the auditory evoked potential ap(cid:173)\npears to be an important physiological indicator of the action of a neural expectancy system \nlike that proposed by Jones.  It has been localized to areas within primary auditory cortex by \nMEG studies [Naatanen,  1992] and it appears as  an increased negativity of the ERP in the \nregion of the N200 peak whenever a psycbologically discriminable deviation of a repetitive \nauditory stimulus occurs.  Mismatch IS caused by deviations in onset or offset time, rise time, \nfrequency, loudness, timbre, phonetic structure, or spatial location of a tone in the sequence. \nThe mismatch is abolished by blockers of the action ofNMDA channels [Naatanen, 1992] \nwhich are important for the synaptic changes underlying the kind of Hebbian learning which \nis used in the model. \nMNN is not a direct function of echoic memory because it takes several repetitions for the \nexpectancy to begin to develop, and it decays  10 2 - 4 seconds.  It appears only for repeti(cid:173)\ntion periods greater that 50-100 msec and less than 2-4 seconds.  Thus the time scale of its \noperation is 10 the appropriate range for Jones' expectancy system.  Stream formation also \ntakes several cycles of stimulus repetition to builo up  over 2-4 seconds and decays away \nwithin 2-4 seconds in the absence of stimulation.  Those auditory stimulus features  which \ncause streaming are also features which cause mismatch.  This supports the hypothesis in \nthe model that these phenomena are functionally related. \nFinally, MNN can  occur independent of attention - while a subject is reading or doing a \nvisual discrimination task.  ThIS implies that the auditory system at least must have its own \ntiming system that can generate timmg and expectancies independent of other behavior. We \ncan  talk or do internal verbal  thinking while doing other tasks.  A further component of \nthis negativity appears in prefrontal cortex and is thought by Nataanen to initiate attentional \nswitchmg toward the deVIant event causing perceptual \"pop out\" [Naatanen, 1992]. \nStream formation is known to affect rhythm perception. The galloping rhythm of high H and \nlow L tones - HLH-HLH-HLH, for example becomes two separate Isochronous rhythmic \nstreams of H-H-H-Hand L-L-L-L when the H and L tones are spread far enough apart \n[Bregman, 1990]. Evidence for the effect of in'putrhythms on stream formation, however, is \nmore sparse, and we focus here on the simulatIOn of a particular set of experiments by Jones \n[J ones et al., 1981] and Bregman [Bregman, 1990] where this effect has been demonstrated. \n2.2  Jones-Bregman Experiment \nJones [Jones et al., 1981] replicated and altered a classic streaming experiment of Bregman \nand Rudnicky [Bregman, 1990], and found that their result depended on a specific choice \nof the rhythm of presentation. The experiment required human subjects to determine of the \norder of presentation of a pair of high target tones AB  or BA of slIghtly different frequen(cid:173)\ncies.  Also presented before and after the target tones were a series of identical much lower \nfrequency tones called the capture tones CCC and two identical tones of intermediate fre-\n\n\f6 \n\nB. Baird \n\nquency before and after the target tones called the flanking tones F - CCCFABFCCC. Breg(cid:173)\nman and Rudnicky found that target order determination performance was best when tfie \ncapture tones were near to the flanking  tones in frequency,  and deteriorated as the captor \ntones were moved away.  Their explanation was that the flanking tones were captured by the \nbackground capture tone stream when close in frequency, leav10g the target tones to stand \nout by themselves in the attended stream.  When the captor tones were absent or far away in \nfrequency, the flanking tones were included in the attended stream and obscured the target \ntones. \nJones noted that the flanking tones and the capture stream were presented at a stimulus on(cid:173)\nset rate of one per 240 ms and the targets appeared at 80 ms intervals.  In her experiments, \nwhen  the captor and flanking  tones were given a rhythm in common with the targets, no \neffect of the distance of captor and flanking tones appeared.  This suggested that rfiythmic \ndistinction of targets and dlstractors was necessary 10 addition to the frequency diStinction \nto allow selective attention to segregate out the target stream.  Because performance in the \nsingle rhythm case was worse than that for the control condition without captors, it appeared \nthat no stream segregation of targets and captors and flanking tones was occurring until the \nrhythmic difference was added.  From this evidence we malie  the assumption in the model \nthat the distance of a stimulus in time from a rhythmic expectancy acts like the distance be(cid:173)\ntween stimuli in pitch, loudness,  timbre,  or spatial location as/actor for theformation of \nseparate streams. \n3  Architecture and Simulation \nTo implement Jones's theory in the model and account for her data, subsets of the oscillatory \nmodules are dedicated to form a rhythmic temporal coordinate frame or time base by divid(cid:173)\ning down a thalamic 10 Hz base clock rate in steps from  10 to .5 Hz.  Each derived clock is \ncreated by an associative memory module that has been specialized to act stereotypically as \na  counter or shift register by repeatedly cycling through all its attractors at the rate of one \nfor each time step of its clock.  Its overall cycle time is therefore determined by the number \nof attractors. Each cycle is guaranteed to be identical, as required for clocklike function, be(cid:173)\ncause of the strong attractors that correct the perturbing effect of noise.  Only one step of the \ncycle can send output back to primary cortex - the one with the largest weight from receiv(cid:173)\ning the most matco to incoming stimuli. Each clock derived in this manner from a thalamic \nbase clock will therefore phase reset itself to get the best match to incoming rhythms. The \nmatch can be further refined by frequency and phase entrainment of the base clock itself. \nThree such counters are sufficient to model the rhythms in Jones' experiment as shown in \nthe architecture of figure  1.  The three counters divide the  12.5 Hz clock down to 6.25 and \n4.16 Hz.  The first contains one attractor at the base clock rate which has adapted to entrain \nto the 80 msec period of target stimulation (12.5 Hz).  The second cycles at 12.5/2 = 6.25 \nHz, alternating between two attractors, and the third steps through three attractors, to cycle \nat 12.5/3 =  4.16 Hz, which is the slow rhythm of the captor tones. \nThe modules of the time base send theirinternal30-80 Hz activity to {>rimary auditory cortex \nin 100msec bursts at these different rhythmic rates through fast adapting connections (which \nwould use NMDA channels in the brain) that continually attempt to match incoming stim(cid:173)\nulus patterns using an incremental Hebbian learning rule.  The weights decay . to zero over \n2-4 sec to simulate the data on the rise and fall of the mismatch negativity.  These weights \neffectively compute a  low frequency discrete Fourier transform over a  sliding window of \nseveral seconds, and the basic periodic structure of rhythmic patterns is quickly matched. \nThis serves to establish a quantized temporal grid of expectations against which expressive \ntiming deviations in speech and music can be experienced. \nFollowing Jones  [Jones et al.,  1981], we hypothesize that this happens automatically as a \nconstant adaptation to environmental rhythms, as suggested by the mismatch negativity. Re(cid:173)\ntained in these weights of the timebase is a special k10d of short term memory of the activity \nwhich includes temporal information since the timebase will partially regenerate the prevI(cid:173)\nous activity in primary cortex at the expected recurrence  time.  This top-down input causes \nenchanced sensitivity in target units by increasing their gain.  Those patterns which meet \nthese established rhythmic expectancy signals in time are thereby boosted in amplitude and \npulled into synchrony with the 30-80 Hz attentional searchlight stream to become part of \nthe attentional network sending input to higher areas.  In accordance with Jones'  theory, \nvoluntary top-down attention can probe input at different hierarchical levels of periodicity \nby selectively synchronizing a particular cortical column in the time base set to the 40 Hz \nfrequency of the inner attention stream.  Then the searchlight into primary cortex is synchro-\n\n\f40 Hz Attentional Streams, Rhythmic Expectation, and Auditory Scene Analysis \n\n7 \n\nDynamic  Attention  Architecture \n\nHigher Order AuditoryCortex \n\nSynchronization \n\nTimebase \n\n10 Hz Clock \n\n~\"  ..... ~: : :\" ::. '---...,. \n\nPitch \n\nAttentionallnput Stream \n\nRhythmic Searchlight \n\n'0\" \n\nw=,:J'~:': \n\n~ :' \n~ \n~  Cycle \n\n. ' \n\nHigh  Target Tones \n\n_._._._._._._._._.~ B ~. \n\n\"  -'1 \n\nHigh Flanking Tones \n\n-.-.- . -.-.-.-.-.-.- . -.-.-.-.-.~ \n\n\u2022 \n\n\u2022  -\n\n. \n\nI  _ .  -\n\n\u2022  -\n\n\u2022  -\n\n\u2022  -\n\n\u2022  -\n\n\u2022  -\n\n\u2022  -\n\n\u2022  -\n\n\u2022  -\n\n\u2022  -\n\n\u2022  -\n\n\u2022  -\n\n\u2022  -\n\n\u2022  -\n\n\u2022  -~ \n\n' @H \nII. \n\nFast Weights \n\n:-.  1 '1 \n\n_ . ~F I  '_._._._._ . _._._._._._._._._._.~ \n\nLow Captor Tones \n\u2022 \n\n\u2022  -\nf \n\n\u2022  -\n\nInput \n\n_.-._.  C  I._._._._._._._._ ... _. __ ~::-:-: __ .:-._.  C  ._._._._._._ . _.  C  ._._._._~ \n\nr '  -\n\n\" 1 \n\n' . \n\n\u00b7 -\n\n'\n\nI \n~  I \nI \n\n1':'.\"\", \n.:::::::(~~ \n.............. \n\n\\ \nI \nI \n\nPrimary Auditory Cortex \n\n~----------------------------------------------------~Time \n\nFigure 1:  Horizontally arrayed units at the top model higher order auditory and motor corti(cid:173)\ncal columns which are sequentially clocked by the (thalamic) base clock on the right to alter(cid:173)\nnate attractor transitions between upper hidden (motor) and lower context (sensory) layers \nto act as an Elman net. Three cortical regions are shown - sequence representation memory, \nattentional synchronization control, and a rhythmic timebase of three counters. The hidden \nand context layers consist of binary \"units\" composed of two oscillatory attractors. Activity \nlevels oscillate up and down through the plane of the paper.  Dotted hnes show frequency \nshifting outputs from  the synchromzation (attention) control modules.  The lower vertical \nset of units IS a sample of primary auditory cortex frequency channels at the values used in \nthe Jones-Bregman expenment.  The dashed lines show the rhythmic pattern of the target, \nflanking, and captor tones moving in time from left to right to Impact on auditory cortex. \nnizing and reading in activity occuring at the peaks of that particular time base rhythm. \n3.1  Cochlear and Primary Cortex Model \nAt present, we have modeled only the minimal aspects of primary auditory conex sufficient \nto qualitatively simulate the Jones-Bregman experiment, but the principles at work allow \nexpansion to larger scale models with more stimulus features.  We simulate four sites in au(cid:173)\nditory conex corresponding to the four frequencies of stimuli used in the experiment, as \nshown in figure  1.  There are two close high frequency target tones, one high flanking fre(cid:173)\nquency location, and the low frequency  location of the captor stream.  These cortical lo(cid:173)\ncations are modeled as oscillators with the same equations  used  for  associative memory \nmodules [Baird et al., 1994], with full linear cross coupling weights.  This lateral connec(cid:173)\ntivity is  sufficient to promote synchrony among simultaneously activated oscillators, but \ninsufficient to activate them strongly in the absence of externallOput. This makes full syn(cid:173)\nchrony of activated units the default condition in  the model conex, as in Brown's model \n[Brown and Cooke, 1996]. so that the background activation is coherent, and can be read \ninto higher order cortical levels which  synchronize with it.  The system assumes  that all \ninput is due to the same environmental source in the absence of evidence for segregation \n[Bregman, 1990]. \nBrown and Cooke [Brown and Cooke, 1996] model the cochlear and brain stem nuclear out(cid:173)\nput as a set of overlapping bandpass (\"gammatone\") filters consistent with auditory nerve \nresponses and psychophysical \"critical bands\".  A tone can excite several filter outputs at \nonce.  We apprOXImate this effect of the gammatone filters as a lateral fan out of input activa(cid:173)\ntions with weights that spread the activation in the same way as the overlapping gammatone \n\n\f8 \n\nB. Baird \n\nfilters do. \nExperiments show that the intrinsic resonant or \"natural\" frequencies or \"eigenfr~uencies\" \nof cortical tissue within the 30-80 Hz gamma band vary within individuals on different trials \nof a task, and that neurotransmitters can quickly alter these resonant frequencies of neural \nclocks.  Following the evidence that the oscillation frequency of binding in vision goes up \nwith the speed of motion of an object, we assume that unattended activity in auditory cortex \nsynchromzes at a default background fr~uency of 35 Hz, while the higher order attentional \nstream is at a  higher frequency of 40 Hz.  Just as fast motion in vision can cause stimulus \ndriven capture of attention, we hypothesize that expectancy mismatch in audition causes the \ndeviant activity to be boosted above the default background frequency to facilitate synchro(cid:173)\nnization with the attentional stream at 40 Hz.  This models the mechanism of involuntary \nstimulus driven attentional \"pop out\". Multiple streams of primary cortex activity synchro(cid:173)\nnized at different eigenfrequencies can be selectively attended by unifonnly sweeping the \neigenfrequencies of all primary ensembles through the passband of the 40 Hz higner order \nattentional stream to \"tune in\" each in turn as a radio reciever does. \nFollowing, but modifing the approach of Brown and Cooke [Brown and Cooke, 1996], the \ncore of our primary cortex stream fonning model is a fast learning rule that reduces the lat(cid:173)\neral coupling and (in our model) spreads apart the intrinsic cortIcal  frequencies  of sound \nfrequency  cflannels  that do not exhibit the same  amplitude of activity at the same  time. \nThis coupling and eigenfrequency difference recovers between onsets. In the absence of lat(cid:173)\neral synchronizing connectlons or coherent top down driving, synchrony between cortical \nstreams is rapidly lost because of their distant resonant frequencIes.  Activity not satisfying \nthe Gestalt prinCIple of \"common fate\" [Bregman, 1990] is thus decorrelated. \nThe trade off of the effect of temporal and sound frequency  proximity on stream segrega(cid:173)\ntion follows because close stimulus frequencies  excite each  other's channel  filters.  Each \nproduces a similar output in the other, and their activitites are not decorrelated by coupling \nreduction and resonant frequency shifts.  On the other hand, to the extent that they are dis(cid:173)\ntant enough in sound frequency, each tone onset weakens the weights and shifts the eigen(cid:173)\nfrequencies of the other channels that are not simultaneously active.  This effect is greater, \nthe faster the presentation rate, because the weight recovery rate is overcome.  This recovery \nrate can then be adjusted to yield stream segregation at the rates reported by van Noorden \n[Bregman, 1990] for given sound frequency separations. \n\n3.2  Sequential Grouping by Coupling and Resonant Frequency Labels \n\nIn the absence of rhythmic structure in the input, the temporary weights and resonant fre(cid:173)\nquency \"labels\" serve as a short tenn \"stream memory\" to brid~e time (up to 4 seconds) so \nthat the next nearby input is \"captured\" or \"sequentially bound  into the same ensemble of \nsynchronized activity.  This pattern of synchrony in primary cortex  has  been made into a \ntemporary attractor by the temporary weight and eigenfrequency changes from  the previ(cid:173)\nous stimulation. This explains the single tone capture expenments where a series of ioenti(cid:173)\ncal tones captures later nearby tones. For two points in tIme to be sequentially grouped by \nthis mechanIsm, there is no need for activity to continue between onsets as in Browns model \n[Brown and Cooke, 1996], or to be held in multiple spatial locations as Wang [Wang, 1995] \ndoes.  Since the gamma band response to a single auditory input onset lasts only 100 - 150 \nms, there is no 40 Hz activity available in prim~ cortex (at most stimulus rates) for suc(cid:173)\ncesive inputs to synchronize with for sequential bmding by these mechanisms. \nFurthermore, the decorrelation rule, when added to the mechanism of timing expectancies, \nexplains the loss of relative timing (order) between streams,  since the lateral connections \nthat normally broadcast actual and expected onsets accross auditory cortex, are cut between \ntwo streams  by the decorrelating weight reduction.  Expected  and actual  onset events in \ndifferent streams can no longer be directly (locally) compared.  Experimental evidence for \nthe broadcast of expectancies comes from  the fast generalization to other frequencies of a \nlearned expectancy for the onset time of a tone of a particular frequency  (Schreiner lab -\npersonal commumcation). \nWhen rhythmic structure is present, the expectancy system becomes engaged, and this be(cid:173)\ncomes  an  additional feature  dimension along which  stimuli can  be segregated.  Distance \nfrom  expected timing as well as sound quality is now an added factor causing stream for(cid:173)\nmation by decoupling and eigenfrequency ShIft.  Feedback of expected input can also par(cid:173)\ntially\"fill in\" missing input for a cycle or two so that the expectancy protects the binding of \nfeatures of a stimulus and stabilizes a perceptual stream accross seconds of time. \n\n\f40 Hz AItentional Streams,  Rhythmic Expectation, and Auditory Scene Analysis \n\n9 \n\n3.3  Simulation or the Jones-Bregman Experiment \nFigure 2 shows the architecture used to simulate the Jones-Bregman experiment.  The case \nshown is where the flanking tones are in the same stream as the targets because the captor \nstream is at the lower sound frequency channel.  At the particular pomt in time shown here, \nthe first flanking tone has just fimshed, and the first target tone has arrived. Both channels are \ntherfore active, and synchronized with the attentional stream into the higher order sequence \nrecognizer. \nOur mechanistic explanation of the Bregman result is that the early  standard target tones \narriving at the 80 msec rate first prime ttie dynamic attention system by setting the 80 msec \nclock to oscillate at 40 Hz and depressing the oscillation frequency of other auditory cortex \nbackground activi~y.  Then the slow captor tones at the 240 msec period establish a back(cid:173)\nground stream at 30 Hz with a rhythmic expectancy that is later violated by the appearance of \nthe fast target tones. These now fall outside the correlation attractor basin of the background \nstream because the mismatch increases their cortical oscillation frequency.  They are explic(cid:173)\nitly brought into the 40 Hz foreground frequency by the mismatch pop out mechanism.  This \nallows the attentional stream into the Elman sequence recognition units to synchronize and \nread in activity due to the target tones for order determination. It is assisted by the timebase \nsearchlight at the 80 msec period which synchronizes and enhances activity arriving at that \nrhythm. In the absence of a rhythmic distmction for the target tones, their sound frequency \ndifference alone is insufficient to separate them from the background stream, and the targets \ncannot be reliably discriminated. \nIn this simulation, the connections to the first two Elman associative memory units are hand \nwired to the A and B primary cortex oscillators to act as a latching, order determining switch. \nIf sy-nchronized to the memory unit at the attentional stream  frequency,  the A  target tone \nOSCillator will drive the first memory unit into the 1 attractor whicfi then inhibits the second \nunit from being driven to 1 by the B target tone. The second unit has similar wiring from the \nB tone oscillator, so that the particular higher order (intermediate term) memory unit which \nis left in the 1 state after a tnal indicates to the rest of the brain which tone came first.  The \nflanking and high captor tone oscillator is connected equally to both memory units, so that a \nrandom attractor transition occurs before the targets amve, when it is interfering at the 40 Hz \nattentional frequency,  and poor order determination results.  If the flanking tone oscillator \nis in a separate stream along with the captor tones at the background eigenfrequency of 35 \nHz,  it is outside the recieving passband of the memory units and cannot cause a spurious \nattractor transition. \nThis architecture demonstrates mechanisms that integrate the theories of Jones and Bregman \nabout auditory perception.  Stream  formation is  a preattentive process  that works well on \nnon-rhythmic inputs as Bregman asserts, but an equally primary and preattentive rhythmic \nexpectancy process is also at work as  Jones asserts and the mismatch negativity indicates. \nThis becomes a factor in stream formation when rhythmic structure is present in stimuli as \ndemonstrated by Jones. \nReferences \n[Baird et al.,  1994]  Baird, B., Troyer, T., and Eeckman, F. H. (1994). Gramatical inference \nby attentional control of synchronization in an oscillating elman network. In Hanson, S., \nCowan, J., and Giles, C., editors, Advances in Neural InjormationProcessing Systems 6, \npages 67-75. Morgan Kaufman. \n\n[Bregman, 1990]  Bregman, A. S. (1990). Auditory Scene Analysis. MIT Press, Cambridge. \n[Brown and Cooke, 1996]  Brown, G.  and Cooke, M.  (1996).  A  neural oscillator model \nof auditory stream segregation.  In JJCAI  Workshop on Computational Auditory Scene \nAnalysis.  to appear. \n\n[Jones et al.,  1981]  Jones,  M.,  Kidd, G.,  and Wetzel, R.  (1981).  Evidence for rhythmic \nattention.  Journal 0/ Experimental Psychology:  Human Perception and Performance, \n7: 1059-1073. \n\n[Llinas and Ribary, 1993]  Llinas, R.  and Ribary, U.  (1993).  Coherent 40-hz oscillation \n\ncharacterizes dream state in humans.  Proc. Natl. Acad. Sci.  USA,90:2078-2081. \n\n[Naatanen,  1992]  Naatanen, R. (1992). Attention and Brain Function.  Erlbaum, New Jer(cid:173)\n\nsey. \n\n[Wang,  1~95]  Wang, D. (1995).  ~n oscillatory correlation theory of te~poral pattern seg(cid:173)\nmentatIOn.  In Covey, E., Hawkms, H., McMullen, T., and Port, R., edaors, Neural Rep(cid:173)\nresentations o/Temporal Patterns. Plenum.  to appear. \n\n\f", "award": [], "sourceid": 1400, "authors": [{"given_name": "Bill", "family_name": "Baird", "institution": null}]}