{"title": "A Neural Edge-Detection Model for Enhanced Auditory Sensitivity in Modulated Noise", "book": "Advances in Neural Information Processing Systems", "page_first": 301, "page_last": 308, "abstract": null, "full_text": " \n\n \n\n \n\nA Neural Edge-Detection Model for \nEnhanced Auditory Sensitivity in \n\nModulated Noise \n\nAlon Fishbach and Bradford J. May \n\nDepartment of Biomedical Engineering and Otolaryngology-HNS \n\nJohns Hopkins University \n\nBaltimore, MD 21205 \n\nfishbach@northwestern.edu \n\nAbstract \n\nPsychophysical data suggest that temporal modulations of stimulus \namplitude envelopes play a prominent role in the perceptual \nsegregation of concurrent sounds. In particular, the detection of an \nunmodulated signal can be significantly improved by adding \namplitude modulation to the spectral envelope of a competing \nmasking noise. This perceptual phenomenon \nis known as \n\u201cComodulation Masking Release\u201d (CMR). Despite the obvious \ninfluence of temporal structure on the perception of complex \nauditory scenes, the physiological mechanisms that contribute to \nCMR and auditory streaming are not well known. A recent \nphysiological study by Nelken and colleagues has demonstrated an \nenhanced cortical representation of auditory signals in modulated \nnoise. Our study evaluates these CMR-like response patterns from \nthe perspective of a hypothetical auditory edge-detection neuron. It \nis shown that this simple neural model for the detection of \namplitude transients can reproduce not only the physiological data \nof Nelken et al., but also, in light of previous results, a variety of \nphysiological and psychoacoustical phenomena that are related to \nthe perceptual segregation of concurrent sounds. \n\n1 Introduction \n\nThe temporal structure of a complex sound exerts strong influences on auditory \nphysiology (e.g. [10, 16]) and perception (e.g. [9, 19, 20]). In particular, studies of \nauditory scene analysis have demonstrated the importance of the temporal structure \nof amplitude envelopes in the perceptual segregation of concurrent sounds [2, 7]. \nCommon amplitude transitions across frequency serve as salient cues for grouping \nsound energy into unified perceptual objects. Conversely, asynchronous amplitude \ntransitions enhance the separation of competing acoustic events [3, 4]. \n\n\f \n\nThese general principles are manifested in perceptual phenomena as diverse as \ncomodulation masking release (CMR) [13], modulation detection interference [22] \nand synchronous onset grouping [8]. \nDespite the obvious importance of timing information in psychoacoustic studies of \nauditory masking, the way in which the CNS represents the temporal structure of an \namplitude envelope is not well understood. Certainly many physiological studies \nhave demonstrated neural sensitivities to envelope transitions, but this sensitivity is \nonly beginning to be related to the variety of perceptual experiences that are evoked \nby signals in noise. \nNelken et al. [15] have suggested a correspondence between neural responses to \ntime-varying amplitude envelopes and psychoacoustic masking phenomena. In their \nstudy of neurons in primary auditory cortex (A1), adding temporal modulation to \nbackground noise lowered the detection thresholds of unmodulated tones. This \nenhanced signal detection is similar to the perceptual phenomenon that is known as \ncomodulation masking release [13]. \nFishbach et al. [11] have recently proposed a neural model for the detection of \n\u201cauditory edges\u201d (i.e., amplitude transients) that can account for numerous \nphysiological [14, 17, 18] and psychoacoustical [3, 21] phenomena. The \nencompassing utility of this edge-detection model suggests a common mechanism \nthat may link the auditory processing and perception of auditory signals in a \ncomplex auditory scene. Here, it is shown that the auditory edge detection model \ncan accurately reproduce the cortical CMR-like responses previously described by \nNelken and colleagues. \n\n2 The M odel \n\nThe model is described in detail elsewhere [11]. In short, the basic operation of the \nmodel is the calculation of the first-order time derivative of the log-compressed \nenvelope of the stimulus. A computational model [23] is used to convert the \nacoustic waveform to a physiologically plausible auditory nerve representation (Fig \n1a). The simulated neural response has a medium spontaneous rate and a \ncharacteristic frequency that is set to the frequency of the target tone. To allow \ncomputation of the time derivative of the stimulus envelope, we hypothesize the \nexistence of a temporal delay dimension, along which the stimulus is progressively \ndelayed. The intermediate delay layer (Fig 1b) is constructed from an array of \nneurons with ascending membrane time constants (t); each neuron is modeled by a \nconventional integrate-and-fire model (I&F, [12]). Higher membrane time constant \ninduces greater delay in the neuron\u2019s response [1]. \nThe output of the delay layer converges to a single output neuron (Fig. 1c) via a set \nof connection with various efficacies that reflect a receptive field of a gaussian \nderivative. This combination of excitatory and inhibitory connections carries out the \ntime-derivative computation. Implementation details and parameters are given in \n[11]. The model has 2 adjustable and 6 fixed parameters, the former were used to fit \nthe responses of the model to single unit responses to variety of stimuli [11]. The \nresults reported here are not sensitive to these parameters. \n \n\n\f \n\n(a) AN model \n\n(b) delay-layer \n\n(c) edge-detector \n\nneuron \n\nt =6ms \n \n\n \n \n \n \n \n\nI&F \n\nlog \n\nRMS \n\nbandpass \n\nNeuron \n\nt =4ms \n \nt =3ms \n \n\n \n \n \n \n \n \nFigure 1: Schematic diagram of the model and a block diagram of the basic \noperation of each model component (shaded area). The stimulus is converted to a \nneural representation (a) that approximates the average firing rate of a medium \nspontaneous-rate AN fiber [23]. The operation of this stage can be roughly \ndescribed as the log-compressed rms output of a bandpass filter. The neural \nrepresentation is fed to a series of neurons with ascending membrane time constant \n(b). The kernel functions that are used to simulate these neurons are plotted for a \nfew neurons along with the time constants used. The output of the delay-layer \nneurons converge to a single I&F neuron (c) using a set of connections with weights \nthat reflect a shape of a gaussian derivative. Solid arrows represent excitatory \nconnections and white arrows represent inhibitory connections. The absolute \nefficacy is represented by the width of the arrows. \n\nd\ndt\n\n3 Result s \n\nNelken et al. [15] report that amplitude modulation can substantially modify the \nnoise-driven discharge rates of A1 neurons in Halothane-anesthetized cats. Many \ncortical neurons show only a transient onset response to unmodulated noise but fire \nin synchrony (\u201clock\u201d) to the envelope of modulated noise. A significant reduction in \nenvelope-locked discharge rates is observed if an unmodulated tone is added to \nmodulated noise. As summarized in Fig. 2, this suppression of envelope locking can \nreveal the presence of an auditory signal at sound pressure levels that are not \ndetectable in unmodulated noise. It has been suggested that this pattern of neural \nresponding may represent a physiological equivalent of CMR. \nReproduction of CMR-like cortical activity can be illustrated by a simplified case in \nwhich the analytical amplitude envelope of the stimulus is used as the input to the \nedge-detector model. In keeping with the actual physiological approach of Nelken et \nal., the noise envelope is shaped by a trapezoid modulator for these simulations. \nEach cycle of modulation, EN(t), is given by: \n \n \n \n \nwhere P is the peak pressure level and D is set to 12.5 ms. \n\n<\nDt\n<\n3\nD\n<\n4\nD\n<\n8\nD\n\n0\ntD\n3\ntD\n4\ntD\n\nP\nt\nD\nP\n(\nt\n0\n\n)(\ntE\n\nD\n\nP\nD\n\n=\n\nP\n\nN\n\n)\n\n3\n\n\n\n\u0001\n\n\n\u0002\n\u0003\n\u00a3\n\u00a3\n-\n-\n\u00a3\n\u00a3\n\f(a) Unmodulated noise \n\n(b) Modulated noise \n\n76 \n\n \n\n \nc\ne\ns\n/\ns\ne\nk\ni\np\nS\n\n \n \n \n \n \n\n \n\n \n)\nL\nP\nS\nB\nd\n(\n \nl\ne\nv\ne\nl\n \ne\nn\no\n\n26 T\n\nTime (ms) \n\n0 150 300 \n\n0 150 300 \n\n \n \n \nFigure 2: Responses of an A1 unit to a combination of noise and tone at many tone \nlevels, replotted from Nelken et al. [15]. (a) Unmodulated noise and (b) modulated \nnoise. The noise envelope is illustrated by the thick line above each figure. Each \nrow shows the response of the neuron to the noise plus the tone at the level specified \non the ordinate. The dashed line in (b) indicates the detection threshold level for the \ntone. The detection threshold (as defined and calculated by Nelken et al.) in the \nunmodulated noise was not reached. \n \n \nSince the basic operation of the model is the calculation of the rectified time-\nderivative of the log-compressed envelope of the stimulus, the expected noise-\ndriven rate of the model can be approximated by: \n \n \nwhere A=20/ln(10) and P0=2e-5 Pa. The expected firing rate in response to the noise \nplus an unmodulated signal (tone) can be similarly approximated by: \n \n \nwhere PS is the peak pressure level of the tone. Clearly, both MN (t) and MN+S (t) are \nidentically zero outside the interval [0 D]. Within this interval it holds that: \n \n \n \n\n<\nDt\n\n<\nDt\n\n and \n\n)(\ntM\n\n+\n)(\ntE\nP\n0\n\nAP\nD\nP\nS\n\n)(\ntM\n\nA\n\n1ln\n\nA\n\n1ln\n\nAP\nD\n\n+\n\nmax\n\n,0\n\nmax\n\n,0\n\n)(\ntE\nP\n0\n\nM\n\n+\n\nSN\n\n)(\nt\n\n0\n\n \n\n \nP\nD\n\nt\n\nd\ndt\n\nM\n\n+\n\nSN\n\n)(\nt\n\n+\n\nP\n0\n\n(\n\n+\n\n)\n\nP\nS\n\n+\n\nt\n\nP\nD\n\n(\n\n+\n\n=\n\nN\n\nd\ndt\n\nN\n\n=\n \n\nP\n0\n\n=\n\n0\n\n)\n\n=\n\nand the ratio of the firing rates is: \n \n \n\n)(\ntM\n)(\nt\nM\n\nN\n+\nSN\n\n+=\n\n1\n\nP\nS\n+\n\nP\n0\n\nP\nD\n\nt\n\n0\n\n<\nDt\n\n<+\n\nM\n\nM\n\nSN\n\nClearly, \n for the interval [0 D] of each modulation cycle. That is, the \naddition of a tone reduces the responses of the model to the rising part of the \nmodulated envelope. Higher tone levels (Ps) cause greater reduction in the model\u2019s \nfiring rate. \n\nN\n\n\n\u0001\n\u0002\n\u0003\n\u0004\n\u0005\n\u0006\n\u0007\n\b\n\t\n\n\u000b\n\u00a3\n\u00a3\n\u00a3\n\f \n\n(b) \n\n(d) \n\n(a) \n\n(c) \n\n \n\n \n)\nL\nP\nS\nB\nd\n(\n \nl\ne\nv\ne\nL\n\n \n \n \n \n \n\n \n \n \n \n \n \n \n \n\n \ne\nv\ni\nt\na\nv\ni\nr\ne\nd\n\n \nl\ne\nv\ne\nL\n\n \n)\ns\n\n/\n\nm\nL\nP\nS\nB\nd\n(\n\n \n\nTime (ms) \n\n \nFigure 3: An illustration of the basic operation of the model on various amplitude \nenvelopes. The simplified operation of the model includes log compression of the \namplitude envelope (a and c) and rectified time-derivative of the log-compressed \nenvelope (b and d). (a) A 30 dB SPL tone is added to a modulated envelope (peak \nlevel of 70 dB SPL) 300 ms after the beginning of the stimulus (as indicated by the \nhorizontal line). The addition of the tone causes a great reduction in the time \nderivative of the log-compressed envelope (b). When the envelope of the noise is \nunmodulated (c), the time-derivative of the log-compressed envelope (d) shows a \ntiny spike when the tone is added (marked by the arrow). \n \n \nFig. 3 demonstrates the effect of a low-level tone on the time-derivative of the log-\ncompressed envelope of a noise. When the envelope is modulated (Fig. 3a) the \naddition of the tone greatly reduces the derivative of the rising part of the \nmodulation (Fig. 3b). In the absence of modulations (Fig. 3c), the tone presentation \nproduces a negligible effect on the level derivative (Fig. 3d). \n\nModel simulations of neural responses to the stimuli used by Nelken et al. are \nplotted in Fig. 4. As illustrated schematically in Fig 3 (d), the presence of the tone \ndoes not cause any significant change in the responses of the model to the \nunmodulated noise (Fig. 4a). In the modulated noise, however, tones of relatively \nlow levels reduce the responses of the model to the rising part of the envelope \nmodulations. \n\n\f \n\n(a) Unmodulated noise \n\n(b) Modulated noise \n\n76 \n\n \nc\ne\ns\n/\ns\ne\nk\ni\np\nS\n\n \n \n \n \n \n\n \n\n \n)\nL\nP\nS\nB\nd\n(\n \nl\ne\nv\ne\nl\n \ne\nn\no\n\n26 T\n\n \n \n \n \n \nFigure 4: Simulated responses of the model to a combination of a tone and \nUnmodulated noise (a) and modulated noise (b). All conventions are as in Fig. 2. \n \n\n0 150 300 \n\n0 150 300 \n\nTime (ms) \n\n4 Discussion \n\nto simulate \n\nthrough the suppression of ongoing responses \n\nThis report uses an auditory edge-detection model \nthe actual \nphysiological consequences of amplitude modulation on neural sensitivity in \ncortical area A1. The basic computational operation of the model is the calculation \nof the smoothed time-derivative of the log-compressed stimulus envelope. The \nability of the model to reproduce cortical response patterns in detail across a variety \nof stimulus conditions suggests similar time-sensitive mechanisms may contribute to \nthe physiological correlates of CMR. \nThese findings augment our previous observations that the simple edge-detection \nmodel can successfully predict a wide range of physiological and perceptual \nphenomena [11]. Former applications of the model to perceptual phenomena have \nbeen mainly related to auditory scene analysis, or more specifically the ability of the \nauditory system to distinguish multiple sound sources. In these cases, a sharp \namplitude transition at stimulus onset (\u201cauditory edge\u201d) was critical for sound \nsegregation. Here, it is shown that the detection of acoustic signals also may be \nto \nenhanced \nthe concurrent \nmodulations of competing background sounds. Interestingly, \ntemporal \nthese \nfluctuations appear to be a common property of natural soundscapes [15]. \nThe model provides testable predictions regarding how signal detection may be \ninfluenced by the temporal shape of amplitude modulation. Carlyon et al. [6] \nmeasured CMR in human listeners using three types of noise modulation: square-\nwave, sine wave and multiplied noise. From the perspective of the edge-detection \nmodel, these psychoacoustic results are intriguing because the different modulator \ntypes represent manipulations of the time derivative of masker envelopes. Square-\nwave modulation had the most sharply edged time derivative and produced the \ngreatest masking release. \nFig. 5 plots the responses of the model to a pure-tone signal in square-wave and \nsine-wave modulated noise. As in the psychoacoustical data of Carlyon et al., the \nsimulated detection threshold was lower in the context of square-wave modulation. \nOur modeling results suggest that the sharply edged square wave evoked higher \nlevels of noise-driven activity and therefore created a sensitive background for the \nsuppressing effects of the unmodulated tone. \n\n\f(a) \n\n60 \n\n(b) \n \n\n \n\n \nc\ne\ns\n/\ns\ne\nk\ni\np\nS\n\n \n \n \n \n \n\n \n\n \n)\nL\nP\nS\nB\nd\n(\n \nl\ne\nv\ne\nl\n \ne\nn\no\nT\n\n10 \n\nTime (ms) \n\n0 200 400 600 \n\n 0 200 400 600 \n\n \n \n \n \nFigure 5: Simulated responses of the model to a combination of a tone at various \nlevels and a sine-wave modulated noise (a) or a square-wave modulated noise (b). \nEach row shows the response of the model to the noise plus the tone at the level \nspecified on the abscissa. The shape of the noise modulator is illustrated above each \nfigure. The 100 ms tone starts 250 ms after the noise onset. Note that the tone \ndetection threshold (marked by the dashed line) is 10 dB lower for the square-wave \nmodulator \nthe \npsychoacoustical data of Carlyon et al. [6]. \n \nAlthough the physiological basis of our model was derived from studies of neural \nresponses in the cat auditory system, the key psychoacoustical observations of \nCarlyon et al. have been replicated in recent behavioral studies of cats (Budelis et \nal. [5]). \n\nsine-wave modulator, \n\nin accordance with \n\nthan \n\nfor \n\nthe \n\nThese data support the generalization of human perceptual processing to other \nspecies and enhance the possible correspondence between the neuronal CMR-like \neffect and the psychoacoustical masking phenomena. \nClearly, the auditory system relies on information other than the time derivative of \nthe stimulus envelope for the detection of auditory signals in background noise. \nFurther physiological and psychoacoustic assessments of CMR-like masking effects \nare needed not only to refine the predictive abilities of the edge-detection model but \nalso to reveal the additional sources of acoustic information that influence signal \ndetection in constantly changing natural environments. \n\nAcknow ledg ments \n\nThis work was supported in part by a NIDCD grant R01 DC004841. \n\nRefe rences \n\n[1] Agmon-Snir H., Segev I. (1993). \u201cSignal delay and input synchronization in passive \ndendritic structure\u201d, J. Neurophysiol. 70, 2066-2085. \n\n[2] Bregman A.S. (1990). \u201cAuditory scene analysis: The perceptual organization of sound\u201d, \nMIT Press, Cambridge, MA. \n\n[3] Bregman A.S., Ahad P.A., Kim J., Melnerich L. (1994) \u201cResetting the pitch-analysis \nsystem. 1. Effects of rise times of tones in noise backgrounds or of harmonics in a complex \ntone\u201d, Percept. Psychophys. 56 (2), 155-162. \n\n\f \n\n[4] Bregman A.S., Ahad P.A., Kim J. (1994) \u201cResetting the pitch-analysis system. 2. Role of \nsudden onsets and offsets in the perception of individual components in a cluster of \noverlapping tones\u201d, J. Acoust. Soc. Am. 96 (5), 2694-2703. \n\n[5] Budelis J., Fishbach A., May B.J. (2002) \u201cBehavioral assessments of comodulation \nmasking release in cats\u201d, Abst. Assoc. for Res. in Otolaryngol. 25. \n\n[6] Carlyon R.P., Buus S., Florentine M. (1989) \u201cComodulation masking release for three \ntypes of modulator as a function of modulation rate\u201d, Hear. Res. 42, 37-46. \n\n[7] Darwin C.J. (1997) \u201cAuditory grouping\u201d, Trends in Cog. Sci. 1(9), 327-333. \n\n[8] Darwin C.J., Ciocca V. (1992) \u201cGrouping in pitch perception: Effects of onset \nasynchrony and ear of presentation of a mistuned component\u201d, J. Acoust. Soc. Am. 91 , 3381-\n3390. \n\n[9] Drullman R., Festen H.M., Plomp R. (1994) \u201cEffect of temporal envelope smearing on \nspeech reception\u201d, J. Acoust. Soc. Am. 95 (2), 1053-1064. \n\n[10] Eggermont J J. (1994). \u201cTemporal modulation transfer functions for AM and FM stimuli \nin cat auditory cortex. Effects of carrier type, modulating waveform and intensity\u201d, Hear. \nRes. 74, 51-66. \n\n[11] Fishbach A., Nelken I., Yeshurun Y. (2001) \u201cAuditory edge detection: a neural model \nfor physiological and psychoacoustical responses to amplitude transients\u201d, J. Neurophysiol. \n85, 2303\u20132323. \n\n[12] Gerstner W. (1999) \u201cSpiking neurons\u201d, in Pulsed Neural Networks, edited by W. Maass, \nC. M. Bishop, (MIT Press, Cambridge, MA). \n\n[13] Hall J.W., Haggard M.P., Fernandes M.A. (1984) \u201cDetection in noise by spectro-\ntemporal pattern analysis\u201d, J. Acoust. Soc. Am. 76, 50-56. \n\n[14] Heil P. (1997) \u201cAuditory onset responses revisited. II. Response strength\u201d, J. \nNeurophysiol. 77, 2642-2660. \n\n[15] Nelken I., Rotman Y., Bar-Yosef O. (1999) \u201cResponses of auditory cortex neurons to \nstructural features of natural sounds\u201d, Nature 397, 154-157. \n\n[16] Phillips D.P. (1988). \u201cEffect of Tone-Pulse Rise Time on Rate-Level Functions of Cat \nAuditory Cortex Neurons: Excitatory and Inhibitory Processes Shaping Responses to Tone \nOnset\u201d, J. Neurophysiol. 59, 1524-1539. \n\n[17] Phillips D.P., Burkard R. (1999). \u201cResponse magnitude and timing of auditory response \ninitiation in the inferior colliculus of the awake chinchilla\u201d, J. Acoust. Soc. Am. 105, 2731-\n2737. \n\n[18] Phillips D.P., Semple M.N., Kitzes L.M. (1995). \u201cFactors shaping the tone level \nsensitivity of single neurons in posterior field of cat auditory cortex\u201d, J. Neurophysiol. 73, \n674-686. \n\n[19] Rosen S. (1992) \u201cTemporal information in speech: acoustic, auditory and linguistic \naspects\u201d, Phil. Trans. R. Soc. Lond. B 336, 367-373. \n\n[20] Shannon R.V., Zeng F.G., Kamath V., Wygonski J, Ekelid M. (1995) \u201cSpeech \nrecognition with primarily temporal cues\u201d, Science 270, 303-304. \n\n[21] Turner C.W., Relkin E.M., Doucet J. (1994). \u201cPsychophysical and physiological \nforward masking studies: probe duration and rise-time effects\u201d, J. Acoust. Soc. Am. 96 (2), \n795-800. \n\n[22] Yost W.A., Sheft S. (1994) \u201cModulation detection interference \u2013 across-frequency \nprocessing and auditory grouping\u201d, Hear. Res. 79, 48-58. \n\n[23] Zhang X., Heinz M.G., Bruce I.C., Carney L.H. (2001). \u201cA phenomenological model for \nthe responses of auditory-nerve fibers: I. Nonlinear tuning with compression and \nsuppression\u201d, J. Acoust. Soc. Am. 109 (2), 648-670. \n\n \n\n\f", "award": [], "sourceid": 2329, "authors": [{"given_name": "Alon", "family_name": "Fishbach", "institution": null}, {"given_name": "Bradford", "family_name": "May", "institution": null}]}