Presentation

Search Abstracts | Symposia | Slide Sessions | Poster Sessions | Poster Slams

The spectro-temporal information distinguishes between speech and music

Poster A41 in Poster Session A, Thursday, October 6, 10:15 am - 12:00 pm EDT, Millennium Hall
This poster is part of the Sandbox Series.

Yike Li1, Andrew Chang1, David Poeppel1; 1New York University

Speech and music are two forms of auditory signals that are related but also highly specialized. The similarities between speech and music have been characterized across multiple levels, including sound elements, temporal organization, syntax, and even semantics (Patel, 2007), while the fundamental question of how speech and music are treated by the brain as two unique forms with distinct functions remains unclear. Given that previous neuroimaging and lesion studies showed that speech and music are processed differently in the auditory cortex (e.g. Norman-Haignere et al., 2015), we hypothesize that the brain makes the distinction based on the low- to mid-level acoustic properties. Consistent with our hypothesis, previous studies showed that the amplitude modulation rate may be a crucial acoustic feature to separate speech from music (Ding et al., 2017). However, it is not well understood to what extent spectral information is also a crucial acoustic distinction between speech and music, given its essential role in pitch in music and formants in speech. We apply signal processing techniques to speech and music recordings from standardized corpora to extract spectro-temporal modulations. Sound waveforms are transformed into a spectrogram using a filter-Hilbert method, and then decomposed to the modulation domain using a 2D FFT (Flinker et al., 2019). We found that speech and music have different modulation patterns. Speech has a higher temporal resolution and music has a higher frequency resolution. The distinct pattern in speech and music is consistent with past studies on the functional asymmetry of the auditory cortex. It has been shown that temporal modulations are dominantly processed by the left hemisphere and are crucial to speech intelligibility while frequency modulations are primarily processed by the right hemisphere and are critical for pitch-related tasks (Albouy et al., 2020; Flinker et al., 2019). A study on the primary and secondary auditory cortices indicates that the early auditory cortex is selectively tuned to the combination of spectro-temporal information (Schönwiesner & Zatorre, 2009), suggesting that this information may play an essential role in the perception and recognition of different sound categories. New types of behavioral and neuroimaging studies are needed to investigate whether the spectro-temporal modulation pattern is crucial for perceptually distinguishing between speech and music, and how the brain implements this computation. This study extends our understanding of fundamental cognitive and neural principles of human auditory processing and communication, and it potentially benefits individuals with auditory and speech-language disorders, such as persons with aphasia and cochlear implant users.

Topic Areas: Perception: Auditory, Computational Approaches