Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Invariant Recognition of Vocal Features

Abstract

Animals and humans are able to communicate vocally in very challenging acoustic conditions. Background noise, especially from other individuals of the same species, may mask the relevant signal and propagation can introduce significant distortions to the sound waveform. While our brains are able to extract meaningful information from heavily degraded communication sounds, the mechanisms by which the auditory system performs this task are not well understood. This thesis shows how neural systems can and do handle signal degradations. by examining how auditory neurons in an animal model of communication, the Zebra Finch Taeniopygia guttata, process degraded and undegraded signals, and demonstrates that these principles can be used to perform noise reduction on voice recordings. I discuss how the notion of invariance, common in studies of sensory perception for roughly a century, has more recently been helpful in the analysis of sensory systems at the neural level. I discuss how to characterize the invariant properties of vocal sounds, and how to connect this analysis to the mathematical theory of invariants.

To characterize invariance at the neural level, I construct a novel metric using spike-train cross-correlation between neural responses to the same signal obtained under various conditions. Using this measure, I show that a subset of neurons in avian secondary auditory forebrain area NCM can extract a representation of birdsong that is robust to background noise interference. Spectro-temporal receptive field (STRF) and modulation transfer function (MTF) analysis show that these invariant neurons are sensitive to slowly changing pitch features. Then, using stimuli that have been degraded systematically along spectral and temporal features, I further characterize the nature and origin of invariant response properties in neurons throughout avian auditory forebrain. The response of auditory neurons to spectral degradation is well explained by their MTF, but results in the temporal domain show that some neurons show invariance properties beyond those expected from this model.

Finally, I use the insights from these experiments to construct a noise-reduction algorithm that can be implemented in real-time on digital systems. The system performs well when compared to state-of-the-art algorithms for noise reduction and I discuss how these systems interrelate in terms of processing the statistics of vocal sounds. Using these comparisons and interpretations, I show how we might improve the performance of such noise reduction algorithms.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View