Abstract

Some of the most important sounds humans hear, including speech and music, can be defined in part by their pitch. Acoustically, sounds said to have pitch have a regular rate of repetition in time—called the fundamental frequency, or f0—and contain overtones (‘harmonics’) that are multiples of the f0. Pitch is traditionally construed as the perceptual correlate of f0, and a longstanding goal of hearing research has been to determine how listeners estimate f0 from harmonic sounds. But the settings in which harmonic sounds have been studied are impoverished relative to the immense variety of situations in which we encounter such sounds, including natural auditory scenes containing music, speech, and noise. Consequently, our understanding of how the auditory system processes harmonic sounds remains limited. This dissertation examines the representations involved in hearing harmonic sounds, both when extracting pitch information and when segregating concurrent sounds. One method employed across studies is the comparison of task performance with harmonic sounds to that with and inharmonic sounds—those whose frequencies are inconsistent with any single f0. By comparing task performance with harmonic and inharmonic sounds we gain insight into the conditions where listeners rely on representations of the f0. In addition, we use a broad array of different tasks, individual differences approaches, and cross-cultural experiments. There are five main contributions of this dissertation:

1. Chapter 1: We first test humans on a large battery of tasks thought to depend on pitch, and find that performance on some tasks, such as discriminating musical intervals or recognizing voices, is impaired when sounds are inharmonic. But other tasks, such as judging the direction of a pitch change, are performed equally well with inharmonic sounds. Listeners appear to estimate pitch changes in these cases by tracking the frequency spectrum without estimating f0. This suggests that the classic view of pitch as f0 estimation is incomplete—at least two representations are involved, one of which does not involve the f0. Chapters 2–4 build from this initial finding, each testing a different hypothesis about when listeners might rely on f0-based pitch.

2. Chapter 2: We demonstrate that pitch discrimination is better for harmonic compared to inharmonic stimuli when stimuli are separated in time, despite being comparably accurate for back-to-back sounds. Listeners appear to use the f0 as an efficient representation for memory, demonstrating a novel form of abstraction within hearing. We also substantiate that listeners have two distinct representations of pitch, comparing the frequency spectra for sounds nearby in time and the f0 for sounds separated in time.

3. Chapter 3: F0-based pitch is traditionally envisioned as being invariant to spectral shape (timbre). We demonstrate that while pitch judgments show some degree of invariance to spectral shape, the invariance observed for natural sounds like speech and music does not depend on representations of f0, being comparable for harmonic and inharmonic sounds.

4. Chapter 4: We demonstrate that harmonic frequency relations aid hearing in noise. We found that it is easier to detect and discriminate sounds in noise when they are harmonic rather than inharmonic. A noise-robust f0-based pitch signal from harmonic sounds like music and speech may help such sounds stand out in noisy backgrounds. This is a previously undocumented aspect of auditory scene analysis.

5. Chapters 5–6: In a parallel line of research, we examine representations of concurrent harmonic musical notes, focusing on ‘fusion’, whereby pairs of notes are misperceived as a single note. In Chapter 5 we survey different factors that might influence fusion in Western listeners. In Chapter 6, we compare fusion of note pairs and preferences for note pairs between listeners in the US and the Tsimane’, an indigenous population of hunter agriculturalists living in the Bolivian Amazon who have limited exposure to Western culture and music. We find cross-cultural similarity in the tendency to fuse canonically ‘consonant’ intervals, despite differences in preferences for ‘consonant’ intervals across cultures. This result suggests universal perceptual mechanisms that could contribute to cross-cultural regularities in musical systems, but show that these regularities do not directly determine aesthetic associations, which appear to be culturally determined.

This work provides evidence for two distinct representations underlying pitch perception and reveals several constraints that determine when each is used. More broadly, this dissertation shows how representations of harmonic sounds are critical to auditory perception and cognition, and opens the door to a better understanding of auditory memory, the relationship between pitch in speech and music, and the role of experience in shaping sensory perception.

Details

Title
The Perception of Harmonic Sounds
Author
McPherson, Malinda Jeanette  VIAFID ORCID Logo 
Publication year
2022
Publisher
ProQuest Dissertations & Theses
ISBN
979-8-209-89947-1
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
2644367202
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.