Introduction
Emotions play an important part of daily life experiences, facilitating cognition, attention, stress recovery, and well-being [1–6]. Investigations using a variety of paradigms have shown that perception of emotion is atypical in several populations, including those with brain damage [7,8], neurodegenerative diseases [e.g., 9–11], pragmatic language deficits [e.g., 12–14], and hearing loss [4,15,16]. Therefore, it is important to understand the conceptual and methodological factors that affect measurement of emotion perception.
The dimensional view of emotion provides a framework for studying emotion. Under the dimensional view, emotions are conceptualized as a combination of two or more continua, most often hedonistic valence (pleasant, unpleasant) and arousal (exciting, calming); valence represents the direction of the emotion and arousal represents the degree of activation of the emotion [17–19]. One example of how this dimensional view can inform our understanding of emotion’s role in communication disorders is evident in listeners with hearing loss. Using this framework, there have been several investigations into emotional responses to sounds for adults with and without hearing loss using ratings of valence and arousal. The results consistently indicate that people with hearing loss demonstrate a reduced range of valence, with valence ratings less extreme (less pleasant and less unpleasant) than their peers’ with normal hearing; similar effects are not often found for arousal [20–23].
In addition to their usefulness in studying experiences of individuals with hearing loss, emotionally evocative stimuli also have the potential to interfere with communication by distracting listeners. Novel sounds, even irrelevant ones, are processed automatically and rapidly [24]. These irrelevant sounds can capture attention, distracting listeners from their intended auditory target and impairing target recognition [25–29]. For example, a barking dog can distract from a conversation, affecting communication. Importantly, the extent to which irrelevant emotional stimuli distract from target speech depends upon the characteristics of that stimulus, including their perceived valence or arousal. For example, pleasant and unpleasant sounds have been shown to be more distracting than neutral sounds [30].
While this dimensional framework is being used to evaluate groups of individuals and to target specific processing mechanisms, surprisingly little is known about how stimulus calibration choices influence the subjective ratings of these emotional dimensions. To better answer questions surrounding emotional auditory distraction, or even to continue investigating emotion perception for people with hearing loss or other communication disorders, it is important to evaluate the degree to which methodological choices result in generalizable results. The focus of this project is the comparison of two amplitude-standardization approaches on ratings of valence and arousal for non-speech sounds.
Amplitude scaling
The choice of sound standardization, or amplitude scaling, could be critically important in the study of auditory emotion perception. Recent literature suggests that differences in level can influence environmental sound recognition [31] and emotional responses to non-speech sounds [20,21]. For example, Picou et al. [21] found that a 20 dB increase in overall level reduced ratings of valence in adults with and without hearing loss. However, the corpus of sounds often used for evaluation of emotional responses to non-speech sounds, the International Affective Digitized Sounds (IADS-2) corpus [32], consists of sounds that vary in peak and mean amplitudes. Furthermore, some studies of auditory distraction lack methodological descriptions surrounding stimulus amplitude, with no mention of amplitude scaling [33–35], as do many published studies on the emotional responses to non-speech sounds for people with hearing loss or tinnitus [36–39].
While some investigators chose not to control stimulus amplitude, others manipulate the sounds by scaling them based on their root-mean-square (rms) level [24,40–46]. This rms amplitude-scaling approach has a longstanding tradition in hearing science with speech sounds as experimental stimuli, where investigators match the rms level of all speech sounds (or segments or sentences) in a given condition [47–52]. This approach is well-suited for speech sounds, where the speech segments have predictable and relatively stable temporal dynamics, owing to the shared sound source (vocal tract) [53]. However, non-speech sounds are heterogeneous in terms of temporal dynamics [54], with no shared sound source and reflecting a wide range of types of sounds (e.g., music, animal sounds, body noises) [53].
Despite the increased variability in temporal dynamics of non-speech sounds, some investigators have reported using the rms amplitude-scaling approach for studies with non-speech stimuli [e.g., 55–57]. The potential disadvantage of rms-scaling across sounds is a homogenization of temporal dynamics across stimuli. For example, if a non-speech sound has several long, low-intensity portions interspersed between periods of higher intensity (e.g., clapping), then the low-amplitude segments will strongly influence the rms amplitude, yielding a sound that may be perceptually louder than reference sounds with the same rms amplitude, but a more constant amplitude (e.g., steady vocalization; see left panel Fig 1).
[Figure omitted. See PDF.]
The figure visually highlights differences between methodologies. Within each plot, constant (“ah” vocalization; left) and intermittent (two hand claps; right) time waveforms are displayed.
Another popular approach to controlling stimulus amplitude is peak scaling, where the peak outputs are matched across sounds (see right panel Fig 2). Maintaining variability in level across sounds might be useful for non-speech sounds, where real-world variability is expected. For example, the IADS corpus includes nature sounds, social sounds, and machine noises, which would naturally be expected to vary in rms amplitude in the real world. In addition, maintaining some variability in rms level might be useful because temporal dynamics are useful for categorizing non-speech sounds [54,58]. In addition, overall level has been shown to be related to emotional responses, specifically ratings of valence [20,21] and arousal [59–63]. Therefore, it is possible that an rms-scaling approach would reduce the usability of one of the cues (level) useful for emotion perception.
[Figure omitted. See PDF.]
Each dot represents an individual sound. Shaded squares indicate ratings that did not change nominal category between scaling approaches. Yellow dots indicate sounds that were nominally ‘unpleasant’ with the peak-scaling approach but were nominally ‘neutral’ with the rms-scaling approach. Blues dots indicate sounds that were nominally ‘neutral’ with the peak-scaling approach but were ‘pleasant’ with the rms-scaling approach. Red dots indicate tokens that were nominally ‘pleasant’ with the peak-scaling approach but were nominally ‘neutral’ with the rms-scaling approach.
The use of peak scaling has been reported in several investigations of vocal emotion recognition [e.g., 64, 65], as well as studies using non-speech sounds for auditory distraction [25,66] and for the study of emotional responses [20–23,67]. It is not clear what effect, if any, the difference in amplitude-scaling strategies across studies might have on emotional responses to non-speech sounds or whether the amplitude-scaling approaches reduce level cues used for emotion perception. Among the studies where amplitude-scaling approach is described, none of the authors offer justification for their scaling approach [e.g., 21, 55–57], except to say their approach is consistent with previous work [20,22,67]. Although there exist accepted scaling approaches in other domains, such as the loudness unit referenced to full scale (LUFS) used in broadcasting [68], there are not currently recommendations for standardizing non-speech sounds for the purpose of evaluating emotion perception.
Purpose
The purpose of this study was to evaluate the effects of amplitude-scaling approach (rms, peak) on emotional responses, specifically ratings of valence and arousal. The results of this study will inform future work with regards to the amplitude-scaling approach that could affect the study of emotional responses to sounds, as well as provide additional context for the interpretation of previous work using both approaches. Should one of the amplitude-scaling approaches result in significantly different ratings, it would provide evidence that the choice of scaling approaches might affect the interpretation of results and therefore confound the synthesis of findings across studies with differing amplitude-scaling approaches. In addition, the relationship between ratings of valence (or arousal) and level within an amplitude-scaling approach was explored to determine if one of the scaling approaches allowed for level to serve as a cue for emotion perception. The finding that one scaling approach reduced or preserved the relationship between ratings and level would also be informative for researchers when considering study designs. Finally, relationships between ratings of valence and level are generally informative in that the results could potentially provide insight into an acoustic cue for ratings of valence and/or arousal.
Materials and methods
Participants
Twenty-two adults (3 male and 19 female) participated. An additional 2 potential participants started but did not finish the study because they failed the headphone or browser requirements (detailed below). Participants reported to be white (n = 16), Asian (n = 2), Hispanic (n = 1), or more than one race (n = 3). They were recruited through word of mouth within undergraduate and graduate school communities at three universities. Participants were included if they self-reported to be native English speakers, to not have diagnosed psychological disorders (e.g., clinical depression), or be taking psychotropic medications (e.g., antidepressants). They reported normal hearing sensitivity, and their median self-reported hearing ability was 9, in response to the question, “On a scale of 1 to 10, how would you rate your hearing? 10 indicates excellent hearing.” All responses to this question were 8 or higher. They participated anonymously via an on-line experiment building and hosting software [Gorilla; 69] and were paid via e-mailed gift certificate. The study was conducted with the approval of the Vanderbilt University Medical Center Institutional Review Board (#211513).
Stimuli
The valence and arousal dimensions of the self-assessment manikin [17] were used to facilitate ratings of emotional response to sounds. The dimensions each include five figures displaying a range of emotion along the dimension, specifically a smiling face to a frowning face for valence and an excited person to a sleepy person for arousal. Under each set of pictures was a slider, where whole numbers appeared when the participant moved the slider (with 1 and 9 at the extremes indicating low or high rating on either dimension). The numerical rating options (numbers 1–9) were equally spaced under the 5 manikin images. The sliders always started in the middle (rating of 5) and both had to be moved (or at least clicked) before advancing to the next trial.
Two sets of non-speech sounds were included: 1) a set of sounds that were the primary focus of this study (hereafter referred to as ‘study sounds’), which were previously used to evaluate auditory distraction [30] and 2) a previously validated set that has published normative data available (hereafter referred to as ‘validation sounds’). Stimuli that were previously validated served as a control for our listeners. This allowed us to ensure that their performance was consistent with established data using previously validated stimuli and, if so, would lend support that their ratings of study sounds reflect a typical population of young adults with normal hearing. The validation sounds consisted of sounds from the International Affective Digitized Sounds corpus [IADS2; 32]. The validation sounds in the current study were a subset of nine sounds from those used previously [23]. The validation sounds had normative ratings of valence ranging from 2.06 to 7.78, reflecting sounds that elicit a wide range of ratings of valence. The IADS tokens used were: 226 (laughing), 351 (applause), 817 (bongos), 120 (rooster), 410 (helicopter), 425 (train 255 (vomit), 295 (sobbing), and 296 (crying). The IADS were 6.0 seconds in duration.
The study sounds included 120 environmental sounds [33], that included sounds like throat clearing, aluminum can opening. The sounds were previously trimmed to be 0.5 s in duration by Marcell and colleagues [24]. The study sounds were chosen because they were used previously in a study of auditory distraction [29,30] and because they were expected to elicit a range of valence and arousal responses.
For both sets of sounds (study sounds and validation sounds), two sets of stimuli were created, one set where the sounds were all matched to have the same peak level (−3.01 dB relative to a maximum value of 1) or the same rms level (−26.9 dB relative to a maximum value of 1). Consistent with work in the area [e.g., 21, 23, 67], the peak level of –3.01 dB was chosen to preserve sufficient headroom to provide a clear signal without risk of peak-clipping and its associated distortion. All amplitude scaling was accomplished via custom MATLAB script (version 2021a). Relative to the maximum of 1, mean rms level for the peak-scaled sounds was –14.7 dB (standard deviation [SD] rms dB = 2.29; mean rms voltage = .19, SD rms voltage = .06). Relative to a maximum of 1, the mean peak level for the rms-scaled sounds was –15.2 dB (SD peak dB = 2.29; mean peak voltage = .18, SD peak voltage = .04). The actual stimulus presentation level and relative level for the two types of amplitude scaling approaches depended on the volume adjustments by the individual participants during testing (see below for details).
Procedures
Due to the ongoing COVID-19 pandemic and because the tests of emotion perception can be reasonably accomplished using remote procedures [23,70,71], data collection in this study occurred remotely. Participants who indicated interest were sent a link to an on-line survey hosted in REDCap [72], which include brief demographic questions, informed consent document, and contact information for research personnel. Once participants provided electronic, written consent, they were directed to a new web address for the study procedures (hosted by Gorilla). Participants were required to use personal headphones or earphones during the experiment and to use a laptop or desktop computer using an HTML-5 compatible browser to complete the experiment. HTML-5 compatibility was required to ensure the browser would be able to play sound files in an uncompressed (lossless) format (i.e., *.wav). These three requirements were verified by the Gorilla software. Participants were also asked to perform the experiment in a quiet place, free from distractions, and to complete the experiment within one sitting. These requirements were not verified by the software; participants were held in good faith to complete them.
The study procedures began with a brief overview and introduction to the experiment.
Participants then performed a headphone test to ensure the participant was using one headphone in each ear. The headphone test was based on the work of Woods and colleagues [73], where participants are asked to discriminate the intensity of several tones. The tones were presented with phase differences of 180 degrees between the two channels. The task is designed to be easily achieved with headphones, but difficult through loudspeakers due to imperfect phase-cancellations of loudspeakers. The headphone check included six trials and participants were required to pass five of six trials for participation to continue.
The participant was then given instructions for the task (“For this experiment you will hear a number of sounds. We would like you to rate how excited/calm and how pleasant/unpleasant each sound makes you feel”) and a single practice trial consisting of a pleasant IADS-2 sound that was not later used for testing (a sample of music, IADS sound number 810). Prior to each testing block (described below), participants were asked to play the same IADS sound used in the familiarization block and to adjust their computer volume to a comfortable listening level (“You should be able to hear the sound, but it should not be too loud. Choose a level that is ‘just right’”). The sound used for volume testing was scaled to match the test sounds in the upcoming block (peak- or rms-scaled). The sound used for calibration had the approximately the same peak- and rms-level as the average of all the tokens in the peak- and rms-scaled conditions (rms level = −26.9 dB and peak = −16.1 dB in the rms-scaling approach and rms level = −13.9 dB and peak level = −3.01 dB in the peak-scaling approach, all relative to a maximum of 1).
After calibration, participants were asked not to change the volume for the remainder of the testing block. Participants then rated 129 non-speech sounds that were all rms- or peak-scaled (120 study sounds, 9 validation sounds). The sounds were presented in a random order, which effectively mixed stimuli of various durations (0.5 sec for study sounds and 6 sec for validation sounds) and expected emotional valence and arousal for a given amplitude-scaling approach inside of each block. At the halfway point of each testing block, participants were given a brief “attention check” where they were asked to adjust the sliders to a specific position (e.g., “Please move the Excitement slider all the way to the left and the Pleasantness slider all the way to the right”). Note that participants rated an additional 9 validation sounds in each amplitude-scaling condition to explore a different methodological question outside the scope of this project. Those data are not reported here.
Sounds were blocked by amplitude-scaling approach (i.e., peak-scaling vs rms-scaling). Blocks were counterbalanced between participants to account for block order effects that may exist. Between blocks, participants were instructed to take a break. Breaks of at least 1 minute were enforced by locking the experimental program and not allowing participants to proceed until the break time had passed. Before starting the next block, participants re-adjusted their volume to find a comfortable level using sound that had the same amplitude-scaling approach as the test sounds (e.g., rms-scaling). Following the rating of all sounds in both blocks, participants were then asked to disclose whether they had adjusted their volume after the calibration for each block. It was stressed that their honest answer would not affect compensation and would only be used to better understand the data. Finally, participants were directed back to REDCap and were invited to provide their e-mail address for receiving study compensation. Demographic information and contact information were not linked. Total test time was less than 60 minutes.
Data analysis
All analyses were conducted in R [v. 4.3.0; [74]]. Three participants failed attention checks put in place during the remote testing; their data were excluded from analyses. All remaining participants (n = 19) affirmed that they had not adjusted their volume after calibration in each testing block. Data analysis included multiple steps. First, a normative data check was accomplished to verify that participants’ responses were consistent with normative data of the validation sounds, thereby validating their understanding of the task and the reliability of their responses to the other study sounds (which lack normative validation). To perform this check, participants’ mean ratings of valence and arousal in response to the 9 validation sounds were compared to the mean normative ratings of valence and arousal available for the same sounds. One participant’s ratings of valence fell more than 1 standard deviation away from the normative values. Their mean valence rating was 3.17, whereas the mean valence rating in the normative data was 5.04 (SD = 1.65). Because their data were different than expected based normative data, the participant’s data were removed for the remaining analyses.
The effects of amplitude-scaling approach on ratings of valence and arousal were examined for the remaining 18 participants. To do this, ratings of valence and arousal were analyzed using linear-mixed effects modelling. Specifically, for each type of rating (valence, arousal), a linear mixed-effects model was constructed using the lmer function of the lme4 package [75]. Each model included a single fixed factor, amplitude-scaling approach (rms, peak), in addition to random intercepts of participant and sound token. Models were analyzed and partial eta squared values were extracted using the anova_stats function of the sjstats package [76]. Significant main effects and interactions were explored using the emmeans function of the emmeans package [77] and controlling for false discovery rates [78].
To evaluate the effect of amplitude-scaling approach on ratings of valence or arousal for individual sounds, the number of sounds whose ratings of valence (or arousal) changed nominal category between the two scaling approaches was calculated. The purpose of this analysis was to evaluate if any changes in ratings would affect the interpretation of the rating; does a sound that elicits a ‘pleasant’ rating with an rms-scaling approach now elicit an ‘unpleasant’ rating with a peak-scaling approach? To evaluate nominal changes in ratings, the 1–9 rating scale was divided into 3 categories, where the ‘low’ category was defined as scores less than 4, ‘neutral’ category was defined as scores 4–6, and the ‘high’ category was defined as a score of greater than 6. For ratings of valence, low scores indicate unpleasantness and high scores indicate pleasantness. For ratings of arousal, low scores indicate calmness and high scores indicate excitedness. Mean responses to each sound were assigned to one of three categories for valence and one of three categories for arousal. The number of sounds whose category assignment was different for rms- and peak-level scaling approaches was calculated for ratings of valence and arousal separately.
Finally, to evaluate the relationship between a sounds’ acoustical properties and subjective ratings, correlations were conducted between ratings of valence (or arousal) and a sound’s rms level (within peak-scaled stimuli) or a sound’s peak level (within rms-scaled stimuli). This approach allows the exploration of the role of the alternative level cue when one level cue was normalized (rms- or peak- level) on subjective ratings of valence or arousal. Correlations were conducted using the cor.test function in base R; each of 4 potential relationships were explored using a separate correlations: 1) ratings of valence and rms level within peak-scaled sounds, 2) ratings of valence and peak level within rms-scaled sounds, 3) ratings of arousal and rms level within peak-scaled sounds, and 4) ratings of arousal and peak level within rms-scaled sounds.
Results
Ratings of valence
Analysis revealed a significant main effect of scaling approach (F[1, 4182] = 23.39, p < .00001, ηp2 = .01). Ratings of valence of the rms-scaled sound were approximately 0.20 points higher than the ratings of valence of the peak-scaled sounds (estimated marginal mean = 4.83 [rms-scaled] and 4.63 [peak-scaled]). Ratings of valence in the rms-scaled condition as a function of ratings of valence in the peak-scaled condition are displayed in Fig 2. Note that this trend is visible in the figure, where many of the points fall above the horizontal diagonal line, suggesting that the same stimuli are given different ratings based on the scaling approach utilized.
Also in Fig 2 is the visualization of the changes in category. Colored symbols indicate sounds that changed nominal categories between rms- and peak-scaling approaches. Of the 120 tokens, 103 were in the same valence category regardless of the amplitude-scaling approach used to present the sounds. However, 9 tokens which were categorized as ‘unpleasant’ with the peak-scaling approach were categorized as ‘neutral’ with the rms-scaling approach. Furthermore, 6 tokens categorized as ‘neutral’ in the peak-scaling approach were categorized as ‘pleasant’ with the rms-scaling approach, and 2 tokens categorized as ‘neutral’ with the peak-scaling approach were categorized as ‘pleasant’ with the rms-scaling approach. Combined, these results indicate that there was a main effect of amplitude-scaling approach on ratings of valence; ratings were higher with the rms- approach than the peak-scaling approach. However, the effect was small enough that only 14% tokens (17 out of 120) would be interpreted differently in the rms- and peak-scaling approaches. Note that none of the tokens changed 2 nominal categories (e.g., from pleasant to unpleasant or vice versa).
Correlation analysis revealed rms level within peak-scaled sounds was significantly related to ratings of valence (Pearson correlation = −.24, t[118] = −2.70, p = .008; left panel Fig 3). However, there was no significant relationship between a sound’s peak level and ratings of valence among tokens that were rms-scaled (Pearson correlation = .10, t[118] = 1.05, p = .294; right panel Fig 3). These findings indicate that rms level is related to ratings of valence with a peak-scaling approach, but peak level is not related to ratings of valence with an rms-scaling approach.
[Figure omitted. See PDF.]
The relationship between ratings of valence and rms level is statistically significant, whereas the relationship between ratings of valence and peak level is not statistically significant.
Ratings of arousal
Analysis revealed a significant main effect of scaling approach (F[1, 4421] = 23.41.56, p < .00001, ηp2 = .01). Ratings of arousal of the rms-scaled sounds were approximately 0.26 points lower than the ratings of arousal of the peak-scaled sounds (estimated marginal mean = 5.18 [rms-scaled] and 5.44 [peak-scaled]). Ratings of arousal in the rms-scaled condition as a function of ratings of valence in the peak-scaled condition are displayed in Fig 4. Note that this trend is visible in the figure, where many of the points fall below the horizontal diagonal line. Also evident in Fig 4 is that the lower ratings of arousal changed the nominal arousal category for 22 tokens; 18 changed from being ‘exciting’ to ‘neutral’ and 4 changed from ‘neutral’ to ‘calming.’ These data indicate that 18% of the tokens would be interpreted differently (less arousing) with the peak-scaling approach than the rms-scaling approach.
[Figure omitted. See PDF.]
Each dot represents an individual sound. Shaded regions indicate ratings that did not change nominal category between scaling approaches. Yellow dots indicate sounds that were nominally ‘neutral with the peak-scaling approach but were nominally ‘calming’ with the rms-scaling approach. Blues dots indicate sounds that were nominally ‘exciting’ with the peak-scaling approach but were ‘neutral’ with the rms-scaling approach.
Correlation analysis revealed rms level within peak-scaled sounds was significantly related to ratings of arousal (Pearson correlation = 0.20, t[118] = 2.22, p = .028; left panel Fig 5). However, there was no significant relationship between a sound’s peak level and ratings of arousal among sounds that were rms-scaled (Pearson correlation = −.08, t[118] = −0.86, p = .351; right panel Fig 5). These findings indicate that rms level was related to ratings of arousal within a peak-scaling approach, but peak level was not related to ratings of arousal within an rms-scaling approach.
[Figure omitted. See PDF.]
The relationship between ratings of arousal and rms level was statistically significant, whereas the relationship between ratings of arousal and peak level was not statistically significant.
Discussion
The purpose of this study was to evaluate the effects of amplitude-scaling approach (rms, peak) on emotional responses, specifically ratings of valence and arousal. The data analyzed in this study was from adults with self-reported normal hearing who passed attention checks during the online testing; they also provided ratings of valence and arousal of validated non-speech sounds that were consistent with normative data of those sounds. Within this group of participants, the choice of amplitude scaling approach affected both ratings of valence and ratings of arousal. Specifically, with an rms-scaling approach, ratings of valence were approximately 0.20 higher than ratings of valence with the peak-scaling approach. In addition, ratings of arousal were approximately 0.26 points lower with an rms-scaling approach compared to a peak-scaling approach. These findings indicate that, in general, tokens were rated as more pleasant and less exciting when they were rms-scaled than when they were peak-scaled. However, the change in amplitude-scaling approach nominally affected fewer than 20% of tokens. Out of 120 sounds, 103 and 98 sounds had ratings of valence and arousal, respectively, that did not change nominal category between amplitude-scaling approaches.
Role of acoustics
In order to better understand the role of acoustics on ratings of valence and arousal, correlations were conducted between ratings of valence (or arousal) and level for the alternate level cue within a scaling approach. With the peak-scaled sounds, the amplitude naturally varies and the rms-level could be higher for some sounds than other sounds. As displayed in Fig 1, sounds with relatively steady amplitude would have higher overall rms values for peak-scaled sounds than for rms-scaled sounds. Correlation analysis revealed that the amplitude of a sound affected ratings of valence and arousal, but only within the peak-scaled sounds. It is not surprising that ratings are related to rms level. Previous work has demonstrated that increasing the overall level results in lower ratings of valence for peak-scaled sounds [20,21]. The current work extends previous findings to show that the relationship between rms level and ratings of valence is statistically significant within individual sounds. Also consistent with previous literature [59–63], and with the notion that arousal reflects degree of activation of an emotion [17–19], sounds with higher rms were rated as more exciting than those with lower rms.
These rms-level related differences could account for the general differences in ratings between the two scaling approaches, depending on the level-setting strategy of participants. The overall rms was approximately 12 dB higher for the peak-scaled sounds (−14.7 dB relative to a maximum of 1) than the rms-scaled sounds in this study (−26.9 dB relative to a maximum of 1). If participants used the same presentation level setting for both sets of tokens, it would explain the lower ratings of valence and higher ratings of arousal for the peak-scaled than the rms-scaled sounds. Participants were encouraged to individually set the level for each block prior to testing, but the absolute level of the system, and their volume settings, were not accessible for analysis.
Research implications
The results of this study suggest the valence ratings in response to sounds used for emotion perception and distraction tasks were affected by amplitude-scaling approach. This indicates that the methodological choice of scaling sounds by peak or rms level might affect interpretation of emotional responses to non-speech sounds. This is especially important considering use of the peak- or rms-scaling strategies are mixed in the literature, with some using rms-scaling strategies [e.g., 55,56,57] and some using peak-scaling strategies [20,21]. Understanding that the amplitude-scaling approach has the potential to affect ratings of valence and arousal has the potential to make data synthesis across studies difficult, if different amplitude-scaling approaches are used.
However, it is important to note the scale of the differences in the current study is small. The 0.2-point rating difference represents a 2-percentage point difference on the 9-point scale used for facilitating ratings of responses. Furthermore, only 14% and 18% of sounds changed nominal categories. In addition, none of the sounds changed more than 1 nominal category. Therefore, although the effect of amplitude-scaling approach was statistically significant and affected some sounds, the effects are generally small.
Importantly, the choice of amplitude-scaling approach not only affects overall ratings of valence and arousal (~0.2 points lower and higher, respectively, with peak-scaled sounds), but also the extent to which amplitude was related to emotion perception. With the peak-scaled sounds, the sounds were matched on one dimension (peak level), but the rms varied between sounds and served as a cue for ratings of valence and emotion. Conversely, the rms-scaled sounds were matched on one dimension (rms level), but the variability in peak level across sounds did not serve as a cue for ratings of valence and emotion. Therefore, in order to preserve some of the individual level variability that contributes to emotion perception of non-speech sounds, it might be beneficial to use a peak-scaling approach. Although ratings of valence and arousal might be generally lower and higher, respectively, than if an rms-scaling approach was used, the peak-scaling allows level to be a cue for ratings of valence and arousal. Conversely, if eliminating level differences is desirable and a researcher wants to explore other cues that might contribute to emotion perception of non-speech sounds, then an rms-scaling approach might be preferable.
Study limitations
There are several study limitations worth mentioning. First, the testing occurred remotely. The unsupervised testing precludes full confidence that participants were not distracted, although attention checks and forced rest breaks serve to improve the chance that study participants were authentic in their responses. Because it was an online study, it is also possible some of the participants had audiometric hearing loss, despite self-reported normal hearing. This could potentially be problematic because people with hearing loss have been shown to provide reduced range of ratings in emotion perception tasks [20]. Fortunately, previous work demonstrates that online testing can be a viable alternative to laboratory testing for hearing-related tasks generally [79] and for measuring emotional responses to non-speech sounds specifically [23,70]. Future work is warranted to evaluate if the results of the current study replicate in a laboratory setting.
Future work is also warranted to determine how these findings will translate to different stimuli. These results are specific to the 120 tokens from a corpus of non-speech sounds, all of them 0.5 seconds in duration. The emotion perception literature often uses an rms-scaling approach with speech stimuli [e.g., 40, 46], although speech corpora are not exclusively rms-scaled [e.g., 80]. It is possible the effects of rms- versus peak-scaling would be smaller with speech tokens because speech tends to have a narrower amplitude dynamic range than other sounds, such as music [e.g., 81]. However, non-speech sounds are inherently variable [53] and therefore current results will only apply to sounds with similar temporal dynamics as the stimuli in the current study. Future work is warranted to evaluate amplitude scaling approach differences in the study of emotion perception with speech stimuli.
Finally, the study focused on only two amplitude scaling approaches. There are other approaches that are standardized that could have been used in this study, such as the European Broadcasting Union Recommendation R128, which has been suggested to be superior to either peak- or rms-level scaling approaches [68]. Although such scaling approaches have not been reported in the study of emotional responses to non-speech sounds, it is possible the use of a widely accepted standard would be beneficial in the field. Importantly, neither peak- nor rms-level scaling account for auditory sensitivity of listeners. The conclusions about level as a cue for ratings of valence and arousal are not based on predicted loudness models [82–84]. Both scaling approaches ignore the spectrotemporal characteristics of human hearing and thus equate only level, not loudness. Future work is warranted to evaluate the extent to which loudness cues for emotional responses contribute to the conclusions of this study.
Conclusions
The purpose of this study was to evaluate the effects of amplitude-scaling approach (rms, peak) on emotional responses, specifically ratings of valence and arousal. The results reveal ratings of valence were lower, and ratings of arousal were higher, in response to peak-scaled sounds than in response to rms-scaled sounds, although the effects were generally small and nominally affected fewer than 20% of the tokens. Within the peak-scaled sounds, those with higher rms values also had ratings of valence that were lower (less pleasant) and ratings of arousal that were higher (more exciting) than for sounds with lower rms values. This finding suggests that peak-scaling approaches preserve some of the individual variability in level that can affect ratings of valence and arousal. Investigators studying emotional responses to non-speech sounds should consider a scaling approach that fits their research agenda, specifically whether variability in level across scaled-tokens is important (preserved with peak-scaling) or whether reducing overall level as a cue is important (with rms-scaling). In addition, care should be used when synthesizing valence and arousal findings across studies with different amplitude-scaling approaches because the chosen approach affects not only overall ratings, but also the acoustic cues listeners are using to make ratings of valence and arousal.
References
1. 1. Picou EM, Buono GH. Emotional Responses to Pleasant Sounds Are Related to Social Disconnectedness and Loneliness Independent of Hearing Loss. Trends Hear. 2018;22:2331216518813243. pmid:30482108
* View Article
* PubMed/NCBI
* Google Scholar
2. 2. Luo X, Kern A, Pulling KR. Vocal emotion recognition performance predicts the quality of life in adult cochlear implant users. J Acoust Soc Am. 2018;144(5):EL429. pmid:30522282
* View Article
* PubMed/NCBI
* Google Scholar
3. 3. Arthaud-day ML, Rode JC, Mooney CH, Near JP. The Subjective Well-being Construct: A Test of its Convergent, Discriminant, and Factorial Validity. Soc Indic Res. 2005;74(3):445–76.
* View Article
* Google Scholar
4. 4. Singh G, Liskovoi L, Launer S, Russo F. The Emotional Communication in Hearing Questionnaire (EMO-CHeQ): Development and Evaluation. Ear Hear. 2019;40(2):260–71. pmid:29894380
* View Article
* PubMed/NCBI
* Google Scholar
5. 5. Taylor SE. Asymmetrical effects of positive and negative events: the mobilization-minimization hypothesis. Psychol Bull. 1991;110(1):67–85. pmid:1891519
* View Article
* PubMed/NCBI
* Google Scholar
6. 6. Baumeister RF, Bratslavsky E, Finkenauer C, Vohs KD. Bad is Stronger than Good. Review of General Psychology. 2001;5(4):323–70.
* View Article
* Google Scholar
7. 7. Yuvaraj R, Murugappan M, Norlinah MI, Sundaraj K, Khairiyah M. Review of emotion recognition in stroke patients. Dement Geriatr Cogn Disord. 2013;36(3–4):179–96. pmid:23899462
* View Article
* PubMed/NCBI
* Google Scholar
8. 8. Zupan B, Babbage D, Neumann D, Willer B. Recognition of facial and vocal affect following traumatic brain injury. Brain Inj. 2014;28(8):1087–95. pmid:24701988
* View Article
* PubMed/NCBI
* Google Scholar
9. 9. Gothwal M, Arumugham SS, Yadav R, Pal PK, Hegde S. Deficits in Emotion Perception and Cognition in Patients with Parkinson’s Disease: A Systematic Review. Ann Indian Acad Neurol. 2022;25(3):367–75. pmid:35936598
* View Article
* PubMed/NCBI
* Google Scholar
10. 10. Keane J, Calder AJ, Hodges JR, Young AW. Face and emotion processing in frontal variant frontotemporal dementia. Neuropsychologia. 2002;40(6):655–65. pmid:11792405
* View Article
* PubMed/NCBI
* Google Scholar
11. 11. Misiewicz S, Brickman AM, Tosto G. Prosodic Impairment in Dementia: Review of the Literature. Curr Alzheimer Res. 2018;15(2):157–63. pmid:29086698
* View Article
* PubMed/NCBI
* Google Scholar
12. 12. Fujiki M, Spackman MP, Brinton B, Illig T. Ability of children with language impairment to understand emotion conveyed by prosody in a narrative passage. Int J Lang Commun Disord. 2008;43(3):330–45. pmid:17852516
* View Article
* PubMed/NCBI
* Google Scholar
13. 13. Taylor LJ, Maybery MT, Grayndler L, Whitehouse AJO. Evidence for shared deficits in identifying emotions from faces and from voices in autism spectrum disorders and specific language impairment. Int J Lang Commun Disord. 2015;50(4):452–66. pmid:25588870
* View Article
* PubMed/NCBI
* Google Scholar
14. 14. Philip RCM, Whalley HC, Stanfield AC, Sprengelmeyer R, Santos IM, Young AW, et al. Deficits in facial, body movement and vocal emotional processing in autism spectrum disorders. Psychol Med. 2010;40(11):1919–29. pmid:20102666
* View Article
* PubMed/NCBI
* Google Scholar
15. 15. Christensen JA, Sis J, Kulkarni AM, Chatterjee M. Effects of Age and Hearing Loss on the Recognition of Emotions in Speech. Ear Hear. 2019;40(5):1069–83. pmid:30614835
* View Article
* PubMed/NCBI
* Google Scholar
16. 16. Kalathottukaren RT, Purdy SC, Ballard E. Prosody perception and musical pitch discrimination in adults using cochlear implants. Int J Audiol. 2015;54(7):444–52. pmid:25634773
* View Article
* PubMed/NCBI
* Google Scholar
17. 17. Bradley MM, Lang PJ. Measuring emotion: the Self-Assessment Manikin and the Semantic Differential. J Behav Ther Exp Psychiatry. 1994;25(1):49–59. pmid:7962581
* View Article
* PubMed/NCBI
* Google Scholar
18. 18. Russell JA. Evidence of convergent validity on the dimensions of affect. Journal of Personality and Social Psychology. 1978;36(10):1152–68.
* View Article
* Google Scholar
19. 19. Russell JA, Mehrabian A. Evidence for a three-factor theory of emotions. Journal of Research in Personality. 1977;11(3):273–94.
* View Article
* Google Scholar
20. 20. Picou EM, Rakita L, Buono GH, Moore TM. Effects of Increasing the Overall Level or Fitting Hearing Aids on Emotional Responses to Sounds. Trends Hear. 2021;25:23312165211049938. pmid:34866509
* View Article
* PubMed/NCBI
* Google Scholar
21. 21. Picou EM. How Hearing Loss and Age Affect Emotional Responses to Nonspeech Sounds. J Speech Lang Hear Res. 2016;59(5):1233–46. pmid:27768178
* View Article
* PubMed/NCBI
* Google Scholar
22. 22. Tawdrous MM, D’Onofrio KL, Gifford R, Picou EM. Emotional Responses to Non-Speech Sounds for Hearing-aid and Bimodal Cochlear-Implant Listeners. Trends Hear. 2022;26:23312165221083091. pmid:35435773
* View Article
* PubMed/NCBI
* Google Scholar
23. 23. Picou EM, Singh G, Russo FA. A Comparison between a remote testing and a laboratory test setting for evaluating emotional responses to non-speech sounds. Int J Audiol. 2022;61(10):799–808. pmid:34883031
* View Article
* PubMed/NCBI
* Google Scholar
24. 24. Murray MM, Camen C, Gonzalez Andino SL, Bovet P, Clarke S. Rapid brain discrimination of sounds of objects. J Neurosci. 2006;26(4):1293–302. pmid:16436617
* View Article
* PubMed/NCBI
* Google Scholar
25. 25. Escera C, Yago E, Corral M-J, Corbera S, Nuñez MI. Attention capture by auditory significant stimuli: semantic analysis follows attention switching. Eur J Neurosci. 2003;18(8):2408–12. pmid:14622204
* View Article
* PubMed/NCBI
* Google Scholar
26. 26. Parmentier FBR. Towards a cognitive model of distraction by auditory novelty: the role of involuntary attention capture and semantic processing. Cognition. 2008;109(3):345–62. pmid:19007926
* View Article
* PubMed/NCBI
* Google Scholar
27. 27. Parmentier FBR, Maybery MT, Elsley J. The involuntary capture of attention by novel feature pairings: a study of voice-location integration in auditory sensory memory. Atten Percept Psychophys. 2010;72(2):279–84. pmid:20139445
* View Article
* PubMed/NCBI
* Google Scholar
28. 28. Lavie N. Attention, Distraction, and Cognitive Control Under Load. Curr Dir Psychol Sci. 2010;19(3):143–8.
* View Article
* Google Scholar
29. 29. Gustafson SJ, Nelson L, Silcox JW. Effect of Auditory Distractors on Speech Recognition and Listening Effort. Ear Hear. 2023;44(5):1121–32. pmid:36935395
* View Article
* PubMed/NCBI
* Google Scholar
30. 30. Morgan SD, Picou EM, Young ED, Gustafson SJ. Relationship Between Auditory Distraction and Emotional Dimensionality for Non-Speech Sounds. Ear Hear. 2025;46(4):983–96. pmid:40001269
* View Article
* PubMed/NCBI
* Google Scholar
31. 31. Traer J, Norman-Haignere SV, McDermott JH. Causal inference in environmental sound recognition. Cognition. 2021;214:104627. pmid:34044231
* View Article
* PubMed/NCBI
* Google Scholar
32. 32. Bradley MM, Lang PJ. The International Affective Digitized Sounds (IADS-2): Affective ratings of sounds and instruction manual. Tech Rep B-3. Gainesville, FL: University of Florida. 2007.
33. 33. Marcell MM, Borella D, Greene M, Kerr E, Rogers S. Confrontation naming of environmental sounds. J Clin Exp Neuropsychol. 2000;22(6):830–64. pmid:11320440
* View Article
* PubMed/NCBI
* Google Scholar
34. 34. Ruhnau P, Wetzel N, Widmann A, Schröger E. The modulation of auditory novelty processing by working memory load in school age children and adults: a combined behavioral and event-related potential study. BMC Neurosci. 2010;11:126. pmid:20929535
* View Article
* PubMed/NCBI
* Google Scholar
35. 35. Marois A, Pozzi A, Vachon F. Assessing the Role of Stimulus Novelty in the Elicitation of the Pupillary Dilation Response to Irrelevant Sound. Auditory Perception & Cognition. 2020;3(1–2):1–17.
* View Article
* Google Scholar
36. 36. Husain FT, Carpenter-Thompson JR, Schmidt SA. The effect of mild-to-moderate hearing loss on auditory and emotion processing networks. Front Syst Neurosci. 2014;8:10. pmid:24550791
* View Article
* PubMed/NCBI
* Google Scholar
37. 37. Carpenter-Thompson JR, Akrofi K, Schmidt SA, Dolcos F, Husain FT. Alterations of the emotional processing system may underlie preserved rapid reaction time in tinnitus. Brain Res. 2014;1567:28–41. pmid:24769166
* View Article
* PubMed/NCBI
* Google Scholar
38. 38. Carpenter-Thompson JR, Schmidt SA, Husain FT. Neural Plasticity of Mild Tinnitus: An fMRI Investigation Comparing Those Recently Diagnosed with Tinnitus to Those That Had Tinnitus for a Long Period of Time. Neural Plast. 2015;2015:161478. pmid:26246914
* View Article
* PubMed/NCBI
* Google Scholar
39. 39. Durai M, O’Keeffe MG, Searchfield GD. Examining the short term effects of emotion under an Adaptation Level Theory model of tinnitus perception. Hear Res. 2017;345:23–9. pmid:28027920
* View Article
* PubMed/NCBI
* Google Scholar
40. 40. Morgan SD. Categorical and Dimensional Ratings of Emotional Speech: Behavioral Findings From the Morgan Emotional Speech Set. J Speech Lang Hear Res. 2019;62(11):4015–29. pmid:31652413
* View Article
* PubMed/NCBI
* Google Scholar
41. 41. Stevenson RA, James TW. Affective auditory stimuli: characterization of the International Affective Digitized Sounds (IADS) by discrete emotional categories. Behav Res Methods. 2008;40(1):315–21. pmid:18411555
* View Article
* PubMed/NCBI
* Google Scholar
42. 42. Davies JE, Gander PE, Hall DA. Does Chronic Tinnitus Alter the Emotional Response Function of the Amygdala?: A Sound-Evoked fMRI Study. Front Aging Neurosci. 2017;9:31. pmid:28270764
* View Article
* PubMed/NCBI
* Google Scholar
43. 43. Bahadori M, Barumerli R, Geronazzo M, Cesari P. Action planning and affective states within the auditory peripersonal space in normal hearing and cochlear-implanted listeners. Neuropsychologia. 2021;155:107790. pmid:33636155
* View Article
* PubMed/NCBI
* Google Scholar
44. 44. Gingras B, Marin MM, Fitch WT. Beyond intensity: Spectral features effectively predict music-induced subjective arousal. Q J Exp Psychol (Hove). 2014;67(7):1428–46. pmid:24215647
* View Article
* PubMed/NCBI
* Google Scholar
45. 45. Wetzel N, Schröger E, Widmann A. The dissociation between the P3a event-related potential and behavioral distraction. Psychophysiology. 2013;50(9):920–30. pmid:23763292
* View Article
* PubMed/NCBI
* Google Scholar
46. 46. Oron Y, Levy O, Avivi-Reich M, Goldfarb A, Handzel O, Shakuf V, et al. Tinnitus affects the relative roles of semantics and prosody in the perception of emotions in spoken language. Int J Audiol. 2020;59(3):195–207. pmid:31663391
* View Article
* PubMed/NCBI
* Google Scholar
47. 47. Dupuis K, Pichora-Fuller MK. Aging Affects Identification of Vocal Emotions in Semantically Neutral Sentences. J Speech Lang Hear Res. 2015;58(3):1061–76. pmid:25810032
* View Article
* PubMed/NCBI
* Google Scholar
48. 48. Globerson E, Amir N, Golan O, Kishon-Rabin L, Lavidor M. Psychoacoustic abilities as predictors of vocal emotion recognition. Atten Percept Psychophys. 2013;75(8):1799–810. pmid:23893469
* View Article
* PubMed/NCBI
* Google Scholar
49. 49. Morgan SD. Comparing Emotion Recognition and Word Recognition in Background Noise. J Speech Lang Hear Res. 2021;64(5):1758–72. pmid:33830784
* View Article
* PubMed/NCBI
* Google Scholar
50. 50. Morgan SD, Garrard S, Hoskins T. Emotion and Word Recognition for Unprocessed and Vocoded Speech Stimuli. Ear Hear. 2022;43(2):398–407. pmid:34310412
* View Article
* PubMed/NCBI
* Google Scholar
51. 51. Morgan SD, Ferguson SH, Crain AD, Jennings SG. Perceived Anger in Clear and Conversational Speech: Contributions of Age and Hearing Loss. Brain Sci. 2022;12(2):210. pmid:35203973
* View Article
* PubMed/NCBI
* Google Scholar
52. 52. de Boer MJ, Jürgens T, Cornelissen FW, Başkent D. Degraded visual and auditory input individually impair audiovisual emotion recognition from speech-like stimuli, but no evidence for an exacerbated effect from combined degradation. Vision Res. 2021;180:51–62. pmid:33360918
* View Article
* PubMed/NCBI
* Google Scholar
53. 53. Stilp CE, Shorey AE, King CJ. Nonspeech sounds are not all equally good at being nonspeech. J Acoust Soc Am. 2022;152(3):1842. pmid:36182316
* View Article
* PubMed/NCBI
* Google Scholar
54. 54. Reddy RK, Ramachandra V, Kumar N, Singh NC. Categorization of environmental sounds. Biol Cybern. 2009;100(4):299–306. pmid:19259694
* View Article
* PubMed/NCBI
* Google Scholar
55. 55. Max C, Widmann A, Kotz SA, Schröger E, Wetzel N. Distraction by emotional sounds: Disentangling arousal benefits and orienting costs. Emotion. 2015;15(4):428–37. pmid:26053245
* View Article
* PubMed/NCBI
* Google Scholar
56. 56. Harrison NR, Davies SJ. Modulation of spatial attention to visual targets by emotional environmental sounds. Psychology & Neuroscience. 2013;6(3):247–51.
* View Article
* Google Scholar
57. 57. Wang Y, Tang Z, Zhang X, Yang L. Auditory and cross-modal attentional bias toward positive natural sounds: Behavioral and ERP evidence. Front Hum Neurosci. 2022;16:949655. pmid:35967006
* View Article
* PubMed/NCBI
* Google Scholar
58. 58. Gygi B, Kidd GR, Watson CS. Similarity and categorization of environmental sounds. Percept Psychophys. 2007;69(6):839–55. pmid:18018965
* View Article
* PubMed/NCBI
* Google Scholar
59. 59. Ilie G, Thompson WF. A Comparison of Acoustic Cues in Music and Speech for Three Dimensions of Affect. Music Perception. 2006;23(4):319–30.
* View Article
* Google Scholar
60. 60. Goudbeek M, Scherer K. Beyond arousal: valence and potency/control cues in the vocal expression of emotion. J Acoust Soc Am. 2010;128(3):1322–36. pmid:20815467
* View Article
* PubMed/NCBI
* Google Scholar
61. 61. Laukka P, Juslin P, Bresin R. A dimensional approach to vocal expression of emotion. Cognition & Emotion. 2005;19(5):633–53.
* View Article
* Google Scholar
62. 62. Ma W, Thompson WF. Human emotions track changes in the acoustic environment. Proc Natl Acad Sci U S A. 2015;112(47):14563–8. pmid:26553987
* View Article
* PubMed/NCBI
* Google Scholar
63. 63. Weninger F, Eyben F, Schuller BW, Mortillaro M, Scherer KR. On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common. Front Psychol. 2013;4:292. pmid:23750144
* View Article
* PubMed/NCBI
* Google Scholar
64. 64. Pell MD, Kotz SA. On the time course of vocal emotion recognition. PLoS One. 2011;6(11):e27256. pmid:22087275
* View Article
* PubMed/NCBI
* Google Scholar
65. 65. Oliva M, Anikin A. Pupil dilation reflects the time course of emotion recognition in human vocalizations. Sci Rep. 2018;8(1):4871. pmid:29559673
* View Article
* PubMed/NCBI
* Google Scholar
66. 66. Escera C, Alho K, Winkler I, Näätänen R. Neural mechanisms of involuntary attention to acoustic novelty and change. J Cogn Neurosci. 1998;10(5):590–604. pmid:9802992
* View Article
* PubMed/NCBI
* Google Scholar
67. 67. Buono GH, Crukley J, Hornsby BWY, Picou EM. Loss of high- or low-frequency audibility can partially explain effects of hearing loss on emotional responses to non-speech sounds. Hear Res. 2021;401:108153. pmid:33360158
* View Article
* PubMed/NCBI
* Google Scholar
68. 68. EBU–Recommendation R. Loudness normalisation and permitted maximum level of audio signals. 2011.
69. 69. Anwyl-Irvine AL, Massonnié J, Flitton A, Kirkham N, Evershed JK. Gorilla in our midst: An online behavioral experiment builder. Behav Res Methods. 2020;52(1):388–407. pmid:31016684
* View Article
* PubMed/NCBI
* Google Scholar
70. 70. Seow TXF, Hauser TU. Reliability of web-based affective auditory stimulus presentation. Behav Res Methods. 2022;54(1):378–92. pmid:34240338
* View Article
* PubMed/NCBI
* Google Scholar
71. 71. Ben-David BM, Mentzel M, Icht M, Gilad M, Dor YI, Ben-David S, et al. Challenges and opportunities for telehealth assessment during COVID-19: iT-RES, adapting a remote version of the test for rating emotions in speech. Int J Audiol. 2021;60(5):319–21. pmid:33063553
* View Article
* PubMed/NCBI
* Google Scholar
72. 72. Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(2):377–81. pmid:18929686
* View Article
* PubMed/NCBI
* Google Scholar
73. 73. Woods KJP, Siegel MH, Traer J, McDermott JH. Headphone screening to facilitate web-based auditory experiments. Atten Percept Psychophys. 2017;79(7):2064–72. pmid:28695541
* View Article
* PubMed/NCBI
* Google Scholar
74. 74. R Core Team. R: A language and environment for statistical computing. 2022.
75. 75. Bates D, Mächler M, Bolker B, Walker S. Fitting Linear Mixed-Effects Models Usinglme4. J Stat Soft. 2015;67(1).
* View Article
* Google Scholar
76. 76. Lüdecke D, Lüdecke MD. Package ‘sjstats’. Statistical functions for regression models. 2019.
77. 77. Lenth R. Emmeans: Estimated Marginal Means, aka Least-Squares Means. 2019.
78. 78. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1995;57(1):289–300.
* View Article
* Google Scholar
79. 79. Paglialonga A, Polo EM, Zanet M, Rocco G, van Waterschoot T, Barbieri R. An Automated Speech-in-Noise Test for Remote Testing: Development and Preliminary Evaluation. Am J Audiol. 2020;29(3S):564–76. pmid:32946249
* View Article
* PubMed/NCBI
* Google Scholar
80. 80. Livingstone SR, Russo FA. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS One. 2018;13(5):e0196391. pmid:29768426
* View Article
* PubMed/NCBI
* Google Scholar
81. 81. Kirchberger M, Russo FA. Dynamic Range Across Music Genres and the Perception of Dynamic Compression in Hearing-Impaired Listeners. Trends Hear. 2016;20:2331216516630549. pmid:26868955
* View Article
* PubMed/NCBI
* Google Scholar
82. 82. Moore BC, Glasberg BR, Baer T. A model for the prediction of thresholds, loudness, and partial loudness. Journal of the Audio Engineering Society. 1997;45(4):224–40.
* View Article
* Google Scholar
83. 83. Moore BCJ, Glasberg BR. A revised model of loudness perception applied to cochlear hearing loss. Hear Res. 2004;188(1–2):70–88. pmid:14759572
* View Article
* PubMed/NCBI
* Google Scholar
84. 84. Moore BCJ, Glasberg BR, Varathanathan A, Schlittenlacher J. A Loudness Model for Time-Varying Sounds Incorporating Binaural Inhibition. Trends Hear. 2016;20:2331216516682698. pmid:28215113
* View Article
* PubMed/NCBI
* Google Scholar
Citation: Picou EM, Morgan SD, Young ED, Gustafson SJ (2025) Effects of stimulus amplitude-scaling approach on emotional responses to non-speech sounds. PLoS One 20(7): e0328659. https://doi.org/10.1371/journal.pone.0328659
About the Authors:
Erin M. Picou
Roles: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing
E-mail: [email protected]
Affiliation: Department of Hearing and Speech, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
ORICD: https://orcid.org/0000-0003-3083-0809
Shae D. Morgan
Roles: Conceptualization, Investigation, Methodology, Supervision, Visualization, Writing – original draft, Writing – review & editing
Affiliation: Program in Audiology, University of Louisville, Louisville, Kentucky, United States of America
Elizabeth D. Young
Roles: Data curation, Investigation, Methodology, Project administration, Software, Validation, Writing – original draft, Writing – review & editing
Affiliation: Department of Communication Sciences and Disorders, University of Utah, Salt Lake City, Utah, United States of America
ORICD: https://orcid.org/0000-0002-1633-8802
Samantha J. Gustafson
Roles: Funding acquisition, Project administration, Resources, Supervision, Writing – original draft, Writing – review & editing, Data curation
Affiliation: Department of Communication Sciences and Disorders, University of Utah, Salt Lake City, Utah, United States of America
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
1. Picou EM, Buono GH. Emotional Responses to Pleasant Sounds Are Related to Social Disconnectedness and Loneliness Independent of Hearing Loss. Trends Hear. 2018;22:2331216518813243. pmid:30482108
2. Luo X, Kern A, Pulling KR. Vocal emotion recognition performance predicts the quality of life in adult cochlear implant users. J Acoust Soc Am. 2018;144(5):EL429. pmid:30522282
3. Arthaud-day ML, Rode JC, Mooney CH, Near JP. The Subjective Well-being Construct: A Test of its Convergent, Discriminant, and Factorial Validity. Soc Indic Res. 2005;74(3):445–76.
4. Singh G, Liskovoi L, Launer S, Russo F. The Emotional Communication in Hearing Questionnaire (EMO-CHeQ): Development and Evaluation. Ear Hear. 2019;40(2):260–71. pmid:29894380
5. Taylor SE. Asymmetrical effects of positive and negative events: the mobilization-minimization hypothesis. Psychol Bull. 1991;110(1):67–85. pmid:1891519
6. Baumeister RF, Bratslavsky E, Finkenauer C, Vohs KD. Bad is Stronger than Good. Review of General Psychology. 2001;5(4):323–70.
7. Yuvaraj R, Murugappan M, Norlinah MI, Sundaraj K, Khairiyah M. Review of emotion recognition in stroke patients. Dement Geriatr Cogn Disord. 2013;36(3–4):179–96. pmid:23899462
8. Zupan B, Babbage D, Neumann D, Willer B. Recognition of facial and vocal affect following traumatic brain injury. Brain Inj. 2014;28(8):1087–95. pmid:24701988
9. Gothwal M, Arumugham SS, Yadav R, Pal PK, Hegde S. Deficits in Emotion Perception and Cognition in Patients with Parkinson’s Disease: A Systematic Review. Ann Indian Acad Neurol. 2022;25(3):367–75. pmid:35936598
10. Keane J, Calder AJ, Hodges JR, Young AW. Face and emotion processing in frontal variant frontotemporal dementia. Neuropsychologia. 2002;40(6):655–65. pmid:11792405
11. Misiewicz S, Brickman AM, Tosto G. Prosodic Impairment in Dementia: Review of the Literature. Curr Alzheimer Res. 2018;15(2):157–63. pmid:29086698
12. Fujiki M, Spackman MP, Brinton B, Illig T. Ability of children with language impairment to understand emotion conveyed by prosody in a narrative passage. Int J Lang Commun Disord. 2008;43(3):330–45. pmid:17852516
13. Taylor LJ, Maybery MT, Grayndler L, Whitehouse AJO. Evidence for shared deficits in identifying emotions from faces and from voices in autism spectrum disorders and specific language impairment. Int J Lang Commun Disord. 2015;50(4):452–66. pmid:25588870
14. Philip RCM, Whalley HC, Stanfield AC, Sprengelmeyer R, Santos IM, Young AW, et al. Deficits in facial, body movement and vocal emotional processing in autism spectrum disorders. Psychol Med. 2010;40(11):1919–29. pmid:20102666
15. Christensen JA, Sis J, Kulkarni AM, Chatterjee M. Effects of Age and Hearing Loss on the Recognition of Emotions in Speech. Ear Hear. 2019;40(5):1069–83. pmid:30614835
16. Kalathottukaren RT, Purdy SC, Ballard E. Prosody perception and musical pitch discrimination in adults using cochlear implants. Int J Audiol. 2015;54(7):444–52. pmid:25634773
17. Bradley MM, Lang PJ. Measuring emotion: the Self-Assessment Manikin and the Semantic Differential. J Behav Ther Exp Psychiatry. 1994;25(1):49–59. pmid:7962581
18. Russell JA. Evidence of convergent validity on the dimensions of affect. Journal of Personality and Social Psychology. 1978;36(10):1152–68.
19. Russell JA, Mehrabian A. Evidence for a three-factor theory of emotions. Journal of Research in Personality. 1977;11(3):273–94.
20. Picou EM, Rakita L, Buono GH, Moore TM. Effects of Increasing the Overall Level or Fitting Hearing Aids on Emotional Responses to Sounds. Trends Hear. 2021;25:23312165211049938. pmid:34866509
21. Picou EM. How Hearing Loss and Age Affect Emotional Responses to Nonspeech Sounds. J Speech Lang Hear Res. 2016;59(5):1233–46. pmid:27768178
22. Tawdrous MM, D’Onofrio KL, Gifford R, Picou EM. Emotional Responses to Non-Speech Sounds for Hearing-aid and Bimodal Cochlear-Implant Listeners. Trends Hear. 2022;26:23312165221083091. pmid:35435773
23. Picou EM, Singh G, Russo FA. A Comparison between a remote testing and a laboratory test setting for evaluating emotional responses to non-speech sounds. Int J Audiol. 2022;61(10):799–808. pmid:34883031
24. Murray MM, Camen C, Gonzalez Andino SL, Bovet P, Clarke S. Rapid brain discrimination of sounds of objects. J Neurosci. 2006;26(4):1293–302. pmid:16436617
25. Escera C, Yago E, Corral M-J, Corbera S, Nuñez MI. Attention capture by auditory significant stimuli: semantic analysis follows attention switching. Eur J Neurosci. 2003;18(8):2408–12. pmid:14622204
26. Parmentier FBR. Towards a cognitive model of distraction by auditory novelty: the role of involuntary attention capture and semantic processing. Cognition. 2008;109(3):345–62. pmid:19007926
27. Parmentier FBR, Maybery MT, Elsley J. The involuntary capture of attention by novel feature pairings: a study of voice-location integration in auditory sensory memory. Atten Percept Psychophys. 2010;72(2):279–84. pmid:20139445
28. Lavie N. Attention, Distraction, and Cognitive Control Under Load. Curr Dir Psychol Sci. 2010;19(3):143–8.
29. Gustafson SJ, Nelson L, Silcox JW. Effect of Auditory Distractors on Speech Recognition and Listening Effort. Ear Hear. 2023;44(5):1121–32. pmid:36935395
30. Morgan SD, Picou EM, Young ED, Gustafson SJ. Relationship Between Auditory Distraction and Emotional Dimensionality for Non-Speech Sounds. Ear Hear. 2025;46(4):983–96. pmid:40001269
31. Traer J, Norman-Haignere SV, McDermott JH. Causal inference in environmental sound recognition. Cognition. 2021;214:104627. pmid:34044231
32. Bradley MM, Lang PJ. The International Affective Digitized Sounds (IADS-2): Affective ratings of sounds and instruction manual. Tech Rep B-3. Gainesville, FL: University of Florida. 2007.
33. Marcell MM, Borella D, Greene M, Kerr E, Rogers S. Confrontation naming of environmental sounds. J Clin Exp Neuropsychol. 2000;22(6):830–64. pmid:11320440
34. Ruhnau P, Wetzel N, Widmann A, Schröger E. The modulation of auditory novelty processing by working memory load in school age children and adults: a combined behavioral and event-related potential study. BMC Neurosci. 2010;11:126. pmid:20929535
35. Marois A, Pozzi A, Vachon F. Assessing the Role of Stimulus Novelty in the Elicitation of the Pupillary Dilation Response to Irrelevant Sound. Auditory Perception & Cognition. 2020;3(1–2):1–17.
36. Husain FT, Carpenter-Thompson JR, Schmidt SA. The effect of mild-to-moderate hearing loss on auditory and emotion processing networks. Front Syst Neurosci. 2014;8:10. pmid:24550791
37. Carpenter-Thompson JR, Akrofi K, Schmidt SA, Dolcos F, Husain FT. Alterations of the emotional processing system may underlie preserved rapid reaction time in tinnitus. Brain Res. 2014;1567:28–41. pmid:24769166
38. Carpenter-Thompson JR, Schmidt SA, Husain FT. Neural Plasticity of Mild Tinnitus: An fMRI Investigation Comparing Those Recently Diagnosed with Tinnitus to Those That Had Tinnitus for a Long Period of Time. Neural Plast. 2015;2015:161478. pmid:26246914
39. Durai M, O’Keeffe MG, Searchfield GD. Examining the short term effects of emotion under an Adaptation Level Theory model of tinnitus perception. Hear Res. 2017;345:23–9. pmid:28027920
40. Morgan SD. Categorical and Dimensional Ratings of Emotional Speech: Behavioral Findings From the Morgan Emotional Speech Set. J Speech Lang Hear Res. 2019;62(11):4015–29. pmid:31652413
41. Stevenson RA, James TW. Affective auditory stimuli: characterization of the International Affective Digitized Sounds (IADS) by discrete emotional categories. Behav Res Methods. 2008;40(1):315–21. pmid:18411555
42. Davies JE, Gander PE, Hall DA. Does Chronic Tinnitus Alter the Emotional Response Function of the Amygdala?: A Sound-Evoked fMRI Study. Front Aging Neurosci. 2017;9:31. pmid:28270764
43. Bahadori M, Barumerli R, Geronazzo M, Cesari P. Action planning and affective states within the auditory peripersonal space in normal hearing and cochlear-implanted listeners. Neuropsychologia. 2021;155:107790. pmid:33636155
44. Gingras B, Marin MM, Fitch WT. Beyond intensity: Spectral features effectively predict music-induced subjective arousal. Q J Exp Psychol (Hove). 2014;67(7):1428–46. pmid:24215647
45. Wetzel N, Schröger E, Widmann A. The dissociation between the P3a event-related potential and behavioral distraction. Psychophysiology. 2013;50(9):920–30. pmid:23763292
46. Oron Y, Levy O, Avivi-Reich M, Goldfarb A, Handzel O, Shakuf V, et al. Tinnitus affects the relative roles of semantics and prosody in the perception of emotions in spoken language. Int J Audiol. 2020;59(3):195–207. pmid:31663391
47. Dupuis K, Pichora-Fuller MK. Aging Affects Identification of Vocal Emotions in Semantically Neutral Sentences. J Speech Lang Hear Res. 2015;58(3):1061–76. pmid:25810032
48. Globerson E, Amir N, Golan O, Kishon-Rabin L, Lavidor M. Psychoacoustic abilities as predictors of vocal emotion recognition. Atten Percept Psychophys. 2013;75(8):1799–810. pmid:23893469
49. Morgan SD. Comparing Emotion Recognition and Word Recognition in Background Noise. J Speech Lang Hear Res. 2021;64(5):1758–72. pmid:33830784
50. Morgan SD, Garrard S, Hoskins T. Emotion and Word Recognition for Unprocessed and Vocoded Speech Stimuli. Ear Hear. 2022;43(2):398–407. pmid:34310412
51. Morgan SD, Ferguson SH, Crain AD, Jennings SG. Perceived Anger in Clear and Conversational Speech: Contributions of Age and Hearing Loss. Brain Sci. 2022;12(2):210. pmid:35203973
52. de Boer MJ, Jürgens T, Cornelissen FW, Başkent D. Degraded visual and auditory input individually impair audiovisual emotion recognition from speech-like stimuli, but no evidence for an exacerbated effect from combined degradation. Vision Res. 2021;180:51–62. pmid:33360918
53. Stilp CE, Shorey AE, King CJ. Nonspeech sounds are not all equally good at being nonspeech. J Acoust Soc Am. 2022;152(3):1842. pmid:36182316
54. Reddy RK, Ramachandra V, Kumar N, Singh NC. Categorization of environmental sounds. Biol Cybern. 2009;100(4):299–306. pmid:19259694
55. Max C, Widmann A, Kotz SA, Schröger E, Wetzel N. Distraction by emotional sounds: Disentangling arousal benefits and orienting costs. Emotion. 2015;15(4):428–37. pmid:26053245
56. Harrison NR, Davies SJ. Modulation of spatial attention to visual targets by emotional environmental sounds. Psychology & Neuroscience. 2013;6(3):247–51.
57. Wang Y, Tang Z, Zhang X, Yang L. Auditory and cross-modal attentional bias toward positive natural sounds: Behavioral and ERP evidence. Front Hum Neurosci. 2022;16:949655. pmid:35967006
58. Gygi B, Kidd GR, Watson CS. Similarity and categorization of environmental sounds. Percept Psychophys. 2007;69(6):839–55. pmid:18018965
59. Ilie G, Thompson WF. A Comparison of Acoustic Cues in Music and Speech for Three Dimensions of Affect. Music Perception. 2006;23(4):319–30.
60. Goudbeek M, Scherer K. Beyond arousal: valence and potency/control cues in the vocal expression of emotion. J Acoust Soc Am. 2010;128(3):1322–36. pmid:20815467
61. Laukka P, Juslin P, Bresin R. A dimensional approach to vocal expression of emotion. Cognition & Emotion. 2005;19(5):633–53.
62. Ma W, Thompson WF. Human emotions track changes in the acoustic environment. Proc Natl Acad Sci U S A. 2015;112(47):14563–8. pmid:26553987
63. Weninger F, Eyben F, Schuller BW, Mortillaro M, Scherer KR. On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common. Front Psychol. 2013;4:292. pmid:23750144
64. Pell MD, Kotz SA. On the time course of vocal emotion recognition. PLoS One. 2011;6(11):e27256. pmid:22087275
65. Oliva M, Anikin A. Pupil dilation reflects the time course of emotion recognition in human vocalizations. Sci Rep. 2018;8(1):4871. pmid:29559673
66. Escera C, Alho K, Winkler I, Näätänen R. Neural mechanisms of involuntary attention to acoustic novelty and change. J Cogn Neurosci. 1998;10(5):590–604. pmid:9802992
67. Buono GH, Crukley J, Hornsby BWY, Picou EM. Loss of high- or low-frequency audibility can partially explain effects of hearing loss on emotional responses to non-speech sounds. Hear Res. 2021;401:108153. pmid:33360158
68. EBU–Recommendation R. Loudness normalisation and permitted maximum level of audio signals. 2011.
69. Anwyl-Irvine AL, Massonnié J, Flitton A, Kirkham N, Evershed JK. Gorilla in our midst: An online behavioral experiment builder. Behav Res Methods. 2020;52(1):388–407. pmid:31016684
70. Seow TXF, Hauser TU. Reliability of web-based affective auditory stimulus presentation. Behav Res Methods. 2022;54(1):378–92. pmid:34240338
71. Ben-David BM, Mentzel M, Icht M, Gilad M, Dor YI, Ben-David S, et al. Challenges and opportunities for telehealth assessment during COVID-19: iT-RES, adapting a remote version of the test for rating emotions in speech. Int J Audiol. 2021;60(5):319–21. pmid:33063553
72. Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(2):377–81. pmid:18929686
73. Woods KJP, Siegel MH, Traer J, McDermott JH. Headphone screening to facilitate web-based auditory experiments. Atten Percept Psychophys. 2017;79(7):2064–72. pmid:28695541
74. R Core Team. R: A language and environment for statistical computing. 2022.
75. Bates D, Mächler M, Bolker B, Walker S. Fitting Linear Mixed-Effects Models Usinglme4. J Stat Soft. 2015;67(1).
76. Lüdecke D, Lüdecke MD. Package ‘sjstats’. Statistical functions for regression models. 2019.
77. Lenth R. Emmeans: Estimated Marginal Means, aka Least-Squares Means. 2019.
78. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1995;57(1):289–300.
79. Paglialonga A, Polo EM, Zanet M, Rocco G, van Waterschoot T, Barbieri R. An Automated Speech-in-Noise Test for Remote Testing: Development and Preliminary Evaluation. Am J Audiol. 2020;29(3S):564–76. pmid:32946249
80. Livingstone SR, Russo FA. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS One. 2018;13(5):e0196391. pmid:29768426
81. Kirchberger M, Russo FA. Dynamic Range Across Music Genres and the Perception of Dynamic Compression in Hearing-Impaired Listeners. Trends Hear. 2016;20:2331216516630549. pmid:26868955
82. Moore BC, Glasberg BR, Baer T. A model for the prediction of thresholds, loudness, and partial loudness. Journal of the Audio Engineering Society. 1997;45(4):224–40.
83. Moore BCJ, Glasberg BR. A revised model of loudness perception applied to cochlear hearing loss. Hear Res. 2004;188(1–2):70–88. pmid:14759572
84. Moore BCJ, Glasberg BR, Varathanathan A, Schlittenlacher J. A Loudness Model for Time-Varying Sounds Incorporating Binaural Inhibition. Trends Hear. 2016;20:2331216516682698. pmid:28215113
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2025 Picou et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
In the study of auditory emotion perception, it is important to calibrate test sounds so their presentation level during testing is known. It is also often desirable to standardize the amplitude of the sounds so that each sound used in testing is approximately the same level. However, existing literature in the study of auditory emotion perception includes a mixture of techniques for standardizing amplitude across sounds. The purpose of this study was to compare the effects of two amplitude-scaling approaches on emotional responses to non-speech sounds, specifically standardization based on peak level or root-mean-square (rms) level. Nineteen young adults provided ratings of valence and arousal via an online testing program. Stimuli were non-speech sounds scaled in two ways, based on the stimulus’ peak level or rms level. Ratings were analyzed using linear-mixed effects modeling to compare scaling methods; correlations between ratings and level within each scaling method were explored. Analysis revealed that the ratings of peak-scaled sounds were less pleasant and more exciting than were the ratings of rms-scaled sounds, although the effects were small in magnitude (~0.2 points on a 1–9 scale). Within rms-scaled sounds, peak level was not related to ratings of valence or arousal. However, within peak-scaled sounds, rms level was related to ratings of valence and arousal. Combined, these data suggest that amplitude standardization has a small effect on ratings overall, but investigators might be motivated to choose one approach over the other, depending on the research question. Rms-scaling reduces overall level as a cue for emotional responses, while peak-scaling maintains some natural variability in responses related to level. Finally, results are specific to this stimulus set. The effects of amplitude-scaling would be expected to be negligible for a stimulus set where the sounds have homogenous temporal dynamics.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer