Content area
With the increase in the use of face masks as a health precaution in the post-pandemic era, the effects of wearing such masks on language learners’ speech perception, which depends highly on visual cues, remain uncertain. This study explores the impact of wearing face masks on the listening comprehension and word recognition of 254 learners of English as a foreign language at intermediate and lower-intermediate levels at a university in Saudi Arabia. Participants listened to a passage read by a native English speaker in one of three conditions assigned randomly: audio-visual-face (AV-F), audio-visual-mask (AV-M), or audio-only (A-O) groups. A series of multiple-choice questions measured their comprehension of the listening passage and recognition of words. Findings revealed significant differences in accuracy scores, with AV-M and A-O resulting in lower accuracy than AV-F, which was the highest in accuracy. Additionally, higher language proficiency correlated with better AV-F performance, indicating the participants’ experience in recognizing facial cues. This study supports other findings on the negative impact of face masks on second language learners’ listening comprehension and word recognition, emphasizing the significance of observing facial and lip movements for language learners. Relevant implications and recommendations are provided for educators and researchers working with language learners to support their listening comprehension and perception skills.
Introduction
Speakers of a second language (L2) or foreign language (FL) often rely on visual cues to enhance their comprehension of spoken language and mitigate potential misunderstandings (Barrós-Loscertales, 2013; Lesnov, 2022). Such cues, including facial expressions and lip movements, significantly improve learners' ability to recognize sounds, words, and sentences (Hardison, 2010; Navarra & Soto-Faraco, 2007). Research has demonstrated that the absence of these visual cues due to mask-wearing negatively affects listening comprehension accuracy (Lee & Hart, 2022) and overall speech perception among L2 learners (Kitikanan & Leung, 2024).
In this context, the emergence of the COVID-19 pandemic has introduced a unique set of challenges for L2 learners by substantially limiting access to visual cues that are critical for effective communication. Two significant challenges have emerged in this context: First, learners are increasingly receiving auditory input through digital screens (e.g., mobile phones, tablets, and laptops), both with and without visual input, in the form of a video depicting the speaker. Second, with the return to in-person classes in the post-pandemic era, the adoption of face masks among instructors and students has introduced a new barrier to effective communication. Although this was necessary to prevent the spread of the disease, the increased use of face masks has also inadvertently led to a lack of vital visual cues (such as lip and facial movements), potentially negatively affecting L2 learners’ speech perception and listening comprehension (Lee & Hart, 2022).
Despite the prevalence of these obstacles in the post-pandemic world, research exploring the effects of wearing face masks on the listening comprehension and word recognition of L2 and FL learners remains limited. This study aims to address this gap in the literature by examining how wearing facial masks as a safety measure during the COVID-19 pandemic affected the listening skills of Arabic-speaking learners of English as a foreign language (EFL). Moreover, the study examines whether language proficiency moderates the extent to which learners rely on visual signals. This study hypothesized that learners with higher proficiency would make more use of visual cues, as they are likely to have had greater exposure to and experience with multimodal input in the target language (Algana & Hardison, 2024; Sueyoshi & Hardison, 2005).
To address these aims, this study analyzed data from a sample of university students in Saudi Arabia. By shedding light on these aspects of language learning during the pandemic and post-pandemic era, this research aims to provide valuable insights that can inform instructional practices and support the continued development of effective language learning strategies amidst ongoing global challenges. The results provide new insights into the impact of wearing masks on L2 listening comprehension and word perception and contribute to a deeper understanding of the challenges faced by EFL learners during and after the COVID-19 pandemic. In addition, the findings are likely generalizable to similar groups with homogenous FL curricula and thus serve as a valuable resource for language instructors and specialists attempting to understand the effects of wearing face masks on L2 learners’ achievement and develop targeted interventions for their language learning journey.
Literature review
Visual cues in speech processing
Effective communication involves more than mere auditory input, as it incorporates visual cues that are crucial for language perception. This multifaceted process of understanding speech requires input from multiple sources, including phonetic segments, prosodic elements such as stress and intonation, and visual signals like lip movements and hand gestures (Hirata & Kelly, 2010; Mcgurk & Macdonald, 1976; Stam & Tellier, 2022). The inherent complexity of speech perception has become evident as research highlights the limitations of relying solely on auditory input, emphasizing the multimodal nature of language communication (Erdener, 2020; Hirata & Kelly, 2010; Stam & Tellier, 2022). Nonverbal cues, for instance, facial expressions, lip movements, and hand gestures, can significantly enhance learners' grasp of the speaker's emotions, intentions, and attitudes, thus facilitating a deeper understanding of the message and increasing the likelihood of responding appropriately.
Moreover, visual information serves as a critical complement to auditory input, particularly in challenging auditory environments such as noisy or unfamiliar settings, which are frequently encountered in educational and public spaces (see Blanco-Elorrieta et al., 2020; Summerfield, 1979). Beyond lexical and grammatical knowledge, visual cues, including hand and facial gestures, are instrumental in aiding L2 comprehension (Hardison & Penington, 2021). These cues not only sharpen L2 learners’ attention and memory but also elevate their awareness of prosodic features and ability to identify segmental sounds (Hardison, 2010).
Furthermore, speechreading, or lip reading, further aids L2 learners in word and phrase recognition, allowing for the visual interpretation of face, lip, and tongue movements (Navarra & Soto-Faraco, 2007). These facial and lip movements are particularly beneficial in deciphering accented speech commonly faced in L2 learning contexts (Reisberg et al., 1987). In addition, incorporating audio-visual input has been shown to significantly improve language learners' listening and speaking proficiency, as demonstrated by learners who received visual feedback during speech training outperforming those who did not (Mehta, 2020).
Research utilizing neuroimaging techniques has confirmed that L2 learners rely more heavily on visual cues to process speech compared to their native-speaking counterparts. For example, Barrós-Loscertales et al. (2013) have provided empirical evidence using functional magnetic resonance imaging (fMRI) that shows increased activation of the occipital lobe (i.e., where visual information is processed) in non-native speakers presented with audio-visual congruent stimuli. In other words, L2 speakers were paying close attention to visual information while listening, highlighting the significant role of visual cues in L2 speech processing. Notably, multimodal input is especially beneficial for facilitating the speech perception of beginner or intermediate learners, who may have insufficient vocabulary or grammar knowledge to deduce meaning from context alone.
Face masks
Despite the importance of visual cues for language learners, the onset of the COVID-19 pandemic has introduced a new variable into the equation of language learning through the widespread use of face masks, as they can muffle speech sounds and obscure visual cues (Hardison & Pennington, 2021). While studies have extensively investigated the effects of face masks in various fields, including communication and individuals with hearing impairments (Mantzikos & Lappa, 2020; Saunders et al., 2021), limited attention has been paid to their impact on language learners, particularly in EFL settings. Some studies have investigated the differences between different types of masks in classroom settings, indicating that cloth face masks may impede speech and language comprehension more than surgical masks (Bottalico et al., 2020).
Emerging studies have also underlined the challenges posed by mask-wearing in effective oral communication. In particular, those communicating in their native language (L1) experience difficulty understanding speech and identifying words when conversing with individuals wearing masks, especially in noisy environments (Giovanell et al., 2021; Rahne et al., 2021; Yi et al., 2021). This difficulty can negatively impact communication and engagement, leading to increased anxiety and listening fatigue (Saunders et al., 2021). These studies predominantly focus on native speakers and L1 listeners with implications for individuals with hearing impairment. It is important to note, however, that not all research has shown negative effects of face masks on speech perception. For example, Choi et al. (2025) investigated native Cantonese speakers in L1 listening conditions and found no significant decrease in speech intelligibility or accuracy in identifying vowels and consonants when surgical face masks were worn. This suggests that the impact of face masks may vary across the speaker-listener population and context, highlighting the complexity of the issue and the need for context-specific investigations.
The effect of face masks is not confined to native language contexts but also extends to L2 learners, who often rely more heavily on visual cues. In a study examining the impact of using face masks in L2 learning contexts, Lee and Hart (2022) investigated how the use of face masks affects the listening comprehension of Japanese L1 speakers who are learning EFL. Their findings reveal that face mask usage has a significant negative impact on listening comprehension, which emphasizes the need for further investigation into the implications of face masks on L2 learning and teaching strategies.
Furthermore, Kitikanan and Leung (2024) conducted a study on Thai L2 English learners, examining the effects of different types of masks, presentation modes, and speaking styles on speech intelligibility. Their findings indicated that when speakers employed a “clear speech” style—characterized by speaking more slowly, loudly, and with exaggerated articulation—speech intelligibility was significantly enhanced, even while wearing a disposable face mask, compared to when speakers used a conversational speech style with a face mask. This suggests that some communication strategies, such as clear speech, can alleviate some of the adverse effects associated with mask-wearing. Moreover, research by Hansen Edwards and Zampini (2025) expands this understanding by examining how face masks influence speech intelligibility and listener ratings of different varieties of Asian English. Their study found that wearing face masks can significantly impact not only the intelligibility of speech but also the listeners’ perceptions of accentedness and comprehensibility in multilingual settings. The study highlights that some speakers were perceived as more accented and less comprehensible when wearing face masks, especially when the listener was unfamiliar with the speaker’s variety of English.
Wearing face masks can also have a negative impact on word recognition and memory for L2-speaking individuals, especially in noisy environments. Research by Smiljanic et al. in 2021 demonstrated that masks did not disrupt speech perception for both native and non-native speakers of a language when they were in quiet environments. However, the researchers observed that with the presence of even a small amount of background noise, non-native speakers experienced decreased intelligibility, unlike native speakers. Additionally, the study revealed that wearing masks had a more adverse effect on non-native speech than on that of native speakers, highlighting an important consideration in second-language learning environments where many individuals speak the language as an additional language. Even with native speakers, both children and adults exhibited reduced sentence recognition when listening to a speaker wearing a face mask, as their speech was characterized as being slower and spectrally flat (Calandruccio et al., 2020).
As the world continues to navigate the challenges of post-pandemic education, it is crucial to understand how face masks affect language learning and how educators can support learners in adapting to this new reality. This is particularly important given that face masks remain common in several Asian countries, including China, India, and Japan (Kitikanan & Leung, 2024). More research is needed to better understand how visual cues impact language comprehension and word recognition, especially in L2 and EFL learning environments, and the implications of face mask usage on the effectiveness of L2 learning and teaching strategies.
Saudi Arabian higher education context
Understanding the Saudi higher education context and the dynamics of listening skills among Saudi university language learners is particularly pertinent within the context of this study. English is taught as a foreign language in Saudi Arabia, with many subjects in higher education taught in English as a medium of instruction (EMI). Despite English being prevalent in many areas of life, especially in the areas of business and academics, it remains a foreign language that many students struggle to achieve proficiency due to a lack of practice (Alshammari, 2022).
University EFL learners in Saudi Arabia have been shown to face challenges with listening comprehension, as indicated in a qualitative study by Otair and Abd Aziz (2017). The research revealed that these learners experience high levels of anxiety when listening to English and attributed the difficulty to the inability to decipher what is being said and the potential loss of concentration. This is further exacerbated by the learners’ low English proficiency, which heightened the difficulty of listening and increased anxiety among the students. Moreover, noisy classrooms further hindered their listening ability. In another study, Saudi university students have been found to rely on problem-solving strategies during activities that require listening, such as actively addressing the barriers that come up during the process (Al-Khresheh & Alruwaili, 2024). However, none of these studies investigate the role of visual cues in improving or hindering listening comprehension.
In addition, speakers of Arabic have been found to struggle with perceiving certain sounds that do not exist in their native language. Evans and Alshangiti’s (2018) study conducted in Saudi Arabia on Arabic-speaking learners of English revealed that those with low proficiency faced challenges in identifying sounds, particularly in the presence of background noise. Difficulties in sound recognition were more pronounced with vowels and specific consonants, and were particularly prevalent among low-proficiency learners. For example, consonant distinctions such as /p/–/b/ (e.g., tap-tab) are challenging due to the absence of the voiceless bilabial stop /p/ in Arabic (Flege & Port, 1981), while vowel contrasts like /æ/–/ɑ/ (e.g., bed-bad) reflect Arabic’s reduced vowel inventory (Alshangiti & Evans, 2024). These disparities may amplify challenges associated with masked speech for Arabic-speaking EFL learners, as visual cues can help distinguish vowel contrasts that are less prominent in their L1. Therefore, it is important to investigate whether the use of face masks may exacerbate these issues for learners, particularly for phoneme pairs that rely on visual cues for disambiguation.
What’s more, the Saudi educational context, which serves as the focal point of this study, presents a compelling case for examination in relation to facial covering practices. Given the cultural tradition of face veiling among women within Saudi society (Long, 2005), this context offers a unique perspective on the interaction between traditional customs and audio-visual cues in language acquisition processes. It remains unclear to what extent the prevalent use of face veils might have accustomed learners to comprehend individuals whose faces are obscured. Notably, interactions regarding face covering predominantly occur within the L1 and are not typically encountered in EFL or other language learning environments. Therefore, it is necessary to investigate whether the post-pandemic use of face masks affects EFL learners’ word recognition and listening comprehension skills in Saudi Arabia.
Current research on the effects of face masks on L2 learners is limited, with few existing studies primarily on L2 learners in Chinese, Japanese, or Thai contexts. While these investigations provide foundational insights, the broader implications of L1 phonological and communicative differences in mask-mediated speech perception remain underexplored. For example, Arabic differs significantly from English and the East Asian languages studied to date in that the latter languages are pitch-accented or tonal languages, whereas Arabic is a stress-timed non-tonal language characterized by emphatic consonants and a reduced vowel system (Alghamdi, 1998; Flege & Port, 1981; Kager, 2012; Watson, 2002). These phonological differences mean that Arabic-speaking learners may rely on different auditory and visual cues when processing spoken English. For instance, Arabic speakers may depend more on vowel quality and emphatic consonants, many of which involve visible articulatory movements. As a result, the obscuring of facial and lip movements by masks may interfere more directly with the perception of contrasts that Arabic speakers struggle with, such as /p/–/b/ or /æ/–/ɑ/, which are already challenging due to their absence or reduced distinction in Arabic. Such cross-linguistic disparities suggest that challenges posed by obscured visual cues may vary significantly across L1-L2 pairings, underscoring the need to study these effects in a broader range of linguistic contexts. This has important implications for both theory and pedagogy, as it highlights the specific perceptual needs of learners based on their L1 background.
By examining Saudi EFL learners as a case, this study bridges both global and local gaps, extending the emerging discourse on face masks and L2 processing while offering novel insights into how Arabic-speaking L2 learners of English interact with the absence of visual cues. Such insights hold practical significance in diverse linguistic contexts, informing language pedagogy in mask-prone environments. Specifically, this study investigates Saudi EFL learners’ auditory-visual processing through the following research questions:
How does wearing facial masks that cover visual input (e.g., lip movements and facial gestures) affect the listening comprehension of EFL learners in Saudi Arabia?
How does wearing facial masks that cover visual input (e.g., lip movements and facial gestures) affect word discrimination in English for EFL learners in Saudi Arabia?
Does language proficiency play a role in the need for visual signals for EFL learners in Saudi Arabia?
Methods
Participants
Participants were 254 EFL learners enrolled in a public university in Saudi Arabia and in their first year of study (mean age = 19.04, SD = 1.59). These first-year students were required to take 18 h of English courses per week. Their proficiency levels were categorized into lower-intermediate (CEFR levels A2 to B1) and intermediate (CEFR levels B1-B2) based on institutional placement. Participants were duly briefed on the study’s procedures. Informed consent was obtained from all participants, ensuring their voluntary agreement to partake in the research, and a statement was included to report any hearing impairments.
Study design
The study employed a quasi-experimental design where participants were randomly assigned to one of three listening conditions:
Audio-visual-face (AV-F): Speaker’s face was in full view.
Audio-visual-mask (AV-M): Speaker wore a surgical mask, covering the lower part of the face, including the mouth and nose.
Audio-only (A-O): Speaker could be heard with no visual input (blank screen).
Figure 1 below is a visual representation of the three listening conditions.
[See PDF for image]
Fig. 1
Listening conditions. Note: AV-F shows the speaker’s entire face, AV-M shows the speaker wearing a surgical face mask that covers the lower part of the face, and A-O shows a blank dark screen while listening
During the task, participants were instructed to look at the screen while listening to the speaker and then answer the questions. To prevent any potential bias in the results, participants were provided with information about the purpose of the masked speaker after completing the task. This approach was implemented to ensure the study’s integrity and the accurate interpretation of the results. The study received approval from the Scientific Research and Ethics Committee at the English Language Institute at the University of Jeddah, after which potential participants were recruited.
Stimuli
The study incorporated two distinct tasks: a listening comprehension task and a word discrimination task. The stimuli for both tasks were meticulously developed in collaboration with a female native English speaker who spoke with a General American accent. The stimuli were recorded twice: once while wearing a surgical mask (AV-M) and once without the mask (AV-F). The AV-F recording was utilized to create the audio-only condition (A-O) by removing the visual input and retaining only audio. All audio-visual recordings were edited using Audacity software to remove background noise, such as low-frequency hums, using the noise reduction tool. This process ensured enhanced speech clarity and intelligibility across experimental conditions.
Tasks
Listening comprehension task. The listening comprehension content was an unseen/unheard text from the students’ curriculum, measuring at CEFR level B1 to ensure it was not beyond the students’ proficiency. All three recordings were precisely 3 min and 40 s in length. The stimuli were piloted with English language instructors and students to ensure the sound quality and clarity. During the task, participants watched or listened to the video (according to their assigned condition) and answered 10 multiple-choice questions that tested main ideas, details, and inferences, requiring no prior knowledge. Each correct answer was given 1 point, with a maximum score of 10 points.
Word discrimination task. The stimuli for the word discrimination assessment were created using 40 minimal pairs of similar-sounding words that differ only in one sound, either a consonant or a vowel (e.g., bed-bad or tap-tab). The selected minimal pairs consisted of simple, high-frequency words that participants are likely familiar with. Half of the pairs included words with documented phonetic challenges for Arabic-speaking learners of English. For example, consonant pairs /p/–/b/ (tap-tab) address the absence of the voiceless bilabial stop /p/ in Arabic (Flege & Port, 1981), whereas vowel pairs included pairs like /æ/–/ɑ/ (bed-bad) reflect Arabic’s reduced vowel inventory (Alshangiti & Evans, 2024). The inclusion of these challenging pairs was purposefully designed to investigate the extent to which visual cues may enhance perceptual accuracy. During the task, participants heard/viewed the speaker saying an isolated word (e.g., “bed”) and selected the perceived word from two on-screen options (e.g., “bed” vs. “bad”). The length of the recorded stimuli for each word ranged from 2 to 3 s. Each correct response was assigned a value of 1 point, with a maximum score of 40 points.
Procedure
The study was conducted in April 2022, when mask-wearing was still mandatory in all Saudi Arabian public institutions. Participants were exposed to one of three study conditions: AV-F, AV-M, or A-O. They were tested on their comprehension with ten multiple-choice questions, followed by 40 minimal pairs. Each condition was played once using an Epson projector on a 120-inch screen and JBL speakers. Participants were instructed to look at the screen while listening, regardless of their assigned condition. The total duration of the experimental procedure was between five and six minutes, exclusive of the instructional session prior to its commencement. The brief duration was deliberately designed to minimize participant fatigue and maintain participants’ focus, thereby facilitating the collection of reliable data.
Analysis
Participants’ accuracy scores were based on the number of correct responses to the comprehension questions, ranging from 0 to 10, and the discrimination questions, ranging from 0 to 40. To answer the first research question, a one-way analysis of variance (ANOVA) was conducted to determine whether there is a significant relationship between the listening condition and the accuracy of the listening comprehension test. Post-hoc tests were employed when necessary. The listening mode served as the independent variable, categorized into three levels: AV-F, AV-M, and AV-O; whereas the comprehension test score was the dependent variable. For the second research question, an ANOVA was similarly conducted to assess the relationship between listening condition and word discrimination scores. Levene's test of homogeneity of variances was non-significant, F(2, 251) = 0.19, p = .830, indicating that the data samples meet the assumption of equal variances.
Results
Listening comprehension
Mean scores and standard deviations of the EFL learners' listening comprehension test results were calculated for the three experimental groups (AV-F, n = 92; AV-M, n = 80; and A-O, n = 82) and are presented in Fig. 2.
RQ1. How does wearing facial masks that cover visual input (e.g., lip movements and facial gestures) affect the listening comprehension of EFL learners in Saudi Arabia?
[See PDF for image]
Fig. 2
Mean scores of EFL students' responses in the listening comprehension test
In the listening comprehension test, the ANOVA yielded statistically significant results [F(2, 251) = 19.20, p < 0.001, η2 = 0.13], indicating a large effect size. This suggests that listening mode explains approximately 13% of the variance in comprehension scores. In other words, there is a significant difference in the listening comprehension test scores of Saudi EFL learners based on their listening mode (Table 1).
Table 1. Differences in mean listening comprehension scores between conditions
Test | Source of Variance | Sum of Squares | df | Mean Square | F value | Sig |
|---|---|---|---|---|---|---|
Listening comprehension | Between Group | 71.5 | 2 | 35.76 | 19.2 | < 0.001* |
Within Group | 467.7 | 251 | 1.86 | |||
Total | 539.2 | 243 |
*Indicates significant at p < 0.05
To further understand the impact of wearing face masks compared to not seeing the speaker’s face at all, a post-hoc Scheffe test was employed to compare the mean scores of the listening comprehension scores in each listening mode (AV-F, AV-M, and A-O) and help control for Type 1 error inflation across all pairwise comparisons. The findings indicate a significant difference in test scores between the AV-F, AV-M, and A-O groups, with the AV-F group obtaining the highest mean accuracy scores. However, there is no significant difference between learners' responses in AV-M and A-O conditions (see Table 2), meaning that results when wearing a mask were comparable to not seeing the speaker’s face at all.
Table 2. Scheffe test comparing mean scores in listening conditions
Test | Listening conditions | Mean difference | Sig | |
|---|---|---|---|---|
Listening comprehension test | AV-F | AV-M | 1.09 | < 0.001* |
A-O | 1.12 | < 0.001* | ||
AV-M | A-O | 0.03 | 0.990 | |
*Indicates significant at p < 0.05
Word discrimination
The mean scores and standard deviations of the EFL learners' word discrimination test results, calculated for the three experimental groups, are presented in Fig. 3 below.
[See PDF for image]
Fig. 3
Mean scores of EFL students’ word discrimination test
RQ2. How does wearing facial masks that cover visual input (e.g., lip movements and facial gestures) affect word recognition in English for EFL learners in Saudi Arabia?
The word recognition ANOVA yielded statistically significant results [F(2, 251) = 23.64, p < 0.001, η2 = 0.16], indicating a large effect size. Listening mode explains approximately 16% of the variance in word discrimination scores. This indicates that there is a significant difference in the word discrimination test scores of Saudi EFL learners based on their listening mode. Specifically, the study found that wearing masks had a negative impact on listening comprehension, as revealed in Table 3.
Table 3. Differences in mean word discrimination scores between conditions
Test | Source of Variance | Sum of Squares | df | Mean Square | F value | Sig |
|---|---|---|---|---|---|---|
Word Discrimination | Between Group | 446.2 | 2 | 223.6 | 23.64 | < 0.001* |
Within Group | 2368.3 | 251 | 9.43 | |||
Total | 2814.5 | 253 |
*Indicates significant at p < 0.05
A follow-up Scheffe test was employed to compare the mean scores of the word discrimination scores in each listening mode and help control for Type 1 error inflation across all pairwise comparisons. The findings indicate a significant difference in test scores between the AV-F, AV-M, and A-O groups, with the AV-F group obtaining the highest mean accuracy scores. However, there is no significant difference between learners' responses in AV-M and A-O conditions (see Table 4). Specifically, the study found that wearing masks had a negative impact on word discrimination and that results from wearing a mask were comparable to not seeing the speaker’s face.
Table 4. Scheffe test comparing word discrimination scores in listening conditions
Test | Listening conditions | Mean difference | Sig | |
|---|---|---|---|---|
Word discrimination test | AV-F | AV-M | 3.16 | < 0.001* |
A-O | 2.07 | < 0.001* | ||
AV-M | A-O | − 1.09 | 0.081 | |
*Indicates significant at p < 0.05
RQ3. Does language proficiency play a role in the need for visual signals for EFL learners in Saudi Arabia?
Further analyses were conducted to evaluate the effects of proficiency level on listening comprehension and word recognition scores across conditions. For the listening comprehension, an independent samples T-test was conducted at (α ≤ 0.05) to determine if there were any statistically significant discrepancies between the mean scores of EFL learners' responses in the listening comprehension pre-intermediate and intermediate proficiency groups. The findings revealed that learners in the higher proficiency group exhibited considerably greater accuracy in all three conditions while displaying the same pattern of having higher accuracy in the AV-F condition. See Table 5 for more details.
Table 5. Differences in the listening comprehension test scores
Group | Proficiency Level | N | Mean | SD | T-value | Sig |
|---|---|---|---|---|---|---|
AV-F | Pre-intermediate | 57 | 6.16 | 0.92 | 7.90 | < 0.001* |
Intermediate | 35 | 7.91 | 1.19 | |||
AF-M | Pre-intermediate | 48 | 5.40 | 1.48 | 2.80 | 0.006* |
Intermediate | 32 | 6.25 | 1.08 | |||
A-O | Pre-intermediate | 57 | 5.49 | 1.15 | 2.21 | 0.030* |
Intermediate | 25 | 6.20 | 1.68 |
*Indicates significant at p < 0.05
For word recognition, an independent samples T-test was conducted at (α ≤ 0.05) to determine if there were any statistically significant discrepancies between the mean scores of EFL learners’ responses in the word recognition pre-intermediate and intermediate proficiency groups. The findings revealed that learners in the higher proficiency group exhibited considerably greater accuracy in all three conditions while displaying the same pattern of having higher accuracy in the AV-F condition (Table 6).
Table 6. Differences in word recognition test scores
Group | Proficiency level | N | Mean | SD | T-value | Sig |
|---|---|---|---|---|---|---|
AV-F | Pre-intermediate | 57 | 30.05 | 2.78 | 7.41 | < 0.001* |
Intermediate | 35 | 33.97 | 1.82 | |||
AF-M | Pre-intermediate | 48 | 27.75 | 2.75 | 2.86 | 0.005* |
Intermediate | 32 | 29.34 | 1.84 | |||
A-O | Pre-intermediate | 57 | 28.93 | 3.22 | 2.19 | 0.031* |
Intermediate | 25 | 30.72 | 3.75 |
*Indicates significant at p < 0.05
Discussion
This research aimed to assess the impact of wearing face masks on the listening comprehension and word recognition of Saudi EFL learners by comparing three different listening conditions: audio-visual-face (AV-F), audio-visual-mask (AV-M), and audio-only (A-O). The mean scores in the AV-F condition, where EFL learners could see the speaker’s entire face, were significantly higher than those in the conditions where the speaker was wearing a face mask (i.e., AV-M) or not showing the face at all (i.e., A-O). Despite some research showing that surgical masks have the least effect on acoustics compared to other mask types (Toscano & Toscano, 2021), language learners’ listening comprehension and word recognition accuracy were significantly lower when learners could not see the speaker’s full face. Differences between the three listening conditions suggest that Saudi EFL learners use visual cues from facial and mouth movements to comprehend oral speech. Previous research indicates that while surgical masks may minimally alter high-frequency acoustic signals, they do not significantly impact word or sentence intelligibility (Magee et al., 2020). This suggests that the variability in test scores is likely due to the lack of visual cues for L2 learners from face masks as they conceal these crucial visual cues, which can hinder the perception of speech sounds (Bottalico et al., 2020; Corey et al., 2020) and may negatively affect listening comprehension and word recognition. In instances of word discrimination, where certain sounds may pose challenges for Arabic speakers, the presence of face masks still proved helpful in discerning these sounds, as AV-F accuracy scores were significantly higher than in the other two conditions. These findings align with existing literature highlighting the significance of receiving visual input from the speaker's face during oral communication.
The results also showed comparable performance in listening comprehension and word recognition of Saudi EFL learners in the mask-wearing and audio-only conditions. While this aligns with studies emphasizing language learners’ reliance on multimodal input (Dahl & Ludvigsen, 2014; Sueyoshi & Hardison, 2005), statistical non-significance does not imply perceptual equivalence between conditions. Prior studies have highlighted that masked speech often requires greater cognitive effort due to degraded auditory-visual cues, even when accuracy remains unaffected (Rahne et al., 2021). For example, Lee and Hart (2022) found Japanese L1 learners performed slightly better in audio-only conditions than with masked speakers. The authors attributed the advantage of not receiving visual cues compared to seeing the masked speaker to the effect of effortful listening (Pichora-Fuller et al., 2016), in which the listener uses more effort and cognitive capacity to decode speech in the presence of obstacles, such as a face mask, than in their absence, such as in audio-only listening conditions.
On the other hand, the comparable performance of Saudi learners in AV-M and A-O conditions may be indicative of an adaptation to masks during the pandemic, mitigating the effects of effortful listening over time. Furthermore, the similarity in accuracy results between the two contexts may also reflect the long-term exposure to the practice of face veiling within Saudi society (Long, 2005), while functionally distinct from pandemic-related mask-wearing, it could foster adaptive strategies for decoding speech with limited visual input, potentially explaining participants’ comparable performance in AV-M and A-O conditions. Although it is posited that the effect of effortful listening might have decreased as language learners’ familiarity with face coverings increased over time and was considered less of an obstacle that required additional effort, it is important to note that this study did not directly measure listening fatigue or cognitive effort.
Some studies have reported minimal impacts of face-masked speech on L2 speech perception (Edwards & Zampini, 2025; Kitikanan & Leung, 2024). Findings in this study revealed significant declines in comprehension and word discrimination under masked (AV-M) and audio-only (A-O) conditions. This divergence may stem from critical methodological distinctions. For instance, Hansen Edwards and Zampini (2025) measured speech intelligibility and comprehensibility via self-reported listener ratings of utterances, as opposed to the objective accuracy of comprehension and word discrimination. Their study focuses on biases against specific English varieties, shifting emphasis away from perceptual challenges posed by visual deprivation—a central focus of this study. Similarly, Kitikanan and Leung (2024) found no mask effects in “clear speech” contexts, where slower, exaggerated articulation compensates for obscured visual cues. By contrast, the monologue stimuli in this study mirrored natural conversational speech with typical pacing and articulation, which lacks such compensatory hyperarticulation. This distinction is crucial, as natural speech places greater demand on auditory-visual integration to resolve ambiguities (Sueyoshi & Hardison, 2005).
The findings also showed that individuals with higher language proficiency had better listening comprehension and word recognition performance in all three conditions, meaning that proficiency moderates the relationship between listening mode and comprehension and discrimination outcomes. Although both proficiency groups demonstrated improved listening accuracy in the AV-F condition, the higher-proficiency group showed greater benefits from observing the speaker’s facial cues and lip movements. These results align with those of Sueyoshi and Hardison (2005), suggesting that L2 learners with advanced proficiency have a greater awareness of visible speech cues than L2 learners of lower proficiencies and can use them more effectively as a listening strategy, potentially due to their experience interacting with the L2 language. The importance of experience with facial cues and lip movements in speech perception highlights the potential value of incorporating visual cue training in language instruction for lower-level learners, as has been observed in speech training for perceiving individual sounds (Erdener, 2020; Mehta, 2020). Similarly, more experienced learners can better identify and recognize sounds in the target language than their less experienced peers (Evans & Alshangiti, 2018).
Pedagogical implications
The results of this study highlight the importance of considering the pedagogical implications of using face masks in language classrooms and online teaching. Although prioritizing the health and safety of students and instructors is essential, the findings provide language educators with practical insights into the challenges that EFL and L2 learners encounter when wearing face masks during language learning activities. Therefore, the results of the study imply that educators should adapt their teaching to mitigate potential difficulties associated with reduced visibility of facial cues. For instance, in situations where medical protective gear is required, transparent face masks may be a viable alternative to provide learners with additional visual cues, improving speech recognition, reducing the effort needed to concentrate, and increasing listener confidence (Thibodeau, 2021).
Moreover, the results suggest that incorporating AV components into EFL teaching methods is beneficial for enhancing students’ understanding and interpretation of spoken language. For example, the use of video content in instructional material can provide students with a clear visual of the speaker, making it a valuable supplement when the use of facial masks is inevitable (Al-Samiri, 2021). Particularly notable are videos that show interactions between two speakers of the target language culture. Seeing videos with multiple speakers can help improve comprehension and communication in the second language and culture (Hardison & Pennington, 2021). Alternatively, for online instruction, utilizing video features that allow EFL learners to view the speaker’s facial expressions and mouth movements is preferable, while recognizing that lags can also create a challenge due to asynchronous speech and visual information.
The results of this study also have implications that are generalizable beyond language acquisition because they can also benefit medical professionals who regularly wear surgical masks and communicate with patients who speak English as a second language. Overall, this study provides valuable insights into the unintended effects of COVID-19 on language learning and suggests practical measures to ensure effective communication and learning in situations where medical protective gear is necessary.
Limitations
It is important to acknowledge that this study has several limitations. First, the experimental design involved presenting stimuli in a video format. While this approach ensured consistent input across different experimental conditions, the results may not be generalizable to face-to-face interactions, where learners can adjust strategies or seek clarifications in real-time. In addition, participants were instructed to focus on the screen while listening, but it was not possible to control their level of engagement. Furthermore, the experiment used only one speaker with a single English accent (General American) and a formal speech style (scripted monologue), which may introduce bias and limit the generalizability of the findings to diverse speaker profiles, accents, and conversational contexts. Additionally, the audio stimuli were delivered through speakers instead of headphones, potentially introducing ambient noise variability.
Another important limitation of this study lies in its treatment of L2 English proficiency as a binary variable (lower-intermediate vs. intermediate). While this categorization is aligned with institutional CEFR-based groupings, it does not fully capture proficiency as a continuum, potentially oversimplifying its role in auditory-visual speech perception. Furthermore, there may be unmeasured confounding variables—such as cognitive abilities (e.g., working memory, attention), listening fatigue, prior exposure to masked speech, or individual learning strategies—that could have influenced group differences. Although participants were randomly assigned to listening mode conditions to reduce such biases, the anonymization of data and lack of access to participants did not allow for post-hoc verification of group equivalence on these variables. These constraints may limit the generalizability of findings regarding the interaction between language proficiency, visual cues, and other latent factors. Therefore, these limitations should be taken into account when interpreting the results.
Future studies should utilize more nuanced proficiency data (e.g., standardized test scores or psycholinguistic measures), pre-tests and post-tests (including listening fatigue), and acoustic measurements to better account for individual variability and isolate the effects of a lack of visual input. Additionally, incorporating diverse speakers, accents, and speech styles can help reduce biases based on familiarity with particular varieties and styles. Additionally, comparing video-recorded and live interactions will enable the assessment of how L2 learners can adapt their listening strategies (e.g., real-time clarification requests, contextual adjustments) in authentic classroom settings to influence comprehension outcomes in masked speech. Finally, using a controlled audio delivery method, such as noise-cancelling headphones, can also minimize the potential impact of ambient noise.
Conclusion
This research is the first to investigate the impact of wearing face masks on the listening comprehension and word recognition of EFL students in Saudi Arabia in the post-pandemic world. The study contributes to the broader field of language education by emphasizing the significance of observing lip movements and facial expressions in language learning settings, particularly in the area of listening. Specifically, this study measured the adverse effects of wearing face masks on listening comprehension and word recognition accuracy because they obscure visual cues, such as facial and lip movements, that language learners use to process spoken language. Additionally, the study highlights the correlation between higher language proficiency and better performance in the audio-visual condition, indicating that learners with more language experience may rely more on visual cues for comprehension and word recognition. In conclusion, the study has practical implications for language educators, researchers, and policymakers, offering insights into the challenges posed by face masks in language learning contexts and suggesting ways to enhance speech perception for EFL learners. Further research is required in this area to explore the effectiveness of incorporating visual cue training in language instruction to improve listening skills, especially for EFL learners with lower language proficiency who may rely more extensively on visual cues when listening.
Acknowledgements
The author is grateful to all who participated in this study and to Dr. Debby Adams for her assistance in creating the stimuli for this study.
Author contributions
The study was designed, conducted, and written by R.A.
Funding
This work was funded by the University of Jeddah, Jeddah, Saudi Arabia [grant No. UJ-22-DR-33]. The author, therefore, acknowledges with thanks the University of Jeddah for its technical and financial support.
Availability of data and materials
The datasets used and analyzed during the current study are available from the author upon reasonable request.
Declarations
Competing interests
The authors declare no competing interests.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
Algana, M; Hardison, DM. Variable effects of speakers’ visual cues and accent on L2 listening comprehension: A mixed-methods approach. Language Teaching Research; 2024; [DOI: https://dx.doi.org/10.1177/13621688241246106]
Alghamdi, MM. A spectrographic analysis of Arabic vowels: A cross-dialect study. Journal of King Saud University; 1998; 10,
Al-Khresheh, MH; Alruwaili, SF. Metacognition in listening comprehension: Analyzing strategies and gender differences among Saudi EFL University students. Cogent Social Sciences; 2024; 10,
Al-Samiri, RA. English language teaching in Saudi Arabia in response to the COVID-19 pandemic: Challenges and positive outcomes. Arab World English Journal (AWEJ) Special Issue on Covid 19; 2021; [DOI: https://dx.doi.org/10.24093/awej/covid.11]
Alshammari, H. Investigating the Low English Proficiency of Saudi EFL Learners. Arab World English Journal; 2022; 13,
Alshangiti, W; Evans, BG. Learning English vowels: The effects of different phonetic training modes on Arabic learners' production and perception. The Journal of the Acoustical Society of America; 2024; 156,
Barrós-Loscertales, A; Ventura-Campos, N; Visser, M; Alsius, A; Pallier, C; Ávila Rivera, C; Soto-Faraco, S. Neural correlates of audiovisual speech processing in a second language. Brain and Language; 2013; 126,
Blanco-Elorrieta, E; Ding, N; Pylkkänen, L; Poeppel, D. Understanding requires tracking: Noise and knowledge interact in bilingual comprehension. Journal of Cognitive Neuroscience; 2020; 32,
Bottalico, P; Murgia, S; Puglisi, GE; Astolfi, A; Kirk, KI. Effect of masks on speech intelligibility in auralized classrooms. The Journal of the Acoustical Society of America; 2020; 148,
Calandruccio, L; Porter, HL; Leibold, LJ; Buss, E. The clear-speech benefit for school-age children: Speech-in-noise and speech-in-speech recognition. Journal of Speech, Language, and Hearing Research; 2020; 63,
Choi, W; Chu, T; Zu, J. Can you hear me clearly? The differential effects of surgical mask on Cantonese consonant, vowel, and tone perception. Frontiers in Communication; 2025; 10, 1582217. [DOI: https://dx.doi.org/10.3389/fcomm.2025.1582217]
Corey, RM; Jones, U; Singer, AC. Acoustic effects of medical, cloth, and transparent face masks on speech signals. The Journal of the Acoustical Society of America; 2020; 148,
Dahl, TI; Ludvigsen, S. How i see what you're saying: The role of gestures in native and foreign language listening comprehension. The Modern Language Journal; 2014; 98,
Erdener, D. Huertas Abril, CA. Second language instruction: Extrapolating from auditory-visual speech perception research. Handbook of Research on Bilingual and Intercultural Education; 2020; IGI Global: [DOI: https://dx.doi.org/10.4018/978-1-7998-2588-3]
Evans, BG; Alshangiti, W. The perception and production of British English vowels and consonants by Arabic learners of English. Journal of Phonetics; 2018; 68, pp. 15-31. [DOI: https://dx.doi.org/10.1016/j.wocn.2018.01.002]
Flege, JE; Port, R. Cross-language phonetic interference: Arabic to English. Language and Speech; 1981; 24,
Giovanelli, E; Valzolgher, C; Gessa, E; Todeschini, M; Pavani, F. Unmasking the difficulty of listening to talkers with masks: Lessons from the COVID-19 pandemic. I-Perception; 2021; 12,
Hansen Edwards, JG; Zampini, ML. The impact of audio versus audiovisual stimuli with or without face masking on judgements about different varieties of Asian English. World Englishes; 2025; [DOI: https://dx.doi.org/10.1111/weng.12734]
Hardison, DM. Visual and auditory input in second-language speech processing. Language Teaching; 2010; 43,
Hardison, DM; Pennington, MC. Multimodal second-language communication: Research findings and pedagogical implications. RELC Journal; 2021; 52,
Hirata, Y; Kelly, SD. Effects of lips and hands on auditory learning of second-language speech sounds. Journal of Speech, Language, and Hearing Research; 2010; 53,
Kager, R. Stress in windows: Language typology and factorial typology. Lingua; 2012; 122,
Kitikanan, P; Leung, AHC. Wearing face masks in different speech styles during the COVID-19 pandemic: A study of Thai L2 English learners. International Journal of Applied Linguistics; 2024; 34,
Lee, BJ; Hart, ET. Facemask occlusion’s impact on L2 listening comprehension. Speech Communication; 2022; 139, pp. 45-50. [DOI: https://dx.doi.org/10.1016/j.specom.2022.03.005]
Lesnov, RO. Furthering the argument for visually inclusive L2 academic listening tests: The role of content-rich videos. Studies in Educational Evaluation; 2022; 72, [DOI: https://dx.doi.org/10.1016/j.stueduc.2021.101087] 101087.
Long, DE. Culture and customs of Saudi Arabia; 2005; Bloomsbury Publishing: [DOI: https://dx.doi.org/10.5040/9798400635724]
Mantzikos, CN; Lappa, CS. Difficulties and barriers in the education of deaf and hard of hearing individuals in the era of COVID-19: The case of Greece. European Journal of Special Education Research; 2020; 6,
Magee, M; Lewis, C; Noffs, G; Reece, H; Chan, J; Zaga, CJ; Chan, JCS; Paynter, C; Birchall, O; Rojas Azocar, S; Ediriweera, A; Kenyon, K; Caverlé, MW; Schultz, BG; Vogel, AP. Effects of face masks on acoustic analysis and speech perception: Implications for peri-pandemic protocols. The Journal of the Acoustical Society of America; 2020; 148,
Mcgurk, H; Macdonald, J. Hearing lips and seeing voices. Nature; 1976; 264,
Mehta, S. (2020). Effects of Visual Feedback on the Production and Perception of Second Language Speech Sounds: A Comparison of Articulatory and Auditory Instruction [Ph.D., The University of Texas at Dallas]. Retrieved May 10, 2021, from https://www.proquest.com/docview/2488730326/abstract/A7155205B54B4248PQ/1
Navarra, J; Soto-Faraco, S. Hearing lips in a second language: Visual articulatory information enables the perception of second language sounds. Psychological Research Psychologische Forschung; 2007; 71,
Otair, I; Abd Aziz, NH. Exploring the causes of listening comprehension anxiety from EFL Saudi learners’ perspectives: A pilot study. Advances in Language and Literary Studies; 2017; 8,
Pichora-Fuller, MK; Kramer, SE; Eckert, MA; Edwards, B; Hornsby, BWY; Humes, LE; Lemke, U; Lunner, T; Matthen, M; Mackersie, CL; Naylor, G; Phillips, NA; Richter, M; Rudner, M; Sommers, MS; Tremblay, KL; Wingfield, A. Hearing impairment and cognitive energy: The framework for understanding effortful listening (FUEL). Ear and Hearing; 2016; 37, 5S. [DOI: https://dx.doi.org/10.1097/AUD.0000000000000312]
Rahne, T; Fröhlich, L; Plontke, S; Wagner, L. Influence of surgical and N95 face masks on speech perception and listening effort in noise. PLoS ONE; 2021; 16,
Reisberg, D. (1987). Easy to hear but hard to understand: A lipreading advantage with intact auditory stimuli. Hearing by eye: The psychology of lip-reading.
Saunders, GH; Jackson, IR; Visram, AS. Impacts of face coverings on communication: An indirect impact of COVID-19. International Journal of Audiology; 2021; 60,
Smiljanic, R; Keerstock, S; Meemann, K; Ransom, SM. Face masks and speaking style affect audio-visual word recognition and memory of native and non-native speech. The Journal of the Acoustical Society of America; 2021; 149,
Stam, G., & Tellier, M. (2022). Gesture Helps Second and Foreign Language Learning and Teaching. In Morgenstern A. & Goldin-Meadow S. (Eds). Gesture in Language: Development Across the Lifespan (pp.336–363). Mouton de Gruyter - APA. https://doi.org/10.1037/0000269-014.
Sueyoshi, A; Hardison, DM. The role of gestures and facial cues in second language listening comprehension: Language Learning, Vol. 55, No. 4. Language Learning; 2005; 55,
Summerfield, Q. Use of visual information for phonetic perception. Phonetica; 1979; 36, pp. 314-331. [DOI: https://dx.doi.org/10.1159/000259969]
Thibodeau, LM; Thibodeau-Nielsen, RB; Tran, CMQ; Jacob, RTDS. Communicating during COVID-19: The effect of transparent masks for speech recognition in noise. Ear and HeariNg; 2021; 42,
Toscano, JC; Toscano, CM. Effects of face masks on speech recognition in multi-talker babble noise. PLoS ONE; 2021; 16,
Watson, JC. The phonology and morphology of Arabic; 2002; Oxford University Press: [DOI: https://dx.doi.org/10.1093/oso/9780199257591.001.0001]
Yi, H; Pingsterhaus, A; Song, W. Effects of wearing face masks while using different speaking styles in noise on speech intelligibility during the COVID-19 pandemic. Frontiers in Psychology; 2021; [DOI: https://dx.doi.org/10.3389/fpsyg.2021.682677]
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.