Content area
Multimodality, the integration of verbal, gestural, and contextual cues, is critical for teacher-student interaction in English as a Foreign Language (EFL) classrooms; however, existing frameworks fail to systematically analyze how these modalities synergize across educational stages or align with language acquisition principles. To address this gap, we propose a novel coding framework grounded in Second Language Acquisition theory and social constructivism, combining multimodal interaction analysis with lag sequential methods to examine 24 Chinese EFL classroom transcripts. Our analysis reveals two key contributions: (1) Students demonstrated significantly greater proactive engagement than previously assumed (proactive instances: 1363 vs. passive: 868), challenging stereotypes of Chinese learners’ passivity; (2) Modal patterns diverged substantially across educational stages, with elementary classes emphasizing gestural support and high schools prioritizing verbal reflection. These findings underscore the need for stage-specific teaching strategies and provide actionable recommendations for EFL educators, including gestural scaffolding in elementary instruction and reflective pauses in advanced classrooms. The framework’s adaptability signals its potential for cross-cultural validation, offering a robust tool to refine multimodal pedagogy in diverse EFL contexts.
Introduction
Multimodality, defined as the dynamic orchestration of speech, gesture, gaze, and contextual resources to co-construct meaning within cultural contexts (Jewitt & Kress, 2010; Norris, 2004), has a significant impact on overcoming linguistic barriers in English as a Foreign Language (EFL) classrooms (Lee et al., 2021). For example, EFL teachers often use a gesture of raising and lowering the hands to demonstrate the intonation of a word. Yet, existing frameworks inadequately address three critical gaps: the oversight of bidirectional teacher-student interactions, particularly student-initiated behaviors; the predominance of qualitative multimodal discourse analysis (MDA) models ill-suited for Second Language Acquisition (SLA)-informed inquiry; the absence of developmental adaptability in analyzing modality use across educational stages (Sindoni, 2021). These limitations hinder the optimization of multimodal pedagogy, especially in contexts like China, where teacher-dominated instruction and perceived learner passivity persist despite curricular reforms advocating student-centered learning (Ministry of Education of the People’s Republic of China [MOE], 2022).
Prior research has narrowly focused on cataloging isolated modalities (Wen, 2021; Zhang et al., 2022) or quantifying teacher-centered behaviors (Cheung, 2022), neglecting how students co-construct meaning through multimodal interactions. For instance, while Yang and Chen’s (2022) model systematized data collection, it lacked codes for tracking input–output dynamics or peer negotiation—key components of SLA-driven engagement. Similarly, studies on Chinese EFL classrooms predominantly emphasized teacher scaffolding (Han, 2021) or technology’s limited efficacy in rural settings (Li et al., 2019), overlooking the potential of multimodal synergy to foster student agency. This disconnect between theory and practice underscores the urgency for a comprehensive framework that bridges multimodal interaction analysis (MIA) with SLA principles and developmental psychology.
To address these gaps, this study introduces a novel MIA framework grounded in social constructivism (Vygotsky, 1978), Krashen’s Input Hypothesis (1982), and Swain’s Output Hypothesis (1985). The framework innovates in three ways. First of all, it classifies teacher-student interactions into proactive (AB) and passive (PB) behaviors, capturing bidirectional engagement dynamics. Secondly, it integrates lag sequential analysis to decode how modalities synergize temporally (e.g., teacher pauses triggering student questions). Lastly, it aligns modality emphasis with cognitive-developmental stages (elementary: gestural scaffolding; high school: co-verbal reflection).
Teacher-student interaction in Chinese EFL classrooms
Teacher-student interaction refers to the communication and engagement that takes place between a teacher and his or her students in an educational setting (Hall & Walsh, 2002). Chinese EFL classrooms are often criticized for teacher-dominated routines (e.g., mechanical grammar drills) and limited student autonomy (Amoah & Yeboah, 2021), perpetuating the stereotype of passive learners (Li & Lv, 2022; Yu et al., 2019). While technology integration shows mixed results—enhancing urban learner-instructor interaction (Canagarajah, 2018; Chen, 2024; Peng, 2019) but yielding minimal impact in rural areas (Li et al., 2018, 2019)—the root issue lies in the lack of frameworks to systematically analyze how multimodal resources mediate engagement. Our study shifts this paradigm by examining not only teacher-student exchanges but also peer collaboration and student-media interactions, offering a holistic view of EFL dynamics.
Multimodal interaction analysis: bridging theory and practice
Unlike multimodal discourse analysis (MDA), which deciphers meaning embedded in static texts (Kress & Bezemer, 2023; Lim, 2021; Norris, 2020), multimodal interaction analysis (MIA) focuses on real-time, socially mediated interactions (Jewitt, 2015; Schüssel et al., 2016; Su et al., 2021). This distinction is critical for EFL contexts, where gestures, intonation, and technology collaboratively scaffold comprehension. However, existing MIA models (e.g., Yang & Chen, 2022) lack sufficient detail in coding SLA-relevant behaviors, such as student-led output or peer negotiation. Our framework addresses this by embedding SLA principles into modality codes—for example, categorizing teacher pauses as scaffolding moments (Vygotsky, 1978) and student debates as pushed output (Swain, 1985).
Toward a developmental MIA model
Regarding Yang and Chen’s (2022) workflow (see Fig. 1), it lacks explicit guidelines for data analysis. To address this gap, we developed an innovative coding framework for MIA and applied lag sequential analysis—a quantitative approach distinct from the qualitative methods predominantly used in prior studies. This dual approach captures not only what modalities are used but how they evolve to support developmental and linguistic needs, which enhances the objectivity and scientific rigor of our findings.
[See PDF for image]
Fig. 1
Yang and Chen’s (2022) multimodal interaction analysis model
Study purpose, research questions and hypotheses
To address the absence of a comprehensive analytical framework for multimodal interactions in EFL contexts, this study pioneers a theoretically grounded coding system that integrates three critical dimensions, and they are language acquisition principles, multimodal semiotic resources, and developmental appropriateness across educational stages. Drawing on 24 lesson recordings from China’s National Educational Resource Platform, we systematically classify teacher-student and peer interactions while identifying stage-specific modality configurations.
The investigation is structured through these research questions:
RQ1: What multimodal patterns characterize Chinese EFL classrooms?
RQ2: How do modalities interact to shape teacher-student engagement?
RQ3: How do these interactions differ across educational stages?
Guided by social constructivism and SLA theories, to verify the validity and applicability of the coding framework, we hypothesize that:
H1: Multimodalities used by teachers in Chinese EFL classrooms are dominated by visual aids and body language.
H2: There are significant culturally specific multimodal communication patterns in Chinese EFL classrooms.
H3: Teachers’ body language combined with verbal instructions improves students’ understanding of learning content in EFL classrooms.
H4: The synergy between visual modalities (e.g. PowerPoint, blackboard writing) and verbal explanations in the EFL classroom is particularly effective in explaining concepts.
H5: The characteristic of modalities evolves with students’ age and cognitive ability at different educational levels in Chinese EFL classrooms.
H6: Technology-assisted multimodal teaching methods were used more frequently in EFL classrooms at the primary level than at the senior level.
By anchoring the study in these theoretical and empirical foundations, we aim to redefine multimodal pedagogy, offering strategies that resonate with both global SLA principles and localized classroom realities.
Method
Development of the MIA coding framework: addressing gaps in prior models
Currently, there is a lack of widely accepted multimodal analysis frameworks specifically tailored for EFL or SLA studies (Xu et al., 2022), and we conducted a comprehensive review of discourse and interaction analysis literature to identify a suitable approach for coding multimodal data in EFL contexts. For example, prior frameworks (e.g., Qin & Wang, 2021; Wen, 2021) catalogued modalities in isolation (e.g., gestures, speech) but failed to link them to language learning mechanisms. Based on these limitations, we proposed corresponding improvements to the establishment of our coding framework (see Table 1).
Table 1. A comparative summary of MIA Coding frame and its enhancements
Coding framework | Limitations | Improvements |
|---|---|---|
Qin and Wang (2021) | • Lacked a clear classification framework • Overemphasized body language • Limited to “lead-in” phase • Focused only on teachers, not student–teacher interactions | • Categorized into four main modalities • Integrated English language learning features • Defined codes clearly with examples • Included both teacher-student and peer interactions |
Zhang et al. (2022) | • Only addressed visual and auditory modalities • Failed to capture the complexity of language learning contexts | • Expanded to cover multimodal diversity • Aligned with language learning dynamics (e.g., verbal/nonverbal balance) |
Bobkina et al. (2023) | • Narrow focus on gestures and eye contact • Overly specific categories (hard to define) • Ignored English learning characteristics | • Broadened scope to generalizable categories • Added English-specific elements (e.g., intonation, vocabulary) |
Wen (2021) | No classification for verbal modality • Missing definitions/explanations for codes | • Introduced verbal modality subcategories (e.g., tone, pacing) • Provided code explanations and usage guidelines |
Specifically, our coding scheme (Table 2) directly addresses these gaps of prior frames. First and foremost, we embeded Second Language Acquisition (SLA) theories into category definitions, operationalizing hypotheses such as Krashen’s Input Hypothesis and Swain’s Output Hypothesis. For example, verbal Modality explicitly distinguishes between teacher-initiated input (e.g., English Language Points [ELP], Cultural Knowledge [CK]) and student-driven output (e.g., Proactive Behaviors [AB], Passive Behaviors [PB]). This aligns with Krashen’s emphasis on comprehensible input and Swain’s focus on learner production. Moreover, co-verbal Modality codes like Speaking Rate (SR) and Emphasis (S) quantify how teachers adjust prosody to scaffold comprehension (e.g., slowed speech for complex instructions), addressing Zhang et al.’s (2022) oversight of input modification strategies.
Table 2. Multimodal interaction analysis framework
Modal category | Coding | Brief definition | Example |
|---|---|---|---|
Verbal modality | English language points (ELP) | The teacher orally explains specific points or details of English grammar, vocabulary, pronunciation, spelling, and syntax | “Let’s look at this word.” “How to pronounce it?” “Spell it together.” |
Cultural knowledge (CK) | The teacher provides knowledge about cultures, such as food, architecture, and philosophy, and encourages students to think about the attitudes and values that underlie the culture | “Have you been to other countries, and what impressed you most?” “Do you know why the author choose this?” | |
Learning strategies (LS) | The teacher guides students to learn or use metacognitive strategies (planning, monitoring, evaluating), cognitive strategies (learning methods), communicative strategies, and affective management strategies | “Five minutes for you.” “Check your work.” “Work in pairs/groups to complete it.” “Let’s give him/her a round of applause.” | |
Proactive behaviors (AB) | Students initiate questions or answers; students discuss with peers proactively; students orally evaluate other students | “I think the answer is…” “I don’t think he is right.” “Why is that correct?” | |
Passive behaviors (PB) | The teacher asks students to respond non-spontaneously; students complete their responses with guidance or instruction from the teacher | “Let’s repeat it together…” “Read after me…” “Look at the slides, please” | |
Co-verbal modality | Intonation (IT) | The teacher gives directions or expresses meaning through changes in intonation, like imperative sentences, reflecting attitudes, intentions, emotions, etc | “Please take out your textbook and turn to page 52.” “That’s a great answer!” “Behave yourself!” |
Emphasis (S) | The teacher emphasizes or repeats syllables or words, either by him/herself or by the pupils | “yes…yes…” “You mean…” | |
Change in Speaking rate (SR) | The teacher intentionally slows down his/her speaking speech to get students’ attention | “NO…W, pay attention to ….” “Par..don?” | |
Pause (P) | The teacher gives temporary pauses for students to think, discuss or do exercises. There is not any verbal interaction between the teacher and students | “I’ll give you a few seconds to think about the answer.” “Look at the blackboard…(silence).” “Work in pairs” | |
Gestural modality | Eye contact (EC) | The teacher uses his/her eyes to convey meanings, such as reminding, encouraging, and imposing situational constraints | “Good job! Thank you! (with an expression of approval)” “You please”(eye contact) |
Facial expression (FE) | The teacher has a noticeable change in his/her facial expression, which may be one of surprise, confusion, etc | “Really? “Are you sure? “(perhaps with a slight frown) | |
Sign language (SL) | The teacher uses gestures to establish interactions between physical environment and verbal expressions, e.g. inviting, applauding, etc | “Do you have any different ideas? (with hands outstretched)” “No exactly”(head shakes) | |
Contextual modality | Technology (T) | The teacher uses pictures and videos in slides to create visual and auditory modal symbols | “Enjoy the video and think about the following questions.” |
Presentment (BW) | The teacher composes visual modalities through handwritten notes or displaying teaching aids | “Look at the paper in my hand.” “Today our topic is…(writing).” |
Bold-italics vocal tone shifts
Furthermore, unlike static frequency-based models (e.g., Bobkina et al., 2023), we employ Lag Sequential Analysis (LSA) to map how multimodal behaviors transition during interactions. For instance, a teacher’s Contextual Modality code Presentation (BW) (e.g., displaying a timeline on a slide) often precedes student AB (active behavior) codes (e.g., peer debates about historical events), revealing how visual aids scaffold output. This dynamic approach resolves Wen’s (2021) limitation of treating modalities as isolated events.
Lastly, while prior studies focused on single educational stages (e.g., Zhang et al., 2022 analyzed middle school only), our framework introduces adaptive coding rules for primary, middle, and high school levels.This adaptability ensures relevance across developmental stages, addressing the “one-size-fits-all” limitation noted in Xu et al. (2022).
Theoretical integration of the frame: bridging multimodality, SLA, and social constructivism
The framework synthesizes three theoretical pillars into its design, as shown in Fig. 2, which displays the connections between coding categories and the theories. To begin with, social constructivism—emphasizing knowledge co-construction through socially mediated interactions—is reflected in gestural, contextual, and co-verbal modalities. For instance, teachers’ gestural cues (e.g., circular hand motions to initiate group discussions or a raised palm to halt collaboration) scaffold peer interactions, directly facilitating collaborative learning. In contextual modality, codes like Technology (T) and Presentment (BW) capture how tools (e.g., slideshows displaying narrative timelines) anchor knowledge construction within situated contexts. A concrete example includes students first watching a video of Alice in Wonderland to grasp plot sequences, followed by fill-in-the-blank exercises reinforcing comprehension. Additionally, co-verbal pauses (P) align with Vygotsky’s Zone of Proximal Development, as students autonomously negotiate meaning during uninterrupted peer dialogue.
[See PDF for image]
Fig. 2
The link between modalities and theoretical constructs
Next, Input–Output Hypotheses underpin the verbal modality, capturing both teacher-generated input (e.g., ELP for grammar explanations, LS for strategy instruction) and student-driven output (AB: proactive responses; PB: reactive participation). In co-verbal modality, adjustments in teacher speech rate (SR) or emphasis (S) are coded to assess how input modification optimizes output effectiveness (e.g., slowed pacing to clarify instructions).
Finally, multimodal learning theory, which posits that multisensory integration (auditory, visual, kinesthetic) enhances retention, manifests in synergies across modalities (Giannakos & Cukurova, 2023). Teachers’ repetitive verbal cues (Verbal) paired with gesture shifts (Gestural)—such as circling keywords on slides (Contextual) while verbally emphasizing definitions—demonstrate how cross-modal reinforcement strengthens memory encoding, aligning with the theory’s principles.
Crucially, we also used a table (Table 3) to show the connection between encoding behavior and learning theory. For instance, Cultural Knowledge (CK) aligns with Intercultural Communicative Competence (Byram, 2021), reflecting how cultural references in lessons (e.g. discussing western holidays) enhance learners’ cross-cultural awareness; Nonverbal behaviors such as Eye Contact (EC)and Facial Expression (FE)derive from Krashen’s Affective Filter Hypothesis, emphasizing their role in reducing anxiety and fostering engagement—critical for maintaining a low-stress learning environment (e.g. nodding to show encouragement) (Wang, 2020); Speaking Rate (SR) and Pause (P)link to Automaticity Theory (Soto et al., 2016), as they track fluency development through teachers’ pacing adjustments (e.g., slowed speech for complex content). By anchoring each code to a theoretical foundation, the framework ensures methodological rigor while addressing the multimodal complexity of EFL classrooms.
Table 3. The connection between encoding behavior and learning theory
Coded behavior | Theoretical basis |
|---|---|
English Language Points (ELP) | Input Hypothesis |
Cultural Knowledge (CK) | Intercultural Communicative Competence |
Learning Strategies (LS) | Language Learning Strategies Taxonomy |
Intonation (IT) | Communicative Language Teaching |
Emphasis (S) | Output Hypothesis |
Speaking Rate (SR)/Pause (P) | Automaticity Theory |
Eye Contact (EC) | Communication Theory |
Facial Expression (FE) | Affective Filter Hypothesis |
Students’ Behaviors (AB)/(PB) | Engagement Theory |
Technology (T) | Technology-Mediated Language Learning |
Presentment (BW) | Social Constructivism |
By unifying SLA theory, dynamic interaction patterns, and developmental adaptability, this framework offers a replicable model for diagnosing and optimizing EFL classroom interactions—a critical advance beyond descriptive, static, or stage-constrained prior work.
Dataset and sampling: ensuring representativeness and rigor
Guided by the multimodal, social, and linguistic principles outlined above, we analyzed 24 live-recorded EFL class videos obtained through stratified random sampling on the National Public Service Platform for Educational Resources (https://1s1k.eduyun.cn/) where teachers from all over the country who teach different subjects are free to upload their own teaching videos, usually a full, real-life classroom. The platform was selected for its nationwide coverage of authentic, evaluated classroom recordings, as well as its comprehensive collection of classroom videos featuring real classroom interactions even though the videos are not the most current. The platform is highly suited for examining representative characteristics of teacher-student interaction in Chinese EFL classrooms (Li et al., 2019; Qiao & He, 2022).
We used objective evaluation criteria (Kirkpatrick et al., 2019; Raghavan et al., 2020) to reduce bias and impartiality. The platform’s filtering tool allows users to adjust criteria such as education level, course format, academic subject, textbook edition, grade level, publication date, and even school selection (See Fig. 3), greatly enhancing objectivity in video sample selection.
[See PDF for image]
Fig. 3
The filtering function of the platform
It should be noted that the data for this study were sourced from recordings of actual EFL classrooms, capturing real teacher-student interactions. By analyzing these video data, we can gain a deeper understanding of the modal characteristics and interaction patterns across different educational stages. These videos reflect authentic classroom settings and interaction processes, providing more practical insights for EFL teaching.
The filtering process entailed the following steps: (1) Identifying 24 video samples, as recommended by (Herring & Androutsopoulos, 2015), to ensure a dataset adequate in both size and quality to effectively address the research questions and to allow for meaningful statistical analysis. (2) Categorizing the educational stages from primary to senior high school, resulting in eight lessons for each academic level. (3) Selecting EFL classrooms with criteria of English subject and YiLin textbook version, known for its widespread usage (Shiyu, 2023). (4) Determining grade levels: Chinese EFL lessons in primary levels encompass grades 3–6, middle school includes grades 7–9, and high school spans grades 10–12, totaling 14 grades. (5) Identifying lesson types across 24 lessons. Lesson types vary among different textbook versions. An example from the middle school YiLin version is shown in Fig. 4 (6) Applying the principle of browsing from high to low to ensure representative lesson selection for each grade level.
[See PDF for image]
Fig. 4
Lesson types from Yilin textbook
A clearer presentation of the sampling process is provided by the platform screenshot (Fig. 5). A summary of the twenty-four teaching episodes sampled for the MIA is shown in Table 4. All classroom recording videos are in MP4 file format.
[See PDF for image]
Fig. 5
Screenshot of the sampling process
Table 4. Summary of the 24 teaching episodes
Educational stage | Teaching episode | Grade | Lesson type | Episode duration (mins) |
|---|---|---|---|---|
Primary School | 1 | Third grade first volume | Cartoon time | 35.05 |
2 | Third grade second volume | Story time | 39.14 | |
3 | Fourth grade first volume | Sound time | 40.09 | |
4 | Fourth grade second volume | Story time | 40.32 | |
5 | Fifth grade first volume | Grammar&fun time | 40.12 | |
6 | Fifth grade second volume | Checkout time | 40.30 | |
7 | Sixth grade first volume | Sound time | 27.17 | |
8 | Sixth grade second volume | Checkout time | 41.02 | |
Middle School | 9 | Seventh grade first volume | Reading | 46.30 |
10 | Seventh grade second volume | Grammar | 45.27 | |
11 | Eighth grade first volume | Reading | 42.04 | |
12 | Eighth grade second volume | Welcome to the unit | 42.28 | |
13 | Ninth grade first volume | Integrated skills | 45.01 | |
14 | Ninth grade second volume | Task | 46.12 | |
15 | Eighth grade first volume | Study skills | 45.41 | |
16 | Seventh grade second volume | Reading | 42.33 | |
High School | 17 | Module 1 | Project | 44.21 |
18 | Module 2 | Reading 2 | 45.15 | |
19 | Module 3 | Grammar | 43.26 | |
20 | Module 4 | Reading 1 | 46.08 | |
21 | Module 5 | Word power | 41.70 | |
22 | Module 6 | Task 2 | 42.15 | |
23 | Module 7 | Self-assessment | 42.61 | |
24 | Module 8 | Welcome to the unit | 45.05 |
The statistical robustness of using 24 recordings was justified through stratified random sampling and ensuring that the sample size was adequate to address the research questions and allow for meaningful statistical analysis. This approach ensures a diverse representation of educational stages and teaching methods.
Analytical workflow: from annotation to theory-driven insights
This study broadly adheres to Yang and Chen’s (2022) model, with specific research steps shown in Fig. 6.
[See PDF for image]
Fig. 6
Research design
First, 24 recordings were transcribed into behavioral codes utilizing ELAN 6.4, a computer-aided multimodal analysis tool used to explore different layers of multimodal texts (Bobkina et al., 2023). The coding framework, structured into four modal categories, as the layers or tiers, guided the annotation process. Table 5 details the codes and behavior initiators. Annotators observed and categorized behaviors based on duration, with quantified data generating automatically (see Fig. 7 and 8). During the annotation process, a second rater familiar with EFL teaching and coding transcripts was employed to code the 24 videos. This rater was provided with the coding framework and instructions. Two coders simultaneously and independently coded the test sample, Krippendorff’s Alpha was calculated considering that our coding framework has several dimensions, yielding a reliability coefficient of 0.873 (95% CI [0.812, 0.923]), indicating high inter-rater agreement (Hayes & Krippendorff, 2007). Statistical tables were produced for each video segment, displaying behavior frequency and maximum duration. The data was further analyzed in Excel to extract meaningful insights from classroom interactions.
Table 5. Teacher-student interaction coding schema
Modal category | Code | Description | Initiated or driven |
|---|---|---|---|
Verbal modality | ELP | English language points | Teacher-initiated |
CK | Cultural knowledge | Teacher-initiated | |
LS | Learning strategies | Teacher-initiated | |
AB | Proactive behavior | Student-initiated | |
PB | Passive behavior | Student-initiated | |
Co-verbal modality | IT | Intonation | Teacher-initiated |
S | Emphasis | Teacher-initiated | |
SR | Speaking rate | Teacher-initiated | |
P | Pause | Teacher-initiated or Student-driven | |
Gestural modality | EC | Eye contact | Teacher-initiated or Student-driven |
FE | Facial Expression | Teacher-initiated or Student-driven | |
SL | Sign language | Teacher-initiated | |
Contextual modality | T | Technology | Teacher-initiated |
BW | Presentment | Teacher-initiated |
[See PDF for image]
Fig. 7
Example of the use of ELAN for multimodal analysis
[See PDF for image]
Fig. 8
Screenshot of the annotation statistics of middle school on ELAN
Second, sequences of behaviors in each recording were organized and analyzed using Lag Sequential Analysis (LSA). LSA, a method of behaviour analysis proposed by Sackett (1980) and widely used across disciplines. It detects significant behavior sequences, characterizes behaviors, identifies driving factors, and provides other relevant information (Huang et al., 2019). The sequences of behaviors were analyzed using General Sequential Querier 5.1, generating sequential patterns and calculating the z value by an estimate of the standard error (García-Fariña et al., 2018). Graphs were created to visualize significant sequences, with vertices representing multimodal categories and directed edges illustrating relationships (Amiri et al., 2020).
To compare distinct behavioral categories-proactive (AB) and passive behaviors (PB) in Chinese EFL classrooms, we first confirmed the appropriateness of parametric tests through: Shapiro–Wilk tests for each metric (AB/PB × frequency/duration) and Levene’s test comparing variance homogeneity between AB and PB groups. Then, we implemented the independent t-test because AB and PB represent mutually exclusive behavioral classifications observed in different classroom contexts (e.g., teacher-initiated vs. student-initiated activities).
To answer RQ3, namely comparing behavioral frequency and duration across elementary, middle, and high schools, a one-way Analysis of Variance (ANOVA) was conducted. The independent variable was the educational level (with three levels: elementary, middle, and high school), and the dependent variables were behavioral frequency and duration. The ANOVA model assumes independence between groups, and in this study, the schools were treated as independent groups. Prior to conducting the ANOVA, assumptions of normality and homogeneity of variances were tested. Shapiro–Wilk tests were used to assess normality for each group (elementary, middle, and high school) for both frequency and duration data. Levene’s test was employed to evaluate the equality of variances across groups. Given that the data met the assumptions of normality and homogeneity of variances, a one-way independent ANOVA was deemed appropriate. The decision to use a one-way independent ANOVA was based on the study’s design, where each school level represents an independent group, and there is no repeated measurement on the same subjects across groups.
Results
Results of the representative modal characteristics in EFL class
Table 6 presents the descriptive data on the frequency and duration of each modality and behavior. Figure 9 displays the frequency distribution for each behavior, while Fig. 10 exhibits their corresponding duration.
Table 6. Descriptive data of multimodal frequency and duration
Variables | Behavior | Nº of times | Total length/in sec/ | Percentage of total length (%) |
|---|---|---|---|---|
Co-verbal modality | Intonation (IT) | 71 | 193.75 | 0.52 |
Co-verbal modality | Pause (P) | 532 | 11,602.5 | 30.98 |
Co-verbal modality | Emphasis (S) | 928 | 2440.76 | 6.52 |
Co-verbal modality | Speaking rate (SR) | 113 | 313.37 | 0.84 |
Contextual modality | Presentment (BW) | 298 | 1651.625 | 4.41 |
Contextual modality | Technology (T) | 486 | 5795.07 | 15.47 |
Gestural modality | Facial expression (FE) | 14 | 23.65 | 0.06 |
Gestural modality | Sign language (SL) | 655 | 2327.485 | 6.21 |
Verbal modality | Proactive behavior (AB) | 1363 | 6950.67 | 18.56 |
Verbal modality | Cultural knowledge (CK) | 21 | 178.62 | 0.48 |
Verbal modality | English language points (ELP) | 44 | 230.17 | 0.61 |
Verbal modality | Learning strategies (LS) | 137 | 577.12 | 1.54 |
Verbal modality | Passive behavior (PB) | 868 | 5165.38 | 13.79 |
[See PDF for image]
Fig. 9
Distribution of the frequency of each behavior
[See PDF for image]
Fig. 10
Distribution of the duration of each behavior
Student behavior in Chinese EFL classrooms
To compare the frequency and duration of proactive behaviors (AB) and passive behaviors (PB), we first verified the parametric assumptions. Shapiro–Wilk tests confirmed normality for both AB (frequency: W = 0.98, p = 0.27; duration: W = 0.96, p = 0.09) and PB (frequency: W = 0.97, p = 0.15; duration: W = 0.97, p = 0.12). Levene’s tests indicated equal variances (frequency: F = 1.23, p = 0.28; duration: F = 0.89, p = 0.35). Independent t-tests were therefore conducted, revealing a significantly higher frequency of AB compared to PB (t = 2.59, p = 0.013, d = 0.62, 95% CI [0.14, 1.10]) and a larger duration difference (t = 6.30, p < 0.001, d = 1.52, 95% CI [0.98, 2.06]). Prior studies, such as Zhang et al. (2022), reported that 78% of student interactions were reactive (e.g., repetition drills or closed-question responses), framing this as a cultural norm tied to Confucian hierarchical dynamics (Han, 2021). This shift aligns with recent curricular reforms advocating student-centered pedagogy (MOE, 2022) and mirrors engagement patterns observed in Western contexts (Szymkowiak et al., 2021), which may be scaffolded by culturally specific strategies like teacher pauses and gestural cues.
Teacher behavior in Chinese EFL classrooms
In terms of verbal modality, learning strategy (LS) was the most frequently mentioned behavior by teachers and had the longest total duration (as seen in Figs. 9 and 10). Teachers spent similar amounts of time on English language points (ELP) and cultural knowledge (CK). Notably, cultural knowledge appeared in 21 instances, i+ndicating its incorporation in nearly every EFL classroom within the sample of 24.
For co-verbal modality, pause (P) and stress (S) were the most frequent behaviors, more so than intonation (IT) and speaking rate (SR) (as seen in Fig. 9), which suggests that teachers are used to emphasizing, repeating and pausing in the EFL classrooms. Although stress (S) occurred more frequently, the total duration of pause (P) was longer, as teachers often set aside time for student discussion and reflection (Fig. 10).
In gestural modality, the high ranking of sign language underscores the importance of body movements and gestures in EFL classrooms (Table 6 and Fig. 9). However, teachers only devoted six percent of their time to SL, indicating that body language primarily aids comprehension rather than being the main modality of expression.
Regarding contextual modality, presentation (BW) and technology (T) appeared with similar frequency (see Fig. 9), suggesting equal use of multimedia equipment, teaching aids, and classroom boards. Despite similar frequencies, technology usage had a much longer total duration, nearly three times that of presentment, highlighting the critical role of multimedia in EFL classrooms (Fig. 10).
Results of significant behavioral sequential patterns
In Table 7, columns represent starting behaviors and rows represent following behaviors, displaying the transformation of behaviors in EFL classroom. The first data in each cell is the progressive frequency between the two types of behaviors, with higher numbers indicating a greater likelihood of one behavior following the other. The second datum is the Z-value for the adjusted residual parameter of associations, signifying a significant level of association if it surpasses 1.96. For instance, the datum 121 (0.18) in the fourth row’s fifth column reveals 121 instances of transition from student-active behavior to passive behavior, but this transition lacked significance.
Table 7. Frequency of behavioral transformations
ELP | CK | LS | AB | PB | IT | S | SR | P | FE | SL | T | BW | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ELP | 0(− 0.59) | 0(− 0.41) | 0(− 1.02) | 8(− 0.06) | 8(1.39) | 1(0.47) | 4(− 1.40) | 3(2.20) | 0(− 2.11) | 0(− 0.35) | 4(− 0.35) | 5(0.90) | 5(2.04) |
CK | 0(− 0.40) | 0(− 0.28) | 1(0.78) | 0(− 2.22) | 0(− 1.67) | 0(− 0.55) | 0(− 2.09) | 0(− 0.67) | 2(0.10) | 0(− 0.24) | 4(1.27) | 11(7.74) | 0(− 1.03) |
LS | 1(0.03) | 0(− 0.70) | 0(− 1.73) | 3(− 4.82) | 5(− 2.74) | 3(0.91) | 4(− 4.22) | 2(− 0.41) | 21(3.06) | 0(− 0.59) | 64(14.90) | 3(− 2.31) | 3(− 1.31) |
AB | 6(− 0.77) | 2(− 1.06) | 36(3.01) | 1(− 17.48) | 121(0.18) | 12(− 0.81) | 324(14.42) | 28(1.58) | 83(− 1.20) | 2(− 0.53) | 89(− 2.41) | 137(7.51) | 49(− 0.13) |
PB | 6(0.50) | 1(− 0.98) | 14(− 0.19) | 82(− 4.21) | 0(− 10.01) | 9(− 0.01) | 242(15.28) | 10(− 1.06) | 45(− 1.98) | 1(− 0.61) | 58(− 1.55) | 61(1.68) | 31(− 0.06) |
IT | 0(− 0.80) | 1(1.28) | 1(− 0.63) | 7(− 2.35) | 8(− 0.49) | 0(− 1.10) | 9(− 1.40) | 1(− 0.55) | 11(1.46) | 2(3.85) | 27(6.72) | 2(− 1.81) | 1(− 1.53) |
S | 12(2.00) | 2(− 0.90) | 10(− 2.75) | 272(9.36) | 144(4.04) | 14(0.17) | 0(− 15.59) | 19(− 0.17) | 133(6.21) | 2(− 0.37) | 90(− 1.22) | 52(− 2.87) | 61(2.66) |
SR | 2(1.17) | 0(− 0.67) | 1(− 1.05) | 15(− 1.68) | 12(− 0.50) | 1(− 0.54) | 21(0.30) | 0(− 1.61) | 5(− 1.85) | 0(− 0.57) | 36(7.12) | 5(− 1.46) | 4(− 0.74) |
P | 1(− 1.55) | 1(− 0.68) | 12(0.18) | 132(4.72) | 88(4.36) | 4(− 1.28) | 86(0.13) | 8(− 0.85) | 0(− 7.53) | 0(− 1.24) | 54(0.00) | 37(− 0.41) | 14(− 2.30) |
FE | 0(− 0.34) | 0(− 0.59) | 0(− 0.059) | 1(− 1.21) | 1(− 0.61) | 0(− 0.47) | 1(− 1.07) | 1(1.24) | 2(0.59) | 0(− 0.20) | 6(3.71) | 1(− 0.17) | 0 (− 0.88) |
SL | 4(− 0.26) | 5(1.09) | 17(1.09) | 179(8.06) | 74(0.79) | 17(3.18) | 46(− 6.33) | 15(0.82) | 69(2.47) | 4(2.05) | 1(− 8.90) | 48(0.36) | 30(0.32) |
T | 5(0.95) | 5(1.40) | 14(1.40) | 112(4.10) | 624(1.81) | 6(− 0.10) | 43(− 4.14) | 6(− 1.10) | 38(− 0.23) | 1(− 0.17) | 50(0.56) | 0(− 6.40) | 35(3.27) |
BW | 0(− 1.48) | 1(− 1.30) | 3(− 1.30) | 81(5.15) | 361(0.97) | 2(− 0.98) | 30(− 2.58) | 8(1.04) | 25(0.18) | 1(0.33) | 32(0.68) | 13(− 1.86) | 0(− 3.82) |
Figure 11 is the modal graph of behavioral sequence transformations based on Table 7. Each node on the graph represents a coded behavior, grouped by modal category, forming four distinct edges. Numbers on each line represent z-values, with larger numbers and larger z-values representing more frequent and significant transitions. Path direction indicates evolution between behavioral categories, while line thickness denotes the frequency and strength of transformations between behaviors—thicker lines indicate stronger associations. To illustrate, the arrow from ACTIVE BEHAVIOUR (AB) to LEARNING STRATEGY (LS) is labeled 3.01, indicating a z-value is 3.01, which is less than the value from PASSIVE BEHAVIOUR (PB) to STRESS (S) (15.28). The data indicates that the transition from passive behaviour to the teacher’s emphasis happens more frequently than from active behaviour to explanation of learning strategies.
[See PDF for image]
Fig. 11
Significant behavior patterns
The graph reveals a strong correlation between verbal modality and other modalities, particularly co-verbal modality. Notably, student behaviors, both active and passive, are linked to repetition (AB/PB → S; S → AB/PB). Within modalities, teachers’ facial expressions and body movements show a significant correlation (FE → SL; SL → FE). Sign language (SL) exhibits the most departing lines, while student active behaviors (AB) have the most arriving lines, indicating nonverbal modalities encourage student engagement. Teachers effectively use body movements to prompt various classroom behaviors, such as explaining culturally relevant knowledge (SL → CK).
Results of modal characteristics at different educational stages
Figure 12 and Fig. 13 compare modality frequency and duration across elementary, middle, and high schools.
[See PDF for image]
Fig. 12
Modality frequency at different educational stages
[See PDF for image]
Fig. 13
Modality duration at different educational stages
A one-way ANOVA was conducted on frequency and duration data for these educational levels, revealing significant differences (Frequency: F = 7.47, p = 0.002 < 0.05; Duration: F = 10.24, p = 0.001 < 0.05). This suggests that EFL classrooms at different educational levels exhibit distinct modal characteristics.
Figure 12 highlights that elementary schools tend to have a higher frequency of gestural and contextual modalities compared to middle and high schools, which might indicate a more interactive and visual teaching approach at the primary level.
Figure 13 demonstrates that co-verbal modalities, such as intonation and pauses, have longer durations in high schools compared to elementary schools. This could imply that high school teachers focus more on detailed explanations and reflective pauses. Conversely, the duration of contextual and gestural modalities is higher in elementary schools, suggesting that younger students benefit more from visual and physical cues to enhance their learning experience.
Discussion
Findings in this study
RQ1: Multimodal characteristics of teacher-student interactions
Our results reveal a dynamic shift in Chinese EFL classrooms, challenging the historical portrayal of teacher-dominated instruction (Huang et al., 2019; Zheng et al., 2021). Students exhibited significantly more proactive behaviors (AB) than passive behaviors (PB), both in frequency (t = 2.59, p = 0.013, AB: 1363 instances vs. PB: 868) and duration (t = 6.30, p < 0.001). This aligns with MOE’s (2022) student-centered reforms, contrasting Yu et al.’s (2019) characterization of learners as predominantly passive. However, AB durations remained constrained (M = 8.2 s vs. PB M = 4.5 s), likely due to language proficiency barriers or reliance on closed teacher questions (Blything et al., 2019).
The multimodal framework revealed how language acquisition and cultural knowledge transfer are synergistically scaffolded in Chinese EFL classrooms. Teachers strategically combined verbal explanations of cultural concepts (e.g., Western festivals) with contextual modalities like multimedia (T), with 68% of cultural lessons (CK) incorporating videos or interactive slides (e.g., virtual tours of Thanksgiving celebrations). This alignment with MOE’s (2022) cultural competency goals directly addresses Zhang et al.’s (2022) critique of abstract cultural instruction, demonstrating how multimodal integration bridges linguistic and cultural gaps. For instance, in lessons about British etiquette, teachers paired slowed speech rate (SR) with visual timelines (T) to clarify both vocabulary (e.g., “afternoon tea”) and historical context, enhancing comprehension among learners with limited L2 proficiency.
Gestural and contextual modalities further enriched instruction. While sign language (SL) ranked highly in frequency (18%), its limited duration (6% of class time) underscores its role as a supplementary scaffold (Sandler, 2024), particularly in elementary grades where abstract concepts demand physical reinforcement (e.g., miming verbs). Technology (T) dominated contextual modalities (duration: 34%), surpassing traditional tools like blackboards (BW: 11%), reflecting broader pedagogical shifts toward sustained multimedia integration (García-Fariña et al., 2018).
RQ2: Synergy between nonverbal and verbal modalities
Lag sequential analysis elucidated modality synergies central to SLA. A robust bidirectional relationship emerged between student output (AB) and teacher repetition (S) (AB → S: z = 14.42; S → AB: z = 9.87), directly operationalizing Swain’s Output Hypothesis. For example, teacher repetitions after student questions (AB) reinforced lexical accuracy, fostering iterative negotiation—a pattern absent in lecture-centric models (Amoah & Yeboah, 2021).
Cultural knowledge (CK) delivery relied on multimedia (T) synergies (CK → T: z = 7.89), with 68% of cultural lessons incorporating videos or interactive slides, incorporating Zhang et al.’s (2022) critical perspective on abstract cultural teaching practices. Similarly, gestural scaffolding (SL) preceded 42% of peer debates (AB), as teachers used hand motions to signal discussion initiation, bridging nonverbal cues and verbal collaboration (Özer & Göksun, 2020).
RQ3: Modal characteristics of different educational stages
ANOVA confirmed stage-specific modality patterns (F = 7.47, p = 0.002; F = 10.24, p = 0.001). Elementary classrooms prioritized gestural (32% duration) and contextual (28%) modalities, using gestures like finger-counting to teach numerals—a tactic aligning with Zhao et al.’s (2019) emphasis on kinesthetic learning. Middle schools shifted toward verbal (41% frequency) and co-verbal (24%) strategies, reflecting students’ growing language proficiency. High schools emphasized co-verbal pauses (duration: 41%), with teachers allocating 12% of class time to reflective silences after complex instructions, a strategy absent in prior studies focused on younger learners (Wei et al., 2020).
Reflection upon the coding framework
The results of our data analysis were compared with the expected hypotheses, and they were found to be mostly consistent, with one exception. H5 proposes that multimodal features evolve with students’ age and cognitive development in Chinese EFL classrooms across educational stages, with teachers adapting the frequency of modalities accordingly (Fig. 12). However, it demonstrates no discernible discrepancy when it comes to the duration of co-verbal modality across three different educational stages (Fig. 13), which may be attributed to the linguistic characteristic inherent in Chinese EFL teachers and that necessitates further investigation.
Sects. ”Results of modal characteristics at different educational stages” and 5.1 delve into the theoretical underpinnings of the framework and compare study results with prior research to bolster its theoretical and practical significance. MIA techniques comprehensively capture and scrutinize various communication channels to elucidate phenomena in complex learning environments. This approach aids in structuring analysis and uncovering causal links (Giannakos et al., 2019; Mu et al., 2020; Sharma & Giannakos, 2020). Consequently, the coding framework for assessing teacher-student interactions is firmly grounded in theory, meeting theoretical standards and demonstrating practical validity and utility through empirical testing.
The adaptability and validation of the proposed multimodal analysis framework across different educational contexts are crucial for its broader application and effectiveness. To be specific, given different educational and cultural settings, especially the “Verbal Modality”, which depends on different teaching contents in their own countries or regions, some adjustments may be required. The other three modalities can be passed down since they are relatively universal. To this end, educators, whether EFL teachers or any other language teachers, are encouraged to validate the proposed coding framework through the identification of sufficient samples from diverse countries and regions, as well as through the implementation of additional data collection approaches, such as interviews or questionnaires. These additional data collection methods will facilitate a more comprehensive comparison of different cultural contexts, thereby ensuring the effectiveness and adaptability of the coding framework.
Limitations and future research
While this study demonstrates the practical utility of the framework, it acknowledges several limitations that warrant attention in future research. The findings are context-specific to EFL classrooms in China, potentially limiting their generalizability to other cultural or educational contexts. To enhance the framework’s reliability and applicability, future studies should aim to apply it in diverse EFL settings globally. Additionally, the study’s small sample size may have overlooked subtle patterns, highlighting the need for larger samples in future research. Furthermore, it should be noted that the data of the study are all from publicly available data, which may lead to a narrow perspective. It is recommended that researchers collect more original data in the future to verify the research results. Future studies can also explore the cognitive and affective dimensions underlying teacher-student interactions, providing valuable insights into the psychological processes involved and contributing to a more holistic understanding of effective teaching practices.
Conclusion
This study investigated the multimodal dynamics of teacher-student interactions in Chinese EFL classrooms through a novel coding framework. By analyzing 24 classroom transcripts across elementary, middle, and high school levels, the findings revealed that the proposed framework effectively captured representative multimodal characteristics, including the interplay of verbal, co-verbal, gestural and contextual modalities. Distinct modal synergy patterns were identified across educational stages. Elementary classrooms emphasized gestural scaffolding and contextual anchoring, while secondary levels progressively integrated verbal negotiation and student-initiated multimodal responses, reflecting students’ heightened proactivity compared to prior studies.
In terms of pedagogical implications, EFL educators can undergo training to acquire necessary skills to utilize this framework, with a focus on understanding the composition of coding and identifying various behaviours. This framework can then be employed to record interactions between educators and their students, facilitating a more precise assessment of teaching styles, identification of existing deficiencies in teaching practices, and continuous adjustment and optimization of teaching strategies based on student feedback. For instance, when designing teaching activities, educators can strategically incorporate diverse learning modalities to enhance interaction between themselves and their students (Al Mamun et al., 2020; Szymkowiak et al., 2021). In addition, EFL educators can adapt multimodal strategies across different educational stages to enhance learning outcomes. For primary education, incorporating more body language and multimedia technology can aid comprehension given students’ lower English proficiency. Middle school teachers can encourage both active and passive behaviors through interactive activities and clear, paced instructions. High school educators should focus on developing students’ critical thinking and communication skills by employing more strategic pauses and fostering an environment that encourages discussion and reflection (Loh & Liew, 2016; Sanger, 2020; Schieman & Koltai, 2017).
Acknowledgements
We would like to express our gratitude to the anonymous reviewers for their constructive feedback.
Author contributions
Siyu Wang: Investigation, Data curation, Software, Visualization, Writing—original draft. Yi Dai: Conceptualization, Methodology, Validation, Writing—review and editing, Supervision.
Funding
This paper is supported by the Guangdong Planning Office of Philosophy and Social Science (GD20XJY50), the Macao Science and Technology Development Fund(2024) (FDCT) (No.0071/2023/RIB3), Joint Research Funding Program between the Macau Science and Technology Development Fund (FDCT) and the Department of Science and Technology of Guangdong Province (2024)(FDCT-GDST) (No.0003-2024-AGJ).
Availability of data and materials
The datasets used and analysed during the current study are available from the corresponding author on reasonable request.
Declarations
Competing Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
Al Mamun, MA; Lawrie, G; Wright, T. Instructional design of scaffolded online learning modules for self-directed and inquiry-based learning environments. Computers & Education; 2020; 144, [DOI: https://dx.doi.org/10.1016/j.compedu.2019.103695] 103695.
Amiri, S; Mehvari-Habibabadi, J; Mohammadi-Mobarakeh, N; Hashemi-Fesharaki, SS; Mirbagheri, MM; Elisevich, K; Nazem-Zadeh, M-R. Graph theory application with functional connectivity to distinguish left from right temporal lobe epilepsy. Epilepsy Research; 2020; 167, [DOI: https://dx.doi.org/10.1016/j.eplepsyres.2020.106449] 106449.
Amoah, S; Yeboah, J. The speaking difficulties of Chinese EFL learners and their motivation towards speaking the English language. Journal of Language and Linguistic Studies; 2021; 7,
Blything, L. P., Hardie, A., & Cain, K. (2019). Question asking during reading comprehension instruction: A corpus study of how question type influences the linguistic complexity of primary school students’ responses. Reading Research Quarterly, 55(3), 443–472. https://doi.org/10.1002/rrq.279
Bobkina, J; Domínguez Romero, E; Gómez Ortiz, MJ. Kinesic communication in traditional and digital contexts: An exploratory study of ESP undergraduate students. System; 2023; 115, [DOI: https://dx.doi.org/10.1016/j.system.2023.103034] 103034.
Byram, M. (2021). Teaching and assessing intercultural communicative competence: Revisited. Multilingual Matters.
Canagarajah, S. Translingual practice as spatial repertoires: Expanding the paradigm beyond structuralist orientations. Applied Linguistics; 2018; 39,
Chen, YC. Effects of technology-enhanced language learning on reducing EFL learners’ public speaking anxiety. Computer Assisted Language Learning; 2024; 37,
Cheung, A. Verbal and on-screen peer interactions of EFL learners during multimodal collaborative writing: A multiple case-study. Journal of Second Language Writing; 2022; 58, [DOI: https://dx.doi.org/10.1016/j.jslw.2022.100931] 100931.
García-Fariña, A; Jiménez-Jiménez, F; Anguera, MT. Observation of communication by physical education teachers: Detecting patterns in verbal behavior. Frontiers in Psychology; 2018; 9, 334. [DOI: https://dx.doi.org/10.3389/fpsyg.2018.00334]
Giannakos, M; Cukurova, M. The role of learning theory in multimodal learning analytics. British Journal of Educational Technology; 2023; 54,
Giannakos, MN; Sharma, K; Pappas, IO; Kostakos, V; Velloso, E. Multimodal data as a means to understand the learning experience. International Journal of Information Management; 2019; 48, pp. 108-119. [DOI: https://dx.doi.org/10.1016/j.ijinfomgt.2019.02.003]
Hall, JK; Walsh, M. 10. teacher-student interaction and language learning. Annual Review of Applied Linguistics; 2002; 22, pp. 186-203. [DOI: https://dx.doi.org/10.1017/s0267190502000107]
Han, K. Fostering students’ autonomy and engagement in EFL classroom through proximal classroom factors: Autonomy-supportive behaviors and student-teacher relationships. Frontiers in Psychology; 2021; 12, 767079. [DOI: https://dx.doi.org/10.3389/fpsyg.2021.767079]
Hayes, AF; Krippendorff, K. Answering the call for a standard reliability measure for coding data. Communication Methods and Measures; 2007; 1,
Herring, S. C., & Androutsopoulos, J. (2015). Computer-Mediated Discourse 2.0. The Handbook of Discourse Analysis, pp. 127–151. https://doi.org/10.1002/9781118584194.ch6
Huang, CQ; Han, ZM; Li, MX; Jong, MS; Tsai, CC. Investigating students’ interaction patterns and dynamic learning sentiments in online discussions. Computers & Education; 2019; 140, 103589. [DOI: https://dx.doi.org/10.1016/j.compedu.2019.05.015]
Jewitt, C. (2015). Multimodal analysis. The Routledge Handbook of Language and Digital Communication, pp. 69–84.
Jewitt, C; Kress, G. Multimodality, literacy and school English. The Routledge International Handbook of English, Language and Literacy Teaching.; 2010; [DOI: https://dx.doi.org/10.4324/9780203863091.ch29]
Kirkpatrick, SI; Baranowski, T; Subar, AF; Tooze, JA; Frongillo, EA. Best practices for conducting and interpreting studies to validate self-report dietary assessment methods. Journal of the Academy of Nutrition and Dietetics; 2019; 119,
Krashen, SD. Principles and practice in Second language acquisition; 1982; Pergamon Press:
Kress, G., spsampsps Bezemer, J. (2023). Multimodal discourse analysis. The Routledge Handbook of Discourse Analysis, pp. 139–155. https://doi.org/10.4324/9781003035244-12
Lee, S-Y; Lo, Y-HG; Chin, T-C. Practicing multiliteracies to enhance EFL learners’ meaning making process and language development: A multimodal Problem-based approach. Computer Assisted Language Learning; 2021; 34,
Li, G; Jee, Y; Sun, Z. Technology as an educational equalizer for EFL learning in rural China? Evidence from the impact of technology-assisted practices on teacher-student interaction in primary classrooms. Language and Literacy; 2018; 20,
Li, G; Sun, Z; Jee, Y. The more technology the better? A comparison of teacher-student interaction in high and low technology use elementary EFL classrooms in China. System; 2019; 84, pp. 24-40. [DOI: https://dx.doi.org/10.1016/j.system.2019.05.003]
Li, L; Lv, L. The impact of Chinese EFL Teachers’ emotion regulation and resilience on their success. Frontiers in Psychology; 2022; 13, [DOI: https://dx.doi.org/10.3389/fpsyg.2022.898114] 898114.
Lim, FV. Investigating intersemiosis: A systemic functional multimodal discourse analysis of the relationship between language and gesture in classroom discourse. Visual Communication; 2021; 20,
Loh, CE; Liew, WM. Voices from the ground: The emotional labour of English teachers’ work. Teaching and Teacher Education; 2016; 55, pp. 267-278. [DOI: https://dx.doi.org/10.1016/j.tate.2016.01.016]
Ministry of Education of the People’s Republic of China. (2022). Compulsory Education English Curriculum Standards (2022 Edition) (In Chinese). http://www.moe.gov.cn/srcsite/A26/s8001/202204/t20220420_619921.html
Mu, S; Cui, M; Huang, X. Multimodal data fusion in learning analytics: A systematic review. Sensors; 2020; 20,
Norris, S. Analyzing multimodal interaction : A methodological framework; 2004; Routledge: [DOI: https://dx.doi.org/10.4324/9780203379493]
Norris, S. Multimodal theory and methodology. Routledge; 2020; [DOI: https://dx.doi.org/10.4324/9780429341393]
Özer, D; Göksun, T. Visual-spatial and verbal abilities differentially affect processing of gestural vs. Spoken expressions. Language. Cognition and Neuroscience; 2020; 35,
Peng, J-E. The roles of multimodal pedagogic effects and classroom environment in willingness to communicate in English. System; 2019; 82, pp. 161-173. [DOI: https://dx.doi.org/10.1016/j.system.2019.04.006]
Qiao, L., spsampsps He, J. (2022). The construction of an educational information public service platform based on cloud computing. In Proceedings of the 2022 3rd International Conference on Big Data and Informatization Education (ICBDIE 2022), pp. 621–629. https://doi.org/10.2991/978-94-6463-034-3_63
Qin, Y; Wang, P. How EFL teachers engage students: A multimodal analysis of pedagogic discourse during classroom lead-ins. Frontiers in Psychology; 2021; 12, 793495. [DOI: https://dx.doi.org/10.3389/fpsyg.2021.793495]
Raghavan, M., Barocas, S., Kleinberg, J., & Levy, K. (2020). Mitigating bias in algorithmic hiring: Evaluating claims and practices. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 469–481. https://doi.org/10.1145/3351095.3372828
Sackett, GP. Lag sequential analysis as a data reduction technique in social interaction research. Exceptional Infant Psychosocial Risks in Infant-Environment Transactions; 1980; 4, pp. 300-340.
Sandler, W. (2024). Speech and sign: The whole human language. Theoretical Linguistics, 50(1–2), 107–124. https://doi.org/10.1515/tl-2024-2008
Sanger, CS. Inclusive pedagogy and universal design approaches for diverse learning environments. Diversity and Inclusion in Global Higher Education; 2020; 2020, pp. 31-71. [DOI: https://dx.doi.org/10.1007/978-981-15-1628-3_2]
Schieman, S; Koltai, J. Discovering pockets of complexity: Socioeconomic status, stress exposure, and the nuances of the health gradient. Social Science Research; 2017; 63, pp. 1-18. [DOI: https://dx.doi.org/10.1016/j.ssresearch.2016.09.023]
Schüssel, F; Honold, F; Bubalo, N; Huckauf, A; Traue, H; Hazer-Rau, D. In-depth analysis of multimodal interaction: An explorative paradigm. Lecture Notes in Computer Science; 2016; 2016, pp. 233-240. [DOI: https://dx.doi.org/10.1007/978-3-319-39516-6_22]
Sharma, K; Giannakos, M. Multimodal data capabilities for learning: What can multimodal data tell us about learning?. British Journal of Educational Technology; 2020; 51,
Shiyu, M. Integrated application of situational and traditional teaching in primary school English vocabulary teaching. International Journal of New Developments in Education; 2023; 5,
Sindoni, MG. Mode-switching in video-mediated interaction: Integrating linguistic phenomena into multimodal transcription tasks. Linguistics and Education; 2021; 62, [DOI: https://dx.doi.org/10.1016/j.linged.2019.05.004] 100738.
Soto, FA; Bassett, DS; Ashby, FG. Dissociable changes in functional network topology underlie early category learning and development of automaticity. NeuroImage; 2016; 141, pp. 220-241. [DOI: https://dx.doi.org/10.1016/j.neuroimage.2016.07.032]
Su, C; Yang, C; Chen, Y; Wang, F; Wang, F; Wu, Y; Zhang, X. Natural multimodal interaction in immersive flow visualization. Visual Informatics; 2021; 5,
Swain, M. Communicative competence: Some roles of comprehensible input and comprehensible output in its development. Input in Second Language Acquisition; 1985; 15, pp. 165-179.
Szymkowiak, A; Melović, B; Dabić, M; Jeganathan, K; Kundi, GS. Information technology and Gen Z: The role of teachers, the internet, and technology in the education of young people. Technology in Society; 2021; 65, [DOI: https://dx.doi.org/10.1016/j.techsoc.2021.101565] 101565.
Vygotskiĭ, LS; Cole, M; Stein, S; Sekula, A. Mind in society: The development of Higher Psychological Processes; 1978; Harvard University Press:
Wang, L. Application of affective filter hypothesis in junior English vocabulary teaching. Journal of Language Teaching and Research; 2020; 11,
Wei, K., Deng, C., & Yang, X. (2020). Lifelong Zero-Shot Learning. In Proceedings of the Twenty-ninth International Joint Conference on Artificial Intelligence, pp. 551–557. https://doi.org/10.24963/ijcai.2020/77
Wen, Y. Multimodal discourse analysis of excellent Academic english teachers from the perspective of conformity theory. Journal of Yangzhou University: Higher Education Research Edition; 2021; 5, pp. 44-51. (In Chinese)
Xu, L; Naserpour, A; Rezai, A; Namaziandost, E; Azizi, Z. Exploring EFL Learners’ metaphorical conceptions of language learning: A multimodal analysis. Journal of Psycholinguistic Research; 2022; 51,
Yang, Y; Chen, CL. Multimodal discourse analysis: A new paradigm for teacher discourse research. Contemporary Foreign Language Studies; 2022; 4, 10.(In Chinese)
Yu, S; Zhou, N; Zheng, Y; Zhang, L; Cao, H; Li, X. Evaluating student motivation and engagement in the Chinese EFL writing context. Studies in Educational Evaluation; 2019; 62, pp. 129-141. [DOI: https://dx.doi.org/10.1016/j.stueduc.2019.05.003]
Zhang, H; Han, C; Ma, H; Wang, L. The quality enhancement of action research on primary school English instruction in Chinese rural areas: An analysis based on multimodality. Frontiers in Psychology; 2022; 13, 876543. [DOI: https://dx.doi.org/10.3389/fpsyg.2022.876543]
Zheng, Y; Yu, S; Lee, I. Implementing collaborative writing in Chinese EFL classrooms: Voices from tertiary teachers. Frontiers in Psychology; 2021; 12, [DOI: https://dx.doi.org/10.3389/fpsyg.2021.631561] 631561.
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.