Content area
Background
Student evaluations of teaching (SET) are widely used in medical education as a tool to improve teaching quality. However, biases in SET can undermine their effectiveness. While numerous studies have explored bias factors in SET within higher education, few have specifically investigated these factors among medical students in China. This study aims to systematically explore the multidimensional causes of bias in Chinese undergraduate medical students’ teaching evaluations.
Methods
A qualitative study was conducted using semi-structured interviews with medical students from a medical university in northern China. Participants were selected through purposive sampling to ensure diversity in gender, academic year, and major. Interviews were transcribed verbatim and analyzed using thematic analysis to identify themes and subthemes related to biases in teaching evaluations.
Results
The analysis revealed several key themes contributing to biases in SET among medical students: (1) Teacher-Student Interaction: High personal affinity of teachers led to positive bias, while strict classroom management and poor teacher-student relationships resulted in negative bias. (2) Aspects Related to Medical Students: Course attributes and personal interest influenced evaluations, with elective courses and low-interest subjects leading to arbitrary bias. Group influences, such as peer effects and conformity, also contributed to bias. (3) Evaluation System Factors: Doubts about the anonymity of the evaluation system and lack of timely feedback led to self-protective scoring behaviors and arbitrary bias. The presence of informal agreements between teachers and students introduced moral hazards that further skewed evaluations.
Conclusions
Biases in medical students’ teaching evaluations are multifaceted and can primarily be attributed to teacher-student interpersonal relationships, student perceptions, and systemic issues within the evaluation process. To enhance the objectivity and effectiveness of SET, it is essential to address these biases by reshaping students’ understanding of evaluations, improving teacher-student communication, and establishing a digital evaluation system that ensures anonymity and timely feedback.
Background
In the late 20th century, constructivist theory led to a shift from teacher-centered to student-centered learning, emphasizing active student participation and transforming global educational practices [1]. The “Student Centered Learning” (SCL) approach emphasized students’ roles in evaluating teaching quality, contributing to the development of Student Evaluations of Teaching (SET) [2]. Student Evaluations of Teaching have been widely adopted in higher education institutions worldwide, becoming an integral part of the modern educational systems and a key standard for assessing teaching quality. Reportedly first used in the 1920s, SET saw widespread adoption in the United States between the late 1960s and early 1970s [3]. Today, it is deeply ingrained in the culture of American higher education institutions [4]. Introduced to China around the 1980s, SET was initially developed to evaluate courses and programs but has evolved over time to measure teaching effectiveness and has become an important tool for assessing teaching quality [5]. SET is often conducted through anonymous online or paper-based questionnaires, typically utilizing rating scales such as the Likert scale, allowing individuals to indicate their level of agreement or disagreement with specific statements [6, 7].In China, with the continuous advancement of information technology, most universities have adopted online evaluation methods to ensure the effectiveness, convenience, and anonymity of SET. This approach is also commonly adopted by higher education institutions in many other regions. Chinese higher education institutions began experimenting with SET activities in the 1980s. With the widespread implementation of teaching quality evaluations and the increased emphasis on quality control in academic management, student evaluations of teaching have gradually become systematized and standardized [8]. Research indicates that since 2000, SET has attracted the attention of Chinese scholars, and the number of publications on Chinese SET has increased rapidly [9]. According to a review study, there are differences in the emphasis of SET research across countries regarding both research content and fields. In terms of research content, the United States focuses intensively on bias, whereas Germany and China place greater importance on instructional improvement. Regarding research fields, the United States and Germany concentrate on medical education, while China places particular emphasis on physical education [9]. Currently, the latest research achievements on SET in China are focused on SET optimization and effectiveness validation [10] and on leveraging information technology to empower SET [11, 12].
[IMAGE OMITTED: SEE PDF]
As SET continues to evolve, an increasing number of studies have highlighted its limitations [13]. Many scholars believe that there is still no consensus on whether SET results accurately reflect teaching quality [14].SET results are often influenced by various bias factors [7, 15], which are variables unrelated to teaching quality and may have either a positive or negative impact on students’ responses [16]. This can lead to SET results deviating from an objective assessment of teaching quality. Existing research on SET bias has largely focused on factors such as gender [17] and race [18]. Bias can distort SET results to deviate from objective assessments, potentially affecting educational administrators’ perceptions of actual teaching effectiveness. This may lead to misguided management decisions, negatively impacting teachers’ professional development and the conduct of teaching activities, thereby hindering the improvement of educational quality. Therefore, researching SET bias is crucial for improving the evaluation system and enhancing teaching quality.
The phenomenon of evaluation bias is also quite evident in medical education. Unlike general higher education, a distinguishing feature of medical education is that educators often have dual roles as both physicians and teachers [19, 20]. Specialized courses in medical education often employ a team-based teaching model, meaning that a single session may be taught collaboratively by multiple educator [21]. This multi-teacher teaching model provides students with diverse perspectives; however, it may also lead to inconsistencies in teaching methods and styles. At the same time, medical students typically face intense pressure from rigorous training, as they are expected to acquire substantial knowledge and develop numerous skills within a short timeframe while also succeeding in demanding internal and national assessments [22]. Therefore, the unique educational model of medical education means that when evaluating teaching, medical students are influenced not only by their learning environment but also by subjective preferences regarding the educators’ personal characteristics, teaching styles, and course structure. These factors can all contribute to biases that lead to deviations in SET results. Therefore, identifying and mitigating biases in SET within medical education to ensure objective evaluation results and ultimately improve the quality of medical education is a critical topic for current research.
[IMAGE OMITTED: SEE PDF]
To the best of our knowledge, there is currently a lack of research systematically exploring the biasing factors in Student Evaluations of Teaching (SET) within medical education. Undergraduate medical education is a critical component of the broader medical education continuum. Some scholars have pointed out that further research is needed to improve the evaluation processes in undergraduate medical education [23]. Therefore, this study focuses specifically on investigating the biasing factors in undergraduate medical education assessments, with the aim of identifying and analyzing these biases to enhance the fairness and accuracy of evaluation results.
To achieve this objective, we employ a qualitative research approach, conducting semi-structured interviews with medical students who have experienced evaluation processes during their undergraduate education. We aim to identify the key bias factors contributing to deviations in SET from multiple perspectives. By analyzing medical students’ evaluations of courses, instructors, and the teaching environment, our research will explore biasing factors related to students’ individual perceptions, student-teacher interactions, and broader sociocultural contexts. Our study not only provides deep insights into the biasing factors affecting SET in undergraduate medical education but also serves as a reference for designing more rational and fair evaluation systems in future medical education.
Methods
Research design
This study adopted qualitative research methods, specifically conducting semi-structured interviews with key participants to gain a deep and comprehensive understanding of bias factors in SET [24,25]. Data were repeatedly compared, analyzed, and conceptualized using thematic analysis to identify underlying factors and patterns that contribute to the observed phenomena.
Study setting and researcher background
This study was conducted at a public medical university in northern China. The university is a well-established institution in the health field, offering a diverse range of programs such as Clinical Medicine, Dentistry, Pharmacy, and Information Management. It is equipped with advanced teaching resources and a dynamic educational environment that fosters academic excellence. This vibrant setting provided an ideal backdrop for exploring bias factors in student evaluations of teaching. The study focused on undergraduate medical students from various academic years, whose diverse experiences enriched our understanding of the phenomenon.
To further enhance the credibility and transparency of our research, we deliberately assembled a multi-stakeholder research team comprising members from university administration, teaching hospital management, faculty, graduate students, and undergraduates. This multidisciplinary team enabled us to critically reflect on our positionality and consider a variety of perspectives during both data collection and analysis.
Development of interview guide
In this study, we first conducted an extensive literature review to summarize the current state of research on bias factors in SET. Based on the review of existing studies and the needs of this research, we initially formulated an interview outline that mainly included topics such as “Perceptions of SET,” “Manifestations of bias behaviors,” and " Causes of Behavioral Deviation in SET.”
Pilot interviews
Prior to conducting the formal interviews, we carried out pilot interviews to validate the feasibility of the interview outline and procedure. We invited six participants to take part in these pilot interviews. During this process, we discovered that some students struggled to articulate their perspectives on bias, prompting us to include additional supportive questions to elicit deeper, more authentic responses (see Attachment 2 for details). We also refined our questioning approach based on participant feedback. For example, after noting a student’s initial reaction, we would then explore the reasoning behind that response to gain a more nuanced understanding of their thought process. This pilot phase not only helped the research team familiarize themselves with the interview process and analytical methods, but also offered preliminary insights into how respondents might react. To minimize defensiveness and discomfort, we deliberately avoided overly personal or highly sensitive topics, thereby facilitating the collection of more genuine data. In addition, we revised the language of the interview materials to address participant evaluations regarding clarity, completeness, and appropriateness, ensuring that the content would be more accessible to students. Targeted revisions and improvements were made before commencing the formal interviews. Lastly, we summarized the crucial communication techniques and best practices necessary for the formal interviews, laying a strong foundation for the subsequent stages of the research.
Formal interviews
Based on the pilot interviews and incorporating respondents’ suggestions, the outline was revised to finalize the formal interview guide. The interview guide served only to provide basic ideas; during actual interviews, researchers further probed based on respondents’ descriptions. For unclear points during interviews, researchers conducted follow-up interviews or telephone callbacks until clarification was achieved.
Sample selection and data collection
We employed a purposive sampling strategy to ensure sufficient diversity in participants’academic years, majors, and genders. A total of 32 volunteers from various disciplines (e.g., Clinical Medicine, Dentistry, Radiology) were invited, of whom 26 agreed to participate. Interviews were scheduled according to participants’ availability, and preliminary data analysis was conducted concurrently. After completing interviews with 24 participants, we found that the last two interviews did not yield any new ideas or themes, indicating that the study had reached thematic saturation [26]. Following a team discussion, we decided not to include data from these two interviews in the final analysis. Two additional volunteers who had initially expressed willingness but had not yet been interviewed were informed that saturation had been achieved; with their understanding, no further interviews were conducted. As a result, interviews from the first 22 participants (7 males and 15 females) were included in the formal analysis. Detailed information about the participants is provided in Table 1.
All interviews were conducted in Chinese so that participants could comfortably express themselves in their native language, and each session lasted approximately 25–30 minutes. After each interview, the audio recordings were transcribed verbatim, and key excerpts illustrating major themes were selected as representative quotations. A bilingual researcher with relevant academic and professional expertise produced an initial English translation of these excerpts, which was then independently reviewed by another bilingual researcher to ensure fidelity to the original meaning. In cases of ambiguity, the research team referred back to the original Chinese transcripts or consulted the interviewees for clarification. This process aimed to preserve both the authenticity and contextual nuances of participants’ perspectives in the final English text.
Data analysis
We adopted a thematic analysis approach following Braun and Clarke (2006) to identify patterns in the interview data [27]. First, two researchers independently and repeatedly read the verbatim transcripts to familiarize themselves with the content and to generate initial codes based on both the predetermined interview questions and new insights emerging from the data. We then conducted a detailed member checking process during which we shared portions of the coding results and preliminary themes with selected participants to solicit their feedback on the accuracy of these codes and themes. This ensured that our interpretations closely reflected the participants’ true experiences. Subsequently, team members from diverse backgrounds, including representatives from school administration, teaching hospitals, faculty, and students, engaged in extensive discussions to integrate the codes and systematically categorize them into potential themes and subthemes. Any discrepancies between the two independent coders were resolved through team discussions and reflective deliberations until consensus was reached, thereby ensuring consistency and clarity in the coding process. This iterative process of member checking and multi-identity team discussions enhanced the rigor of our data analysis and further validated the effectiveness and credibility of our interpretations. Ultimately, this methodological process resulted in the formation of three overarching themes (see Results).
The study was conducted in accordance with ethical guidelines, and informed consent was obtained from all participants. The study was deemed not to involve biomedical or epidemiological research, and the local Institutional Review Board approved the ethics review. All procedures complied with data protection regulations, and the data were anonymized prior to analysis.
Result
This study employs a qualitative approach to examine biases in SET among undergraduate medical students. Data were collected through in-depth interviews with 22 participants. Approximately 60% of the participants were female, all of whom had prior experience with SET, and most were from upper-year cohorts, ensuring that we captured rich insights from students with extensive evaluation experience. The participants represented a wide range of academic disciplines, including Clinical Medicine, Pharmaceutics, Dentistry, Radiology, Psychiatry, and Medical Laboratory Science, as well as supporting fields such as Health Services and Management and Information Management and Information Systems. This diverse academic background provides a comprehensive basis for exploring the multifaceted nature of bias in SET within medical education.
Multidimensional causes of bias in medical students’ teaching evaluations Teacher-student interaction
Teachers with high personal affinity receive higher ratings
Our findings indicate that in medical education, teachers who exhibit strong interpersonal rapport tend to receive higher evaluations, even when their objective teaching quality is not exceptional. Some students consistently give these teachers higher scores. For example, one participant (PH003) stated, “My friends and people around me tend to give high scores to teachers who are friendly and approachable. Such teachers are popular! Even if their teaching quality isn’t that great, they have good rapport with students.”
Different Classroom Management Styles of Teachers
Different classroom management styles lead to noticeably divergent evaluation outcomes. Teachers with a laissez-faire management style generally receive higher ratings. As one participant (PS012) explained, “Teachers who don’t mind if we do little things in class are seen as better, so we tend to give them high scores.”
In contrast, teachers who enforce strict classroom management, requiring rigorous adherence to rules, may receive lower ratings from some students. For example, another participant (HM004) noted, “Some teachers are very strict about attendance. If you’re late or use your phone in class, they deduct your attendance points. Some students might deliberately give such teachers low scores.”
Informal High Score Agreements
Some informal arrangements exist between teachers and medical students regarding evaluation scores. For instance, some teachers may remind students before or during class to pay attention to evaluations, subtly suggesting that they should assign high marks. In certain cases, especially in elective courses, teachers even establish informal agreements with students to exchange high scores for mutual benefit. As one participant (DE021) explained, “I think one reason is that teachers often mention before class that they hope to receive high scores, and over time, everyone gets used to it… Some teachers even say, ‘At the end of the term, I’ll give you high grades, and you give me high scores.’ Some elective course teachers say this.”
Quality of teacher-student relationships leading to different bias
In medical students’ teaching evaluations, close teacher-student relationships typically result in higher ratings. Students tend to give positive evaluations to those teachers with whom they believe they have built a good rapport, especially when such teachers provide individual guidance and answer questions outside of class. Positive emotional interactions between teachers and students may lead students to overlook the actual quality of teaching, resulting in a positive bias. For example, one participant (HM009) stated, “If a teacher has helped me a lot—even if their lectures aren’t very engaging—but they provide significant help outside of class, I will still give them a very high score.” Conversely, when conflicts or discord occur during the teaching process, such strained or distant relationships often lead students to give lower ratings. As another participant (PH003) explained, “If I hold a teacher in low regard, I might give that course a low score.”
In-group bias among professionals
Our findings indicate that an in-group bias among professionals is evident in medical student evaluations. Specifically, medical students tend to give higher scores to teachers from their own major or college. One participant explained, “Teachers who teach major/specialized courses are usually from our college; we have closer relationships, and everyone takes it more seriously and understands them better” (RA017). Another observed, “There might be a tendency to give higher scores when evaluating teachers from our own major” (HM019).
The authority bias
Medical students also demonstrate authority bias in their evaluations. Some students, concerned that teachers with higher titles or positions might influence their grades, tend to assign them higher scores. As one participant noted, “Because we fear their position or authority, they might influence our grades, so we’re forced to give them high scores. For those with higher positions, like deans or professors, we definitely give high scores” (RA017).
Factors related to medical students
Influence of course attributes on bias in SET
Our study shows that for medical students, elective courses are regarded as less important than major or specialized courses. In interviews, some students explained that because elective courses are not directly related to their professional development, they tend to adopt a more casual attitude toward both learning and evaluation. Consequently, students do not evaluate these courses as objectively, which leads to an arbitrary bias. “Because specialized courses are more important for future employment and life, we’ll be more serious in class and more earnest in evaluations. For other courses that we feel are not useful for our future life, work, or studies, we won’t be as serious and won’t adopt a serious attitude during evaluations. Medical students have many specialized courses, so they might be more relaxed in elective courses and not listen carefully, and they won’t evaluate seriously either.“— PS012.
Influence of course interest on bias in SET
Our findings indicate that medical students’ evaluations are significantly influenced by their interest in the course. Students who are more engaged tend to assign higher ratings. One participant explained, “If I’m interested, I’ll have a good impression of the teacher teaching the course and listen carefully. There’s a psychological effect of ‘love me, love my dog,’ so I might give a higher score. If I’m very interested in a course, I will be more inclined to give it a higher score” (IM018). In contrast, when interest is low, students often assign lower or casual scores, as another participant noted, “If interest is low, everyone might not attend or might take leave, so they might just rate casually” (RA017).
Bias caused by group influence
Our findings indicate that group dynamics influence evaluation bias among medical students. When peers hold negative evaluations of a teacher or course, individual students tend to give lower ratings. For example, one participant stated, “There’s also the persuasion or instigation from classmates around them. Maybe their friends or classmates have a bad impression of the teacher or course, so out of so-called brotherhood or friendship, they give a low score” (PH003).
Defaulting to high scores
Our findings indicate that medical students tend to default to high scores in their teaching evaluations. On one hand, they worry that their evaluations might affect teachers’ career advancement and professional title assessments. Concerned about the potential negative impact on teachers’ future development, students tend to assign higher scores. For example, one participant stated, “Worried that it might impact the teacher’s professional title evaluation, so we give high scores” (DE021).
On the other hand, when students have only vague impressions of certain teachers or cannot recall specific teaching situations, they also tend to give high scores. As one participant noted, “If we don’t know the teacher, we’ll generally give a relatively high score because we don’t really remember. Generally, everyone defaults to giving teachers a high score (default path)” (DE021).
Evaluation system factors
Non-independent evaluation systems
Our results indicate that although the evaluation system claims to be anonymous, it is integrated into the main teaching system where students must log in with their personal information. Consequently, some students question its anonymity, believing that their student IDs and names can reveal their identities. As a result, they exhibit self-protective scoring by giving higher scores to avoid potential risks. One participant stated, “The evaluation requires us to fill in our student ID and name; it’s not private. You have to write your student ID and name, so we’re afraid that teachers will see the evaluation results and cause trouble.” — IM018.
Insufficient feedback
Results indicate that in many cases, students do not receive timely feedback regarding their evaluations, which leads to a lack of motivation and focus when completing them. One participant commented, “Because we don’t receive feedback, we feel it’s not that important. Whether we fill it out well or not doesn’t seem to affect our lives, so we’re not that serious.” — HM004.
Flawed timing and heavy evaluation workload
Our findings indicate that both the short evaluation period and the heavy workload contribute to arbitrary bias in teaching evaluations. Students report that limited time forces them to rush through the evaluation process, while the extensive evaluation tasks due to numerous courses and instructors exacerbate the issue. For example, one participant stated, “They don’t give us enough time; the time is tight, and the system is cumbersome. This makes everyone hurriedly complete the evaluations without being very serious or meticulous, so they might be somewhat casual.” — RA017. Another participant remarked, “The most important thing is the timing of the evaluation. I think it can be postponed—for example, set after the exams, and then we can evaluate.” — DE005.
Students’ lack of Understanding of evaluation indicators
In some cases, the scoring standards or evaluation indicators used in the evaluation system are too general or do not align with the actual teaching situation, making it difficult for students to understand their specific meaning and resulting in casual scoring. One participant remarked, “Sometimes the questions are strange and don’t have much to do with the teacher’s teaching.” — IM018.
For more detailed results, please refer to Table 2.
Discussion
As far as we know, we are among the few exploratory studies revealing the bias factors leading to deviations in student evaluations of teaching in medical education.
Gender bias in higher education, especially in SET, has been revealed by many studies. For instance, Sinclair and Kunda’s research found that students often unconsciously assume a teacher’s ability based on gender during evaluations. Female teachers are often perceived as lacking “traditional leadership characteristics,” such as classroom control and decision-making abilities, which affects their scores [28]. Our findings indicate that medical students are inclined to give higher scores to teachers who display strong personal affinity, while overlooking their actual teaching quality [29, 30]. However, in our research findings, gender factors did not lead to negative bias tendencies in the teaching evaluations of Chinese medical students. Particularly in the evaluations of female teachers, some female teachers even received positive bias tendencies due to their personal affinity. This may be related to China’s collectivist culture. In Western educational systems, there exists an academic authority hierarchy differentiated by social gender stereotypes. Boring’s study found that despite male and female teachers performing equally in teaching, students often rate female teachers lower due to gender bias [31]. Even when teaching content and classroom management abilities are similar, female teachers still receive lower evaluations due to social gender stereotypes, such as males being perceived as more authoritative and possessing leadership qualities [31, 32]. In Western educational evaluation standards, teachers’ leadership and classroom control hold a higher position, and gender role stereotypes are often associated with traits like leadership and authority. This association leads to female teachers being underestimated in these areas, resulting in negative bias due to gender, reflected in lower ratings. In Chinese culture, individual performance is closely linked with collective harmony and harmonious relationships, which is particularly evident in the education system. Collectivism emphasizes harmonious interpersonal relationships and mutual support [33]. Therefore, medical students do not associate academic authority with socially gendered impressions that might disrupt harmony as evaluation criteria in SET. Chinese medical students focus more on the teacher’s professional title and position when considering academic authority. As our results also revealed, some medical students exhibit positive rating biases due to teachers’ high professional status. This phenomenon not only reflects significant differences between Chinese and Western educational cultures in teacher evaluation standards but also indicates that future research on student evaluation systems needs to deeply consider the role of cultural background [34]. It is suggested that when designing and improving student evaluation tools, greater consideration should be given to the cultural background of the medical education system to ensure the objectivity, fairness, and cultural adaptability of the evaluation system.
In our study, we also found that classroom management styles and the establishment of teacher-student relationships are important factors leading to biases in medical students’ teaching evaluations. When students receive positive emotions from teachers (such as lenient management styles and close relationships [35]), they may overlook teaching quality and directly give high scores to the teachers. Conversely, strict classroom management styles and conflicting teacher-student relationships may trigger negative emotions in students, which are reflected in lower scores in teaching evaluations [36]. This emotion-based scoring bias indicates that teaching evaluations are not only feedback on teaching behaviors and quality but also reflections of students’ emotional and relational experiences. This poses challenges to the effectiveness and fairness of teaching feedback and may affect teachers’ genuine understanding and improvement of teaching effectiveness. Therefore, reducing the influence of emotional factors in teaching evaluations and enhancing the objectivity and fairness of evaluations are directions that need further exploration. At the same time, it is necessary to enhance mutual understanding between teachers and students and maintain positive communication and comprehension.
It is noteworthy that we found medical students exhibit behaviors similar to “self-protective evaluations” when conducting teaching evaluations. When facing high-ranking teachers or when there are doubts about the anonymity of the evaluation system, which may pose risks affecting their own development, students may adopt a default high scoring to mitigate risks, leading to biased evaluations. This biased behavior stems partly from students’ understanding of teaching evaluations. For instance, studies have found that the optimal time to provide SET is after exam results are released, which is the most preferred choice. Most students may worry that instructors’ perceptions of student-submitted evaluations might affect exam difficulty or alter grading curves [36, 37], potentially leading students to exhibit positive bias in evaluations to avoid such risks. On the other hand, insufficient trust in the teaching evaluation system is an important factor causing such biases among medical students. When students lack trust in the anonymity and fairness of the evaluation system, they may believe that their evaluation results will be known to teachers, thereby adversely affecting their academic performance and future development. As a result, students tend to choose safe scoring strategies during evaluations, such as giving higher scores to avoid potential risks. Finally, this behavior also reflects students’ improper understanding of the relationship between teachers’ social status and power and the student evaluation process. Due to the particularity of medical education, medical students often need to continue postgraduate studies. Teachers with higher professional titles usually hold higher positions in academia and may significantly influence medical students’ grade assessments and academic recommendations for further studies. Due to doubts about the evaluation system, to avoid conflicts with these teachers or leaving a negative impression, students are more inclined to give higher scores, ignoring genuine teaching evaluations.
This phenomenon weakens the function of teaching evaluations as a tool for teaching improvement. It also suggests that university management departments need to ensure the anonymity of the evaluation process and the confidentiality of evaluation data, enhancing the transparency of the evaluation work [38]. For example, clearly stating the purpose and feedback mechanism of evaluation results can make students believe that their genuine feedback will not negatively affect themselves [39]. Enhancing students’ trust in the evaluation system can help correct biased behaviors. Simultaneously, students should be made aware of the importance of teaching evaluations, understanding that the true purpose of evaluations is to improve teaching quality.
Our study found that the evaluation system can also be adjusted in terms of evaluation timing, evaluation indicators, evaluation policies, and feedback to reduce biased behaviors in medical students’ evaluations. There was a study [40] also indicated that it might not be appropriate for students to evaluate all individual teachers at the end of a course because students may not actually remember and comment on courses from multiple teachers. Our research similarly revealed this point; such an arrangement of evaluation timing can lead to situations where medical students, due to vague impressions and tight schedules, resort to casual scoring or default high scores. The underlying reason can be explained by Cognitive Load Theory [41]. Medical students face significant cognitive load when evaluating multiple courses and teachers and may lack sufficient time, information, and energy due to assessments and other reasons. To simplify decision-making and reduce cognitive load, they adopt quick scoring strategies. This reminds educational management institutions to reasonably allocate more time for medical students to conduct evaluations.
Under the current medical education management system, our research has also found the existence of informal agreements between elective course teachers and students regarding teaching evaluations. As our study mentioned, elective courses often lead medical students to exhibit arbitrary scoring biases due to the nature of these courses, which may affect the personal development of elective course instructors. Therefore, this group of teachers may, to protect their own interests, engage in informal agreements with students to obtain high evaluations [42]. This behavior reflects the existence of moral hazard. Moral hazard refers to a situation where one party, knowing that their actions may not be fully monitored or that they won’t bear all the consequences, may engage in behavior that benefits themselves but is detrimental to others [43]. In this context, teachers, for the sake of their own promotions and career development, may lower teaching difficulty, relax grading standards, or even reach agreements with students to exchange high grades for high evaluations. This not only violates the original purpose of teaching evaluations but may also harm the quality of education and students’ learning outcomes. Existing studies have also found that some university professors, for their own benefit, adopt lenient grading and reduce course difficulty to receive positive evaluations from students, which in turn helps them secure promotions and tenure [44]. At the same time, informal agreements also involve teachers in unstable positions, who are more inclined to offer simpler courses and give higher grades to achieve high evaluations [18]. Such reciprocal exchange of interests further exacerbates the moral hazard. Students, in turn, tend to give high scores to teachers who award them higher grades, leading to distorted evaluation results that fail to reflect the actual teaching performance of the teachers.
In the evaluation indicator system, our results also revealed that some medical students subjectively evaluate based on the attributes of elective or major/specialized courses and classroom interest, leading to evaluation biases. Simultaneously, some medical students pointed out insufficient understanding of some indicators in teaching evaluations [36]. This indicates that in the future development of medical education evaluation systems, the student evaluation indicator system for courses of different attributes should be adaptively adjusted according to the courses [45]. Dolmans’ research also mentioned that administrators have significantly different views and understandings of high-quality learning and teaching compared to students [46], which may lead to student teaching evaluation systems and their main indicators developed primarily by administrators being difficult for students to understand. Therefore, during development, the participation of student representatives in research can be increased, or students’ weight in adjustments can be increased [47, 48]. Focusing on the adaptability of evaluation indicators to students and courses is crucial.
Furthermore, our research results indicate that medical students show a significantly reduced interest in evaluating teaching effectiveness when timely feedback is lacking. The importance of feedback in higher education has been confirmed by numerous studies, which show that college students need feedback to ensure that their evaluation opinions are seen and valued [49]. In medical education evaluations, emphasizing feedback is crucial. The absence of timely feedback leads to a decrease in students’ enthusiasm to participate in evaluations, while efficient feedback mechanisms can significantly enhance students’ commitment to the evaluation process [45]. This may be because timely feedback not only validates medical students’ contributions, making them feel that their evaluations are meaningful and impactful, but also enhances their motivation to engage actively in the evaluation process. To achieve this goal, medical education institutions can establish more efficient online evaluation systems and real-time data analysis tools through cutting-edge digital technologies. These tools can identify evaluation results through efficient data analysis and provide timely feedback to students, thereby increasing their willingness to participate in evaluations and promoting the provision of more effective course evaluation information [39]. Additionally, these tools can rapidly generate teaching improvement suggestions through data analysis, helping teachers save time and promptly adjust teaching methods and course content. Therefore, in the continuously developing digitalized medical education environment, establishing a digital evaluation system that prioritizes timely feedback is essential for promoting active student participation and ensuring the continuous improvement of medical education programs.
Finally, our research results showed that evaluation biases caused by student groups in evaluations include the influence of peer effects [50], where medical students’ evaluations tend to align under the influence of peer evaluations. When medical students learn that their peers have consistent evaluations of a teacher, they may also tend to ignore the objective evaluation goal to stay consistent with the majority. This is similar to the herd effect mentioned in behavioral economics [51]. The underlying reason for this conformity behavior is the social adaptation needs of medical students. It also indicates that in evaluation work, group trends should be avoided to obtain objective and real teaching evaluations. Therefore, in the evaluation system, attention should be paid to the tendency of individual students to ignore objective evaluation goals to stay consistent with the majority. Maintaining students’ independent evaluations while correcting improper perceptions among medical students and enhancing correct understanding of evaluations is essential.
Our research results have detailed the different manifestations of bias factors in student evaluations within undergraduate medical education. Currently, many low- and middle-income countries face issues such as insufficient medical human resources. To increase the number of medical talents, improving the quality of their local medical education is an important solution. Student evaluations can greatly help understand the problems existing in medical education. Understanding the factors of evaluation bias and timely adjustments in the design, operation, and maintenance of evaluation systems can fully leverage the true role of student evaluations in medical education.
China is also at the forefront globally in the application of information technology, and most medical education evaluations rely on online network systems. Therefore, we sincerely hope that the bias factors among medical students uncovered in our research and the many challenges we mentioned that are faced in the operation of evaluation systems can provide some reference value for other low- and middle-income countries in constructing and adjusting their evaluation systems.
Limitations and future directions
Due to limitations in research capacity and resources, the participants in this study were drawn exclusively from a single medical university in northern China. Although previous studies suggest that teaching evaluation systems among Chinese students are generally consistent, relying on a single-institution sample may restrict the generalizability of our findings to other contexts. Furthermore, while thematic analysis enables the exploration of complex phenomena, it is inherently susceptible to researcher bias. To mitigate this risk, we engaged in extensive discussions within a multidisciplinary team, performed member checking, and conducted ongoing reflection to continuously evaluate and adjust our assumptions and potential biases, thereby enhancing the rigor of our data interpretation. Nevertheless, since we did not employ formal peer debriefing methods, some interpretative bias may still be present. Moreover, the study’s focus on the Chinese context, which is characterized by distinct educational systems, cultural norms, and evaluation practices, may limit the applicability of our findings to medical education settings in other countries.
Building on our current practice of reflecting on results through multidisciplinary team discussions, future research should further enhance reflexivity by incorporating a systematic external audit mechanism and structured participant feedback. These additional measures are expected to more effectively minimize subjective influences, ensuring that our findings are both transparent and credible and bolstering our capacity to critically evaluate and mitigate researcher bias throughout the study process. Furthermore, relying solely on qualitative data may not fully capture the complexity of student evaluation biases; therefore, integrating quantitative methods (a process we are currently implementing) is essential for comprehensive validation of these issues. Future studies should also broaden their scope by employing mixed-methods approaches to achieve a more nuanced and robust understanding of student evaluation biases or by conducting cohort studies that longitudinally track the evolution of these biases over time, thereby revealing underlying patterns and trends. Ultimately, we anticipate that the findings of this study will not only provide valuable theoretical and practical insights for developing countries with comparable medical education systems but also stimulate increased scholarly interest in this field. Through rigorous cross-national comparative analyses, we aim to elucidate robust patterns that can correct biases in undergraduate student evaluations of teaching, thereby enhancing the quality of medical education, mitigating the shortage of medical professionals in low- and middle-income countries, and ultimately safeguarding public health.
Conclusion
This study represents the first systematic exploration of the causes of biases in teaching evaluations among Chinese undergraduate medical students. Biases in medical students’ teaching evaluations arise from teacher-student relationships, student perceptions and systemic flaws. Emotional factors such as teacher affinity, lenient classroom management, and informal agreements lead to favorable ratings. Systemic issues like doubts about anonymity, lack of timely feedback, and poorly understood evaluation indicators undermine objectivity. Group dynamics and concerns about teachers’ career advancement also contribute to inflated ratings, reflecting conformity or self-protective behaviors rather than objective assessments. Addressing these biases is crucial for enhancing the fairness and effectiveness of evaluations and improving medical education quality.
To reduce biased behaviors in medical student evaluations and improve the quality of medical education, it is essential to address multiple aspects. On the student side, it is necessary to reshape medical students’ perceptions of teaching evaluations, finding more effective ways to help students understand the true significance of evaluations. By providing timely feedback on evaluation outcomes, we can enhance medical students’ participation in the evaluation process, significantly reducing their self-protective scoring behaviors. On the teacher side, educational management departments should establish effective communication channels between teachers and students, allowing teachers to personally explain the importance of teaching evaluations to students. On the part of university education management departments, appropriate student evaluation systems should be constructed based on the characteristics of medical education. In the future process of digitalizing medical education, a digital student evaluation system with timely feedback and anonymity as key features can be established. In terms of evaluation systems, adaptive indicator systems for courses with different attributes should be developed. Evaluation timings should be scheduled reasonably according to students’ learning arrangements, and the arrangements for multi-stage and multiple evaluations in medical education should be explored.
Data availability
No datasets were generated or analysed during the current study.
Abbreviations
SET:
Student evaluations of teaching
RCN M. From content-centred to learning-centred approaches: shifting educational paradigm in higher education. J Educational Adm History. 2017;49(1):72–86.
Steyn CD, Sambo C. Eliciting student feedback for course development: the application of a qualitative course evaluation tool among business research students. Assess Evaluation High Educ. 2018;44(1):11–24.
Uttl B, White CA, Gonzalez DW. Meta-analysis of faculty’s teaching effectiveness: student evaluation of teaching ratings and student learning are not related. Stud Educational Evaluation. 2017;54:22–42.
Cox SR, RMK, Lowery CM. The student evaluation of teaching: let’s be Honest – Who is telling the truth?? Mark Educ Rev. 2022;32(1):82–93.
Uijtdehaage S, O’Neal C. A curious case of the Phantom professor: mindless teaching evaluations by medical students. Med Educ. 2015;49(9):928–32.
Seldin PMJE, Seldin CA. The teaching portfolio: A practical guide to improved performance and promotion/tenure decisions. San Francisco; 2010.
Leamon MHFL. Measuring teaching effectiveness in a pre-clinical multi-instructor course: a case study in the development and application of a brief instructor rating scale. Teach Learn Med. 2005;17(2):119–29.
Dunrong B, Fan M. On student evaluation of teaching and improvement of the teaching quality assurance system at higher education institutions. Chin Educ Soc. 2009;42(2):100–15.
Pineda P, Steinhardt I. The debate on student evaluations of teaching: global convergence confronts higher education traditions. Teach High Educ 2020(2).
Chen Y. [email protected], https://orcid.org/---, Information VFA: Does students’ evaluation of teaching improve teaching quality? Improvement versus the reversal effect. Assessment & Evaluation in Higher Education 2023.
Niu SL, Liang Y, Abuduklm AJLL, Jin YY, Yan J. The application of instant evaluation based on information technology in anatomy teaching from China. Int J Morphol. 2022;40(4).
Wang Q, Li H. Research and analysis of evaluation and expansion of physical education teaching system based on internet of things communication. J Sens. 2022.
Marsh HW, RLA. Making students’ evaluations of teaching effectiveness effective: the critical issues of validity, bias, and utility. Am Psychol. 1997;52(11):1187.
Spooren P, Brockx B, Mortelmans D. On the validity of student evaluation of teaching: the state of the Art. Rev Educ Res. 2013;83(4):598–642.
Mengel F, Sauermann J, Zölitz U. Gender bias in teaching evaluations. J Eur Econ Assoc. 2019;17(2):535–66.
Alhija FN-A. Guest editor introduction to the special issue contemporary evaluation of teaching: challenges and promises. Stud Educational Evaluation. 2017;54:1–3.
MacNell L, Driscoll A, Hunt AN. What’s in a name: exposing gender bias in student ratings of teaching. Innov High Educ. 2015;40(4):291–303.
Stroebe W. Student evaluations of teaching encourages poor teaching and contributes to grade inflation: A theoretical and empirical analysis. Basic Appl Soc Psychol. 2020;42(4):276–94.
Van den Berg BAM, Bakker AB, Ten Cate TJ. Key factors in work engagement and job motivation of teaching faculty at a university medical centre. Perspect Med Educ. 2013;2:264–75.
van Bruggen LTCO, Chen HC. Develo** a novel 4-C framework to enhance participation in faculty development. Teach Learn Med. 2020;32(4):371–9.
Leamon MH, Fields L. Measuring teaching effectiveness in a pre-clinical multi-instructor course: A case study in the development and application of a brief instructor rating scale. Teach Learn Med. 2005;17(2):119–29.
Melnick DE, Dillon GF, Swanson DB. (2002). Medical licensing examinations in the United States. Critical Issues in Dental Education, 66(5), 595–599.: Medical licensing examinations in the United States. Critical Issues in Dental Education 2002, 66(5):595–599.
Schiekirka S, Raupach T. A systematic review of factors influencing student ratings in undergraduate medical education course evaluations. BMC Med Educ. 2015;15.
Hawk VHea. Student and faculty perceptions of nutritioneducation in medical school. Clin Nutr ESPEN. 47:351–7.
Sohrabi ZKM, Vanaki Z, et al. Lived experiences of educational leaders in Iranian medical education system: a qualitative study. Global J Health Sci. 2015;8(7):251.
Im DPJ, Lee H, Jung H, Ock M. Qualitative research in healthcare: data analysis. J Prev Med Public Health. 2023;52(2):100–10.
Braun V, Clarke V. Using thematic analysis. 2006.
Sinclair L, Kunda Z. Motivation and the construction of stereotypes: the case of gender. Psychol Bull. 2000;126(5):577–603.
Naftulin DH, WJJE, Donnelly FA. The Doctor Fox lecture: A paradigm of educational seduction. Acad Med. 1973;48(7):630–5.
Clayson D. The student evaluation of teaching and likability: what the evaluations actually measure. Assess Evaluation High Educ. 2021;47(2):313–26.
B A. Gender biases in student evaluations of teaching. J Public Econ. 2017;145:27–41.
Peterson DAM, Biederman LA, Andersen D, Ditonto TM, Roe K. Mitigating gender bias in student evaluations of teaching. PLoS ONE. 2019;14(5).
Zheng YXZ, Wei L, et al. The neural representation of relational-and collective-self: two forms of collectivism. Front Psychol. 2018;9:2624.
Centoni M, Maruotti A. The role of cultural background and prior experience in shaping student evaluations of teaching. Educational Meas. 2021;45(3):250–64.
Feistauer D, Richter T. How reliable are students’ evaluations of teaching quality? A variance components approach. Assess Evaluation High Educ. 2017;42(8):1263–79.
Almakadma AS, Fawzy NA, Baqal OJ, Kamada S. Perceptions and attitudes of medical students towards student evaluation of teaching: A cross-sectional study. Med Educ Online 2023;28(1).
Li GHG, Wang X, et al. A multivariate generalizability theory approach to college students’ evaluation of teaching. Front Psychol. 2018;9:1065.
Afonso NM, CLJ, Mascarenhas OA, et al. Are anonymous evaluations a better assessment of faculty teaching performance? A comparative analysis of open and anonymous evaluation processes. Fam Med. 2005;37(1):43–7.
Schiekirka SRD, Heim S, et al. Student perceptions of evaluation in undergraduate medical education: A qualitative study from one medical school. BMC Med Educ. 2012;12:1–7.
Uijtdehaage S, O’Neal C. A curious case of the Phantom professor: mindless teaching evaluations of medical students. Med Educ. 2015;49:928–32.
Israel A, Rosenboim M, Shavit T. Time preference under cognitive load - An experimental study. J Behav Experimental Econ. 2021;90.
Donaldson JH, GM. Systematic review of grading practice: is there evidence of grade inflation? Nurse Educ Pract. 2012;12(2):101–14.
Zhang L, Li X. The impact of traditional culture on farmers’ moral hazard behavior in crop production: evidence from China. Sustainability. 2016;8(7):643.
Ewing AM. Estimating the impact of relative expected grade on student evaluations of teachers. Econ Educ Rev. 2012;31(1):141–54.
Zhang ZM, Wu Q, Zhang XP, Xiong JY, Zhang L, Le H. Barriers to obtaining reliable results from evaluations of teaching quality in undergraduate medical education. BMC Med Educ. 2020;20(1).
Dolmans DSRE, Van Berkel HJM et al. Quality assurance of teaching and learning: enhancing the quality culture. Med Education: Theory Pract Edinburgh: Churchill Livingston Elsevier Ltd 2011:257–64.
Zerihun Z, Beishuizen J, Van Os W. Student learning experience as indicator of teaching quality. Educational Assess Evaluation Account. 2012;24(2):99–111.
Zerihun ZBJ, Van Os W. Student learning experience as indicator of teaching quality. Evaluation Account. 2012;24:99–111.
Kember DLDYP, Kwan KP. Does the use of student feedback questionnaires improve the overall quality of teaching? Assess Evaluation High Educ. 2002;27(5):411–25.
Poldin O, Valeeva D, Yudkevich M. Which peers matter: how social ties affect Peer-group effects. Res High Educt. 2016;57(4):448–68.
Al-Shaikhli D. The effect of the tracking technology on students’ perceptions of their continuing intention to use a learning management system. Educ Inform Technol. 2023;28(1):343–71.
© 2025. This work is licensed under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.