Content area
Background
There have been few comparisons of global and analytic evaluations of fixed prosthodontic procedures. Given the growing number of dental students and the time-consuming nature of assessment, there is a need for a simple but reliable method of evaluation. This study therefore evaluated and compared inter-rater reliability of assessment of preclinical prosthodontic procedures using global and analytic methods and the impact of academic rank on evaluation outcomes.
Methods
Two professors and three assistant professors evaluated five different prosthodontic procedures performed by dental students using two evaluation methods (analytic evaluation using a rubric and global “glare and grade”). Inter-examiner reliability was assessed using interclass correlation coefficients.
Results
Interclass correlations ranged from moderate to excellent for both analytic and global evaluations. There were no significant differences in interclass correlations between the analytic and global methods. There were no significant differences in grading between professors and assistant professors for either approach.
Conclusions
With proper faculty calibration, global evaluation is equivalent to using a analytic method of evaluation and is not affected by academic rank. Overall, the evaluation method appears to have less of an impact on reliability than the need to calibrate faculty members at the beginning of the academic year.
Background
Student evaluations are fundamental to the educational outcomes of teaching, and ensuring that faculty members are calibrated in their assessments is essential for delivering consistent and quality dental education [1]. Evaluating large numbers of students can be challenging and time consuming. Two evaluation methods have traditionally been used to evaluate dental procedures: the global method (“glare and grade”) and the analytic method [2]. Although both approaches evaluate the overall performance of the student’s work based on several aspect of the work submitted, in the global method, dentists examine the procedure and provide a total grade for the procedure based on subjective assessment, while in the analytic method, dentists must follow a detailed rubric to formulate a grade for each aspect of the procedure and a summed total at the end. Although some have advocated for analytic methods and reported better reliability with them than with the global approach [3, 4], others have reported no difference between the two evaluation methods [5]. Using a rubric to evaluate students is a proven means to record accurate scores [6, 7], and it is also a good way to provide structured feedback to students and identify weaknesses and strengths in teaching to inform curriculum changes [8, 9]. While the global method has the advantage of being quick, allowing dentists to spend more time teaching rather than evaluating students, it is disadvantaged by limiting the amount of feedback available to students on specific deficits, hence affecting self-evaluation. Conversely, analytic evaluation offers both students and staff clear criteria for assessment and hence helps students to self-assess their work. However, evaluating each aspect of a dental procedure can be time consuming. While many preclinical prosthodontic procedures can now be assessed using digital evaluation methods such as scanners and digital software [10], these expensive tools are not available in every dental school. There is still a need to evaluate students using traditional methods in the preclinical setting.
There have been previous investigations of inter- and intra-rater reliability in the preclinical evaluation of prosthodontics procedures. Shih et al. [11] investigated intra- and inter-rater reliability of full metal and metal ceramic crown preparations using a digital scanner and software, reporting better inter-rater reliability with anterior teeth than posterior teeth and better intra-rater reliability than inter-rater reliability. Another study investigated intra-rater agreement between faculty and digital scanners in evaluating crown preparations performed by undergraduate students in preclinical settings [12], reporting consistency in evaluations made by faculty and digital scanners. Conversely, a similar study comparing inter- and intra-grader agreement using a digital scanner and software with traditional evaluation methods found improved intra- and inter-grader agreement when using the computational approach than traditional methods of evaluation [13].
Despite advances in digital dentistry and artificial intelligence, faculty must still regularly evaluate basic dental procedures. Global (“glare and grade”) evaluation is time saving in the clinical and laboratory settings, but there have been no analytic studies of whether global and analytic approaches are equivalent. Therefore, the aim of this study was to evaluate and compare inter-rater reliability in assessing various preclinical prosthodontic procedures using the global and analytic approaches. The study also compared the grades awarded by faculty of different academic rank for evaluation outcomes. The null hypotheses were (i) that there is no significant difference in inter-rater reliability between faculty members of different rank; (ii) there is no significant difference between the global vs. analytic methods of evaluation; and (iii) there is no significant difference between global and analytic evaluations performed by the same individual.
Methods
The Research Ethics Committee of the Faculty of Dentistry at King Abdulaziz University Dental Hospital approved the study (KAUDH; Ref: 4597691) and waived the need for consent.
Five prosthodontists (two professors and three assistant professors) evaluated the same five prosthodontic procedures twice but three weeks apart. The procedures were performed by undergraduate dental students and included: (i) three different full metal crown preparations; (ii) three different all-ceramic crown preparations; (iii) three different custom-made posts and cores; (iv) three different prefabricated posts and cores; and (v) three different indirect provisional restorations. No other prosthodontic procedure was included.
These procedures were performed in the preclinical setting by fourth year undergraduate dental students attending King Abdulaziz University Dental Hospital. Two calibration sessions were conducted at the beginning of the school year to familiarize faculty members with the grading rubrics and hence calibrate them in assessing and evaluating students throughout the year.
The first method of evaluation was a rubric of criteria (detailed analytic evaluation) based on textbooks (see Supplementary File 1 for an example rubric and its presentation to faculty during the calibration session using Google Forms for ease of administration and data gathering) [14, 15]. Each aspect of the five prosthodontic procedures was scored out of 10 by each evaluator: eight aspects of the procedure were scored for full metal crown preparations, eight for all-ceramic crown preparations, nine for custom posts and cores, six for prefabricated posts and cores, and five for indirect provisional restorations. The rubric was explained in detail to the examiners prior to recording grades and was the rubric used throughout the academic year to evaluate student performance. The second evaluation performed three weeks later was a “glare and grade” global evaluation of the procedure marked out of 10, where a score of 6 and below was regarded as “lacking competence”, 7 or 8 was “competent”, and 9 or 10 was “proficient”.
Preparations were performed on acrylic teeth placed on Kavo models in the laboratory phantom head containing all teeth in the arch to ensure the presence of adjacent sound teeth for proximal contact and occlusal reduction assessments.
Post-hoc calculation of the power of the paired sample t-test was performed using G*Power software. For α = 0.05, an effect size of 0.4, a sample size of 75, and a maximum df = 73, the power was 0.982.
IBM SPSS v28 (Armonk, NY, USA) was used to measure inter-examiner reliability for both evaluation methods (analytic vs. global) using interclass correlation coefficients (ICCs). An ICC value below 0.5 was considered poor, between 0.5 and 0.75 moderate, from 0.75 to 0.9 good, and above 0.9 excellent [16]. The two evaluation methods were compared using paired-samples t-tests, and independent sample t-tests were used to compare evaluation outcomes between professors and assistant professors. Bonferroni correction was applied to account for multiple comparisons. A p-value < 0.05 was considered significant.
Results
Study participants
Two professors in their fifties with over 20 years of experience and three assistant professors in their thirties with a different range of experience (10 years, 5 years, and 15 years) participated in the study. Fifteen students both male and female were evaluated, and they were aged 21 and 22 years.
Analytic evaluation
For full metal crowns, the inter-rater reliability as measured by ICCs ranged from moderate to excellent (0.580–0.938) for most aspects of each procedure, except for margin definition, roughness, and continuity, where reliability was poor (0.119) (Supplementary Table 1). For all ceramic crowns, inter-rater reliability ranged from moderate to excellent (0.583–0.907) for all aspects (Supplementary Table 2). For custom posts and cores, inter-rater reliability ranged from moderate to excellent (0.632–0.952) for all aspects (Supplementary Table 3). For prefabricated posts and cores, inter-rater reliability ranged from moderate to excellent (0.500-1.000) for most aspects (Supplementary Table 4). For indirect provisional restorations, inter-rater reliability ranged from poor (0.263) for occlusion and (0.296) for finish and polish, moderate for evaluating margins and proximal contacts (0.759 and 0.817, respectively), and excellent for evaluating axial contours (0.918; Supplementary Table 5).
Global evaluation method
Inter-rater reliabilities for global evaluations were moderate for prefabricated posts and cores, good for full metal crown preparations and indirect provisional restorations, and excellent for anterior ceramic restorations and custom posts and cores (Table 1), although the wide confidence intervals indicate some variability in each group.
[IMAGE OMITTED: SEE PDF]
Analytic vs. global evaluation method for each study participant
Paired sample t-testing revealed no statistically significant differences between analytic and global evaluations for each study participant, except for professor 1 for indirect provisional evaluations, where there was a marginally significant difference between the global and analytic analysis (p = 0.04) (Table 2), but these were not significant after correction for multiple comparisons (all p = 1.0).
[IMAGE OMITTED: SEE PDF]
Global and analytic evaluations by professors and assistant professors
Independent sample t-testing revealed no significant differences in global evaluations by professors and assistant professors, between assistant professors, or between professors (Table 3). Furthermore, independent sample t-testing revealed no significant differences in analytic evaluations by professors and assistant professors or between assistant professors (Table 4).
[IMAGE OMITTED: SEE PDF]
[IMAGE OMITTED: SEE PDF]
Discussion
This study failed to reject the null hypothesis that there were no significant differences in evaluations between differently ranked faculty members, between the global and analytic methods of evaluation, nor between global and analytic evaluations made by the same individual. For the analytic method of evaluation, inter-rater reliability ranged from poor to excellent but was mainly moderate to excellent and rarely poor. Poor reliability was noted for margin definition roughness and continuity of full metal crown preparation and for occlusion and finish and polish in indirect provisional restorations, which are critical aspects of these procedures.
The poor reliability in full metal crown preparation with respect to margin definition/roughness/continuity can be attributed to the fact that a dental explorer was not provided during the evaluation procedure and that the decision was made on visual inspection rather than using tactile sensation to evaluate the margin. Many studies have investigated the need to detect caries through tactile sensation [17,18,19] and to investigate crown margin openness through tactile sensation [20]. Although some stress the importance of using explorers, others suggest the need to use other innovative methods such as dental explorers with different tip diameters to define a crown margin [21], although the best means to evaluate crown preparation has yet to be defined.
Occlusion, finishing, and polishing of indirect provisional restorations yielded poor reliability. This might be due to the lack of articulated opposing models to evaluate occlusion, which meant that evaluators had to use manual articulation, which may not always have been accurate. Additionally, evaluators were not provided with articulating paper to check for occlusion, which may have contributed to poor reliability. The reason for the lack of agreement for finishing and polishing is uncertain.
There were no significant differences between the two evaluation methods, especially after correction for multiple comparisons. The overall agreement suggests that either approach is accurate if calibration is carried out prior to evaluation [22,23,24,25]. However Al Ammari et al. [26] recommended the use of analytic methods to facilitate reliability between evaluators.
There were no differences in evaluation outcomes according to academic rank, nor were these parameters predictive of outcome. This finding may encourage and welcome the presence of diversity in academic institutions.
These findings contrast with those of Al Amri et al. [27], who reported inconsistencies in evaluations made by the analytic and global methods, with analytic methods awarding higher grades than global methods and junior faculty awarding higher grades than senior faculty. In this previous study, the junior faculty were demonstrators who had no experience of teaching and were newly graduated, while in the current study both faculty members of different ranks were consultants in prosthodontics with qualifications in postgraduate education in prosthodontics.
This study was limited by carrying out the evaluation on the completed procedure rather than examining each aspect during the laboratory session. Additionally, not providing examiners with explorers may have resulted in poor reliability. Future reliability studies should investigate each aspect of the procedure and further research is needed to determine the exact and best tools required to evaluate tooth preparations for indirect restorations. If the global approach is adopted as a convenient method to overcome the lack of self-evaluation in the global method, a separate self-evaluation rubric could complement the global approach to allow students to consider when and where they failed to achieve the ideal requirements. Finally, the wide confidence for the reliability estimates for the global approach may reflect unmeasured variability, such as intra-examiner variability, which was not measured here.
Conclusions
Global (glare and grade) evaluation is as accurate as using analytic approaches, so both could be used for evaluations. Additionally, academic rank had no negative impact on the evaluation process.
Data availability
All data generated or analyzed during this study are included in this published article and its supplementary information files.
Abbreviations
ICC:
Interclass correlation coefficient
Sherwood IA, Douglas GV. A study of examiner variability in assessment of preclinical class II amalgam Preparation. J Educ Ethics Dentistry. 2014;4(1):12.
Yune SJ, Lee SY, Im SJ, Kam BS, Baek SY. Holistic rubric vs. analytic rubric for measuring clinical performance levels in medical students. BMC Med Educ. 2018;18(1):124.
Goepferd SJ, Kerber PE. A comparison of two methods for evaluating primary class II cavity preparations. J Dent Educ. 1980;44(9):537–42.
Schmitt L, Moltner A, Ruttermann S, Gerhardt-Szep S. Study on the interrater reliability of an OSPE (Objective structured practical Examination) - Subject to the evaluation mode in the Phantom course of operative dentistry. GMS J Med Educ. 2016;33(4):Doc61.
Philips Z, Ginnelly L, Sculpher M, Claxton K, Golder S, Riemsma R, Woolacoot N, Glanville J. Review of guidelines for good practice in decision-analytic modelling in health technology assessment. Health Technol Assess. 2004;8(36):iii–iv. ix-xi.
Al Moaleem MM, Adawi HA, Alahmari NM, Sayed ME, Albar NH, Okshah A, Meshni AA, Gadah TS, Alshehri AH, Al-Makramani BMA. Using an analytic rubric system for the evaluation of anterior ceramic crown Preparation performed by preclinical dental students. PLoS ONE. 2025;20(4):e0318752.
Habib SR. Rubric system for evaluation of crown Preparation performed by dental students. Eur J Dent Educ. 2018;22(3):e506–13.
Jonsson A, Svingby G. The use of scoring rubrics: reliability, validity and educational consequences. Educational Res Rev. 2007;2(2):130–44.
O’Donnell JA, Oakley M, Haney S, O’Neill PN, Taylor D. Rubrics 101: a primer for rubric development in dental education. J Dent Educ. 2011;75(9):1163–75.
Baik KM. Digital evaluation of occlusal reduction of metal crown preparations in students in the COVID-19 era: A cross-sectional study. Saudi Dent J. 2023;35(8):1023–8.
Shih W, Tran K, Yang V, El Masoud B, Sexton C, Zafar S. Investigation of inter- and intra-rater reliability using digital dental software for prosthodontics crown preparations. J Dent Educ. 2020;84(9):1037–45.
Sadid-Zadeh R, Nasehi A, Davis E, Katsavochristou A. Development of an assessment strategy in preclinical fixed prosthodontics course using virtual assessment software-Part 2. Clin Exp Dent Res. 2018;4(3):94–9.
Miyazono S, Shinozaki Y, Sato H, Isshi K, Yamashita J. Use of digital technology to improve objective and reliable assessment in dental student simulation laboratories. J Dent Educ. 2019;83(10):1224–32.
Rosenstiel SF, Land MF. Contemporary fixed Prosthodontics-E-Book: contemporary fixed Prosthodontics-E-Book. Elsevier Health Sciences; 2015.
Shillingburg H, Hobo S, Whitsett L, Jacobi R, Brackett S. Preparations for full veneer crowns. Fundamentals Fixed Prosthodont. 1997:139–54.
Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15(2):155–63.
Ismail AI. Visual and visuo-tactile detection of dental caries. J Dent Res. 2004;83(Spec C1suppl):C56–66.
Topping GVA, Pitts NB. Clinical visual caries detection. Monogr Oral Sci. 2009;21:15–41.
Zandoná AF, Zero DT. Diagnostic tools for early caries detection. J Am Dent Association. 2006;137(12):1675–84.
Baldissara P, Baldissara S, Scotti R. Reliability of tactile perception using Sharp and dull explorers in marginal opening identification. Int J Prosthodont. 1998;11(6):591–4.
Hayashi M, Wilson NH, Ebisu S, Watts DC. Influence of explorer tip diameter in identifying restoration margin discrepancies. J Dent. 2005;33(8):669–74.
Gunnell KL, Fowler D, Colaizzi K. Inter-rater reliability calibration program: critical components for competency‐based education. J Competency‐Based Educ. 2016;1(1):36–41.
Lyness SA, Peterson K, Yates K. Low inter-rater reliability of a high stakes performance assessment of teacher candidates. Educ Sci. 2021;11(10):648.
Nederhand ML, Tabbers HK, Rikers RM. Learning to calibrate: providing standards to improve calibration accuracy for different performance levels. Appl Cogn Psychol. 2019;33(6):1068–79.
Tabassum A, Alhareky M, Madi M, Nazir MA. Attitudes and satisfaction of dental faculty toward calibration: A cross-sectional study. J Dent Educ. 2022;86(6):714–20.
Alammari M, Nawar E-S. Inter-rater and intra-raters’ variability in evaluating complete dentures insertion procedure in senior undergraduates’ prosthodontics clinics. Electron Physician. 2018;10(9):7287.
Al Amri MD, Sherfudhin HR, Habib SR. Effects of evaluator’s fatigue and level of expertise on the global and analytical evaluation of preclinical tooth Preparation. J Prosthodont. 2018;27(7):636–43.
© 2025. This work is licensed under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.