Content area
The rise of AI in educational assessments has significantly enhanced efficiency and accuracy. However, it also introduces critical ethical challenges, including bias in grading, data privacy risks, and accountability gaps. These issues can undermine trust in AI-driven assessments and compromise educational fairness, making a structured ethical framework essential. To address these challenges, this study empirically validates an existing triadic ethical framework for AI-assisted educational assessments, originally proposed by Lim, Gottipati and Cheong (In: Keengwe (ed) Creative AI tools and ethical implications in teaching and learning, IGI Global, 2023), grounded in student perceptions. The framework encompasses three ethical domains—physical, cognitive, and informational—which intersect with five key assessment pipeline stages: system design, data stewardship, assessment construction, administration, and grading. By structuring AI-driven assessments within this ethical framework, the study systematically maps key concerns, including fairness, accountability, privacy, and academic integrity. To validate the proposed framework, Structural Equation Modeling (SEM) was employed to examine its relevance and alignment with learners' ethical concerns. Specifically, the study aims to (1) evaluate how well the triadic framework aligns with learners' perceptions of ethical issues using SEM analysis, and (2) examine relationships among the assessment pipeline stages, ethical considerations, pedagogical outcomes, and learner experiences. Findings reveal robust connections between AI-assisted assessment stages, ethical concerns, and learners' perspectives. By bridging theoretical validation with practical insights, this study emphasizes actionable strategies to support the development of AI-driven assessment systems that balance technological efficiency, pedagogical effectiveness, and ethical responsibility.
Introduction
The integration of artificial intelligence in education (AIED) has rapidly transformed educational assessments, promising enhanced accuracy and efficiency. However, this transformative potential has not been without its ethical challenges, raising critical concerns such as fairness, accountability, privacy, and trust (Memarian & Doleck, 2023; Nguyen et al., 2023). A notable gap lies in the consensus of ethical principles governing the use of AI within assessments. This draws attention to the necessity for a robust theoretical framework (e.g., Li & Gu, 2023). Such a framework should aim to steer the development and validation of ethical constructs in AI-enabled educational assessments.
Understanding the ethical dimensions of AI-enabled educational assessments is essential for responsible governance and implementation. AI-enabled educational assessments refer to automated systems that leverage artificial intelligence to design, administer, and evaluate student learning outcomes (Ouyang et al., 2023). These systems promise efficiency and personalization but also raise ethical concerns regarding fairness, accountability, and privacy.
To provide a structured approach to these challenges, this study empirically validates an established triadic ethical framework (Lim et al., 2023a) in the context of AI-assisted assessments. The triadic ontological framework proposed by Lim et al. 2023a organizes AI-enabled educational assessment components into three key domains: physical, cognitive, and informational. This framework builds on theoretical foundations from Ashok et al. (2022), Project and Peirce (1998), Popper (1979), and Ogden and Richards (1923), offering a comprehensive lens to examine AI’s role in assessments. While this framework serves as a theoretical foundation, this study builds upon it by subjecting the model to empirical validation through structural equation modeling (SEM). The validation process empirically assesses the framework’s robustness, ensuring that its structure accurately represents the ethical and functional dimensions of AI-assisted assessments. Without such validation, ethical considerations in AI-driven education remain conceptual rather than actionable. By systematically linking assessment pipeline stages with key ethical concerns, this study provides a data-driven foundation for designing systems that balance the application of AI with ethical integrity.
The framework comprises five key stages of the AI assessment pipeline: (i) system design and check, (ii) data stewardship and surveillance, (iii) assessment construction and rollout, (iv) assessment administration, and (vi) grading and evaluation. It embeds ethical considerations, including inclusivity, fairness, accountability, accuracy, auditability, explainability, privacy, trust, human centricity, and strategies for maintaining academic integrity.
In addition, this study incorporates a learner-centric perspective on AI ethics, a dimension often underrepresented in AI governance research (e.g., Jang et al., 2022). By capturing learners’ perceptions through survey data and qualitative feedback, the research examines how AI-driven assessments influence student experiences, including satisfaction, perceived learning efficacy, sense of academic support, and instructor presence.
The technical contribution of this study lies in the rigorous validation of the triadic framework through SEM analysis (e.g., Wang et al., 2023). This methodological approach allows for a structured examination of the relationships between assessment stages, ethical concerns, and learner outcomes, providing empirical evidence to support a more responsible and effective integration of AI in education. By grounding ethical considerations in quantitative validation, this study offers actionable insights for policymakers, educators, and AI developers, ensuring that AI-driven assessments enhance learning without compromising ethical integrity.
The research objectives of this study focus on two key areas:
Validation of the triadic theoretical framework using SEM analysis
This objective seeks to validate and refine the existing triadic framework proposed by Lim et al. (2023a), examining its empirical fit and practical implications in the domain of educational assessment. The framework maps the relationship between AI-enabled educational assessments, assessment pipeline stages, and ethical considerations. Through SEM analysis, the study assesses the framework’s ability to accurately capture both the operational and ethical dimensions of AI in education. Additionally, this validation ensures that the framework aligns with learner perceptions, making it theoretically robust, practically applicable, and reflective of real-world user experiences.
Exploring relationships in the assessment pipeline
The second objective explores the interconnections between the different stages of the AI assessment pipeline, key ethical principles, pedagogical outcomes, and learner perceptions. By identifying patterns and dependencies, this study aims to inform the design of ethical, effective, and learner-centric AI-driven assessments
The remainder of this paper is organized as follows: "Literature Review" section presents a literature review on theory-underpinned key AI applications and ethics elements in AI-enabled educational assessments, focusing on the triadic ontological framework by Lim et al. (2023a). "Methodology" section describes the expanded methodology for validating the triadic theoretical framework using SEM analysis. "Findings" section presents the results of the quantitative and qualitative analyses. "Discussion" section distills insights from the findings and proposes a revised theoretical framework based upon learners' perspectives. Finally, "Implications, Limitations and Conclusion" section concludes the paper with implications for the design and implementation of AI-enabled educational assessments, limitations, and directions for future research.
Literature review
AI-enabled educational assessments
The integration of AI into educational assessments marks a “bar-raising” shift in learning and evaluation (Dede et al., 2021). As AI-driven technologies become embedded in assessment processes, educational institutions would have to move prior resistance (Hargreaves, 2023), and embrace the benefits of personalized learning, adaptive testing, and advanced performance analytics (Rudolph et al., 2023; Lim et al., 2023a, 2023b; Lim et al., 2022). These applications are grounded in a confluence of educational, learning, and technology theories.
Learner-facing AI applications encompass intelligent tutoring systems (ITS) and personalized learning systems (PLS), among others. These ITS and PLS leverage AI algorithms to dynamically adjust educational content and assessments in accordance with the specific needs and individual learning paces of students. As argued by Fariani et al. (2023) and Xie et al. (2019), these systems align with the principles of constructivist learning theory. Within this framework, learning is characterized as an active, contextualized process in which individuals construct knowledge rather than passively acquire it. The cognitive model of AI-powered personalized systems employs a 'theory of mind' approach, recognizing and accommodating the unique learning trajectories of each student. It also incorporates adaptive testing methods, including cognitive diagnostic assessment. Sun et al. (2023) have demonstrated how adaptive testing can align with Bloom's Taxonomy, providing a more nuanced assessment of learners' knowledge and cognitive skills. This personalized approach has the potential to significantly enhance learning effectiveness. Research has indicated improvements in learning outcomes of up to two standard deviations (Lee & Soylu, 2023).
Educator-facing AI applications encompass automated writing evaluation (AWE) systems (often referred to as automated essay scoring systems), and predictive AI for learning outcomes, among others. AWE has gained substantial traction, utilizing natural language processing to automate the assessment of written student responses and offer constructive feedback (Huang et al., 2023). This aligns with Mead's social interaction theory and Vygotsky's sociocultural theory, both emphasizing the role of scaffolded learning and feedback in skill assessment (Ding & Zou, 2024). AI applications extend further into predictive AI, where machine learning models are employed to forecast student performance and learning outcomes. As highlighted by Sghir et al. (2023), these predictive models analyze historical data to identify students at risk, enabling timely interventions. Dropout rates in e-learning environments can rise to as high as 80% (Anagnostopoulos et al., 2020). This approach is deeply rooted in the behaviorist evaluation of learning. This places emphasis on the early identification and intervention in formative assessments to reshape learning trajectories and optimize overall outcomes (Duin & Tham, 2020).
Triadic ontological framework: a systematic stage-based approach to addressing of AI ethics in assessment pipeline
AI applications can be approached through the lens of the assessment development and delivery pipeline. In this context, underpinned by Ashok et al. (2022), Project and Peirce (1998), Popper (1979), and Ogden and Richards (1923), Lim et al. (2023a) conducted a study utilizing network analysis and topic modeling to identify AI application areas throughout five stages of an assessment pipeline. This approach allows researchers and educators to critically examine the implications of AI at each assessment stage, including AI ethical issues. For instance, at the assessment construction stage, AI might raise concerns about algorithmic bias and fairness, which can be evaluated in light of ethical theories such as consequentialism and deontology. The surveillance aspect, when analyzed from an ethical standpoint, may invoke discussions on privacy and surveillance theories, such as the panopticon concept. The notion of panopticism, rooted in Foucault’s (1977) theory of disciplinary power, further illustrates student discomfort with invisible data collection and automated surveillance. This metaphor captures the asymmetry of power in AI-assisted assessments, where students may be observed by systems they cannot see or question, raising ethical concerns about autonomy, consent, and control. Understanding such perspectives facilitate a more holistic evaluation of AI's impact on educational assessments, and in particular, from an AI ethics perspective.
The application of AIED has precipitated a range of ethical issues that are critical to address for the responsible use of this technology (Nguyen et al., 2023). Ethics principles and frameworks do not emerge in isolation; rather, they exert a significant influence on research and development (R&D). Ethics principles and frameworks actively shape R&D projects and methodological considerations, thus playing a pivotal role in defining research expectations, values, and objectives. This is evident in how issues like algorithmic bias and algorithmic fairness have now become routine topics of discussion for companies and institutions when implementing AI systems. The number of such guidelines have grown to as much as 200 in 2023 (Corrêa et al., 2023).
The exploration of ethical principles and frameworks in the context of AIED has been an area of growing academic interest. For instance, Holmes et al. (2021) explored ethical principles by surveying 60 leading AIED researchers and developed a ‘strawman’ draft ethics framework for AIED that mapped the ethics of algorithms in education, ethics of data used in AI and ethics of learning analytics. On a more granular level, Hong et al. (2022) further proposed an AIED data ethics framework that considered data processes from collection to disposal. It is beneficial for AI ethics discourse to be “specific enough to be action guiding” (Whittlestone et al., 2019). In the context of educational assessments, Lim et al. (2023a) recognized the idiosyncratic relevance of considering fundamental ethical principles in assessments that would provide “concrete property instantiations of applied ethics”.
The Triadic AIED Assessment Framework proposed by Lim et al. (2023a) (Fig. 1) forms the cornerstone of this study. This framework, built upon a systematic literature survey, is a useful approach to understanding and integrating AI in educational assessments, emphasizing the application across five key stages of the educational assessment pipeline across the physical, cognitive, and information domains.
[See PDF for image]
Fig. 1
Triadic AIED assessment framework proposed by Lim et al. (2023a)
The physical domain pertains to the tangible aspects of AI systems, including hardware and infrastructure essential for AI deployment in assessment settings. It encompasses the device and network physical components necessary for robust and secure systems capable of handling the demands of educational data processing, as well as the physical interface between the AI system and users. Cognitive domain focuses on the AI algorithms and data that drive the assessment processes. This content layer involves the development and application of intelligent algorithms capable of adapting to diverse learning styles and needs. This domain also encompasses the AI's ability to analyze and interpret data, providing insights into student learning and performance. Information domain is the service layer includes AI-underpinned user interface and interaction aspects of educational assessments.
In the triadic framework, the five key stages of assessment pipeline are, namely: (i) AI system design and checks for assessment purposes, (ii) data stewardship and surveillance, (iii) AI-based assessment construction and rollout, (iv) administration of assessments using AI systems, and (v) AI-facilitated assessment grading and evaluation. AI system design and checks for assessment purposes is directed towards the development of a robust and secure AI system, spanning across physical infrastructure, cognitive functionalities, and data management protocols (e.g., Lin et al., 2023). Data stewardship and surveillance focuses on data governance aspects and, if necessary, the implementation of surveillance measures (e.g., Williamson et al., 2020). AI-based assessment construction and rollout revolves around leveraging AI capabilities to construct, deliver, and optimize assessments, fostering streamlined communication and formative feedback mechanisms (e.g., Dai & Ke, 2022). During the administration of assessments using AI systems stage, AI is instrumental in upholding the integrity of assessments. This is achieved through rigorous authentication and security measures, including proctoring and plagiarism detection (e.g., Nigam et al., 2021; Surahman & Wang, 2022). Finally, AI-facilitated assessment grading and evaluation play a central role in the interpretation of assessment responses, performance measurement, and the provision of insightful feedback (e.g., Ramesh & Sanampudi, 2022).
There exist different emphases of key ethical elements in each stage of the assessment pipeline. Each presenting unique challenges, considerations and emphases in the context of AI-enabled educational assessments, these ethical issues include:
Fairness: This ethics principle highlights the imperative of upholding fair, equitable, and appropriate assessment practices within AI systems, acknowledging the complexity in defining fairness due to subjectivity, context, and cultural nuances. Fairness, in this context, encompasses the elimination of data and algorithmic bias to ensure diversity, equity, and non-prejudice, preventing disadvantages for minority groups and aligning with inclusivity. Ethical considerations encompass "allocation harm" for equitable resource distribution, "representational harm" to combat bias marginalizing learner groups, and concern about unintended learner profiling (Mayfield et al., 2019). Additionally, it emphasizes the importance of avoiding universal emotional assumptions in socio-emotional assessments due to cultural differences (Stark & Hoey, 2021) and highlights the need for standard ethical codes and robust monitoring mechanisms for the effective implementation of fair assessment practices in AI systems (Tlili et al., 2018).
Inclusivity: This ethics principle emphasizes the significance of inclusivity and accessibility within AI systems for education, particularly in personalized, large-scale settings. It highlights key insights, such as the need to exhibit empathy towards learners' diverse conditions, including health, disabilities, gender, race, educational backgrounds, and socio-economic status. Additionally, it emphasizes the importance of sensitivity and supportiveness of AI-generated communication and feedback (Costas-Jauregui et al., 2021), and addresses the potential for AI decisions to perpetuate conformity, peer pressure, or segregation (Gedrimiene et al., 2020). In essence, the principle seeks to foster empathetic and inclusive AI systems that promote a diverse and equitable learning environment.
Accountability: This ethics principle emphasizes responsible AI system design and operation across various contexts and roles. It highlights the need for those involved in AI systems, particularly in education, to be responsible stewards of data, as students often lack influence in data handling (Gedrimiene et al., 2020). Decision-makers must provide clear reasons and take responsibility for AI-driven outcomes (Hakami & Hernández-Leo, 2020). Compliance with regulations can be challenging due to inconsistencies, and the ownership of shared data remains uncertain, posing obstacles to accountability efforts (Costas-Jauregui et al., 2021). Additionally, this principle draws attention to the importance of avenues for addressing the adverse consequences of AI system use, both at individual and societal levels.
Accuracy: This ethics principle stresses the importance of accuracy in AI assessments to maintain their reliability and validity. Key drivers of accuracy include the necessity of ensuring high-quality data inputs to prevent negative impacts on AI-driven decisions (Tlili et al., 2018), addressing imbalanced datasets to avoid discriminatory outcomes (Chounta et al., 2022), accurately interpreting learner responses, ability to handle prediction errors (Khairy et al., 2022), and possibly guarding against students’ gaming of AI systems to their academic advantage (Tlili et al., 2018).
Auditability: This ethical principle highlights the necessity of allowing independent third-party assessors the authority to examine and report on the utilization and configuration of data and AI algorithms in assessment processes. Auditability pertains to comprehending, validating, and reviewing AI systems to ensure adequate traceability, transparency, and utilization of data and AI algorithms, as well as ensuring the credibility and dependability of assessment tools. It should be acknowledged, however, that difficulties may arise when dealing with proprietary algorithms, as indicated in the studies by Tlili et al. (2019) and Casas-Roma and Conesa (2021).
Explainability: This ethics principle calls attention to the vital need for transparency and comprehensibility in AI systems. It emphasizes making data, AI algorithms, and AI-driven decisions easily understandable for relevant stakeholders and justifying their use in a non-technical manner (Casas-Roma & Conesa, 2021). Achieving transparency in AI system design, information accessibility, and user comprehension is crucial for fostering trust and fairness among human stakeholders. It also advocates for clear rationales in AI recommendations while acknowledging the balance needed to protect proprietary algorithms (Latham & Goltz, 2019). Furthermore, it highlights the tradeoff between interpretability and complexity in AI systems, emphasizing the importance of simpler models when feasible, and the necessity of a sound theoretical basis for applying AI in assessments, incorporating pedagogical principles and AI training for meaningful implementation (González-Calatayud et al., 2021).
Privacy: This ethics principle emphasizes the importance of safeguarding individuals' privacy and data protection in AI systems, from data collection to disposal (Chounta et al., 2022). It highlights the need to manage sensitive data securely and respect individuals' emotions and expressions in data usage to maintain trust. Obtaining explicit consent, especially from minors, and allowing flexibility for opting in or out of AI-related activities are crucial for fair data practices, despite potential challenges and data gaps. Constant surveillance resulting from AI use raises concerns about privacy infringements, and potential anxiety and behavioral changes resulting from such surveillance (Megahed et al., 2022).
Trust: This ethics principle emphasizes the significance of trust in AI systems and data utilization for assessments. Trust involves confidence in AI systems' decision-making abilities and feedback quality, with concerns arising from the absence of human-like attributes, potential biases, and doubts about improvement (Pontual Falcão et al., 2022). Furthermore, trust is also tied to the level of autonomy and control granted to educators and learners; excessive intrusion and reduced autonomy can erode trust in AI systems. Additionally, trust relies on a consensus regarding AI system purposes, particularly evident in socio-emotional assessments, where a lack of agreement can lead to ethical discrepancies affecting trust (Stark & Hoey, 2021).
Human Centricity: This ethics principle points to the need to prioritize human agency and dignity. It emphasizes care on positive states of human wellbeing, user protection from AI manipulation of learner behaviors and emotions, and intervenability and reversibility of AI processes. The latter allows for correction, termination, erasure and blocking of AI processes when learners’ level of autonomy and/or capacity to learn are/is diminished (Mougiakou et al., 2018).
Academic Integrity: This ethical principle focuses on uncovering and discouraging deceitful conduct by learners. It entails the utilization of AI-powered monitoring and plagiarism detection techniques to spot instances of cheating, whether they occur in physical exam settings or in online assessment platforms (Elshafey et al., 2021; Kiennert et al., 2019).
The triadic theoretical framework provides a comprehensive and structured approach to understanding and evaluating AI applications in educational assessments. By encompassing the physical, cognitive, and information domains, and applying these across the various stages of the assessment pipeline, the framework ensures a holistic evaluation of AI tools, grounded in ethical considerations and practical effectiveness. This study builds upon the work of Lim et al. (2023a) for framework validation. Through empirical validation of the framework, this study looks to bring forth a structured and validated theoretical approach to evaluating AI ethics in educational assessment settings.
Methodology
Conceptual framework and hypothesis development
The triadic framework provides a theoretically grounded structure for examining the interactions between ethical AI design, assessment pipeline dynamics, and pedagogical outcomes. Rooted in sociotechnical systems theory (Trist & Bamforth, 1951), transactional distance theory (Moore, 2013) and aligned with ethical guidelines of AI in education (European Commission, 2022), the framework posits that AI-mediated environments may alter learner-instructor dynamics, and ethical considerations may operate as mediating mechanisms between technical implementation (assessment stages) and human-centered outcomes (pedagogical impacts).
As illustrated in Fig. 2, the triadic model comprises three interdependent dimensions: (i) ethical principles (e.g., transparency, accountability, fairness), (ii) AI assessment pipeline stages (system design, data stewardship, assessment construction, administration, grading), and (iii) pedagogical outcomes (learner satisfaction, perceived learning efficacy, sense of academic support, instructor presence). This conceptualization extends prior work on AI-mediated learning environments by explicitly modeling stage-specific ethical influences rather than treating ethics as a monolithic construct (Boom & Molenaar, 1989).
[See PDF for image]
Fig. 2
Conceptual framework
The framework’s theoretical novelty lies in its rejection of uniform ethical effects, instead proposing that distinct ethical principles gain salience at different assessment stages – a proposition supported by recent research on context-dependent perceptions of algorithmic fairness (Dolata et al., 2022; Kleanthous et al., 2022). For instance, system design stages may prioritize explainability to foster initial trust, while grading stages demand heightened accuracy to maintain satisfaction. To operationalize this model, two interrelated hypotheses were formulated:
Hypothesis 1 (H1)
Ethical principle consideration (e.g., trust, fairness) exhibits significant stage-dependent variation across technical implementation phases (e.g., data stewardship, grading) in the AI assessment pipeline.
Underpinned by sociotechnical systems theory, ethical considerations in AI systems can leverage value-sensitive design to improve inclusivity and human-centeredness. A study by Sadek et al. (2024) discovered a lack of systematic ethics process that consistently guide the application of approaches and methods across all stages of AI system development, suggesting that such design “are limited to specific objectives or outcomes within specific phases.” Empirical studies by Monteiro and Salgado (2023) and Yildirim et al. (2023) further noted that ethical principles tend to exhibit significant phase-dependent variations in their operationalization.
Hypothesis 2 (H2)
Stage-specific (e.g., data stewardship, grading) ethical consideration positively predicts differentiated pedagogical outcomes (e.g., learner satisfaction).
Applying a socio-technical grounded theory perspective, Pant et al. (2024) developed a taxonomy of ethics in AI grounded in practitioners' experiences, highlighting ethics considerations, such as awareness, perception, and challenges, that vary significantly across distinct stages of AI system development. These phase-dependent ethical factors, such bias and fairness considerations, have been empirically linked to variations in user outcomes, emphasizing their relevance to differentiated educational impacts. Similarly, Amugongo et al. (2025) highlighted the role of systematically embedding an ‘ethics by design’ approach in the AI software lifecycle to ensure alignment with intended outcomes. Further, the Community for Advancing Discovery Research in Education (CADRE) identified measurable pedagogical outcomes, such as personalized support, as pivotal components within phase-based ethical reflection frameworks. This framework systematically connects ethical AI design practices to enhanced teaching effectiveness (Barnes et al., 2024).
Literature further suggests that pedagogical outcomes in AI-assisted education are closely tied to communication, support and presence in learner-instructor interactions, as these factors significantly influence student motivation, satisfaction, and achievement (Seo et al., 2021). In this study, theoretically grounded pedagogical outcome variables include: (i) Learner satisfaction: Overall satisfaction with AI-assisted education (e.g., Kashive et al., 2020); (ii) Perceived learning efficacy: Perceived effectiveness of AI in enhancing learning outcomes (e.g., Shi et al., 2024); (iii) Sense of academic support: Degree to which students feel supported academically by AI tools (e.g., Wu & Yang, 2022); and (iv) Perceived instructor presence: How the use of AI impacts the perceived presence and engagement of instructors (e.g., Bolick & da Silva, 2024).
To address potential construct underrepresentation, the framework incorporates a mixed-methods validation approach. Qualitative thematic analysis of open-ended responses supplements quantitative SEM findings by capturing experiential dimensions of AI ethics.
By situating ethical AI design as a dynamic, stage-gated process rather than a static feature, this framework advances theoretical understanding of how sociotechnical systems evolve across educational workflows. Its empirical validation through SEM provides actionable insights for (i) stage-dependent allocation of ethical resources in AI development cycles, and (ii) targeted interventions to align technical capabilities with pedagogical priorities.
Participants and data collection
A total of 397 undergraduate students voluntarily participated in the study. Participants ranged in age from 18 to 24 years (M = 19.3, SD = 1.8), with a gender distribution of 195 males (49%) and 202 females (51%).
All participants have varying levels of prior knowledge and exposure to AI assessment systems, which were anecdotally derived from the pre-survey briefing. No formal self-report instrument was administered to quantitatively assess participants’ AI literacy or prior experience with AI assessment systems, prior to survey participation. To ensure consistency in construct comprehension, participants were first provided with definitions of key ethical principles as outlined by Lim et al. (2023a). Additionally, explanations of the pedagogical outcome constructs relevant to SEM analysis were provided before survey administration.
Data collection followed a mixed-method approach, integrating quantitative structured responses with qualitative open-ended feedback. Structured responses were collected using a 10-point Likert scale, while qualitative responses allowed students to elaborate on their experiences and perspectives on AI in assessments. Participant responses were anonymized and coded as “RXX”, where “XX” (e.g. 11) represents the ordering of the student respondent (i.e., 11th respondent of sample pool). This is to ensure confidentiality and facilitate systematic analysis. Ethical considerations, including informed consent and data protection, were rigorously observed throughout the data collection process.
For the qualitative component, the study employed a thematic analysis approach (Braun & Clarke, 2006). Two independent researchers coded the open-ended responses using an inductive, open coding strategy. Emergent codes were then organized into higher-order themes that aligned with the triadic ethical domains. Discrepancies in coding were discussed and resolved through consensus. The study ensured that the full spectrum of perspectives, including contradictory student views, were represented in the results. Representative quotes were selected to reflect both majority trends and minority or divergent views, avoiding cherry-picking or anecdotal bias.
Survey validation
A structured survey instrument was developed to measure ethical considerations and pedagogical outcomes in AI-enabled assessments (refer to Appendix I). The survey consists of 58 items, with each construct measured using a 10-point Likert scale ranging from 1 (strongly disagree) to 10 (strongly agree). The survey instrument comprises 50 items assessing ethical principles and 8 items measuring pedagogical outcomes.
To ensure the content validity of the 58-item survey, a panel of six experts was enlisted for evaluation. Among these experts, three possessed specialized knowledge in the domain of AI, while the remaining experts were in educational assessments. Each expert was requested to assess every item on a scale of (1) not relevant, (2) somewhat relevant, (3) quite relevant, and (4) relevant. Items garnering ratings of 3 or 4 were considered relevant, whereas those with ratings of 1 or 2 were deemed not relevant. The widely accepted Content Validity Index (CVI) was employed as the metric for content validity. Two distinct types of CVIs were computed: item-level CVIs (I-CVIs) and scale-level CVIs (S-CVI). The I-CVI was determined as the proportion of panel experts who assigned a rating of 3 or 4 to an item, effectively categorizing them as relevant. The S-CVI, as employed in this study, denotes the mean proportion of items rated 3 or 4 by the panel of experts. Further reliability testing was conducted using Cronbach’s Alpha, ensuring internal consistency above the threshold of 0.70. Factor analysis suitability was confirmed via the Kaiser–Meyer–Olkin (KMO) measure, ensuring that the sample was adequate for factor analyses. The combined application of expert validation and statistical reliability assessments ensured that the survey instrument was both methodologically sound and practically relevant.
Statistical method
The study employed a structured multi-phase analytical approach to validate the triadic framework using SEM. The analytical workflow followed three sequential phases: exploratory factor analysis (EFA), confirmatory factor analysis (CFA), and structural path modeling. The first phase employed EFA using principal axis factoring with promax rotation to verify whether the ten ethical principles aligned with the five hypothesized assessment pipeline stages. Factor retention was guided by eigenvalues greater than 1.0, factor loadings above 0.6, and cross-loadings below 0.3. The KMO measure and Bartlett’s test of sphericity confirmed the adequacy of the dataset for factor analysis. In the second phase, CFA was conducted to evaluate the latent structure derived from EFA. A maximum likelihood estimator was employed to estimate factor loadings and construct relationships. Convergent validity was examined using composite reliability (CR) and average variance extracted (AVE), while discriminant validity was verified via heterotrait-monotrait (HTMT) ratios. Model fit was assessed through Chi-square goodness-of-fit (χ2/df), comparative fit index (CFI), Tucker-Lewis index (TLI), root mean square error of approximation (RMSEA), and standardized root mean square residual (SRMR). The final phase involved structural path modeling to examine the hypothesized relationships between ethical principles, assessment pipeline stages, and pedagogical outcomes (Lim et al., 2023a, 2023b). Path coefficients were estimated using maximum likelihood estimation, and statistical significance was assessed through bias-corrected bootstrap confidence intervals (5000 resamples). Given the complexity of the model, no modifications were made to remove nonsignificant paths. All analyses were conducted using the lavaan package in R. Model specifications, latent constructs, and fit indices were systematically assessed to ensure empirical robustness. The analytic workflow adhered to established latent variable modeling guidelines, ensuring methodological rigor.
Findings
Validation of triadic theoretical framework through SEM analysis
To rigorously validate the triadic theoretical framework, a multi-indicator, multi-factor SEM was constructed, aligning with methodological standards for latent variable analysis. This approach operationalized the five assessment pipeline stages as latent constructs, each measured through observed indicators representing ten ethical principles. The validation process systematically integrated EFA, CFA, and structural path modeling to evaluate construct validity, reliability, and hypothesized relationships.
Content validity was established through expert evaluation by a panel of six specialists in AI and educational assessments. The survey instrument achieved good content validity indices, with S-CVI ranging from 0.93 to 0.98, exceeding the recommended threshold of 0.90 (Jang et al., 2022). Internal consistency was further confirmed via Cronbach’s α = 0.96, indicating a high degree of reliability. These metrics collectively affirm the instrument’s capacity to capture the theoretical dimensions of the framework without redundancy or ambiguity.
EFA using principal axis factoring with promax rotation (κ = 4) was conducted to verify the alignment of ten ethical principles with five latent constructs. The analysis adhered to stringent retention criteria: eigenvalues > 1.0, factor loadings ≥ 0.6, and cross-loadings ≤ 0.3. The KMO measure (KMO = 0.91) and Bartlett’s test of sphericity (χ2/df = 2.64, p < 0.001) confirmed sampling adequacy and significant correlations. All ethical principles loaded distinctly onto their respective constructs, with no cross-loading violations, empirically substantiating the framework’s discriminant validity.
A confirmatory factor analysis (CFA) was subsequently conducted to evaluate the measurement model derived from EFA. The model demonstrated an acceptable global fit, as indicated by fit indices: χ2/df = 2.64 (acceptable < 5), CFI = 0.911, TLI = 0.897 (acceptable ≥ 0.90), RMSEA = 0.034, and SRMR = 0.058 (acceptable < 0.08). These values are consistent with commonly accepted benchmarks for good model fit (Abrahim et al., 2019; Hu & Bentler, 1999; Nikkhah et al., 2018).
Convergent validity was established through CR and AVE, with all constructs demonstrating CR ≥ 0.85 and AVE > 0.50 (Table 1). Discriminant validity was assessed via the HTMT ratio; all inter-construct HTMT values were below the conservative threshold of 0.85 (Table 2), supporting construct distinctiveness.
Table 1. Composite reliability (CR) and average variance extracted (AVE)
Latent construct | CR | AVE |
|---|---|---|
SDC | 0.91 | 0.66 |
DSS | 0.89 | 0.63 |
ACR | 0.92 | 0.67 |
AA | 0.88 | 0.61 |
GE | 0.90 | 0.65 |
Table 2. HTMT ratios for discriminant validity
Constructs | SDC | DSS | ACR | AA | GE |
|---|---|---|---|---|---|
SDC | – | 0.71 | 0.65 | 0.68 | 0.66 |
DSS | 0.71 | – | 0.69 | 0.70 | 0.73 |
ACR | 0.65 | 0.69 | – | 0.67 | 0.69 |
AA | 0.68 | 0.70 | 0.67 | – | 0.72 |
GE | 0.66 | 0.73 | 0.69 | 0.72 | – |
The model results, as illustrated in Fig. 3, offer strong empirical support for hypotheses H1.1a through H2.5d, validating the triadic framework’s ability to predict critical educational outcomes in the context of AI-enabled assessments. Statistically significant standardized path coefficients (β) between the five latent constructs and the four pedagogical outcome variables confirm stage-specific effects of ethical principles across the AI-assisted assessment pipeline. Consistent with Chin (1998), the structural model demonstrates moderate to substantial explanatory power, accounting for 68% of the variance in sense of academic support, 61% in perceived learning efficacy, 74% in learner satisfaction, and 41% in perceived instructor presence.
[See PDF for image]
Fig. 3
Structural equation model
The findings also shed light on the differentiated contributions of each latent construct. System design and check exhibited strong effects on both sense of academic support (β = 0.64, p < 0.05) and perceived learning efficacy (β = 0.62, p < 0.05). This suggests that inclusive, accountable, and explainable AI system architectures contribute significantly to students’ perceptions of support and the effectiveness of AI in enhancing their learning experience. The high factor loadings for inclusivity, accountability, explainability, and trust within this stage reinforce the importance of embedding ethical design principles from the inception of AI systems.
Analysis highlights that data stewardship and surveillance exerted a significant influence on perceived learning efficacy, with a path coefficient of 0.81 (p < 0.05). This suggests that students' confidence in AI-assisted assessments is strongly shaped by how ethically their data is handled. The factor structure associated with this stage indicates that privacy, auditability, inclusivity, accuracy, and explainability are the dominant ethical dimensions. These findings suggest the necessity for transparent and accountable AI-driven data governance to foster trust and enhance learning efficacy.
Assessment construction and rollout demonstrated substantial associations with sense of academic support (β = 0.75, p < 0.05). This result suggests that AI-driven assessment creation strengthens perception of engagement. The dominant ethical dimensions for this stage—explainability, accuracy, inclusivity, and trust—highlight the importance of transparent and fair AI-generated assessments in maintaining learner engagement levels.
Assessment administration had a notable impact on sense of academic support (β = 0.74, p < 0.05), suggesting that ethically designed AI-administered assessments reinforce students’ confidence in the assessment process. Academic integrity, inclusivity, explainability, accuracy and trust emerged as the key ethical principles associated with this stage, reflecting students’ expectations for AI-assisted assessments to uphold fairness and safeguard against misconduct.
Grading and evaluation exhibited the strongest relationships with learner satisfaction (β = 0.79, p < 0.05) and sense of academic support (β = 0.72, p < 0.05). These results stress the critical role of ethical grading mechanisms in shaping student satisfaction with AI-driven assessments. The factor loadings indicate that privacy, explainability, accuracy, inclusivity, and accountability are the primary ethical dimensions linked to grading. The findings suggest that students are particularly responsive to the ethical implications of AI-driven grading, especially regarding fairness, clarity in grading rationales, and safeguards against algorithmic bias.
This validation establishes the triadic framework as a theoretically grounded and robust model for ethical AI integration in education. SEM results confirm the triadic framework’s structural integrity, revealing ethical AI impacts as contextually mediated by assessment pipeline stages. The heterogeneity in effect magnitudes suggests that ethical priorities shift across stages in the assessment lifecycle. Practically, these findings suggest stage-specific ethical interventions, such as robust data protocols during stewardship, algorithmic transparency in grading, and academic integrity safeguards during administration.
Examination of relationships between assessment pipeline stages, ethics elements, output variables, and learner perceptions
The SEM analysis provided statistical confirmation of the relationships between ethical principles, assessment pipeline stages, and key pedagogical outcomes, while the qualitative findings revealed learner perceptions that underpin these relationships. The qualitative analysis was conducted systematically using thematic analysis. Key themes were constructed through iterative coding and refinement, supported by inter-coder reliability checks. Findings were triangulated with the SEM results to deepen contextual interpretation and to highlight not only prevailing sentiments but also divergent views where relevant.
Fairness: This ethical principle features moderately across all assessment stages. Most students believed that the notion of fairness is important. Assessment systems should not favor specific students or prejudice marginalized groups, through intentional or unintended profiling or labeling of students. Students were not sure if AI systems would assess students from different backgrounds in a fair manner. Students cited concerns where “if AI systems are built based on certain [learner profiles], would they discriminate against others?” (R22). Tied to this issue are practical concerns on the quality of input data—“data from online sources can have issues with bias and relevance" (R331). AI systems, including data inputs, should exist contextualization, objectivity and cultural specificity. Students generally preferred a human-in-the-loop approach, citing the need for educators to “watch over” the AI system (R158, R259). Some students shared that “explaining clearly how AI systems work would help” (R46). Some students tied fairness to trust, sharing that fairness would lead to greater trust and “acceptance of AI systems” (R25). Fair AI systems would provide learner satisfaction, and a perceived sense of academic support.
Inclusivity: This ethics principle is featured highly across all assessment stages. Students overwhelmingly endorse assessments and associated resources that foster diversity and equity. AI systems should be supportive and have unqualified “accessibility to all” (R33) (similar to results in Tlili et al., 2019). AI feedback should exhibit empathy and be “judgment-free and welcoming” (R56) (similar to results in Costas-Jauregui et al., 2021). Many students are uncertain about the possibility of AI systems achieving complete inclusivity, with one citing concerns on how “AI doesn’t really feel emotions or understand how to empathize with people” (R180). However, most students place their trust in educational institutions to strive for it. Inclusive AI systems would provide learner satisfaction, and a perceived sense of academic support.
Accountability: This ethics principle features highly in system design and check stage, and to lesser extents, assessment construction and roll out, grading and evaluation, and data stewardship and surveillance stages. Many students believed that educational institutions, and mainly “the people in charge” (R62) should take responsibility for AI issues (similar to results in Kong et al., 2023 and Gedrimiene et al., 2020). “The creator of AI [assessment] systems are humans, not AI itself” (R29). Most students trust that compliance with relevant educational guidelines and regulations should suffice, and educational institutions would demonstrate accountability. The presence of accountability in AI systems would provide a perceived sense of academic support.
Accuracy: This ethics principle is featured highly at the data stewardship and surveillance stage. Data quality is of overwhelming importance amongst the students (similar to results in Tlili et al., 2018), as it supports all other stages of the assessment pipeline. While students would prefer having the option to opt in or out of data collection, they understood that “opting out may affect how representative the data is” (R56), and impact accuracy in AI-based assessments (similar to results in Berendt et al., 2020). Despite this, most students expressed confidence in AI's potential to enhance the validity and reliability of assessment instruments, mitigate prediction biases, and detect violations of assessment integrity. Leveraging appropriate data sources, data-driven insights can enhance objectivity and perceived credibility of assessments. Students note that AI systems maintain consistent accuracy levels and do not succumb to human fatigue or emotional states. AI systems “don’t get tired when grading our assessments.” (R47). “Sometimes we can sense impatience or irritation in the voice of lecturers when we receive [assessment] feedback, so we stop asking” (R10). Provided students cannot exploit AI systems to their advantage, they generally associate AI systems with accuracy, anticipating improved learning efficacy with AI systems.
Auditability: This ethical principle features moderately across all assessment stages. Students generally believed that AI system’s processes and decisions should be auditable by independent third parties. In fact, “[AI systems] should be accessible for everyone to check” (R386). Students believe it is important to review grading processes facilitated by AI “to catch any mistakes when grading students” (R290). Some students argue that AI responses may not always be “applicable” (R141, R348), “relevant” (R164, R260, R291) or “appropriate” (R99, R315). As “assessment outcomes can affect [a student’s] learning, motivation and achievements”, hence there should be “some kind of regular audit trails and logging” (R02) to assess the actions and decisions of AI systems. Most students trust that educational institutions “will take care of such check and balances” (R44). Auditable AI systems would provide a perceived sense of academic support.
Explainability: This ethics principle features highly across all assessment stages. Students believed that actions and decisions of AI systems should be clear and explainable with “rationale” (R35, R159, R161, R322, R377) and “in layman terms” (R13). They stressed the importance of reliability and validity in assessment tools, as discrepancies or ambiguities stemming from unexplainable or unreliable AI actions can “affect trust on AI systems” (R48). “We need to clearly explain how the AI program makes decisions to understand its thought process and spot any errors.” (R253). This should include understanding the “input, model and output of AI systems” and “how they change over time” (R03). A lack of explainability may prompt students to “challenge grading” (R20) and increase instances of reevaluation (similar to results in Seo et al., 2021). Such AI systems might be more open to misunderstandings. Conversely, AI systems that prioritize user understanding and accessibility of information, grounded in pedagogical principles, foster a positive sense of perceived learning efficacy.
Privacy: This ethics principle is featured highly at the grading and evaluation stage, and moderately at all other stages. Students prioritized their autonomy in controlling and divulging their personal data (similar to results in Latham & Goltz, 2019), especially when they are tied to grading, evaluation and academic achievement (similar to results in Berendt et al., 2020). Students preferred the flexibility to opt in or out of data collection processes. Nevertheless, they acknowledged the potential trade-off between privacy and the optimal delivery of educational experiences (similar to results in Tlili et al., 2019), showing a pragmatic rather than absolute stance on this issue. While students generally endorsed research-related behavioral surveillance “provided anonymity is preserved”, they expressed discomfort with “continuous educational monitoring” (R60), including intrusive methods like biometric data collection such as eye-tracking and facial expression analytics, echoing surveillance concerns raised in Seo et al. (2021). Many students also shared concerns on “digital attacks” (R110), and “data breaching and hacking” (R149). Students value data security, and a prevailing trust among learners was that educational institutions serve as “protected space” (R34) safeguarding privacy. An academic environment that supports privacy can lead to learner satisfaction, and provide a perceived sense of academic support.
Trust: This ethics principle is featured moderately in the assessment construction and roll out, system design and check, data stewardship and surveillance stages, and to lesser extent, assessment administration stage. Students were comfortable interfacing with AI systems and believed that students’ learning autonomy could be respected (in contrast to findings by Henne & Gstrein, 2023). They emphasized the importance of social interactions and human input, aligning with findings from Pontual Falcão et al. (2022), and Park and Kim (2020). While uncertain about AI's potential impact on reducing instructors' involvement in assessments, students expressed optimism that instructors can “focus on more meaningful assessable content” (R55). Students were neutral on whether AI auto-grading systems can fairly grade assessments with diverse learning backgrounds. There were concerns about AI’s ability to appropriately interpret assessment responses, personalize and respect nuanced learning decisions of users. Students perceived that AI systems might lack contextual information and innately “human social interaction skills and emotions” (R39), and mistakenly adjudge an assessment response to be incorrect or learning decision to be inappropriate (similar to results in Khairy et al., 2022). “Although AI can grade consistently, it might not fully understand the emotional aspects of a student's work” (R378). Students also expressed apprehension about a possible lack of recourse to “clarify misunderstandings when AI misinterprets [their] responses” (R44). Most students in this study displayed moderate trust to AI systems. A trustworthy human-in-the-loop AI system can lead to learner satisfaction and high perceived instructor presence.
Human centricity: This ethics principle is featured moderately in the assessment construction and roll out stage, and to lesser extents, in system design and check and data stewardship and surveillance stages. Learners generally believe that AI systems can facilitate academic help-seeking behavior while preserving learning autonomy. The anonymity provided by AI may reduce self-consciousness, thereby “encouraging more questions” (R23) in self-regulated learning (similar to results in Adams et al., 2023). However, using AI for grading could reduce interactions and feedback between educators and students, which “might not be suitable for everyone” (R109). There was a slight concern about the potential diminishment of learning agency and ownership due to overreliance on AI systems. AI systems might foster a “false sense of security” (R42) on learning efficacy. “AI may be accepted too liberally [by students] in learning, and students might forget how to do simple things they learn, like how to reference sources” (R264). This said, these concerns were offset by the benefits of enhancing efficiency (similar to results in Adams et al., 2023), promoting creativity (contrast with ‘stifling of creativity’ in Adams et al., 2023) and provision of just-in-time support (similar to results in Seo et al., 2021). Most learners perceived that human agency and dignity would be respected (contrast with reducing humans to ‘objects’ in Berendt et al., 2020), and AI system could deliver a positive sense of academic support.
Academic Integrity: This ethical principle features highly in assessment administration stage, and to lesser extents assessment construction and roll out, and system design and check stages. Most students supported the notion of academic integrity for fair assessments, and believed that it is crucial to incorporate features to prevent students from unfairly cheating in AI-based assessments. Cheating is considered serious academic breaches of integrity and most students would be “scared” (R41) if they are penalized for plagiarism. Students shared that they “try to submit early to check against Turnitin similarity detection software” (R61), to avoid unintended plagiarism and revise problematic segments (similar to results in Stone, 2023). However, most students expressed confusion and a lack of understanding about AI plagiarism detection. Students expressed that it is “vague how AI plagiarism works” (R33). “Unlike standard similarity detection where we get to see the sources of plagiarism, AI plagiarism does not do that” (R31). In general, students were unsure if AI systems can fairly and confidently identify cheating without falsely accusing honest students, or being misled by cheating students. AI systems that can fairly and confidently identify cheating cases could deliver a positive sense of academic support.
The heterogeneous effects of ethical principles across different assessment stages suggest that ethical priorities are context dependent. For instance, privacy concerns peak at the grading and evaluation stage, whereas accountability concerns are more prominent in the system design and data stewardship stages. These insights emphasize that a one-size-fits-all approach to AI ethics in education is insufficient, and instead, a stage-sensitive ethical framework is necessary to enhance learner trust, satisfaction, and academic engagement.
Discussion
By reframing AI ethics as a stage-sensitive construct, the framework proposes to move beyond one-size-fits-all ethical guidelines to governance systems that mirror the contextual fluidity of pedagogical practice. The study’s findings and stage-dependency are further synthesized through three interconnected lenses:
Balancing procedural and outcome fairness
This study finds that students associate fairness with transparency, explainability, and the absence of bias, yet concerns persist regarding AI’s ability to ensure fair grading, personalized assessments, and equitable data governance. The broader AI ethics literature recognizes two competing interpretations of fairness (Lee et al., 2019; Wang et al., 2025): (i) procedural fairness, which focuses on ensuring that AI systems follow standardized, transparent decision-making processes; and (ii) outcome fairness, which evaluates whether AI-generated assessment results are equitable across diverse student groups.
Findings suggest that students intuitively engage with both fairness concepts. While they appreciate consistent grading procedures, they also expect AI to adapt assessments based on individual needs rather than impose a one-size-fits-all approach. This mirrors debates in algorithmic fairness literature, where trade-offs exist between objectives of interest (Lee et al., 2021). In the context of assessments, procedural fairness typically demands standardization, while outcome fairness may require flexibility and contextual adaptation.
Structural path analysis reveals that these considerations are not mutually exclusive but stage-dependent. The significant relationship between system design and academic support (β = 0.64, p < 0.05) is associated with procedural fairness, wherein students perceive ethically designed AI systems, characterized by inclusivity, accountability, explainability, and trust, as fundamental to ensuring fair assessment processes. Procedural fairness at this stage reflects the integrity of AI system architecture, ensuring transparency and accountability before assessments take place. The qualitative findings reinforce this, as students emphasized the importance of clear guidelines, human oversight, and transparent AI decision-making to uphold fairness in assessment design. In contrast, grading and evaluation exhibited the strongest association with learner satisfaction (β = 0.79, p < 0.05), reflecting a manifestation of outcome fairness. Here, students’ satisfaction hinges on the perception that AI-generated grades are equitable, unbiased, and aligned with expected performance outcomes. The high factor loadings for privacy, explainability, accuracy, inclusivity, and accountability suggest that students expect grading systems to be both transparent and free from algorithmic biases. This aligns with existing research emphasizing the necessity of clear grading rationales and safeguards against algorithmic discrimination in AI-driven evaluations (Dar et al., 2025). Students’ qualitative responses corroborate this, with concerns that opaque AI grading processes may erode trust and prompt challenges to assessment outcomes.
Auditability, which emerged as an important ethical dimension in this study, offers a potential solution to the fairness dilemma. If AI assessment tools are designed with robust audit trails, educators and students can trace grading decisions and challenge AI-generated outcomes where necessary. Research by Schaeffer et al. (2024) highlights that third-party auditing of AI systems can improve trust while ensuring fair and accountable decision-making. Educational institutions may establish independent AI oversight mechanisms that allow students and educators to request audits of AI-graded assessments. Furthermore, the development of explainable AI (XAI) models, where AI-generated decisions include rationale for grades and assessment feedback, would align with student expectations for greater transparency and trust.
Inclusivity and educational inequalities
This study finds that while students recognize the potential to design AI systems that enhance learning efficacy (β = 0.62, p < 0.05), skepticism remains, particularly among those from non-dominant linguistic and socioeconomic backgrounds. Many students are concerned that AI systems may lack the contextual awareness needed to fairly assess diverse learners (Khairy et al., 2022). This finding aligns with broader critiques of AI-driven “automated inequality” (Eubanks, 2018) and challenges the techno-optimist view that AI can improve education without deeper systemic interventions (Danaher, 2022).
The structural model identifies two ways AI may worsen educational disparities. First, bias in training data disadvantages students who do not fit “hierarchical status quo” patterns (Cheuk, 2021). For example, AI grading systems often penalize dialectal variations, reinforcing standardized norms (Markl, 2022). Many students in this study expressed concerns that AI lacks the ability to recognize diverse learning styles. Second, AI-driven feedback loops can create self-reinforcing disadvantages. Proactive students, who interact more with an AI system, can significantly influence a system to suit their needs better over time at the expense of other students (Chinta et al., 2024). The strong association between data stewardship and perceived learning efficacy (β = 0.81, p < 0.05) suggests that these biases are already shaping student outcomes.
However, some recent studies suggest AI can reduce achievement gaps if designed with inclusivity in mind. AI-driven accommodations have been shown to help students at a middle school summer camp when Universal Design for Learning (UDL) principles are applied (Song et al., 2024). Similarly, NLP tools promoted inclusive learning for students with disabilities (Alvarado et al., 2023). These successes, however, depend on two often-overlooked prerequisites: (i) participatory design involving marginalized students in AI development phases to surface edge cases and implicit biases (Andrews et al., 2023); and dataset curation with continuous incorporation of underrepresented data archetypes, as static “diverse” datasets rapidly obsolesce (Orr & Crawford, 2024). Yet, most students were not aware of AI bias audits in their schools, and qualitative responses indicate a lack of trust in AI’s ability to handle diverse learning needs. The high factor leading of inclusivity (≥ 0.85, p < 0.001) suggests that students expect inclusive treatment at every stage of assessment.
Toward stage-gated ethical governance
The study’s findings suggest the application of ethical maturity models—governance frameworks that calibrate safeguards according to the risks and stakes at each assessment stage. Privacy, for instance, holds greater significance in grading (0.86, p < 0.001) compared to assessment construction (0.78, p < 0.001), reflecting a pattern similar to cybersecurity’s layered defense models, where ethical safeguards may escalate in tandem with system criticality (Pallagi et al., 2023). Formative AI tools used for quiz design, which pose lower stakes, may prioritize explainability, while summative grading systems, with their high-stakes implications, require rigorous bias auditing and human oversight. Upholding ethical universalism (O’Neill, 1998), this stage-gated approach extends the EU AI Act’s risk-based classification (European Union, 2024) by integrating pedagogical specificity.
Implementing this staged maturity model in real-world educational settings presents challenges. Institutional inertia remains a primary barrier, as many universities integrate AI ethics reactively rather than proactively redesigning workflows to accommodate ethical AI systems (Ojha et al., 2025). Additionally, resource disparities pose a fundamental challenge, as maturity models presuppose institutional capacity to enforce staged compliance (Schiff, 2022). Temporal alignment will also need to be established between agile, iterative AI systems development lifecycle and academic AI governance structures (Sayles, 2024). A potential solution may be a phased implementation model, beginning with mandatory documentation of AI systems’ data sources and error rates, progressing to stage-specific risk assessments and mitigation strategies, rounded off by independent audits that assess stakeholder impacts. This flexible approach allows educational institutions to scale ethical safeguards according to their operational capacity while ensuring foundational accountability measures are in place.
SEM analysis in this study surfaced opportunities to refine the original triadic framework, particularly in terms of how ethical principles are emphasized across different stages of the assessment pipeline. Unlike earlier iterations grounded primarily in literature synthesis, the current refinements are informed directly by student perceptions derived from empirical data. While the framework revision may appear modest, this bottom-up refinement foregrounds the perspectives of end-users—students—whose experiences with AI systems are too often overlooked in current governance models (e.g., Jang et al., 2022).
Figure 4 presents the revised framework and illustrates how a staged maturity model might be applied to guide the implementation of ethical AI practices in educational assessment.
[See PDF for image]
Fig. 4
Revised triadic AIED assessment framework based on learner perceptions
To support interpretive clarity, we introduce a heuristic classification of ethical principles based on their standardized factor loadings within each stage. Specifically, principles with loadings above 0.85 are designated as primary considerations, those between 0.80 and 0.85 as secondary, and those below 0.80 as supporting (but less salient) concerns. This categorization is not intended to suggest fixed ethical hierarchies or prescriptive prioritizations, but rather to provide a transparent, empirically guided lens on how learners perceive the relative importance of different ethical dimensions. The thresholds are grounded in psychometric reasoning, where higher factor loadings signal stronger empirical alignment between observed indicators and their associated latent constructs (Knekta et al., 2019). However, we emphasize that this schema is context-sensitive and open to reinterpretation, particularly across diverse educational, cultural, or technological settings.
Implications, limitations and conclusion
Implications
From a pedagogical standpoint, this study emphasizes the importance of embedding ethical considerations—particularly inclusivity and explainability—into the design and implementation of AI-assisted assessments. These two principles emerged as statistically salient across all assessment stages (factor loadings ≥ 0.85), highlighting their central role in students’ perceptions of fairness and trust. To operationalize these findings, educators should adopt adaptive assessment formats that account for diverse linguistic and cognitive profiles, and implement student-facing feedback dashboards that translate AI-generated outputs into plain language. Explainability, in particular, should be reinforced through transparent AI workflows, allowing students to understand how their inputs are evaluated and how final grades are determined. To this end, auditability features, such as traceable decision logs and "why this grade?" explanation prompts, should be incorporated to enhance perceived fairness and foster learner confidence. These pedagogical design elements directly reflect students’ calls for both procedural and outcome fairness.
From a technological perspective, the study highlights the need for stage-specific design responses that embed ethical safeguards at each point in the AI assessment pipeline. During system design, this includes implementing fairness-aware algorithms, drawing from inclusive training datasets, and conducting bias audits during model development. During grading and evaluation, more stringent interventions are required, such as integrating bias detection modules, enforcing accuracy thresholds, and enabling human-in-the-loop overrides where needed.
In addition, the study emphasizes the importance of XAI to ensure output transparency. Developers should adopt established interpretability frameworks such as LIME or SHAP, and design student-readable model explanations within user interfaces. Recent research (e.g., Yu et al., 2023) demonstrates that generative AI systems enhanced with human feedback can foster inclusive, empathetic learner interactions, though such potential must be continually reassessed through iterative refinement and performance monitoring, especially in high-stakes environments.
From a policy and governance standpoint, the study advocates for the creation of stage-gated regulatory frameworks that align ethical oversight with the level of assessment criticality. Drawing from student-reported concerns regarding fairness, bias, and opacity, the study proposes the institutional adoption of algorithmic impact assessments (AIAs), real-time audit mechanisms, and standardized reporting protocols for AI-driven grading tools. These governance strategies are particularly vital during high-stakes summative assessments, where the risk of harm is most acute.
In tandem, the study highlights the urgent need for professional development programs that prepare educators and administrators on the ethical challenges of AI integration. The study recommends a tiered development framework, progressing from introductory workshops on AI ethics to advanced certifications in ethical AI implementation and oversight. These programs should equip stakeholders with the tools to critically evaluate AI outputs, interpret ethical risks, and engage meaningfully in AI procurement and policy decisions.
Limitations
This study, while providing useful insights into the ethical considerations of AI in educational assessments, is not without its limitations. Firstly, the reliance on learner perceptions as the primary data source may introduce a degree of subjectivity. While these perceptions are invaluable for understanding user experience, they may not fully capture the entire spectrum of ethical complexities in AI-driven assessment systems, for instance, through educators, educational administrators, AI system designers and developers, and the public, among others. Furthermore, learners' understanding of AI technologies and their underlying principles may vary, potentially influencing their responses and the subsequent analysis. There is also room to expand the study beyond the sample size of 397 participants.
Another important limitation lies in the scope of the study. The research was conducted primarily within higher education settings, which may constrain the applicability of its findings to other educational contexts, such as K-12 institutions or vocational training environments. Furthermore, the cultural and geographical diversity of the participant sample was limited, potentially impacting the universality of the insights derived. Ethical considerations in AI-enabled educational assessments are deeply contextual and may vary significantly across cultural, institutional, and societal environments. Values such as fairness, privacy, and accountability are shaped by local norms, pedagogical traditions, and governance structures. While this study offers a foundational framework validated in a specific regional context, we explicitly caution against broad generalizations. Instead, we emphasize the importance of ethical pluralism—the recognition that multiple, context-sensitive ethical interpretations may coexist. Future research should aim to conduct cross-contextual validations to explore how ethical priorities shift across diverse educational systems and sociocultural landscapes.
Conclusion
This study examines the ethical dimensions across the AI-driven educational assessment pipeline through the lens of pedagogy, technology, and policy, by positioning AI as an ethics-driven sociotechnical system.
From a pedagogical standpoint, this research emphasizes the need for educators and AI developers to prioritize ethical considerations, advocating for, among others, inclusivity and explainability in AI-driven assessments to cater to diverse learner needs. Technologically, the study sheds light on the necessity of developing AI systems with an inherent focus on stage-gated ethical governance. On the policy front, it calls for comprehensive frameworks and professional development programs to guide and inform the ethical use of AI in education.
However, the study acknowledges its limitations, including potential subjectivity in learner perceptions and the limited scope in terms of participant diversity and educational contexts. These limitations highlight the need for a broader, more inclusive approach in future research.
Aside from studies that can help overcome the inherent limitations of this study, suggestions for future research include the following:
Conducting longitudinal studies could offer deeper insights into how perceptions and impacts of AI in educational assessments evolve over time, particularly as technology and educational practices continue to advance.
Investigating learners' depth of understanding of AI technologies and principles would provide valuable context to their perceptions and experiences. This could involve assessing their knowledge base and how it influences their views on ethical issues.
Adopting interdisciplinary research methodologies that bring together expertise from education, technology, ethics, and policy studies could yield more holistic and nuanced insights into the ethical dimensions of AI in education.
Interestingly, there is also a possibility to create a dynamic network model that visualizes and analyzes the interrelationships among ethical principles like inclusivity, fairness, and accountability. This model would aim to guide the ongoing development, deployment, and evaluation of AI systems by representing these principles as nodes in a network graph, capturing their interactions through edges, and dynamically adjusting to real-time data and feedback to maintain ethical integrity throughout the AI lifecycle.
While this study offers significant insights into the ethical dimensions of AI in educational assessments, it also opens the door to a range of research possibilities. These future research can help guide the responsible and effective integration of AI in educational settings. By centering ethics as a non-negotiable pillar of innovation, stakeholders can harness AI’s potential to enhance assessment practices while safeguarding the dignity, agency, and diversity of learners.
Acknowledgements
Not applicable.
Author contributions
All parties—TL, SG, MC—have made substantive contributions to the article. All authors read and approved the final manuscript.
Funding
No funding was received for conducting this study.
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Declarations
Competing interests
The authors declare that they have no competing interests.
Abbreviations
Artificial intelligence in education
Average variance extracted
Automated writing evaluation
Confirmatory factor analysis
Comparative fit index
Composite reliability
Content validity index
Exploratory factor analysis
Heterotrait-Monotrait ratio
Item-level content validity index
Intelligent tutoring system
Kaiser–Meyer–Olkin
Personalized learning system
Research and development
Root mean square error of approximation
Scale-level content validity index
Structural equation modeling
Standardized root mean square residual
Tucker-Lewis index
Universal design for learning
Explainable AI
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
Abrahim, S; Mir, BA; Suhara, H; Mohamed, FA; Sato, M. Structural equation modeling and confirmatory factor analysis of social media use and education. International Journal of Educational Technology in Higher Education; 2019; 16,
Adams, D; Chuah, KM; Devadason, E; Azzis, MSA. From novice to navigator: Students’ academic help-seeking behaviour, readiness, and perceived usefulness of ChatGPT in learning. Education and Information Technologies; 2023; [DOI: https://dx.doi.org/10.1007/s10639-023-12427-8]
Alvarado, Y; Guerrero, R; Serón, F. Inclusive learning through immersive virtual reality and semantic embodied conversational agent: A case study in children with autism. Journal of Computer Science and Technology; 2023; 23,
Amugongo, LM; Kriebitz, A; Boch, A et al. Operationalising AI ethics through the agile software development lifecycle: A case study of AI-enabled mobile health applications. AI Ethics; 2025; 5,
Anagnostopoulos, T., Kytagias, C., Xanthopoulos, T., Georgakopoulos, I., Salmon, I., & Psaromiligkos, Y. (2020). Intelligent predictive analytics for identifying students at risk of failure in Moodle courses. In Intelligent Tutoring Systems: 16th International Conference, ITS 2020, Athens, Greece, June 8–12, 2020, Proceedings 16 (pp. 152–162). Springer International Publishing. https://doi.org/10.1007/978-3-030-49663-0_19
Andrews, J; Zhao, D; Thong, W; Modas, A; Papakyriakopoulos, O; Xiang, A. Ethical considerations for responsible data curation. Advances in Neural Information Processing Systems; 2023; 36, pp. 55320-55360.
Ashok, M; Madan, R; Joha, A; Sivarajah, U. Ethical framework for artificial intelligence and digital technologies. International Journal of Information Management; 2022; 62, 102433. [DOI: https://dx.doi.org/10.1016/j.ijinfomgt.2021.102433]
Barnes, T., Danish, J., Finkelstein, S., Molvig, O., Burriss, S., Humburg, M., Reichert, H., & Limke, A. (2024). Toward ethical and just AI in education research. Education Development Center, Inc., Community for Advancing Discovery Research in Education (CADRE). https://cadrek12.org/sites/default/files/2024-06/CADRE-Brief-AI-Ethics-2024.pdf. Accessed 15 Apr 2025.
Berendt, B; Littlejohn, A; Blakemore, M. AI in education: Learner choice and fundamental rights. Learning, Media and Technology; 2020; 45,
Bolick, AD; da Silva, RL. Exploring artificial intelligence tools and their potential impact to instructional design workflows and organizational systems. TechTrends; 2024; 68,
Boom, J; Molenaar, PCM. A developmental model of hierarchical stage structure in objective moral judgements. Developmental Review; 1989; 9,
Braun, V; Clarke, V. Using thematic analysis in psychology. Qualitative Research in Psychology; 2006; 3,
Casas-Roma, J; Conesa, J. Caballé, S; Demetriadis, SN; Weinberger, A. A literature review on artificial intelligence and ethics in online learning. Intelligent systems and learning data analytics in online education; 2021; Academic Press: pp. 111-131. [DOI: https://dx.doi.org/10.1016/B978-0-12-823410-5.00006-1]
Cheuk, T. Can AI be racist? Color-evasiveness in the application of machine learning to science assessments. Science Education; 2021; 105,
Chin, WW. The partial least squares approach to structural equation modeling. Modern Methods for Business Research; 1998; 295,
Chinta, S. V., Wang, Z., Yin, Z., Hoang, N., Gonzalez, M., Quy, T. L., & Zhang, W. (2024). FairAIED: Navigating fairness, bias, and ethics in educational AI applications. arXiv preprint arXiv:2407.18745
Chounta, IA; Bardone, E; Raudsep, A; Pedaste, M. Exploring teachers’ perceptions of artificial intelligence as a tool to support their practice in Estonian K-12 education. International Journal of Artificial Intelligence in Education; 2022; 32,
Corrêa, NK; Galvão, C; Santos, JW; Del Pino, C; Pinto, EP; Barbosa, C; de Oliveira, N. Worldwide AI ethics: A review of 200 guidelines and recommendations for AI governance. Patterns; 2023; [DOI: https://dx.doi.org/10.1016/j.patter.2023.100857]
Costas-Jauregui, V., Oyelere, S. S., Caussin-Torrez, B., Barros-Gavilanes, G., Agbo, F. J., Toivonen, T., Motz, R., & Tenesaca, J. B. (2021, October). Descriptive analytics dashboard for an inclusive learning environment. In 2021 IEEE Frontiers in Education Conference (FIE) (pp. 1–9). IEEE. https://doi.org/10.1109/FIE49875.2021.9637388
Dai, CP; Ke, F. Educational applications of artificial intelligence in simulation-based learning: A systematic mapping review. Computers and Education: Artificial Intelligence; 2022; [DOI: https://dx.doi.org/10.1016/j.caeai.2022.100087]
Danaher, J. Techno-optimism: An analysis, an evaluation and a modest defence. Philosophy & Technology; 2022; 35,
Dar, AA; Yadav, SS; Tripathi, RK; Albalawi, O; Jain, A; Gautam, PL. Moreira, FT; Teles, RO. Managing ethical challenges: ensuring equity and integrity in AI-powered assessment. Improving student assessment with emerging AI tools; 2025; IGI Global Scientific Publishing: pp. 301-332.
Dede, C. Etemadi, A., & Forshaw, T. (2021). Intelligence augmentation: Upskilling humans to complement AI [White Paper]. The Next Level Lab, Harvard Graduate School of Education. https://pz.harvard.edu/sites/default/files/Intelligence%20Augmentation-%20Upskilling%20Humans%20to%20Complement%20AI.pdf. Accessed 15 Apr 2025.
Ding, L; Zou, D. Automated writing evaluation systems: A systematic review of Grammarly, Pigai, and Criterion with a perspective on future directions in the age of generative artificial intelligence. Education and Information Technologies; 2024; [DOI: https://dx.doi.org/10.1007/s10639-023-12402-3]
Dolata, M; Feuerriegel, S; Schwabe, G. A sociotechnical view of algorithmic fairness. Information Systems Journal; 2022; 32,
Duin, AH; Tham, J. The current state of analytics: Implications for learning management system (LMS) use in writing pedagogy. Computers and Composition; 2020; 55, 102544. [DOI: https://dx.doi.org/10.1016/j.compcom.2020.102544]
Elshafey, A. E., Anany, M. R., Mohamed, A. S., Sakr, N., & Aly, S. G. (2021). Dr. Proctor: A multi-modal AI-based platform for remote proctoring in education. In International Conference on Artificial Intelligence in Education (pp. 145–150). Springer, Cham. https://doi.org/10.1007/978-3-030-78270-2_26
Eubanks, V. Automating inequality: How high-tech tools profile, police, and punish the poor; 2018; Martin's Press:
European Commission: Directorate-General for Education, Youth, Sport and Culture. (2022). Ethical guidelines on the use of artificial intelligence (AI) and data in teaching and learning for educators. Publications Office of the European Union. https://data.europa.eu/doi/https://doi.org/10.2766/153756
European Union. (2024). Regulation (EU) 2024/1689 of the European Parliament and of the council of 13 June 2024 laying down harmonised rules on artificial intelligence and amending regulations (EC) no 300/2008, (EU) no 167/2013, (EU) no 168/2013, (EU) 2018/858, (EU) 2018/1139 and (EU) 2019/2144 and directives 2014/90/EU, (EU) 2016/797 and (EU) 2020/182 (Artificial Intelligence Act). Official Journal of the European Union, L series, 1–144, 12.7.2024. https://eur-lex.europa.eu/eli/reg/2024/1689/oj/eng. Accessed 15 Apr 2025.
Fariani, RI; Junus, K; Santoso, HB. A systematic literature review on personalised learning in the higher education context. Technology, Knowledge and Learning; 2023; 28,
Foucault, M. Discipline and punish: The birth of the prison (A. Sheridan, Trans.); 1977; Pantheon Books:
Gedrimiene, E; Silvola, A; Pursiainen, J; Rusanen, J; Muukkonen, H. Learning analytics in education: Literature review and case examples from vocational education. Scandinavian Journal of Educational Research; 2020; 64,
González-Calatayud, V; Prendes-Espinosa, P; Roig-Vila, R. Artificial intelligence for student assessment: A systematic review. Applied Sciences; 2021; 11,
Hakami, E., & Hernández Leo, D. (2020). How are learning analytics considering the societal values of fairness, accountability, transparency and human well-being: A literature review. In Martínez-Monés A, Álvarez A, Caeiro-Rodríguez M, Dimitriadis Y (Eds.), Learning Analytics Summer Institute Spain 2020 (pp. 121–41). Aachen: CEUR. Valladolid, Spain. https://ceur-ws.org/Vol-2671/paper12.pdf. Accessed 15 Apr 2025.
Hargreaves, S. (2023). ‘Words Are Flowing Out Like Endless Rain Into a Paper Cup’: ChatGPT & Law School Assessments. The Chinese University of Hong Kong Faculty of Law Research Paper, (2023–03). https://doi.org/10.2139/ssrn.4359407
Henne, T; Gstrein, OJ. Zwitter, A; Gstrein, OJ. Governing the ‘datafied’ school: Bridging the divergence between universal education and student autonomy. Handbook on the politics and governance of big data and artificial intelligence; 2023; Elgar: pp. 395-427. [DOI: https://dx.doi.org/10.4337/9781800887374.00025]
Holmes, W; Porayska-Pomsta, K; Holstein, K; Sutherland, E; Baker, T; Shum, SB; Santos, OC; Rodrigo, MT; Cukurova, M; Bittencourt, II; Koedinger, KR. Ethics of AI in education: Towards a community-wide framework. International Journal of Artificial Intelligence in Education; 2021; [DOI: https://dx.doi.org/10.1007/s40593-021-00239-1]
Hong, Y., Nguyen, A., Dang, B., & Nguyen, B. P. T. (2022, July). Data Ethics Framework for Artificial Intelligence in Education (AIED). In 2022 International Conference on Advanced Learning Technologies (ICALT) (pp. 297–301). IEEE. https://doi.org/10.1109/ICALT55010.2022.00095
Hu, L; Bentler, PM. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal; 1999; 6,
Huang, X; Zou, D; Cheng, G; Chen, X; Xie, H. Trends, research issues and applications of artificial intelligence in language education. Educational Technology & Society; 2023; 26,
Jang, Y; Choi, S; Kim, H. Development and validation of an instrument to measure undergraduate students’ attitudes toward the ethics of artificial intelligence (AT-EAI) and analysis of its difference by gender and experience of AI education. Education and Information Technologies; 2022; 27,
Kashive, N; Powale, L; Kashive, K. Understanding user perception toward artificial intelligence (AI) enabled e-learning. The International Journal of Information and Learning Technology; 2020; 38,
Khairy, D; Alkhalaf, S; Areed, MF; Amasha, MA; Abougalala, RA. An algorithm for providing adaptive behavior to humanoid robot in oral assessment. International Journal of Advanced Computer Science and Applications; 2022; [DOI: https://dx.doi.org/10.14569/IJACSA.2022.01309119]
Kiennert, C; De Vos, N; Knockaert, M; Garcia-Alfaro, J. The influence of conception paradigms on data protection in e-learning platforms: A case study. IEEE Access; 2019; 7, pp. 64110-64119. [DOI: https://dx.doi.org/10.1109/ACCESS.2019.2915275]
Kleanthous, S; Kasinidou, M; Barlas, P; Otterbacher, J. Perception of fairness in algorithmic decisions: Future developers' perspective. Patterns; 2022; [DOI: https://dx.doi.org/10.1016/j.patter.2021.100380]
Knekta, E; Runyon, C; Eddy, S. One size doesn’t fit all: Using factor analysis to gather validity evidence when using surveys in your research. CBE Life Sciences Education; 2019; 18,
Kong, SC; Cheung, WMY; Zhang, G. Evaluating an artificial intelligence literacy programme for developing university students’ conceptual understanding, literacy, empowerment and ethical awareness. Educational Technology & Society; 2023; 26,
Latham, A., & Goltz, S. (2019, June). A survey of the general public’s views on the ethics of using AI in education. In International Conference on Artificial Intelligence in Education (pp. 194–206). Springer, Cham. https://doi.org/10.1007/978-3-030-23204-7_17
Lee, J., & Soylu, M. Y. (2023). ChatGPT and Assessment in Higher Education [White Paper]. Center for 21st Century Universities, Division of Lifetime Learning, Georgia Institute of Technology. https://c21u.gatech.edu/papers/chatgpt-and-assessment-higher-education. Accessed 15 Apr 2025.
Lee, M. K., Jain, A., Cha, H. J., Ojha, S., & Kusbit, D. (2019). Procedural justice in algorithmic fairness: Leveraging transparency and outcome control for fair algorithmic mediation. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW), 1–26. https://doi.org/10.1145/3359284
Lee, MSA; Floridi, L; Singh, J. Formalising trade-offs beyond algorithmic fairness: Lessons from ethical philosophy and welfare economics. AI and Ethics; 2021; 1,
Li, S; Gu, X. A risk framework for human-centered artificial intelligence in education. Educational Technology & Society; 2023; 26,
Lim, T., Gottipati, S., & Cheong, M. (2022). Authentic Assessments for Digital Education: Learning Technologies Shaping Assessment Practices. Proceedings of the 30th International Conference on Computers in Education (ICCE 2022). 1, p. 587–592. Kuala Lumpur, Malaysia. ISBN: 978-986-972-149-3. https://icce2022.apsce.net/uploads/P1_C7_91.pdf. Accessed 15 Apr 2025.
Lim, T; Gottipati, S; Cheong, M. Keengwe, J. Ethical considerations for artificial intelligence in educational assessments. Creative AI tools and ethical implications in teaching and learning; 2023; IGI Global: pp. 32-79. [DOI: https://dx.doi.org/10.4018/979-8-3693-0205-7.ch003]
Lim, T; Gottipati, S; Cheong, M; Ng, JW; Pang, C. Analytics-enabled authentic assessment design approach for digital education. Education and Information Technology; 2023; 28, pp. 9025-9048. [DOI: https://dx.doi.org/10.1007/s10639-022-11525-3]
Lin, CC; Huang, AY; Lu, OH. Artificial intelligence in intelligent tutoring systems toward sustainable education: A systematic review. Smart Learning Environments; 2023; 10,
Markl, N. (2022, June). Language variation and algorithmic bias: understanding algorithmic bias in British English automatic speech recognition. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (pp. 521–534). https://doi.org/10.1145/3531146.3533117
Mayfield, E., Madaio, M., Prabhumoye, S., Gerritsen, D., McLaughlin, B., Dixon-Román, E., & Black, A. W. (2019, August). Equity beyond bias in language technologies for education. In Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 444–460). Association for Computational Linguistics. https://doi.org/10.18653/v1/W19-4446
Megahed, N. A., Abdel-Kader, R. F., & Soliman, H. Y. (2022). Post-pandemic education strategy: Framework for artificial intelligence-empowered education in engineering (AIEd-Eng) for lifelong learning. In International Conference on Advanced Machine Learning Technologies and Applications (pp. 544–556). Springer, Cham. https://doi.org/10.1007/978-3-031-03918-8_45
Memarian, B; Doleck, T. Fairness, accountability, transparency, and ethics (FATE) in artificial intelligence (AI), and higher education: A systematic review. Computers and Education: Artificial Intelligence; 2023; [DOI: https://dx.doi.org/10.1016/j.caeai.2023.100152]
Monteiro, M; Salgado, L. Conversational agents: A survey on culturally informed design practices. Journal of Interactive Systems; 2023; 14,
Moore, MG. Moore, MG. The theory of transactional distance. Handbook of distance education; 2013; Routledge: pp. 66-85. [DOI: https://dx.doi.org/10.4324/9780203803738]
Mougiakou, E., Papadimitriou, S., & Virvou, M. (2018, July). Intelligent tutoring systems and transparency: The case of children and adolescents. In 2018 9th International Conference on Information, Intelligence, Systems and Applications (IISA) (pp. 1–8). IEEE. https://doi.org/10.1109/IISA.2018.8633652
Nguyen, A; Ngo, HN; Hong, Y; Dang, B; Nguyen, BPT. Ethical principles for artificial intelligence in education. Education and Information Technologies; 2023; 28,
Nigam, A; Pasricha, R; Singh, T; Churi, P. A systematic review on AI-based proctoring systems: Past, present and future. Education and Information Technologies; 2021; 26,
Nikkhah, M; Heravi-Karimooi, M; Montazeri, A; Rejeh, N; Sharif Nia, H. Psychometric properties of the Iranian version of the Older People’s Quality of Life questionnaire (OPQOL). Health and Quality of Life Outcomes; 2018; 16,
Ogden, CR; Richards, IA. The meaning of meaning: A study of the influence of language upon thought and of the science of symbolism; 1923; Routledge & Kegan Paul:
Ojha, M; Kumar Mishra, A; Kandpal, V; Singh, A. Bhattacharya, P; Hassan, A; Liu, H; Bhushan, B. The ethical dimensions of AI development in the future of higher education: balancing innovation with responsibility. Ethical dimensions of AI development; 2025; IGI Global: pp. 401-436.
O’Neill, O. Universalism in ethics. The Routledge encyclopedia of philosophy; 1998; Taylor & Francis: [DOI: https://dx.doi.org/10.4324/9780415249126-L108-1]
Orr, W; Crawford, K. The social construction of datasets: On the practices, processes, and challenges of dataset creation for machine learning. New Media & Society; 2024; 26,
Ouyang, F; Dinh, TA; Xu, W. A systematic review of AI-driven educational assessment in STEM education. Journal for STEM Education Research; 2023; 6,
Pallagi, A., Peto, R., & Hronyecz, E. (2023, September). Increasing the resilience of critical infrastructures with defense zone system. In 2023 IEEE 21st Jubilee International Symposium on Intelligent Systems and Informatics (SISY) (pp. 000549–000554). IEEE. https://doi.org/10.1109/SISY60376.2023.10417949
Pant, A; Hoda, R; Tantithamthavorn, C; Turhan, B. Ethics in AI through the practitioner’s view: A grounded theory literature review. Empirical Software Engineering; 2024; 29,
Park, C; Kim, DG. Perception of instructor presence and its effects on learning experience in online classes. Journal of Information Technology Education: Research; 2020; 19, pp. 475-488.
Pontual Falcão, T., Lins Rodrigues, R., Cechinel, C., Dermeval, D., Harada Teixeira de Oliveira, E., Gasparini, I., & Ferreira Mello, R. (2022, March). A Penny for your Thoughts: Students and Instructors’ Expectations about Learning Analytics in Brazil. In LAK22: 12th International Learning Analytics and Knowledge Conference (pp. 186–196). https://doi.org/10.1145/3506860.3506886
Popper, K. Three worlds; 1979; University of Michigan: [DOI: https://dx.doi.org/10.28945/4611]
Project, PE; Peirce, CS. The essential peirce (Volume 2); 1998; Indiana University Press: [DOI: https://dx.doi.org/10.2979/4296.0]
Ramesh, D; Sanampudi, SK. An automated essay scoring systems: A systematic literature review. Artificial Intelligence Review; 2022; 55,
Rudolph, J; Tan, S; Tan, S. ChatGPT: Bullshit spewer or the end of traditional assessments in higher education?. Journal of Applied Learning and Teaching; 2023; [DOI: https://dx.doi.org/10.37074/jalt.2023.6.1.9]
Sadek, M; Calvo, RA; Mougenot, C. Designing value-sensitive AI: A critical review and recommendations for socio-technical design processes. AI Ethics; 2024; 4,
Sayles, J. Aligning AI governance, AI development lifecycle, and systems development lifecycle processes. In Principles of AI governance and model risk management: Master the techniques for ethical and transparent AI systems; 2024; Apress: pp. 383-408.
Schaeffer, D; Coombs, L; Luckett, J; Marin, M; Olson, P. Risks of AI applications used in higher education. Electronic Journal of e-Learning; 2024; 22,
Schiff, D. Education for AI, not AI for education: The role of education and ethics in national AI policy strategies. International Journal of Artificial Intelligence in Education; 2022; 32,
Seo, K; Tang, J; Roll, I; Fels, S; Yoon, D. The impact of artificial intelligence on learner–instructor interaction in online learning. International Journal of Educational Technology in Higher Education; 2021; 18,
Sghir, N; Adadi, A; Lahmer, M. Recent advances in predictive learning analytics: A decade systematic review (2012–2022). Education and Information Technologies; 2023; 28,
Shi, SJ; Li, JW; Zhang, R. A study on the impact of generative artificial intelligence supported situational interactive teaching on students’‘flow’experience and learning effectiveness – a case study of legal education in China. Asia Pacific Journal of Education; 2024; [DOI: https://dx.doi.org/10.1080/02188791.2024.2305161]
Song, Y; Weisberg, LR; Zhang, S; Tian, X; Boyer, KE; Israel, M. A framework for inclusive AI learning design for diverse learners. Computers and Education: Artificial Intelligence; 2024; [DOI: https://dx.doi.org/10.1016/j.caeai.2024.100212]
Stark, L., & Hoey, J. (2021). The ethics of emotion in artificial intelligence systems. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 782–793). ACM. https://doi.org/10.1145/3442188.3445939
Stone, A. Student perceptions of academic integrity: A qualitative study of understanding, consequences, and impact. Journal of Academic Ethics; 2023; 21,
Sun, S; Wu, X; Xu, T. A theoretical framework for a mathematical cognitive model for adaptive learning systems. Behavioral Sciences; 2023; 13,
Surahman, E; Wang, TH. Academic dishonesty and trustworthy assessment in online learning: A systematic literature review. Journal of Computer Assisted Learning; 2022; 38,
Tlili, A; Essalmi, F; Jemni, M; Chen, NS. A complete validated learning analytics framework: Designing issues from data preparation perspective. International Journal of Information and Communication Technology Education (IJICTE); 2018; 14,
Tlili, A; Essalmi, F; Jemni, M; Chen, NS. A complete validated learning analytics framework: Designing issues from data use perspective. International Journal of Information and Communication Technology Education (IJICTE); 2019; 15,
Trist, EL; Bamforth, KW. Some social and psychological consequences of the longwall method of coal-getting: An examination of the psychological situation and defences of a work group in relation to the social structure and technological content of the work system. Human Relations; 1951; 4,
Wang, Z., Huang, C., Tang, K., & Yao, X. (2025). Procedural Fairness and Its Relationship with Distributive Fairness in Machine Learning. arXiv preprint arXiv:2501.06753.
Wang, S; Sun, Z; Chen, Y. Effects of higher education institutes’ artificial intelligence capability on students' self-efficacy, creativity and learning performance. Education and Information Technologies; 2023; 28,
Whittlestone, J; Nyrup, R; Alexandrova, A; Dihal, K; Cave, S. Ethical and societal implications of algorithms, data, and artificial intelligence: A roadmap for research; 2019; Nuffield Foundation:
Williamson, B; Bayne, S; Shay, S. The datafication of teaching in Higher Education: Critical issues and perspectives. Teaching in Higher Education; 2020; 25,
Wu, SY; Yang, KK. The effectiveness of teacher support for students’ learning of artificial intelligence popular science activities. Frontiers in Psychology; 2022; 13, 868623. [DOI: https://dx.doi.org/10.3389/fpsyg.2022.868623]
Xie, H; Chu, HC; Hwang, GJ; Wang, CC. Trends and development in technology-enhanced adaptive/personalized learning: A systematic review of journal publications from 2007 to 2017. Computers & Education; 2019; 140, 103599. [DOI: https://dx.doi.org/10.1016/j.compedu.2019.103599]
Yildirim, N., Pushkarna, M., Goyal, N., Wattenberg, M., & Viégas, F. (2023). Investigating how practitioners use human-AI guidelines: A case study on the People + AI guidebook. Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '23), 356, 1–13. https://doi.org/10.1145/3544548.3580900
Yu, P; Xu, H; Hu, X; Deng, C. Leveraging generative AI and large language models: A comprehensive roadmap for healthcare integration. Healthcare; 2023; 11,
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.