Content area
Intelligent Tutoring Systems (ITS) are computer systems that mimic human tutoring behavior while providing immediate feedback. With the rise of Generative Artificial Intelligence (GenAI), numerous ITS integrated with GenAI have been developed. Student engagement is critical for improving learning processes and outcomes. Therefore, it is important to examine the effectiveness of ITS integrated with GenAI in promoting student engagement in educational practice. This paper presents an explanatory mixed-method case study involving 880 undergraduate students who used GPTutor, an ITS powered by GenAI. First, a survey research was conducted to investigate the relationship between students’ actual interaction with GPTutor and their self-reported student engagement from three dimensions: Behavioral Engagement, Cognitive Engagement, and Emotional Engagement. Next, focus groups were conducted with a subsample of survey participants to better understand how and under what circumstances GPTutor improved student engagement. The focus groups also explored potential design improvements for GPTutor and other ITS powered by GenAI. The results of the survey research revealed a complex relationship between feature usage and student engagement. Specifically, engagement with the chatbot is significantly and positively associated with behavioral and emotional engagement, but not cognitive engagement. The exercise generator feature had no significant associations with any of the three dimensions of student engagement. The results of the focus groups shed some light on these relationships, revealing how GPTutor was used only when it was perceived as useful, and this perceived usefulness was shaped by the students’ perception of the difficulty of the course and whether their support system could adequately address questions they may have. Its usefulness was found to increase as the course progressed, particularly as examinations approached. As the examinations approached, it was increasingly clear that the exercise generator was preferred over the chatbot. The participants also made this clear by expressing how GPTutor could be improved, notably by increasing the capabilities of the chatbot to include multimodal media, like video recordings of lectures. In general, leveraging survey data, interview data, and back-end trace data from GenAI, this research makes an original contribution to AI-supported effective learning environments and design strategies to optimize the educational experiences of higher education students.
Introduction
The emergence and increasing popularity of Intelligent Tutoring Systems (ITS), generally powered by Artificial Intelligence (AI), has ushered a new era of self-directed learning (Alkhatlan & Kalita, 2019). Generative AI (GenAI) tools, like ChatGPT, are capable of providing flexible feedback in natural language, images, and videos on various tasks (Moorhouse et al., 2023). The capacity of GenAI tools to generate extensive learning resources has had a significant impact on higher education recently (Wang et al., 2024), for example, by empowering science education and self-regulation (Cooper, 2023; Ng et al., 2024), assisting with academic writing training and assessing (Zhang & Xu, 2024), and helping instructors to generate questions automatically (Mulla & Gharpure, 2023). Thus, the incorporation of GenAI into ITS has yielded benefits in easing the pressures on teachers and facilitating students to customize their learning experiences. However, concerns have also been raised about the heavy reliance on GenAI, which may adversely impact students’ critical thinking and problem-solving skills (Kasneci et al., 2023), and potential misinformation in the output of GenAI, which may mislead students in their knowledge acquisition (Wang et al., 2024).
Although various successful GenAI products have been applied in higher education ecosystems (Moorhouse et al., 2023; Wang et al., 2024), limited studies have examined the effectiveness of GenAI-powered ITS in promoting student engagement. Since the pedagogical utility of advanced digital technologies like ITS on learning relies on the engagement of students with these technologies (Bond et al., 2020; Nkomo et al., 2021), student engagement can be regarded as a critical metric to assess the effectiveness of an ITS in supporting the learning process and experience (Qian, 2025).
The present study introduces and evaluates GPTutor, a GenAI-powered custom ITS that provides course-specific teaching and learning features, for its influence on student engagement. GPTutor is selected as a representative ITS for two reasons. Following the classic ITS designs, GPTutor incorporates the four foundational modules of ITS (see the next Sect. 3.1 for more details). As an ITS powered by emerging GenAI technique, GPTutor supports intuitive tutoring experiences with natural language interactions. This is unlike previous ITS designs, in which students usually interact with the system through mouse manipulation, menu selection, or text input (Anderson et al., 1985, 1989; Taub et al., 2017). Also, unlike prior GenAI applications in education, which lacked reliable knowledge sources or were misaligned with teachers’ instructions (Chan & Hu, 2023; Reddy et al., 2024; Wang et al., 2024), GPTutor supports tutoring for broad curricula by integrating the Retrieval-Augmented Generation (RAG) technique to generate knowledge-grounded and accessible responses based on a knowledge base endorsed by the instructor.
To evaluate the influence of GPTutor on student engagement and uncover potential areas for further improving student engagement with GPTutor and other ITS powered by GenAI, a sequential explanatory mixed-method research design (Creswell & Plano Clark, 2007) is adopted. This research addresses the following two research questions:
RQ1. What influence does GPTutor have on higher education students’ levels of behavioral engagement, cognitive engagement, and emotional engagement?
RQ2. How does GPTutor improve student engagement?
Related work
Intelligent tutoring system powered by generative AI
Intelligent Tutoring Systems (ITS) are systems that generally deploy AI technologies to provide personalized and immediate instructional feedback (Elham et al., 2021). The primary idea of ITS can be traced back to the 1980 s, when the human tutoring effect was recognized, which motivated scholars to mimic tutoring with AI technologies (Alkhatlan & Kalita, 2019). Although the architecture of ITS varies, the traditional architecture generally consists of a student model module to track the learning profile of students, an expert knowledge module to store domain knowledge, a tutoring module to determine which information to provide to students, and a user interface module to interact with students (Nwana, 1990). With exemplary ITS products from pioneers in the 1980 s, like LISP Tutor to teach LISP programming (Anderson et al., 1989) or Geometry Tutor to present geometry proofs (Anderson et al., 1985), to various contemporary advancements, such as dialog-based ITS like AutoTutor that interacts with students in natural language (Graesser et al., 2004), game-based ITS like Crystal Island that embeds the study of microbiology into a detective game (Taub et al., 2017), and culturally-aware ITS like ICON that adapts delivered educational content based on students’ cultural background (Mohammed & Mohan, 2015), the effect of ITS in supporting learning has been well-recognized (James & Fletcher, 2016).
With the emergence of GenAI, there is excitement about the potential for ITS to provide extensive and personalized generated content to students (Chan & Hu, 2023). However, scholars have raised concerns about the factuality, faithfulness, and maliciousness of generated content (i.e., the hallucination problem) by large language model (LLM) products, such as ChatGPT (Giannakos et al., 2024). Like other ITS powered by GenAI, GPTutor utilizes LLM to provide its core learning features, including knowledge-grounded question-answering (KGQA) (see Fig. 1), exercise generator (see Fig. 2), and flashcard generator (see Fig. 3). To mitigate concerns about the hallucination problem, GPTutor uses the RAG technique (Fan et al., 2024). For each course, GPTutor will establish a knowledge base from the learning materials uploaded by the instructors (expert knowledge module). Given a student’s query about the course, the most relevant content will be retrieved from this knowledge base and fed into the output LLM together with the query, the learning profile, and a prompt related to the intended learning outcomes (ILO) (tutoring module), which could improve the factuality and faithfulness (Fan et al., 2024) of generated content and align the tutoring experience to the expectation of instructors. The generated content is attached with corresponding citations linking to the learning materials. In this study, GPTutor adopts the GPT-4o mini model (i.e., gpt-4o-mini-2024-07-18) by calling Azure OpenAI API with the following tuned parameters: temperature is 0.7 and top-p is 0.95. Temperature values range from 0 to 2, and top-p from 0 to 1. Higher values of both of these two parameters will increase the randomness of the output. To safeguard the generated response to students for academic use only, the intents of the student’s question are classified as chitchat, unethical, or inquiry, which are handled by different system prompts respectively (see Appendix A). For more technical details about GPTutor, please refer to Lui et al. (2024). The present paper uses GPTutor as a case study to evaluate the influence of ITS powered by GenAI in promoting student engagement.
[See PDF for image]
Fig. 1
The knowledge-grounded question-answering feature user interface of GPTutor.
[See PDF for image]
Fig. 2
The exercise generator feature user interface of GPTutor.
[See PDF for image]
Fig. 3
The flashcard generator feature user interface of GPTutor.
Digital technologies and student engagement
According to the definition of Kuh (2001), student engagement is the time and energy that students invest in purposeful learning activities. Student engagement is closely related to goals in education, such as achieving high academic performance and school completion (Fredricks et al., 2016), which makes it a critical concern for educators and researchers to improve the learning experience of students. Although there is considerable variation in the definition of student engagement in literature, the three widely accepted dimensions of student engagement are behavioral engagement (BE), cognitive engagement (CE), and emotional engagement (EE) (Fredricks, 2014). BE refers to students’ participation, efforts, and persistence in learning activities, CE refers to students’ perception and psychological investment in learning activities, and EE refers to students’ emotional reactions to learning activities (Nkomo et al., 2021). Fredricks (2014) suggested that all three dimensions should be measured, since these dimensions are interrelated for individual students. Thus, the present paper investigated all three dimensions of student engagement.
With the advancement of computer technologies, digital technologies, such as ITS, and other learning management systems (LMS) have been widely integrated in higher education practice (Bond et al., 2020). Since the primary objective of applying such tools is to facilitate learning and teaching, student engagement has been considered as an important factor to assess the effectiveness of various digital technologies, including LMS, social media, and lecture captures (Bond et al., 2020; Nkomo et al., 2021). Plenty of empirical studies have demonstrated the positive effects of digital technologies, such as Moodle as a classic LMS (Avcý & Ergün, 2022), web-based distance learning (Chen et al., 2010), and ITS combined with gamifications (Ramadhan et al., 2024) on student engagement. However, few studies have investigated the influence of ITS powered by GenAI on student engagement, which is one of the most popular digital technologies to help with various educational tasks among undergraduates and postgraduates (Chan & Hu, 2023) integrated with the classic ITS framework. Compared to prior studies on GenAI-powered ITS, the present study distinguishes itself by deploying GPTutor in a real-world undergraduate course and conducting a mixed-method investigation. In contrast, an adaptive GenAI-powered ITS proposed by Almetnawy et al. (2025) was limited to preliminary tests with AI-simulated students. Similarly, Mukherjee et al. (2025) developed a GenAI-enhanced ITS for cybersecurity education, but only proposed a qualitative study framework without any empirical implementation. Frank et al. (2024)’s ITS integrated with GenAI for tutoring R programming only underwent system evaluations without involving students. Another distinguishing factor is the KGQA feature of GPTutor, which enables students to explore the learning materials and address personalized needs. In comparison, prior GenAI-based ITS designs (Almetnawy et al., 2025; Frank et al., 2024; Mukherjee et al., 2025) simply focused on assessment-based or exercise-based tutoring. Thus, the present study offers an original case study on how students’ use of GPTutor (as an ITS powered by GenAI) influenced the three dimensions of student engagement (RQ1).
According to a systematic review on the link between student engagement and educational technologies by Bond et al. (2020), the most used indicator of student engagement is students’ participation and interaction. While many studies have shown satisfying levels of self-reported student engagement after students’ usage of digital technologies, some studies have reported the ineffective role of digital technologies, like gaze-reactive ITS (D’Mello et al., 2012) and Wiki integration with learning (Cole, 2009), given the non-significant relationship between system log data of students’ interaction with these technologies and self-reported student engagement. Thus, to rigorously examine the effectiveness of GPTutor on promoting student engagement, the count of students’ interactions with GPTutor recorded in the system is selected as the predictor of all three dimensions of student engagement. Bond et al. (2020) also discussed how, compared to the wealth of quantitative research that has been conducted on student engagement, there are only a limited number of qualitative and mixed-method studies that have been conducted on the topic. This motivates us to adopt a sequential explanatory mixed-method design to gain a deeper understanding of how GPTutor influenced student engagement (RQ2).
Pedagogical considerations in designing GPTutor
Digital technologies have the potential to improve student engagement (Nkomo et al., 2021), but effective pedagogical design is essential for these technologies to foster engagement rather than hinder it (Bond et al., 2020). In this paper, the constructivist approach is adopted as a foundational principle undergirding the design of GPTutor. The constructivist approach refers to a theory that “learners construct their own understanding and knowledge of the world through experiences and reflecting on those experiences” (Yan et al., 2024, p.1840). According to one recent empirical study (Pallant et al., 2025), it was revealed that meaningful learning was contingent on how and why the student used the GenAI tool in their education. Specifically, students who engaged with the tool to bridge knowledge gaps, critically collaborate in the construction of knowledge, and master the knowledge achieved higher learning outcomes than students who passively used the tool to fulfill performance goals. From a design and educational perspective, this reflects the importance of developing diverse and high-quality features that can leverage and nurture students’ abilities to question, analyze, and memorize course material for their educational journey, as per their learning needs.
The platform and features of GPTutor were designed with the constructivist approach in mind. First, with inquiry-based principles at its core, the KGQA feature affords the student the opportunity to explore and understand the material using prompts and queries. It provides high-quality and personalized instruction in the form of feedback (e.g., summaries, filtered information, and/or specific suggestions). Second, on the principle of learning-by-doing, the exercise generator yields three types of questions (short-form, true-false, and multiple-choice questions) based on the course materials. Students have the flexibility to decide on the range of content to be quizzed on and the type of questions to generate. Finally, the flashcard generator feature is grounded on the principle of learning by engaging in rote memorization. Rote memorization is a traditional memorization technique that is popularly employed for knowledge retention.
Each feature helps the student accomplish specific learning goals and objectives. Mapping the features to Bloom’s taxonomy, the flashcard generator provides a structured approach to rote memorization, the exercise generator develops students’ cognitive skills of remembering, understanding, applying, analyzing, and evaluating, and the KGQA assists the student with developing cognitive skills on all levels of the taxonomy (i.e., from lower-order to higher-order thinking), including creating (e.g., by reflecting on what they know and what they are uncertain about, students can create prompts or queries for the chatbot to discover what they do not know). In this manner, GPTutor empowers the student with greater flexibility and opportunities to personalize their learning experiences. With these features, we anticipate that GPTutor will encourage student engagement.
While it is important to consider how GenAI advances pedagogical approaches in learning, prior research has revealed that GenAI usage can vary based on issues of unequal access (Qian, 2025). Indeed, most of the existing applications of GenAI in higher education adopt a monthly quota or fixed credit allocation system to limit users’ frequency of use, which has led some to express the need to design GenAI tools with greater consideration to equity and inclusion in higher education. Notably, as users may have different levels of AI literacy, it would impact when and how they use GenAI tools in their learning journey (Beckman et al., 2025; Giannakos et al., 2024). GPTutor was designed with these considerations in mind. Rather than operating with a monthly quota on usage, it carries nil access costs to the user, thereby affording users freedom to use the tool whenever and however many times they wish.
Methods
This study adopts a sequential explanatory mixed-method research design (Creswell & Plano Clark, 2007) to investigate the influence of using GPTutor on student engagement in the higher education context of the Hong Kong Special Administrative Region. The target participants are year-one university students who were taking an introductory course on AI and data analytics (AIDA)1 in University X.2 GPTutor was introduced in the course as an ITS to support their learning. The quantitative and qualitative components of the research were conducted from October 17 to November 5 in 2024 and January 21 to February 17 in 2025, when students had used GPTutor for approximately six weeks and three months (one semester) respectively. The design of the interview schedule (qualitative component) was informed by the results of the survey research (quantitative component). This study received prior approval from the Institutional Review Board (IRB) to conduct the research.
Quantitative research
Sampling procedures and participants
The survey was administered to all students who enrolled in the AIDA course. Prior to completing the survey, an information sheet, which contained the project background details, IRB approval, and information on data management practices and their rights (e.g., right to voluntary participation and withdrawal from the study), and consent form were provided to survey respondents. The key inclusion criterion for recruitment into the sample is that the respondents should have used GPTutor. We validated this through their survey responses and according to system records during the semester.
The survey received 880 responses, of which 741 respondents were eligible based on the key inclusion criterion. After removing respondents from the sample who did not complete the survey genuinely – notably, some of them scored 5 (highest) or 1 (lowest) on pairs of items that had contrasting meanings –, the final sample included 727 survey respondents.
Measurements
Student engagement in the ITS context is measured by three dimensions: behavioral engagement (BE), cognitive engagement (CE), and emotional engagement (EE). The original measurement has a 5-point Likert scale (1 = Strongly disagree, 2 = Disagree, 3 = Neutral, 4 = Agree, 5 = Strongly agree), with items designed to measure the three dimensions of student engagement in relation to a distance education setting among undergraduates and postgraduates (Fredricks et al., 2005). Given its similarity with the present study, we adapted the measurement and replaced distance education with GPTutor in the items. The adaptation of items in the scale followed advice from two knowledgeable university faculty staff and feedback from students in a pilot study (see Appendix B).
To study students’ actual interaction with the GPTutor system, Chat message count (the count of messages that students sent to interact with the GPTutor KGQA feature) and generated exercises count3 (the count of questions that students generated for personalized exercises) were extracted from the system records of GPTutor as administrative data for analysis. Chat message count and generated exercises count are non-negative count data.
Gender (Female is coded as 0, and Male is coded as 1), exam scores, familiarity with AI and Machine Learning, and familiarity with GenAI were also measured in the survey, as control variables. Exam scores is a summative test of students’ knowledge in the AIDA course, where students could voluntarily use GPTutor to support their learning process. Familiarity with AI and Machine Learning and familiarity with GenAI tools are both measured on a 7-point Likert scale. They were both measured in a pre-survey (i.e., prior to students’ exposure to GPTutor and AIDA).
Data analysis
RStudio was used for data management and performing the statistical analysis (R Core Team, 2022). According to literature on the student engagement scale (Fredricks et al., 2005), the reverse-coded items are re-coded prior to analysis. For the self-reported data collected from the adapted scale, Exploratory Factor Analysis (EFA) (Field, 2005) was conducted to reveal the underlying factor structure. The Kaiser–Meyer–Olkin (KMO) test (Dziuban & Shirkey, 1974) was carried out and related statistics were performed for the EFA. The Cronbach’s alpha was estimated to test the internal consistency of the items measuring each dimension of interest (i.e., BE, CE, and EE). Given the results of EFA and Cronbach’s alpha, and related literature about the measurement of student engagement, the items for BE, CE, and EE were adjusted accordingly. The relationship between the administrative data (system record) and self-reported data is analyzed with three multiple regression models (i.e., BE, CE, and EE as the outcome variable of separate regression models). To control for their confounding effects on the relationship between GPTutor use and student engagement, control variables were also included in the regression models.
Qualitative research
Sampling procedures and participants
An open recruitment call was issued to recruit AIDA students to participate in a focus group. The recruitment call included an information sheet, which detailed the project background, what to expect, and how it follows-up on the survey component. To be eligible to participate in the follow-up study as an informant, the respondent should have completed a valid survey.
The recruitment call for the focus groups received 16 responses, among which six participants were eligible and attended the discussion. Two discussion sessions were conducted based on participants’ availability. A facilitator/moderator and an assistant facilitator (both authors in this study) led and guided the focus group discussions. Before starting the focus group, the lead facilitator and assistant facilitator first introduced themselves and their roles, and then explained the project background, information on data management practices, and their rights (e.g., right to voluntary participation and withdrawal from the study). The lead facilitator set the stage for an inclusive and respectful discussion by explaining the importance of listening to others, respecting differences, and being open-minded. Consent forms were provided for completion, and prior approval was sought for audio recording the focus groups. Each focus group lasted approximately 45–50 min. To acknowledge their efforts and time participating in the focus group, a HKD100 supermarket coupon was provided to each informant.
The first focus group had two informants, while the second group had four informants. The sample size is suitable according to conventions and guidelines on focus group research (i.e., 5–8 informants) (Krueger & Casey, 2014) and an empirical study on student engagement in educational software (Li & Kim, 2024). The educational backgrounds of the participants are similar (i.e., all are year-one undergraduates who majored in Computer Science). The six participants were provided pseudonyms (i.e., from P1 to P6), and their personal information was sufficiently anonymized to prevent direct or indirect identification.
Measurements
The two focus group discussions were conducted with a semi-structured interview schedule. The interview schedule included guiding questions on the students’ initial impressions of GPTutor, usage patterns of GPTutor, preferences of GPTutor features, satisfaction with GPTutor, and suggestions for feature enhancements and/or changes of GPTutor to improve students’ engagement. Sample questions included “In your opinion and experience, how has GPTutor impacted your learning process in the AIDA course?" and “How did you interact with or use GPTutor? What motivated or discouraged your use of the tool?". The guiding questions served to solicit reflections and responses among the informants, and follow-up questions were asked whenever probing was deemed necessary (e.g., to seek clarity on an informant’s vague sharing or to solicit some real-life examples of using GPTutor).
Data analysis
Thematic analysis was used to perform data analysis on the transcribed focus group data (Braun & Clarke, 2006). Two coders (both authors of this study) were involved in the thematic analysis. Following the six phases of thematic analysis, first we familiarized ourselves with the transcript data by reading and re-reading them (data familiarization). When necessary, such as during periods when an informant is mumbling or speaking in a low voice, we listened to the audio recording directly. Second, the two coders manually coded the transcripts. This process was done separately. After coding the transcripts, the two coders met together to discuss the codes, find common ground, and resolve disagreements on coding if necessary. This was an iterative and collaborative process that involved revisiting transcripts and audio files of the focus group discussions to determine the integrity and capture of relevant codes. There were no major disagreements that had to be resolved. Third, after manually coding the transcripts, we attempted to identify emergent themes and subthemes. Fourth, we compared notes on codes, subthemes, and themes, and reviewed the thematic map/network and output together. Fifth, we defined and named the themes. Finally, we selected exemplar quotations as illustrations, and presented them in a manner that is coherent, consistent, and readable. Overall, the two coders met over two weeks to perform the analysis and write the results shown in this paper.
Originally, the results revealed four emergent themes: (1) Encouraging engagement with GPTutor by affording ample opportunities to learn; (2) Dispositional and situational influences on engagement with GPTutor; (3) Other design features of GPTutor and their advantages for improving engagement; and (4) Design improvements for GPTutor. However, after receiving suggestions from reviewers to highlight the themes in a more analytical manner, we decided to collapse the first and third themes, since they both elucidated how GPTutor influenced student engagement. The second theme was retained, as it shed light on the factors that influenced variations in student engagement. The four theme was removed since it added little to the main inquiry. The refinement resulted in two themes: (1) Encouraging engagement with GPTutor with design features and nil access costs, and (2) Dispositional and situational influences on engagement with GPTutor.
Results
Quantitative research
Descriptive analysis
The demographic data of the sample is summarized in Table 1. Among the 727 respondents, 47.87% of them were female and 52.13% were male. The mean exam scores was 77.18 (S.D. = 12.11), indicating generally strong performance in the course, where students used GPTutor to support their learning. Based on system records, respondents exchanged an average of 7.68 chat messages with the GPTutor system (S.D. = 10.24), reflecting a moderate level of interaction with the KGQA feature. In contrast, the generated exercises count in eligible participants was relatively low, with a mean of 7.10 (S.D. = 27.53), suggesting a positively skewed distribution (skewness = 8.36). However, excluding participants who never used the exercise generator feature, the mean of generated exercises count significantly increases to 25.15 (n = 238). Prior to the AIDA course, students reported moderate familiarity with AI and Machine Learning, with a mean of 3.53 (S.D. = 1.41) on a 7-point Likert scale. Their familiarity with GenAI tools was somewhat higher, with a mean of 4.09 (S.D. = 1.38).
Table 1. Demographic statistics of eligible participants in the quantitative study
Nominal variables | n (%) | |
|---|---|---|
Gender | ||
Female | 348 (47.87) | |
Male | 379 (52.13) | |
Continuous variables | Mean | S.D. |
|---|---|---|
Exam Scores | 77.18 | 12.11 |
Chat Message Count | 7.68 | 10.24 |
Generated Exercises Count | 7.10 | 27.53 |
Familiarity with AI/ML | 3.53 | 1.41 |
Familiarity with GenAI Tools | 4.09 | 1.38 |
The mean values and S.D. (Standard Deviation) are calculated for all eligible participants (regardless of what features of GPTutor they engaged with)
Exploratory factor analysis
Before running EFA on the student engagement items, KMO test results showed that factor analysis is suitable, given that the overall Measure of Sampling Adequacy (MSA) was larger than 0.9 and nearly all sampling adequacy for each item (MSAi) was larger than 0.8 (except for one item with 0.67). The factor loading results from EFA and Cronbach’s alpha are summarized in Table 2. Based on Field’s suggestion (Field, 2005), the threshold for significant factor loadings is determined to be 0.35. An Oblimin rotation was used, since the student engagement items are assumed to be non-independent (Fredricks, 2014). Based on the EFA results, the updated BE subscale consists of five items (BE1, BE2, BE5, CE4, and CE6), the updated CE subscale consists of six items (CE1, CE2, CE3, CE5, CE7, and CE8), and the updated EE consists of seven items (EE1, EE2, EE3, EE4, EE5, EE6, and BE4).
One item (BE3) was removed because of the following reasons. First, according to the EFA results, BE3 had low-factor loading on all three factors. Second, removing BE3 from the BE item list improved the Cronbach’s alpha value of the BE subscale, increasing it from 0.57 to 0.63. Third, as suggested by one survey respondent in the comment section of our survey, “Regarding BE3, ’When I am using GPTutor, I just ‘act’ as if I am learning’, I dislike this question because I am a student. I should learn about the GPTutor. I do not get what is the purpose of the question”. The respondent’s sentiment was shared by a couple of other respondents as well, indicating that the item was inappropriate for our context.
To measure the internal consistency of the updated dimensions of student engagement, Cronbach’s alpha was estimated. The reliability test yielded alpha scores of 0.7, 0.75, and 0.9 for the updated BE, CE, and EE respectively. Since the internal consistency of each dimension of student engagement is not lower than 0.7, and thus considered satisfactory for further use, we proceeded to use the updated BE, CE, and EE for multivariate modeling. The descriptive statistics of the updated BE, CE, and EE are summarized in Table 3. The results showed that all survey respondents reported moderate to high levels of engagement with GPTutor across all three dimensions. BE had the highest mean score (M = 3.82, S.D. = 0.53), suggesting that respondents generally participated in the learning activities with GPTutor. The EE subscale also had a relatively high mean score (M = 3.58, S.D. = 0.64), which indicated that respondents generally had positive emotional responses to GPTutor. CE had a slightly lower mean score (M = 3.21, S.D. = 0.64), which reflected respondents’ generally neutral perception and psychological investment in engaging with GPTutor. Overall, these findings suggest that GPTutor was perceived positively by most of the survey respondents.
Table 2. Results of Exploratory factor analysis on student engagement scale
Item | Factor 1 | Factor 2 | Factor 3 | |
|---|---|---|---|---|
BE1 | 0.15 | 0.11 | 0.52 | 0.41 |
BE2 | 0.14 | 0.23 | 0.44 | 0.25 |
BE3 | 0.04 | 0.17 | 0.26 | 0.09 |
BE4 | 0.37 | 0.11 | 0.30 | 0.38 |
BE5 | 0.09 | 0.08 | 0.49 | 0.31 |
EE1 | 0.79 | 0.02 | 0.09 | 0.67 |
EE2 | 0.86 | 0.08 | 0.14 | 0.74 |
EE3 | 0.87 | 0.01 | 0.03 | 0.74 |
EE4 | 0.81 | 0.02 | 0.07 | 0.72 |
EE5 | 0.84 | 0.00 | 0.05 | 0.73 |
EE6 | 0.52 | 0.22 | 0.29 | 0.39 |
CE1 | 0.35 | 0.38 | 0.29 | 0.37 |
CE2 | 0.05 | 0.52 | 0.20 | 0.31 |
CE3 | 0.05 | 0.60 | 0.16 | 0.39 |
CE4 | 0.25 | 0.36 | 0.36 | 0.53 |
CE5 | 0.07 | 0.72 | 0.07 | 0.55 |
CE6 | 0.06 | 0.36 | 0.48 | 0.46 |
CE7 | 0.11 | 0.41 | 0.32 | 0.39 |
CE8 | 0.25 | 0.45 | 0.16 | 0.35 |
Unique factor loadings no less than 0.35 are bolded. represents the communalities of items at extraction. Factor 1, Factor 2, and Factor 3 are EE, CE, and BE, respectively
Table 3. Descriptive statistics of student engagement with GPTutor
Engagement dimension | Mean | Median | S.D. |
|---|---|---|---|
Behavioral Engagement (BE) | 3.82 | 3.80 | 0.53 |
Cognitive Engagement (CE) | 3.21 | 3.17 | 0.64 |
Emotional Engagement (EE) | 3.58 | 3.57 | 0.64 |
Linear regression modeling
The three multiple regression models are denoted as the BE model, the CE model, and the EE model, corresponding to their respective dependent variables. Despite the positive skewness of the generated exercises count, the original data without transformation was included in the three multiple regression models, as transformating it (e.g., with logarithm transformation) would make the data distribution worse and far from the actual nature of student usage patterns. The results of the regression models are summarized in Table 4.
In the BE model, chat message count (, ), exam scores (, ), and familiarity with GenAI tools (, ) were statistically significant predictors of BE. For the CE model, only familiarity with AI and Machine Learning was found to be a significant positive predictor (, ). In the EE model, both chat message count (, ) and familiarity with GenAI tools (, ) were statistically significant. Based on the results, we can conclude that KGQA usage (chat message count) is significantly related to BE and EE. By interpretation, when students’ chat message count increases, their BE and EE correspondingly increase.
Table 4. Results for regression models of behavioral engagement, cognitive engagement, and emotional engagement
Independent variables | Dependent variables | ||
|---|---|---|---|
BE | CE | EE | |
Chat | 0.078* (0.0019)[0.0003, 0.0078] | 0.029 (0.0023)[0.0064, 0.0027] | 0.097** (0.0023)[0.0015, 0.0106] |
Exercises | 0.033 (0.0007)[0.0007, 0.0020] | 0.030 (0.0009)[0.0024, 0.0010] | 0.030 (0.0009)[0.0010, 0.0024] |
Exam | 0.129*** (0.0016)[0.0025, 0.0089] | 0.042 (0.0020)[0.0061, 0.0016] | 0.011 (0.0020)[0.0044, 0.0033] |
Gender | 0.054 (0.0393)[0.1354, 0.0191] | 0.022 (0.0475)[0.1211, 0.0656] | 0.068 (0.0478)[0.1806, 0.0070] |
Fami-Literacy | 0.009 (0.0201)[0.0428, 0.0363] | 0.146** (0.0243)[0.0188, 0.1144] | 0.010 (0.0245)[0.0524, 0.0437] |
Fami-GenAI | 0.187*** (0.0203)[0.0324, 0.1123] | 0.061 (0.0246)[0.0199, 0.0766] | 0.144** (0.0247)[0.0180, 0.1150] |
Standardized coefficients () are reported, followed by standard errors in parentheses and 95% confidence intervals in square brackets. * , ** , *** . Chat is chat message count; Exercises is generated exercises count; Exam is exam scores; Fami-Literacy is familiarity with AI and Machine Learning; Fami-GenAI is familiarity with GenAI tools.
Qualitative research
Encouraging engagement with GPTutor with design features and nil access costs
Focus group interviews with the student users (i.e., the informants) offered greater context and first-hand accounts on how and under what circumstances GPTutor improved student engagement. Based on the findings, students’ proactive engagement with GPTutor is facilitated by the recognition that it comprehensively addresses their educational needs and can be personalized to adapt to their learning styles. There are two main elements that were raised as being beneficial: the availability of high quality and diverse features for learning, and the nil access costs.
From the technological viewpoint, all six of the informants agreed that GPTutor afforded them opportunities to increasingly engage with the course materials. Specifically, it afforded them chances to review, receive feedback, catch up on missed or overlooked content, and prime themselves pre-class for better engaging with the lecture. For instance, as informants P2 and P4 shared regarding their attitude toward GPTutor and usage of it:
P2: “For me, I think it is a very useful platform. You can embed the PowerPoints into the platform and you can also ask questions about the PowerPoints. And one good thing is that you can locate the answers by the pages number. So, you can really have a double check ... [Also] it can provide us with example [questions]. So, we can better understand what it is ... helping make [for] a clear and efficient review.”
GPTutor’s capacity to afford students the opportunities to learn in different ways was helped by its KGQA (e.g., when P4 shared about using GPTutor to better organize material for reviewing the course), and the exercise and flashcard generators (e.g., when P2 described how it could be used to generate sample questions to test their understanding of the PowerPoint content). These design features were hailed as strengths that GPTutor had over other available GenAI platforms. During the focus groups, contrasts were made between GPTutor’s straightforward interface and navigation deck, on the one hand, and more complex platforms, such as POE, or less capable methods of targeted information retrieval, such as using Google to search for information.P4: “I think GPTutor is important for me ... I want to say, for the final exam, you know, we need to organize some important part[s] of the PPT. But it’s hard for us to just [review] say from the first PPT to the last. So, in that time, I used the [GP]tutor, and I asked it to generate what it thinks is the most important [content], and what it thinks is the least important. And then I organized all the information [to review].”
P3: “... [one lecturer] showed us how to use POE or something like that. It’s also an AI. And it was extremely hard to understand this AI because there were lots of information, lots of some kind of, I don’t know, chats with different kinds of AI. And I think that if he could use also GPTutor, it would be much better.”
Besides the high-quality, diverse features that motivated engagement, all six informants also appreciated its nil access costs as facilitating sustained engagement. The informants enthused the limitless capacity of using GPTutor, contrasting it against their home institution’s GenAI platform which operated with a monthly usage quota. Informants P3 and P4 made this point clear in their response to the moderator’s question on the attractive features of GPTutor:P2: “[For] understanding issues with the PowerPoint, I would just ask GPTutor. I think it would be better than asking Google because Google just lists all the solutions to you, and maybe some of them are very complicated, and a lot of [it] you don’t understand, [they may be] without any PowerPoint slides.”
P3: “Oh, actually, it’s free, right? There are no points. No quota ... Yeah, it’s much better because well, I’m used to using [University X’s] GenAI, but I have no credits left, and I don’t know what to do.”
The concern raised by Informants P3 and P4 echo those raised by scholars over how GenAI may exacerbate existing inequalities in learning opportunities (i.e., unequal access to using the technology) if they impose limits to its use (Yan et al., 2024). By making the GenAI free, students expressed greater behavioral, cognitive, and emotional engagement with GPTutor beyond the initial stage of adopting the GenAI tool.P4: “What about some features? ... Yeah, so what’s the difference between GPTutor and [University X’s] Gen AI? Ok. So, well, first of all, I can name no credits.”
Based on the excerpts, we are able to conclude that GPTutor’s features lowered the barriers to initially, as well as repeatedly, engage with the GenAI. First, by providing different ways to learn the course material (e.g., inquiry-based learning, rote memorization, and text summarization), students with diverse educational needs could feel accommodated and encouraged to engage. Second, students’ engagement was encouraged by the lack of an enforced monthly usage quota, which has been discussed as a significant barrier to accessibility and equity in learning opportunities. Importantly, too, its features were able to cultivate commitment to using it over moving to other platforms.
Dispositional and situational influences on engagement with GPTutor
Interestingly, despite all six informants agreeing on its advantages for enhancing their learning, and its superiority over other GenAI platforms, they wavered on their behavioral engagement with the tool. This is evidenced by the back-end data on the count of chat messages and generated exercises (see Table 5).
Table 5. Data on informants’ engagement with GPTutor
Informant | Chat message count | Generated exercises count |
|---|---|---|
P1 | 15 | 36 |
P2 | 10 | 3 |
P3 | 5 | 5 |
P4 | 5 | 0 |
P5 | 2 | 38 |
P6 | 6 | 0 |
Generated exercises count is inclusive of exercises and flashcards generated by the user
Although each feature was used in anticipation of its learning benefits, a consistent and repeated engagement with the GenAI tool during the 13-week course was unfounded. How can we make sense of their variable engagement with GPTutor, alongside acknowledgments of its design advantages and nil access costs? Based on the focus group interviews, informants’ wavering engagement with GPTutor was observed to be a product of two factors: dispositional and situational factors. Regarding the former, the perceived difficulty of the course determined the perceived usefulness of the tool. As Informant P3 opined, if the class was easy, then there was no (perceived) need to use the tool:
Informant P3 followed up their explanation with an example of a more difficult course and how they would use GPTutor in that course instead:P3: “Actually, it’s a really good idea to use AI to help students to learn some kind of information, but to be honest, okay, to be honest, AIDA is not that subject where we would really use AI. Like, because all the information there was easy to understand ... all of us, like, we choose to study in computing in the AI. So at least, like, we knew how to use AI.”
In a similar fashion, Informant P4 expressed how, in the beginning of the course, they did not use GPTutor because the content was easy enough that their friends could help them if they had a question (“If I have a question, I will ask my friend . . . sometimes it [i.e., using GPTutor] was not as useful as to just ask the people around us”). The self-disclosures from informants P3 and P4 revealed that the perceived usefulness of the tool can be a factor that dynamically shapes informants’ level of engagement with the tool.P3: “... For example, I have one subject right now, Introduction to Computing Systems ... it’s not hard if you spend a lot of time on it, but you still have other subjects [in the semester] to handle. GPTutor will be an excellent way to help you not to spend around six hours in the library to try to understand what is going on.”
As the course progressed, all of the informants (including P3 and P4) increasingly engaged with GPTutor. Informant P6, for example, expressed how, though they found no initial need to use the GPTutor’s KGQA feature (“The content in the course is very clear in face-to-face classes, so I don’t need the KGQA feature in GPTutor to further clarify”), the text summarization feature (a sub-feature under KGQA, where students could find the summary of learning materials) was found to be helpful later for preparing for their exams.
Informant P4 puts this into context as they described their use of GPTutor over the 13-week course.P6: “I won’t use GPTutor if there are no exams because after all the summary [i.e., generated by GPTutor] cannot cover everything. There are some details [that] it cannot present. I usually learn by myself first, then consolidate by doing my exercise. I think this is better.”
What changed their perceived usefulness of GPTutor? One prominent situational factor that influenced behavioral engagement with the tool was study pressure. Specifically, where there was a strong need to study or revise, there correspondingly was greater usage of the tool. For example, Informant P1 described how GPTutor was situationally instrumental for cramming for examinations, sharing, “I suppose, because when we have not much time like preparing for the exam or something ... you know, like [generating], true or false, and also open-ended questions are helpful ... because there’s a lot of information on the PowerPoint and we can actually do it [i.e., use GPTutor] during the short period. That is what I did.” Similarly, for P5, examinations were the spark that animated increased use of GPTutor. As P5 explained, “My usage is three times, but two times I used it a lot. The first time [was when] I finished the midterm review, I wanted to test my degree of understanding using GPTutor ... The last [two] times were before the final exam.”P4: “I used GPTutor for around eight times in the semester, but for each time, I will use it for one or two or even three hours. [For example,] I didn’t attend one class for some reason. So, then I use[d] the GPTutor to help me better understand the course [material]. And, also, I use[d] it is around the midterm and final [exam]. I will use it to help me better review the content for the examinations. [Finally,] also, one time I use[d] it was to help me to pre-look the course before the class. And I think it’s useful.”
The importance of dispositional (perceived usefulness) and situational (increasing academic stress) as moderating factors in influencing engagement with GPTutor is notable. Based on the focus group interviews, the use of exercise and flashcard generators invariably increased as exams approached, even among those who initially saw little perceived usefulness in the tool. This dynamic and trend suggests proactive personalization of the tool to match their learning styles and adaptability in using features based on where they are in their learning journey. This conclusion is buttressed by the informants’ suggestions to incorporate multimodal media, such as video recordings of the lecture, into the GenAI model. Their reasoning was that GPTutor’s capacity to support learning was limited to the course materials and content uploaded from PowerPoint or PDF files. This limitation meant that knowledge shared over the lecture – which might deviate from the PowerPoint used – could not be accounted for in the GPTutor. Since a prominent limitation of KGQA was the knowledge gap (i.e., between content on the PowerPoint and content conveyed in lectures), making the suggested feature improvement on KGQA might assist with improving student engagement with the GenAI tool as the course progresses over time.
Discussion
With the growing application of GenAI in our educational ecosystems, there is increasing necessity to provide evidence-based insights into how GenAI technologies affect students’ learning behaviors (Khosravi et al., 2025). Collecting, measuring, and analyzing data about students and their learning contexts are important to understanding the processes and conditions for initial and/or repeated engagement with GenAI tools (Khosravi et al., 2025). In our study, we evaluated the effectiveness of GPTutor in increasing student engagement, and explored how and under what circumstances did the student users engage with GPTutor for their learning.
Our mixed methods study revealed that GPTutor did increase student engagement, but not in a uniform manner. The quantitative results revealed that the chat message count, representing engagement with the KGQA feature of GPTutor, is significantly and positively associated with BE and EE. This shows that students generally recognize their interaction with the GPTutor in the KGQA feature as meaningful learning activities. However, generated exercises count, representing the exercise and flashcard generator features, is not significantly related to any of the three student engagement dimensions. This may be related to the positively skewed distribution of generated exercises count in the sample (Mean = 7.10 and Skewness = 8.36). Interestingly, the survey results do not fully coincide with the results from the focus groups. In the focus group discussions, the students generally reported a greater preference for the exercise generator over KGQA. We offer two interpretations for the contrasting results. First, the regression model was based on the full sample, rather than a subsample of participants who used the feature. As mentioned above, by excluding those who had never used the feature (that is, 489 participants), the mean score increased from 7.10 to 25.15. Therefore, it is possible that the generated exercises count feature does have an influence on student engagement dimensions, as the focus group findings suggests, but the general non-engagement with the feature in the overall sample influenced the model estimations of the associations to reach statistical nonsignificance. An alternative interpretation is that the difference in results might reflect a self-selection bias in the focus group. Specifically, compared to their counterparts who did not reach out to join the focus group discussions, students who were willing to voluntarily participate may have been those who cared more about their grades in the AIDA course (i.e., high performing students). Also, since the quantitative data was collected before the exam of the AIDA course, students may generate more exercises during their reviews for the exam. In East Asia, it is common for students to prepare for their examinations by using practice/mock exams and rote memorization, over an inquiry-based learning style (such as chatting with a chatbot). For instance, in one study on Hong Kong’s educational culture under the Outcome-Based Teaching and Learning pedagogical paradigm, which has become increasingly popular in secondary schools and higher education institutions since the early 2000s, interviewees voiced that the city strongly promoted a specific learning approach that involved “tedious rote learning, memorization and repetition, practice and familiarization of examination techniques, as opposed to genuine conceptual understanding and the application of the knowledge" (Pang et al., 2009, p.113).
The statistical nonsignificance of CE in the regression model is worth unpacking, since it generally supports what available empirical literature has found and coincides with findings from the focus group discussions. For example, Sun and Rueda (2012) also reported a nonsignificant relationship between CE and students’ usage of an online learning tool. CE refers to an individual’s perception of a tool or activity as being worthwhile to invest time and effort into. In the context of help-seeking behaviors of computing students, recent scholarship has demonstrated that ChatGPT is hardly the first resource that they consult (Hou et al., 2024, 2025). Rather, there are other more popular resources that are used for seeking assistance, such as doing online searches, talking to friends, seeking help from the instructor or TA, and consulting online forums (Hou et al., 2024). Our focus group findings echo their results, since informants in our study articulated how their engagement with GPTutor varied by its perceived usefulness in a range of situational circumstances (Chan & Hu, 2023). In some circumstances, it was helpful to consult GPTutor, such as when preparing for an exam in a short time period. Its features were appreciated by all six of the informants. However, in other circumstances, engagement with GPTutor was not necessary, since the student could seek help from their friends or simply be self-reliant (especially when the course or class material was not perceived to be difficult enough to warrant seeking help from other resources). These dynamics echo the findings from previous studies that bring to light the heterogeneity of engagement patterns with GenAI tools and how they can vary by situational circumstances and dispositional factors (e.g., see Hou et al. (2024) on how usage varies by social context, and Chan & Hu (2023) on how usage varies by academic need). As the perceived usefulness of the tool and its features can vary over time, our findings suggest the need to better measure cognitive engagement with GenAI tools as fluid (context-dependent) rather than static.
In our study, we can conclude that GPTutor was generally welcomed into the educational ecosystem. In both the survey and focus group data, generally speaking, it was evident that using GPTutor promoted their engagement. While there were no identified barriers to adopting the GenAI tool, there were initial hurdles that discouraged repeated engagement with the tool. Particularly, GPTutor was used only when it was perceived as being useful, and this perceived usefulness was shaped by the students’ perception of the difficulty of the course and whether their support system (e.g., friends/classmates) could adequately address queries they may have. Relating to our previous point, its usefulness was found to increase as the course progressed, particularly as examinations approached. In terms of the features, specifically, as examinations approached, it was increasingly clear that the exercise and flashcard generators were preferred over the KGQA feature. Informants made this clear too in expressing how GPTutor could be improved by increasing the capabilities of the KGQA feature to include multimodal media, notably video recordings of lectures. Considering the findings, ITS developers are recommended to design multimodal features that cater to diverse learning needs (i.e., accounting for between-person dispositional differences) and variable engagement (i.e., accounting for within-person differences) over time. Notably, the features may cater to students’ needs for inquiry-based learning, learning-by-doing, and rote memorization. Nil access costs and convenient interface designs for hassle-free navigation are non-trivial design improvements that developers can consider in promoting student engagement with GenAI learning tools in higher education. These practical design improvements can afford students greater autonomy in integrating ITS with their study practices.
While the design suggestion from the focus groups may indeed increase student engagement levels, since it effectively bridges the knowledge gap between content on the PowerPoint and content conveyed in lectures, we are also alerted to the warning expressed by Khosravi et al. (2025) concerning the risk of cultivating dependency as a result of making design improvements. Specifically, as argued by Khosravi et al. (2025), developers need to exercise prudence and judge the degree to which students actively engage with the GenAI tool versus develop a dependency on it for satisfying their educational needs. The danger in developing GenAI tools that cultivate engagement to a point of dependency is that it risks compromising essential skills, attitudes, and competencies that higher education students should develop. There is already plenty of evidence suggesting the negative impact of over-reliance on GenAI in educational contexts. Park and Ahn (2024) reported that professors and students have raised concerns about decreased academic integrity since students could easily receive answers from GenAI without engaging in learning, which is echoed by Chan and Hu (2023) from the perspective of the difficulty in identifying plagiarism with GenAI. Another study by Bianchi et al. (2023) on text-to-image GenAI tools indicated persistent biases in the generated images, even with explicit counterexamples and guardrails in place to prevent biases, which amplify demographic stereotypes, such as regarding race, gender, and class. Students who overly depend on such GenAI tools for their learning might be at risk of adopting these stereotypes in their everyday life. If lecture recordings are incorporated into GPTutor, effectively bridging the aforementioned knowledge gap, there is a potential risk of developing dependency on the platform (Kasneci et al., 2023). We do not have an answer on how to reconcile this issue (that is, how to offer high-quality features, opportunities for adaptive personalization, and accessibility without promoting over-reliance on the tool for learning), but we raise it to stimulate dialogue on the limits of designing AI-based educational tools and our responsibility to prioritize human-centered design (not productivity-centered design) (Khosravi et al., 2025). We encourage future studies to explore this inquiry further.
Conclusion
Overall, by leveraging survey data, interview data, and system record data from the GenAI (e.g., number of interactions with the chatbot or generated quiz questions), this paper makes an original contribution to literature on effective, AI-supported learning environments and design strategies to optimize higher education students’ educational experiences. Our mixed methods study showed that student engagement in AI-supported learning environments rely on more than just interactive features. Based on the survey results, GPTutor differentially influenced students’ level of engagement. Specifically, engagement with the KGQA feature of the chatbot positively influenced BE and EE, but not CE. Also, there was no significant association found between using the exercise generator feature and the three dimensions of engagement. Based on these results alone, we might conclude that feature designs greatly impact dimensions of engagement. However, our focus group results greatly illuminated the survey findings, unpacking the nuanced relationship between GPTutor and student engagement. While there was clear positive regard for GPTutor, the focus groups brought attention to how usage of the tool’s features varied by dispositional and situational characteristics of the student. The interviews revealed that BE followed the perceived usefulness of the tool. The perceived usefulness was shaped by the students’ perception of the difficulty of the course and whether their support system could adequately address questions they may have. Notably, its usefulness was found to increase as the course progressed, particularly as examinations approached. As the examinations approached, it was increasingly clear that the exercise generator was preferred over the chatbot. These results highlight the importance of dispositions and situational circumstances in the relationship between ITS powered by GenAI and student engagement. Considered as a whole, the mixed methods results echo the age-old dictum—‘it’s not about the tool, but how it is used’—in revealing that the effectiveness of the tool and its features depend on design considerations as much as the intent and decision-making of the user, which can change over time. We hope that our findings and suggestions for design improvement can serve students and educators and animate further developments in this field.
Limitations
This study has a number of limitations to acknowledge. First, we lacked a comparison group to rigorously compare the effectiveness of GPTutor in promoting student engagement. This is because we wanted to ensure that all students in the AIDA course had a fair learning experience and assessment experience. Yet, admittedly, arranging an experimental and comparison group would have been ideal to arrive at stronger conclusions regarding GPTutor’s effectiveness. Second, the sample is neither randomly selected nor representative of Computer Science students in University X (i.e., since the sample was drawn only from one course, AIDA). Thus, readers are cautioned against drawing inferences from the present study to the larger population. Third, self-selection bias may have influenced the results obtained from the qualitative research sample. Also, despite adopting techniques to limit social desirability bias, such as asking neutral, indirect questions and assuring participants that there is no wrong opinion (i.e., as part of our efforts to establish an inclusive and respectful focus group atmosphere), we acknowledge the potential of the bias to influence the data. Due to the small sample size and the potential biases disclosed, readers are cautioned against over-interpreting the findings of the focus groups. Future researchers may consider recruiting a larger sample and employing recruitment strategies that yield a sample of informants with varying levels of academic performance or distinct AI literacy profiles.
Acknowledgements
We acknowledge and thank all of the students who participated in this study and shared their valuable time and insights with us.
Funding
The present study is funded by the Fund for Innovative Technology-in-Education (FITE).
Data availability
The data that support the findings of this study are available from University X but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of University X.
Declarations
Competing interests
The authors declare that they have no competing interests.
Abbreviations
Artificial intelligence
Artificial intelligence and data analytics
Behavioral engagement
Cognitive engagement
Emotional engagement
Exploratory factor analysis
Generative artificial intelligence
Intended learning outcomes
Institutional review board
Intelligent tutoring systems
Knowledge-grounded question-answering
Kaiser–Meyer–Olkin
Large language model
Learning management system
Measure of sampling adequacy
Measure of sampling adequacy for each item
Retrieval-augmented generation
Standard deviation
AIDA refers to a course on artificial intelligence literacy and data analytics.
2University X is a pseudonym to represent the university where the informants are enrolled in.
3The usage of the exercise generator feature is optional to students.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
Alkhatlan, A; Kalita, J. Intelligent tutoring systems: A comprehensive historical survey with recent developments. International Journal of Computer Applications; 2019; 181,
Almetnawy, H., Orabi, A., Alneyadi, A.R., Ahmed, T., Lakas, A. (2025). An adaptive intelligent tutoring system powered by generative AI. In 2025 IEEE global engineering education conference (educon) (p. 1-10). https://doi.org/10.1109/EDUCON62633.2025.11016362
Anderson, J.R., Boyle, C.F., Yost, G. (1985). The geometry tutor. In Proceedings of the 9th international joint conference on artificial intelligence - volume 1 (p. 1 - 7). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Anderson, JR; Conrad, FG; Corbett, AT. Skill acquisition and the lisp tutor. Cognitive Science; 1989; 13,
Beckman, K; Apps, T; Howard, SK; Rogerson, C; Rogerson, A; Tondeur, J. The GenAI divide among university students: A call for action. The Internet and Higher Education; 2025; 67, [DOI: https://dx.doi.org/10.1016/j.iheduc.2025.101036] 101036.
Bianchi, F., Kalluri, P., Durmus, E., Ladhak, F., Cheng, M., Nozza, D., Hashimoto, T., Jurafsky, D., Zou, J., Caliskan, A. (2023). Easily accessible text-to-image generation amplifies demographic stereotypes at large scale. In Proceedings of the 2023 ACM conference on fairness, accountability, and transparency (p. 1493-1504). New York, NY, USA: Association for Computing Machinery. Retrieved from https://doi.org/10.1145/3593013.3594095
Bond, M; Buntins, K; Bedenlier, S; Zawacki-Richter, O; Kerres, M. Mapping research in student engagement and educational technology in higher education: A systematic evidence map. International Journal of Educational Technology in Higher Education; 2020; 17,
Braun, V; Clarke, V. Using thematic analysis in psychology. Qualitative Research in Psychology; 2006; 3,
Chan, CKY; Hu, W. Students voices on generative AI: Perceptions, benefits, and challenges in higher education. International Journal of Educational Technology in Higher Education; 2023; 20,
Chen, P-SD; Lambert, AD; Guidry, KR. Engaging online learners: The impact of web-based learning technology on college student engagement. Computers & Education; 2010; 54,
Cole, M. Using wiki technology to support student engagement: Lessons from the trenches. Computers & Education; 2009; 52,
Cooper, G. Examining science education in chatgpt: An exploratory study of generative artificial intelligence. Journal of Science Education and Technology; 2023; 32,
Creswell, JW; Plano Clark, VL. Designing and conducting mixed methods research; 2007; SAGE Publications:
D’Mello, S; Olney, A; Williams, C; Hays, P. Gaze tutor: A gaze-reactive intelligent tutoring system. International Journal of Human-Computer Studies; 2012; 70,
Dziuban, CD; Shirkey, EC. When is a correlation matrix appropriate for factor analysis? Some decision rules. Psychological Bulletin; 1974; 81,
Elham, M; Nahid, Z; Sharareh, RNK; Mahnaz, R; Leila, K; Saeedi, MG. Intelligent tutoring systems: A systematic review of characteristics, applications, and evaluation methods. Interactive Learning Environments; 2021; 29,
Fan, W., Ding, Y., Ning, L., Wang, S., Li, H., Yin, D., Chua, T.-S., Li, Q. (2024). A survey on RAG meeting LLMS: Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD conference on knowledge discovery and data mining (p. 6491-6501). New York, NY, USA: Association for Computing Machinery. Retrieved from https://doi.org/10.1145/3637528.3671470
Field, A. (2005). Discovering statistics using SPSS (2nd ed.). Sage Publications, Inc.
Frank, L., Herth, F., Stuwe, P., Klaiber, M., Gerschner, F., Theissler, A. (2024). Leveraging GenAI for an intelligent tutoring system for R: A quantitative evaluation of large language models. In 2024 IEEE global engineering education conference (educon) (p. 1-9). https://doi.org/10.1109/EDUCON60312.2024.10578933
Fredricks, JA. Eight myths of student disengagement: Creating classrooms of deep learning; 2014; Corwin Press: [DOI: https://dx.doi.org/10.4135/9781483394534]
Fredricks, JA; Blumenfeld, P; Friedel, J; Paris, A. Moore, KA; Lippman, LH. School engagement. What do children need to flourish? Conceptualizing and measuring indicators of positive development; 2005; Springer, US: pp. 305-321. [DOI: https://dx.doi.org/10.1007/0-387-23823-9_19]
Fredricks, JA; Filsecker, M; Lawson, MA. Student engagement, context, and adjustment: Addressing definitional, measurement, and methodological issues. Learning and Instruction; 2016; 43, pp. 1-4. [DOI: https://dx.doi.org/10.1016/j.learninstruc.2016.02.002]
Giannakos, M; Roger, A; Peter, B; Mutlu, C; Yannis, D; Davinia, H-L; Järvelä, S; Mavrikis, M; Rienties, B. The promise and challenges of generative AI in education. Behaviour & Information Technology; 2024; 44,
Graesser, AC; Lu, S; Jackson, GT; Mitchell, HH; Ventura, M; Olney, A; Louwerse, MM. Autotutor: A tutor with dialogue in natural language. Behavior Research Methods, Instruments, & Computers; 2004; 36,
Hou, I., Mettille, S., Man, O., Li, Z., Zastudil, C., MacNeil, S. (2024). The effects of generative AI on computing students help-seeking preferences. In Proceedings of the 26th Australasian computing education conference (p. 39-48). New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3636243.3636248
Hou, I., Nguyen, H.V., Man, O., MacNeil, S. (2025). The evolving usage of GenAI by computing students. Proceedings of the 56th ACM technical symposium on computer science education v. 2 (p. 1481-1482). New York, NY, USA: Association for Computing Machinery. Retrieved from https://doi.org/10.1145/3641555.3705266
James, AK; Fletcher, JD. Effectiveness of intelligent tutoring systems: A meta-analytic review. Review of Educational Research; 2016; 86,
Kasneci, E; Sessler, K; Ku Chemann, S; Bannert, M; Dementieva, D; Fischer, F; Kasneci, G. Chatgpt for good? On opportunities and challenges of large language models for education. Learning and Individual Differences; 2023; 103, 102274. [DOI: https://dx.doi.org/10.1016/j.lindif.2023.102274]
Khosravi, H; Shibani, A; Jovanovic, J; Pardos, Z; Yan, L. Generative AI and learning analytics: Pushing boundaries, preserving principles. Journal of Learning Analytics; 2025; 12,
Krueger, R; Casey, M. Focus groups: A practical guide for applied research; 2014; SAGE Publications:
Kuh, GD. Assessing what really matters to student learning inside the national survey of student engagement. Change: The Magazine of Higher Learning; 2001; 33,
Li, L; Kim, M. It is like a friend to me: Critical usage of automated feedback systems by self-regulating English learners in higher education. Australasian Journal of Educational Technology; 2024; 40,
Lui, R.W.C., Bai, H., Zhang, A.W.Y., Chu, E.T.H. (2024). GPTutor: A generative AI-powered intelligent tutoring system to support interactive learning with knowledge-grounded question answering. In 2024 international conference on advances in electrical engineering and computer applications (AEECA) (p. 702-707). https://doi.org/10.1109/AEECA62331.2024.00124
Mohammed, P; Mohan, P. Dynamic cultural contextualisation of educational content in intelligent learning environments using icon. International Journal of Artificial Intelligence in Education; 2015; 25,
Moorhouse, BL; Yeo, MA; Wan, Y. Generative AI tools and assessment: Guidelines of the worlds top-ranking universities. Computers and Education Open; 2023; 5, [DOI: https://dx.doi.org/10.1016/j.caeo.2023.100151] 100151.
Mukherjee, M; Le, J; Chow, Y-W. Generative AI-enhanced intelligent tutoring system for graduate cybersecurity programs. Future Internet; 2025; 17,
Mulla, N; Gharpure, P. Automatic question generation: A review of methodologies, datasets, evaluation metrics, and applications. Progress in Artificial Intelligence; 2023; 12,
Ng, DTK; Tan, CW; Leung, JKL. Empowering student self-regulated learning and science education through chatgpt: A pioneering pilot study. British Journal of Educational Technology; 2024; 55,
Nkomo, LM; Daniel, BK; Butson, RJ. Synthesis of student engagement with digital technologies: A systematic review of the literature. International Journal of Educational Technology in Higher Education; 2021; 18,
Nwana, HS. Intelligent tutoring systems: An overview. The Artificial Intelligence Review; 1990; 4,
Pallant, J. L., Blijlevens, J., Campbell, A., & Jopp, R. (2025). Mastering knowledge: The impact of generative AI on student learning outcomes. Studies in Higher Education. https://doi.org/10.1080/03075079.2025.2487570
Pang, M; Ho, TM; Man, R. Learning approaches and outcome-based teaching and learning: A case study in Hong Kong, China. Journal of Teaching in International Business; 2009; 20,
Park, H., & Ahn, D. (2024). The promise and peril of ChatGPT in higher education: Opportunities, challenges, and design implications. In Proceedings of the 2024 CHI conference on human factors in computing systems. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3613904.3642785
Qian, Y. Pedagogical applications of generative AI in higher education: A systematic review of the field. TechTrends; 2025; [DOI: https://dx.doi.org/10.1007/s11528-025-01100-1]
R Core Team (2022). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R project.org/
Ramadhan, A; Warnars, HLHS; Razak, FHA. Combining intelligent tutoring systems and gamification: A systematic literature review. Education and Information Technologies; 2024; 29,
Reddy, MR; Walter, NG; Sevryugina, YV. Implementation and evaluation of a chatgpt-assisted special topics writing assignment in biochemistry. Journal of Chemical Education; 2024; 101, pp. 2740-2748. [DOI: https://dx.doi.org/10.1021/acs.jchemed.4c00226]
Sun, JC-Y; Rueda, R. Situational interest, computer self-efficacy and self-regulation: Their impact on student engagement in distance education. British Journal of Educational Technology; 2012; 43,
Taub, M; Mudrick, NV; Azevedo, R; Millar, GC; Rowe, J; Lester, J. Using multi-channel data with multi-level modeling to assess in-game performance during gameplay with crystal island. Computers in Human Behavior; 2017; 76, pp. 641-655. [DOI: https://dx.doi.org/10.1016/j.chb.2017.01.038]
Wang, H; Dang, A; Wu, Z; Mac, S. Generative ai in higher education: Seeing chatgpt through universities policies, resources, and guidelines. Computers and Education: Artificial Intelligence; 2024; 7, [DOI: https://dx.doi.org/10.1016/j.caeai.2024.100326] 100326.
Yan, L; Greiff, S; Teuber, Z; Gasevic, D. Promises and challenges of generative artificial intelligence for human learning. Nature Human Behaviour; 2024; 8,
Zhang, Z; Xu, L. Student engagement with automated feed1126 back on academic writing: A study on Uyghur ethnic minority students in China. Journal of Multilingual and Multicultural Development; 2024; 45,
Avcý, Ü; Ergün, E. Online students’ LMS activities and their effect on engagement, information literacy and academic performance. Interactive Learning Environments; 2022; 30,
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.