Content area
Aim
This study evaluated the use of a generative pre-trained transformer (GPT)-based virtual patient in nursing education.
Background
In nursing education, conventional training methods such as interactions with real-life or standardized patients exhibit limitations such as psychological distress, repetitive training and insufficient cost- and time-effectiveness. Because of their capacity to emulate human-like dialogue, GPTs have emerged as a valuable resource for educational nursing activities.
Design
This study employed a mixed-methods design.
Methods
A GPT-based virtual patient with acute appendicitis was included. Twenty-eight new prospective nurses in South Korea, equipped with a head-mounted display, evaluated and communicated with the virtual patient. Usability, perceived virtual learning environment and self-efficacy in communication were measured. The GPT-generated dialogues and open-ended questions were subjected to qualitative analysis.
Results
Among the subfactors of usability, the subdomains of perceived accessibility of functions and perceived virtual learning environments achieved high scores. Furthermore, a notable increase in self-efficacy for communication was observed (t = -2.82, p = .009). The participants' experiences with the GPT-based virtual patient were divided into "educational effects and learner experience" and "technical limitations and the need for improvement." Evaluation of the dialogue between the GPT-based virtual patient and participants revealed that the readability subdomain achieved the highest score, whereas the accuracy subdomain achieved the lowest score.
Conclusions
The findings of the present study provide insights into the advantages of employing GPT-based virtual patients, particularly regarding the perceived accessibility of functions, high scores for immersion and enhanced self-efficacy of communication.
1 Introduction
Effective patient communication, a fundamental skill for patient safety, should be cultivated among healthcare providers. However, there is a paucity of training opportunities, despite the emphasis on communication in the curriculum ( Holderried et al., 2024). Conventional training methods such as interactions with real life or standardized patients induce psychological distress and exhibit limitations in terms of repetitive training ( Choi et al., 2021; Holderried et al., 2024). Previous studies have raised concerns regarding the lack of consistency in the use of standardized patients ( Choi et al., 2021). Furthermore, the cost- and time-effectiveness of this approach have been deemed insufficient ( Shorey et al., 2019). Although peer role-play is a widely used and accessible method, students often face challenges in effectively simulating essential communication skills and emotional expressions, owing to insufficient knowledge, limited acting ability and a sense of embarrassment ( Gelis et al., 2020). Recently, the advent of large language models (LLMs), such as generative pre-trained transformers (GPTs), has enabled the generation of conversational responses aligned with human cognitive processes by comprehending the grammatical principles underlying human communication ( Holderried et al., 2024; Shorey et al., 2019).
History-taking, symptom assessment and nursing interventions can be conducted by conversing with the GPT. This approach enables learners to gain therapeutic communication skills such as listening and empathy ( Ahn and Park, 2023; Holderried et al., 2024; Sauder et al., 2024; Scherr et al., 2023). Previous studies have mainly been conducted using text-based dialogues; however, health assessments are generally conducted verbally ( Stamer et al., 2023). The advancement of GPTs and virtual reality simulation technologies has led to concerns about the substitution of practical training with these technologies ( Sharma and Sharma, 2023). The GPT can be used in the field of nursing education; however, a preliminary evaluation of the validity, reliability and accuracy of its responses is necessary ( Kim et al., 2023). Therefore, programs that integrate speech recognition and virtual reality environments must be developed to provide a more integrated and interactive learning experience for GPTs. Furthermore, GPT-generated dialogues should be analyzed.
Virtual reality environments can be created using computers or head-mounted displays (HMDs). HMDs have been used to enhance healthcare education and patient treatment because of their capacity for immersive engagement ( Rahman et al., 2020). HMDs integrate virtual reality with the physical world, thereby facilitating the natural acquisition of nursing knowledge and the development of confidence and performance among learners ( Kim and Seo, 2024; Kim and Shin, 2024).
This study aimed to ascertain the efficacy and validity of GPT-based virtual patients as an innovative nursing education tool by evaluating learners’ experiences with GPT-based virtual patients and GPT-generated dialogues. To achieve this, a GPT-based virtual patient was developed for health assessment and communication training on an HMD device to enhance realistic and immersive experiences and repetitive learning.
2 Methods
2.1 Study design
A mixed-methods approach was employed in the present study. A one-group pre-post test design was employed in the quantitative study to evaluate the usability and perceived virtual learning environment of the learners and to ascertain its effect on the self-efficacy of communication. The dialogue between the GPT-based virtual patient and learner was evaluated in the context of instructor evaluation. A qualitative study was conducted to evaluate the learners' training experiences.
2.2 Study population and setting
Participants were recruited through convenience sampling from 10 universities across six regions of South Korea. The sample included newly graduated nurses with less than 1 year of experience and prospective nurses who had completed their bachelor's program. These individuals are either entering or preparing to enter clinical practice, making the acquisition of non-technical skills, such as communication, which is critical ( Heo and Kim, 2017). Furthermore, learners at this stage tend to be more receptive to new educational tools and their training experiences are likely to have a direct impact on their future clinical performance ( Holderried et al., 2024; Liaw et al., 2023), making them an appropriate sample for the purpose of this study. The recruitment was conducted through public announcements posted on accessible online platforms and institutional bulletin boards. The sample size was estimated to be 27 using G*Power 3.1.9.7, with an effect size of 0.5, significance level of.05 and power of 0.8. Twenty-eight students were recruited to achieve the required sample size.
The mean age of the participants was 23.46 (SD 1.67) years and 23 (82.1 %) of the 28 participants were females. Eighteen respondents (64.3 %) agreed that “AI would positively impact learning.” Eight respondents (28.6 %) “strongly agreed” with this opinion. A mean importance score of 9.39 points on a scale of 10 points was assigned to learners’ communication skills.
2.3 Procedures
A GPT-based virtual patient for health assessment and communication training was developed according to the instructional design model for artificial intelligence (AI) education ( Kim et al., 2022). The principal investigator conducted the analysis and design stages, while the development stage involved collaborative contributions from all research team members. The implementation and evaluation stages were conducted by two co-investigators who interacted directly with the participants.
2.3.1 Development
In the analysis stage, learning needs were identified through a literature review focusing on GPTs, virtual patients and the acquisition of non-technical skills. Several review studies have explored the impact of interactions with virtual patients in realistic clinical environments, especially in the context of healthcare communication training, with growing evidence supporting their effectiveness ( Lee et al., 2020; Peddle, Bearman, and Nestel, 2016; Kelly et al., 2022). Although virtual patients allow users to respond in natural language, most interactions are conducted in written form, which reduces the realism and spontaneity expected from actual conversations ( Lee et al., 2020; Peddle et al., 2016). Therefore, highly immersive technologies must be designed to overcome the ongoing technical challenges, such as integrating effective natural language processing systems and enabling a more natural conversational flow ( Stamer et al., 2023). Therefore, in this study, we examined the network environment and HMD device (MetaQuest 3) required to implement a GPT-based virtual patient system that allows learners to engage in spoken conversations using natural language.
The integration of GPT and virtual reality was selected as a teaching and learning method suitable for achieving the established learning objectives during the design and development stages of the AI class. The learning objectives were as follows: (1) Knowledge: possessing the ability to explain the nursing requirements for patients with abdominal pain; 2) Skill: assess the health issues of patients with abdominal pain; and (3) Attitude: possesses the ability to engage in skillful therapeutic communication with the patient. Detailed information regarding the virtual patient—name, sex, age, height, weight, religion, occupation, primary caregiver, allergies, family history, social history, medical history, vaccinations, medication history and symptoms—was entered because the script must contain detailed information regarding the patient for the GPT-generated dialogue to be accurate ( Holderried et al., 2024). A script (health assessment and nursing care for patients with acute appendicitis presenting with abdominal pain) was developed to facilitate the implementation of a role-playing exercise in a simulated patient. The VIRTI application (Virtual Humans 2.0), a program that facilitates the creation of virtual humans through the use of templates without the necessity of code, was used for this purpose. Patient information and prescriptions were entered into the VIRTI application and a virtual human capable of simulating conversations by processing natural language through the integrated GPT-4o (OpenAI) was configured. Vital sign data were presented in a virtual environment where a virtual human was engaged in communication. When accessed on a computer, the VIRTI facilitates text-based conversations. By contrast, it facilitates a fully immersive virtual human voice-based generative model when accessed on an HMD device. A speech-to-text generation model for HMD devices was used in this study. The entire dialogue record was saved on completion and used for feedback and debriefing.
A pilot study was conducted to ensure the smooth use of media and tools by students and the reproducibility of the GPT-based virtual patients. Two nursing students and a nurse were recruited to participate in this pilot study. The time interval between the responses was extensive or the response was executed at a rapid pace in some instances in the pilot study. Furthermore, there were instances where the GPT played the role of a nurse instead of a patient. Consequently, the virtual-patient information was revised to delineate its role as a patient, and the modification process was repeated while assessing the network environment.
2.3.2 Implementation
A simulation overview and scenario objectives were provided to the participants (learners) via email as a preliminary briefing and preparation activity during the implementation and evaluation stages. An orientation that included instructions on the proper use of the HMD and a review of necessary precautions was also conducted. The learners engaged in individual practice sessions following their orientation to acclimatize to the equipment. Research assistants were made available in the training room to resolve technical difficulties and monitor safety. Because verbal communication with the GPT-based virtual patient was required, each participant conducted the session alone in the training room.
The learners were equipped with an HMD and the VIRTI application was used to engage in uninterrupted communication with the virtual human. Each participant was given a total of 1 h to interact with the virtual patient, during which they could engage in multiple conversations without any restriction on the number of attempts (
An evaluation of learner performance was conducted during the debriefing course and individualized feedback was provided. The learners participated in a meticulous review of dialogue records and performance scores and engaged in introspective reflection. Furthermore, the instructor led a debriefing session employing the Plus-Delta format to assess learners’ performance and identify areas of improvement.
The evaluation was conducted by the learners and the instructor. The usability of the virtual learning environment perceived virtual learning environment and self-efficacy in communication were initially evaluated by the learners. The training experience is reported thereafter. Subsequently, the conversation between the GPT-based virtual patient and the learner was evaluated by the instructor.
2.4 Data collection
Data collection spanned 11 November 2024–2 February 2025. The learners were given a pre-test (general characteristics and self-efficacy of communication) and a post-test (usability, perceived virtual learning environment and self-efficacy of communication) before and after training, respectively. The investigator evaluated the overall responses provided by each participant following training.
2.5 Measures
2.5.1 Usability
Perception of the virtual patient was assessed using a modified version of the bot-usability scale in the present study ( Borsci et al., 2022, 2023). The questionnaire comprised 11 items divided into five subdomains: perceived accessibility of functions, perceived quality of functions, perceived quality of conversation and information provided, perceived privacy and security and response time. These subdomains are scored on a five-point Likert scale. The reliability of the English version of the scale achieved a Cronbach's ɑ value of.89 in the study by Borsci et al. (2023). In the present study, Cronbach’s alpha was.85 in the present study.
2.5.2 Perceived virtual learning environment
Shin et al. (2013) developed a scale to evaluate user experiences with virtual reality environments based on a technology acceptance model. This scale was modified and supplemented for use in this study. The scale comprised 21 items distributed across seven subdomains (presence, immersion, perceived usefulness, perceived ease of use, confirmation, satisfaction and intention to use). The perceived virtual-learning environment was assessed using a seven-point Likert scale. At the time of development, Cronbach’s ɑ value was 0.80–0.90. In the present study, Cronbach’s ɑ value was 0.95.
2.5.3 Self-efficacy of communication
This study used the self-efficacy scale developed by Ayres (2005), translated by Park and Kwon (2012) and modified and supplemented by Cho (2014). This scale comprises 10 items scored on a seven-point Likert scale. The total score ranged from 10 to 70 points. A higher score indicated a higher level of self-efficacy in communication. A Cronbach’s ɑ value of 0.94 was achieved at the time of development in the study conducted by Ayres (2005). The Cronbach’s ɑ values in the studies conducted by Park and Kweon (2012) and Cho (2014) were 0.95 and 0.92, respectively. The Cronbach’s ⍺ value was 0.88 in the present study.
2.5.4 Open-ended question
Data regarding the responses pertaining to the experience, advantages, disadvantages, technical issues and additional comments or wishes regarding the use of GPT-based virtual patients for the health assessment and communication training of learners were collected.
2.5.5 Evaluation of the dialogue between the GPT-based virtual patient and learner
The AI Chatbot Assessment Rubric employed in the study conducted by Neo et al. (2024) was modified and supplemented for use in the present study to evaluate conversations between a GPT-based virtual patient and 28 learners. It is a rubric based on a three-point Likert scale of unsatisfactory, borderline and satisfactory for the four items of accuracy, safety, relevance and readability. The aspects evaluated by the investigators are noted in other comments. Disagreements were resolved by consensus through discussion.
2.6 Data analysis
The questionnaires were analyzed using SPSS/WIN 29.0, to analyze quantitative data. The general characteristics of the participants and the dependent variables are presented as real numbers, percentages and means ± standard deviations. The Shapiro–Wilk test was used to assess the normality of self-efficacy for the communication variable, and a paired t-test was used to analyze the differences before and after the intervention. In addition, the errors, percentages, means and standard deviations of the conversation evaluation between the GPT-based virtual patient and learner were analyzed.
The content analysis process proposed by Elo and Kyngas (2008) was used to analyze the qualitative data derived from the questionnaire: (1) Descriptive data were repeatedly read aloud during the preparation step to identify and codify meaningful words. (2) General theme formation was performed with the analysis unit during the organizing step. (3) The results of the derived subcategories and categories were reported in the reporting step.
2.7 Ethical consideration
The study protocol was approved by the university’s Institutional Review Board (IRB No. 240826–7A) prior to data collection. The participants demonstrated a comprehensive understanding of the study objectives and procedures as evidenced by their written informed consent to participate voluntarily.
3 Results
3.1 Usability scale
The mean scores for the subdomains of the usability scale, measured on a five-point scale , were as follows: perceived accessibility of functions (mean = 4.44, SD 0.50), quality of functions (mean = 4.10, SD 0.66), quality of conversation and information provided (mean = 4.15, SD 0.64), privacy and security (mean = 4.07, SD 0.86) and response time (mean = 3.18, SD 1.09).
Thus, the score for the perceived accessibility of functions subdomain was the highest, whereas that for the time responses subdomain was the lowest. Among the items, the score for “It was easy to find the virtual patient” was the highest, whereas the score for “My waiting time for a response from the virtual patient was short” was the lowest (
3.2 Perceived virtual learning environment using the technology acceptance model
The mean scores for the sub-domains of the perceived virtual learning environment , measured on a seven-point scale, were as follows: presence (mean = 5.18, SD 0.97), immersion (mean = 6.06, SD 0.76), perceived usefulness (mean = 5.93, SD 0.80), perceived ease of use (mean = 5.68, SD 0.97), confirmation (mean = 5.93, SD 1.00), satisfaction (mean = 5.75, SD 0.99) and intention to use (mean = 5.76, SD 1.07).
Thus, the score for the immersion subdomain was the highest, whereas that for the presence subdomain was the lowest. Among the items, the scores for "I felt as if I was in the classroom with a patient during 3D learning" and “My experience with using 3-dimension virtual learning environments (3DVLEs) was better than what I had expected” were the highest, whereas the score for "There is a sense of human warmth in 3DVLEs" was the lowest (
3.3 Self-efficacy of communication before and after training with the GPT-based virtual patient
The mean scores for self-efficacy in communication increased from 61.57 (SD 4.58) before training to 64.32 (SD 5.54) after training, demonstrating a significant improvement (t = -2.82, p = .009).
3.4 Training experience with the GPT-based virtual patient
The experience of using GPT-based virtual patients was divided into nine and two subcategories. The two categories were “educational effects and learners’ experiences” and “technical limitations and improvements needed” (
3.5 Evaluation of the dialogue between the GPT-based virtual patient and learner
The mean score for accuracy was 2.46 ± 0.51 on a three-point scale, with 46.4 % and 53.6 % of the respondents rating it as “satisfactory” and “borderline,” respectively. Other comments included “it is unable to recognize unclear pronunciation,” “it is unable to recognize fast or long questions,” and “there is a stutter immediately before answering a question.” The mean score for safety was 2.61 ± 0.50 on the three-point scale. Other comments included “it does not answer a question” and “it answers a question that was not asked.” The mean score for relevance was 2.86 ± 0.36 points. Other comments included “it provides excessive information.” The mean score for readability was 2.96 ± 1.89 points for readability, with 96.4 % and 3.6 % of the respondents rating it as “satisfactory” and “borderline,” respectively. The mean score was highest for the readability subdomain and lowest for the accuracy subdomain (
4 Discussion
The learner evaluation results of the present study revealed that the scores for the perceived accessibility of functions were the highest among the subdomains of usability of the GPT-based virtual patient for health assessment and communication training. Notably, the scores for the subdomains of perceived quality of functions, perceived quality of conversation and information provided and perceived privacy and security also exceeded 4 on a five-point scale. Qualitative content analysis revealed that learners also reported improvements in the "accessibility" and "perceived usefulness of the technology." GPT has numerous notable advantages, such as accessibility, availability and user-friendliness ( Naamati-Schneider, 2024; Scherr et al., 2023). Medical students have also reported that the GPT-based chatbot is useful, appropriate and relevant as a simulated patient for health assessments ( Holderried et al., 2024). This study confirmed the overall positive user experience of the proposed GPT-based virtual patient. Furthermore, the proposed GPT-based virtual patient was judged to be at a level that could be used in nursing education.
Notwithstanding these encouraging results, the score for the time response subdomain was the lowest among the subdomains of the usability scale of the GPT-based virtual patient. A qualitative content analysis of the present study revealed that learners reported a "lack of technical performance." Despite its ease of use, the GPT has demonstrated issues caused by high traffic and intermittent outages ( Naamati-Schneider, 2024). There is growing interest in improving the speed and accuracy of AI systems that provide responses to given tasks in a healthcare environment ( Kim et al., 2024). The modification and supplementation of the time response in the learner's interaction with the GPT-based virtual patient and its development into a program may be necessary in the future.
The score for immersion was the highest among the subdomains of the perceived virtual learning environment. Qualitative content analysis revealed that learners reported "a sense of realism and immersion." The use of GPT and virtual reality simulations facilitates an immersive and personalized learning experience ( Sharma and Sharma, 2023). However, a computer-based virtual reality simulation of expert-to-expert AI communication training followed by an evaluation using the same tool revealed that immersion was the lowest ( Liaw et al., 2023). This discrepancy may be attributed to the HMD, which facilitated a highly immersive virtual experience. The GPT-based virtual patient employed in the present study demonstrated the potential for use as an alternative method to enhance communication skills in a safe and immersive virtual nursing environment under constrained clinical practice conditions.
The mean score for the presence subdomain was the lowest among the subdomains of perceived virtual learning environment. Qualitative content analysis revealed that the learners reported "limited interaction." This finding is consistent with previous studies, which indicated that GPTs exhibit limited sensory perception and nonverbal cues ( Safranek et al., 2023), restricted interpersonal interactions ( Liu et al., 2023), unnatural conversations and voice expressiveness and an absence of emotional sentiment ( Liaw et al., 2023; Neo et al., 2024). Thus, despite its great usability, the functionality of GPT-based virtual patients is constrained in terms of verbal communication. Consequently, a balanced approach to practical and interpersonal aspects that integrates GPT-based language communication and technology to facilitate real-world interactions in extended reality should be adopted.
We observed a significant increase in communication self-efficacy. Qualitative content analysis revealed that the learners reported "effective practice of communication." The implementation of a virtual counseling application using AI enhances perceived self-efficacy and confidence in communication skills ( Shorey et al., 2019). A similar effect has been reported for learning self-efficacy within the context of computer simulation-based communication training ( Choi et al., 2021). Virtual patient-based social learning approaches also enhance self-efficacy and communication skills ( Hwang et al., 2022). The findings of this study are consistent with the outcomes of communication training that incorporates AI and virtual reality to enhance communication knowledge and self-efficacy among experts ( Liaw et al., 2023). Given the limited scope of efficacy evaluation in the present study, subsequent programs must be expanded and applied. Furthermore, experimental studies should be conducted to assess the impact of diverse nursing capabilities. A comparison of the effects of GPT-based virtual patients in simulation and clinical practice would be a valuable research direction.
Two categories were derived based on the training experience of the GPT-based virtual patients in the present study. First, the positive experience of new learning opportunities was observed in the “support of learning based on objective data” and the “arousal of interest in new learning methods” in “educational effects and learner experience.” A study involving medical students also revealed enthusiasm and positive comments regarding GPT-based simulations ( Scherr et al., 2023).
The experience also revealed “technical limitations and need for improvement." Notwithstanding these technical limitations, the present study demonstrated a "need for scalability and a variety of scenarios," thereby indicating the necessity to expand the use of GPT-based virtual patients in nursing education in the future. Previous studies also supported the future expansion of AI-assisted virtual reality ( Liaw et al., 2023). The increased integration of AI and GPTs into healthcare education has emphasized the importance of educators’ understanding and consideration of their benefits and limitations ( Liu et al., 2023; Safranek et al., 2023; Sauder et al., 2024). Therefore, instructors should prioritize resolving the technical challenges associated with the use of GPT-based virtual patients. Furthermore, the integration of GPT facilitates the generation of cases, thereby enabling the provision of diverse training aligned with learners' interests and professional training.
Evaluation of the GPT-based virtual patient by the instructor revealed that the responses to the learner were generated using scripted information, were based on the inference of information that was not clear in the script, or were indirect. Among the subdomains of the assessment, readability was good; however, the accuracy could be enhanced. GPTs often overlook the context of conversation or confuse roles during roleplay, leading to interruptions in simulations ( Ahn and Park, 2023; Holderried et al., 2024). Concerns have been raised regarding their accuracy and reliability, indicating the need to optimize the performance of GPTs as an educational resource ( Liu et al., 2023; Scherr et al., 2023). The accuracy of GPTs must be enhanced and validated before they can be incorporated into academic curricula. Furthermore, the capacity to curate information generated by these models must be strengthened. Given the absence of a standardized instrument for the evaluation of the GPT, it is imperative to devise instruments and resources to assess the reliability and validity of GPT-based virtual patients.
5 Limitations
First, this was an early study on the integration of ChatGPT into nursing education that focused on a specific scenario with a small sample size recruited through convenience sampling. This may limit the generalizability of the results. Secondly, this study was based on ChatGPT-4o (OpenAI) and has limitations in evaluating other LLMs. Therefore, caution should be exercised when interpreting these results.
Nevertheless, this study provides insights into the feasibility and educational potential of GPT-based virtual patients by assessing the usability, perceived virtual learning environment and improvements in self-efficacy for communication. Future research should expand the sample size and employ more rigorous research designs, such as randomized controlled trials, to better evaluate the effectiveness of GPT-based virtual patients across usability, learning and clinical performance outcomes.
6 Conclusions
The perceived accessibility of the functions of virtual patients, the perceived virtual learning environment of immersion, self-efficacy for communication improvement and positive experiences were high in the present study, indicating that the use of GPT-based virtual patients could help improve health assessment and communication skills. This study also identified areas for improvement, with GPT-generated dialogue evaluations achieving the highest scores for readability and the lowest scores for accuracy. Various challenges and limitations, including technical improvements, development of different scenarios and enhancement of accuracy, must be addressed to improve the use of GPT-based virtual patients and to achieve their full potential.
CRediT authorship contribution statement
Yuran Lee: Validation, Resources, Methodology, Investigation, Data curation. Jiyeong Won: Validation, Resources, Methodology, Investigation, Data curation. Jiyoung Kim: Writing – original draft, Visualization, Validation, Supervision, Software, Resources, Project administration, Methodology, Investigation, Funding acquisition, Formal analysis, Data curation, Conceptualization.
Funding
This work was supported by the National Research Foundation of Korea (
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Table 1
| Factors (items) | Mean±SD |
| Perceived accessibility of functions (2 items) | 4.44 ± 0.45 |
| 1. The virtual patient function was easily detectable | 4.14 ± 0.76 |
| 2. It was easy to find the virtual patient | 4.75 ± 0.44 |
| Perceived quality of functions (3 items) | 4.10 ± 0.66 |
| 3. Communicating with the virtual patient was clear | 3.68 ± 0.82 |
| 4. The virtual patient was able to keep track of context | 4.14 ± 0.76 |
| 5. The virtual patient’s responses were easy to understand | 4.46 ± 0.74 |
| Perceived quality of conversation and information provided (4 items) | 4.15 ± 0.64 |
| 6. I find that the virtual patient understands what I want and helps me achieve my goal | 4.25 ± 0.65 |
| 7. The virtual patient gives me the appropriate amount of information | 4.32 ± 0.61 |
| 8. The virtual patient only gives me the information I need | 3.89 ± 0.96 |
| 9. I feel like the virtual patient’s responses were accurate | 4.14 ± 0.85 |
| Perceived privacy and security (1 item) | |
| 10. I believe the virtual patient informs me of any possible privacy issues | 4.07 ± 0.86 |
| Time response (1 item) | |
| 11. My waiting time for a response from the virtual patient was short | 3.18 ± 1.09 |
Table 2
| Factors (items) | Mean±SD |
| Presence (3 items) | 5.18 ± 0.97 |
| 1. There is a sense of human contact on 3DVLEs. | 5.39 ± 1.13 |
| 2. There is a sense of sociability on 3DVLEs. | 5.36 ± 1.00 |
| 3. There is a sense of human warmth on 3DVLEs. | 4.79 ± 1.13 |
| Immersion (3 items) | 6.06 ± 0.76 |
| 4. I was unaware of what was happening around me. | 6.06 ± 0.72 |
| 5. I felt detached from the outside world. | 6.04 ± 0.88 |
| 6. I felt as if I was in the classroom with a patient while 3D learning. | 6.07 ± 1.05 |
| Perceived usefulness (3 items) | 5.93 ± 0.80 |
| 7. I think 3DVLEs is useful to me. | 6.00 ± 0.77 |
| 8. It would be convenient for me to have 3DVLEs. | 5.75 ± 1.08 |
| 9. I think 3DVLEs can help me with many things. | 6.04 ± 0.84 |
| Perceived ease of use (3 items) | 5.68 ± 0.97 |
| 10. I find learning via 3DVLs easy. | 5.71 ± 0.98 |
| 11. I find interaction through 3DVLEs clear and understandable. | 5.75 ± 1.00 |
| 12. Overall, 3DVLE learning is easy for me. | 5.57 ± 1.29 |
| Confirmation (3 items) | 5.93 ± 1.02 |
| 13. My experience with using 3DVLEs was better than what I had expected. | 6.07 ± 0.98 |
| 14. The product and service provided by 3DVLE was better than what I had expected. | 5.96 ± 1.00 |
| 15. Overall, most of my expectations from using 3DVLE were confirmed. | 5.75 ± 1.21 |
| Satisfaction (3 items) | 5.75 ± 0.99 |
| 16. I am satisfied with the overall experience of 3DVLEs. | 5.89 ± 1.03 |
| 17. I have no problems/complaints in learning via 3DVLEs. | 5.57 ± 1.29 |
| 18. Overall, I am pleased with 3DVLEs. | 5.79 ± 1.10 |
| Intention to use (3 items) | 5.76 ± 1.07 |
| 19. I think I will use 3DVLEs in the future. | 5.75 ± 1.11 |
| 20. I recommend others to use 3DVLEs. | 5.86 ± 1.01 |
| 21. I intend to continue using 3DVLEs in the future. | 5.68 ± 1.19 |
Table 3
| Category | Sub-category | Quote |
| Educational effects and learner experience | Provision of a sense of realism and immersion | It was immersive and felt like I was talking to and assessing a real patient (P2).
The realism provided by the simulation lab was similar to that of hands-on training (P19). |
| Improved accessibility | It was very accessible as it was not difficult to operate and could be used anywhere (P5). | |
| Support of learning based on objective data | It aided in learning relevant knowledge as it enabled the assessment of patients through conversation and by looking at objective indicators such as vital signs (P5).
The disease could be predicted based on the signs and symptoms that the patient reported during the health assessment and it seemed to be efficient for studying the disease (P6). | |
| Effective communication practice | I believe that the debriefing after the program can improve the communication skills as it is easy to identify the shortcomings and proceed again (P10).
I was surprised at how well I was able to communicate with the patients and I thought it was a very good program to improve my communication skills with patients. It was nice to be able to make split-second judgments in communication and have it on record, which enabled me to look back at what I said (P13). | |
| Arousal of interest in new learning methods | It was fun to hear the answers right away and I enjoyed the process of pinpointing the correct diagnosis among the various diagnoses (P4).
The learning method of applying artificial intelligence and virtual reality was new and I enjoyed it (P15). | |
| Perceived usefulness of technology | It was easy to understand the context of the communication and communicate only the necessary information to provide efficient care (P11).
I think it could be used as a low-cost, high-efficiency training tool for new nurses (P19). It provided accurate responses to questions (P24). | |
| Technical limitations and the need for improvement | Lack of technical performance | There are limitations in recognizing sentences that are too long (P16).
Since the responses took time, it seems that a sufficient accumulation of data and good communication in the environment where the device is used are required (P20). |
| Limited interaction | The lack of action from the patient was disappointing. When asked to raise its clothes, the virtual patient did not raise its clothes and when asked to turn over, it only said to lie down but did not do so (P3).
It would be better if the symptoms were physical or dynamic, similar to real life (P7). It would have been nice to be able to use the controller to palpate, tap, etc. on the patient (P8). I was disappointed that it had an AI voice (P23). | |
| Need for scalability and a variety of scenarios | I thought creating multiple cases would be helpful for nursing students and nurses (P1).
I think practicing with a virtual patient in addition to in-class and simulation exercises would enhance the autonomy and ability to apply what I have learned to various patients (P2). |
Table 4
| Classification | Unsatisfactory | Borderline | Satisfactory | Mean±SD | Other comments |
| Accuracy | Answers from the virtual human did not fit the scenario at all. | Answers were partially inconsistent with the scenario or ambiguous. | Answers accurately fit the scenario. | 2.46 ± 0.51 | It fails to recognize unclear pronunciations (e.g., it failed to recognize the word “meal” based on how it is pronounced).
It fails to recognize fast or lengthy questions (e.g., it failed to recognize the pain severity question). It stuttered just before answering a question. |
| 0 | 15 (53.6 %) | 13 (46.4 %) | |||
| Safety | Answers from the virtual human may confuse users or create anxiety. | Answers were unclear or contained unnecessary information that may confuse the users. | Answers were clear, tailored to the scenario and helped reduce user anxiety or solve the problem. | 2.61 ± 0.50 | It failed to answer questions (e.g., “Have you been to the restroom?”).
It answered something that was not asked. |
| 0 | 11 (39.3 %) | 17 (60.7 %) | |||
| Relevance | Excessive information that is not relevant to the scenario was included. The answer lacked a logical flow. | Some irrelevant information was included, resulting in a slight lack of overall logical flow. | Answers were directly related to the scenario and maintained a good logical flow. | 2.86 ± 0.36 | Excessive information was provided (e.g., it says, “I've been told I have acute appendicitis” or “I'm here because I have acute appendicitis.”). |
| 0 | 4 (14.3 %) | 24 (85.7 %) | |||
| Readability | Answers were too brief or wordy to understand. It did not act kind enough. | Some content was difficult to understand and required further clarification. | Answers were written in a way that is easy to understand and were kind and empathetic. | 2.96 ± 1.89 | It provided good answers with appropriate terminology.
No jargons were used. |
| 0 | 1 (3.6 %) | 27 (96.4 %) |
© 2025 Elsevier Ltd