Content area
Purpose
This paper aims to critically review the intersection of searching and learning among children in the context of voice-based conversational agents (VCAs). This study presents the opportunities and challenges around reconfiguring current VCAs for children to facilitate human learning, generate diverse data to empower VCAs, and assess children’s learning from voice search interactions.
Design/methodology/approach
The scope of this paper includes children’s use of VCAs for learning purposes with an emphasis on conceptualizing their VCA use from search as learning perspectives. This study selects representative works from three areas of literature: children’s perceptions of digital devices, children’s learning and searching, and children’s search as learning. This study also includes conceptual papers and empirical studies focusing on children from 3 to 11 because this age spectrum covers a vital transitional phase in children’s ability to understand and use VCAs.
Findings
This study proposes the concept of child-centered voice search systems and provides design recommendations for imbuing contextual information, providing communication breakdown repair strategies, scaffolding information interactions, integrating emotional intelligence, and providing explicit feedback. This study presents future research directions for longitudinal and observational studies with more culturally diverse child participants.
Originality/value
This paper makes important contributions to the field of information and learning sciences and children’s searching as learning by proposing a new perspective where current VCAs are reconfigured as conversational voice search systems to enhance children’s learning.
1. Introduction
Young children are naturally curious about their surroundings and like to ask questions. Child development research has found that children seek factual information when encountering knowledge gaps in their understanding of the world (Chouinard et al., 2007). Besides asking their parents at home and teachers at school, children also reach out to search engines for answers. However, previous work found that children encountered challenges with typing, spelling, selecting proper keywords, and interpreting search results when searching on the Web (Druin et al., 2009).
Rapid advances in automatic speech recognition (ASR) and natural language processing (NLP) now enable children to seek information from voice-based conversational agents (VCAs) at a younger age by removing literacy requirements such as reading and spelling. Children can use voice commands and perform voice searches by pressing a button or using a “wake word” (e.g., Hey Siri, OK Google) to activate these devices. When adults are unavailable or lack the relevant information to answer children’s questions, VCAs empower children to seek information independently, fostering their critical thinking and self-directed learning. By providing immediate responses to children’s questions, VCAs support the continuous learning flow, encouraging curiosity and the exploration of new concepts in a conversational manner that can supplement traditional adult guidance.
Conversational voice search systems refer to systems that use speech recognition and synthesis to engage in dialogue with users. The primary components of VCAs include using ASR and NLP to recognize human speech, understand the natural language, monitor the dialogue status, retrieve dialogue history, and choose appropriate dialogue actions. Then, the natural language generation component generates the natural language output and converts it into speech using text-to-speech synthesis. From our definition of VCAs, we exclude social robots, which rely on nonverbal cues (e.g., facial expressions, physical appearances, and posture) for interaction.
A Common Sense Media report (Rideout and Robb, 2020) showed smart speaker ownership among children under 8 rose from 9% in 2017 to 41% in 2020. In addition, a Pew Research Center survey (Auxier et al., 2020) showed that 36% of parents reported their children under 12 used or interacted with a smart speaker. These reports indicate that this youth generation is growing up with advanced technology, and artificial intelligence (AI)-based tools and systems have been part of their daily lives.
This paper focuses on 3- to 11-year-old children. This age spectrum ranges from children who ask explanatory questions (aged from 3 to 4 years old, Chouinard et al., 2007), have started to develop literacy (aged from 5 to 6 years old, Teale and Sulzby, 1986), as well as children who have become more fluent readers. Around age three and a half, approximately one-third to one-half of children asked questions that required explanations (Chouinard et al., 2007). Around age 4, children begin to consider others’ feelings and thoughts, marking the emergence of the theory of mind (Wellman et al., 2003). By age 5, children have sufficiently developed the theory of mind (Wellman et al., 2003). Literacy begins to take shape in most children around age 6, with fluency increasing as they grow older (Teale and Sulzby, 1986). By age 7, many children are already acquainted with the internet and can use search tools (Druin et al., 2010; Duarte Torres et al., 2014). They also start interacting with VCAs and posing various questions (Lovato and Piper, 2015). Approaching the age of 10, children begin to grasp the more complex functions and conceptual aspects of VCAs (Oranç and Ruggeri, 2021). Therefore, this age range also covers a vital transitional phase in children’s capability to comprehend and engage with VCAs.
While children are increasingly becoming important users of VCAs, existing literature on this topic is scattered across different research fields. Lovato and Piper (2019) discussed future research opportunities of VCAs for children by examining human–computer interaction (HCI) research on how children seek information and perceive digital devices. Garg et al. (2022) analyzed trends and methodologies in HCI research on children and VCAs over the past decade. Murray (2021) investigated how trust in VCAs might be influenced by child-specific and technological factors from the perspectives of education and communication. Tong et al. (2022) reviewed existing research on children’s perceptions and interactions toward VCAs, discussing its impact on children’s learning and development through the lens of developmental psychology and education. Despite these discussions, more work has yet to investigate VCAs and children from the Search as Learning (SAL) perspective, signaling a need for more comprehensive studies on reconfiguring current VCAs into conversational voice search systems to enhance children’s learning.
We begin by reviewing studies on how children perceive digital devices, specifically their perceptions of VCAs, as these perceptions shape their expectations of technology as an information source. Then, we review studies on children’s learning and searching, incorporating children’s meaningful learning and sensemaking theories, question-asking behaviors, and information-seeking and searching behaviors. Next, we introduce the SAL framework and review relevant studies on children’s interactions with VCAs from this perspective.
We used multiple literature search methods to find conceptual papers and empirical studies. We used online databases, including the Web of Science, IEEE Xplore, PsycINFO, ACM Digital Library, PubMed, and Google Scholar. We also browsed the references of papers we cited to further identify relevant research and ensure that we covered related works across multiple academic disciplines, including education, communication, information science, computer science, HCI, medicine, and psychology.
This paper aims to present new perspectives and insights focusing on the intersection of conversational voice search and children’s learning. We contribute to the field of information science, learning science, and children’s search as learning. The perspectives presented in this paper hold significance by introducing conversational voice search systems as a potential robust learning technology. VCAs can support children’s learning by offering access to abundant online information and helping children learn through interactive engagements with information.
2. Children’s perceptions of digital devices
Children’s perceptions of traditional digital devices
To better understand whether children conceptualize VCAs as potential sources of information more than entertainment tools, we review how children perceive traditional digital devices. Prior research has demonstrated that children perceive computers as holding enormous amounts of information (Murray, 2021). A literature review (Rücker and Pinkwart, 2016) identified children’s perceptions of computers in the following categories: intelligent machines, mechanical devices, omniscient databases, programmable machines, and wire networks. Rücker and Pinkwart concluded that children perceived a single internet-connected device as synonymous with the internet itself, believing it was an omniscient database with limitless information storage and retrieval. However, some studies on young children’s perceptions of digital devices reveal a different picture. McKenney and Voogt (2010) investigated how 4- to 7-year-old children perceive their computer use in the Netherlands. These authors found that most young children used computers to watch videos or play games, and searching the internet was far less common. Dodge et al. (2011) conducted semi-structured interviews with kindergarten to second-grade children in the USA from two suburban schools with varying SES statuses to inquire about the internet and to investigate their skills on a laptop computer. These researchers found that most students did not recognize the internet as an information source; instead, they primarily used it for games and rarely used it to obtain information independently.
Prior work broadly investigated children’s trust of technological devices for inquiries (e.g., Web search engines, computers, and mobile devices). Danovitch and Alzahabi (2013) conducted experiments to examine 3- to 5-year-old children’s trust in the information delivered by technological devices. The findings from their experiments indicate a propensity among children to trust computers in a manner analogous to their trust in people. On the other hand, recent findings from a study in China suggest that children aged 5–6 are skeptical about the information retrieved from the Web (Wang et al., 2019). Furthermore, another study by Yarosh et al. (2018) indicates that children interact with novel technological devices critically. They observed that children assessed the voice-based interfaces’ intelligence by posing questions to which they already knew the answer.
Children’s perception of voice-based conversational agents
Contrary to traditional digital devices, VCAs can facilitate a mode of interaction for children with the system that parallels authentic human communication to some degree. The emergence of VCAs could potentially obscure the distinction between machines and intelligent beings for younger children, which may influence how they approach such devices for information-seeking.
Children often perceive VCAs as intelligent, reliable, and capable of human-like verbal interactions, treating them similarly to animated characters (Aeschlimann et al., 2020; Girouard-Hallam et al., 2021; Hoffman et al., 2021; Xu and Warschauer, 2020). This perspective leads them to inquire about the VCA’s identities and personalities and even engage in humorous conversations as though they were speaking with another person (e.g., “Is it ok if I eat you?”, Druga et al., 2017, p. 598). Despite these interactions, children showed nonverbal behaviors such as frowning, nodding, smiling, and shrugging – behaviors that VCAs cannot recognize (Xu and Warschauer, 2020). Garg and Sengupta (2020a, 2020b) observed that young children aged 5–7 tend to form emotional bonds with VCAs, viewing them as “a helpful assistant” and “a valuable family member” (p. 11). While older children also exhibit these behaviors, Garg and Sengupta (2020a, 2020b) noted that they more frequently challenge the devices’ intelligence to highlight its nonhuman nature. Festerling and Siraj (2020) analyzed children’s interpretations of VCAs’ interactive capabilities, revealing perceptions varying between genuinely humanoid and nonhumanoid. When VCAs could not respond to inquiries or provide inaccurate answers, children were inclined toward the belief that an actual human was operating the device. Conversely, instantaneous, standardized responses lacking common sense led children to perceive VCAs as mere machines. This dichotomy has led several researchers to hypothesize that children tend to anthropomorphize VCAs.
Some studies have also explored children’s trust of VCAs for information. Girouard-Hallam and Danovitch (2022) pointed out that the way children perceive their interactions with VCAs could affect their grasp of cybersafety: “Believing that a voice assistant could keep a secret, and that it is, at least in part, a moral entity, may contribute to young children oversharing information with internet-based devices” (p. 12). According to Murray (2021), children tend to place their trust in VCAs, attributing to them qualities of knowledge and social interaction. Conversely, Wojcik et al. (2022) found that children aged 5–6 displayed no particular preference for trusting a VCA over a human, leading them to deduce that such children might not inherently trust smart devices like Amazon’s Echo.
Strathmann et al. (2020) identified a significant decrease in children’s tendency to anthropomorphize initially, which contradicts Garg and Sengupta’s (2020a, 2020b) statement that personification by young children persists over time. Strathmann et al. (2020) also found that age and duration of exposure to VCAs affect children’s tendency to anthropomorphize VCAs. Researchers have also observed instances of maltreatment toward these devices, with reports noting children uttering threats and profanity at them (Lovato and Piper, 2015). In this case, children perceive the device as a servant and tend to assume a commanding and scolding tone toward the device, which is unsuitable for interpersonal communication (Bylieva et al., 2021). For example, Garg and Sengupta found that when the devices fail to understand or answer older children’s questions, they would say, “You are a really stupid machine” or “You are just a dumb device” (p. 11).
Young children could have unprecedented opportunities to access rich online information through VCAs. According to Rücker and Pinkwart (2016), children cannot comprehend the limitations of their knowledge of the internet and their devices. Considering the instances in which children pose questions to VCAs that extend beyond internet knowledge (e.g., asking where their mom is, Lovato and Piper, 2015), these scholars speculated that children would conceptualize VCAs as omniscient entities. When children tested the intelligence of VCAs by posing questions, the frequency of seeking factual information increased with age. In contrast, the proportion of questions related to information beyond the scope of online data decreased (Lovato et al., 2019).
3. Children’s learning and searching
Children’s learning
We have identified learning theories that hold particular relevance to our paper: assimilation theory, generative learning theory, and sense-making theory. According to the assimilation theory, posited by Ausubel et al. (1978), learning transpires when individuals relate new information to existing cognitive structures, thereby assimilating it into their knowledge base. This process of assimilation is particularly relevant in the context of children’s use of VCAs. When children interact with these devices, they often incorporate the received information into their preexisting schemas, thereby expanding and reorganizing their cognitive frameworks. This theory underpins the importance of designing voice search interactions conducive to assimilative learning and enhancing children’s ability to comprehend and retain information effectively.
Another relevant theory is the generative learning theory, introduced by Grabowski (2008). This theory highlights the learner’s active role in creating meaning from information by organizing and integrating it with what they already know. This perspective is particularly pertinent when considering children’s interactions with VCAs. By their very nature, such interactions encourage children to engage actively with the information presented, ask questions, and draw connections to their existing knowledge base, embodying the essence of generative learning.
The close relation between meaningful learning and sensemaking lies in their mutual emphasis on the learner’s active engagement with new information. Sensemaking provides the initial framework for this engagement, prompting learners to question, explore, and reflect upon new information to make it understandable and relevant. Zhang and Soergel (2020) provide a framework that outlines the various stages and cognitive mechanisms that individuals use to identify gaps in their knowledge, seek information to fill these gaps, evaluate the relevance and reliability of the information, and then assimilate this new information in a way that makes sense within their personal or professional contexts. This model’s application to children’s interactions with VCAs offers a comprehensive view, including how children seek clarification, interpret responses from VCAs, and use information to construct knowledge.
Children’s question-asking in learning
Prior work in developmental psychology on children’s question-asking behavior shows that questions are essential to their cognitive development (Chouinard et al., 2007; Mills et al., 2011). Children constantly strive to make sense of their experiences by trying to impose meanings to the experiences based on what they already know (Piaget, 1964). According to Vygotsky (1980), learning occurs when children internalize the information imparted during social interactions. In addition, more knowledgeable others help the child learn by scaffolding their understanding of concepts and processes within their zone of proximal development. Rogoff (1990) presented that children, as cognitive apprentices, interact with experts in their communities by observing and learning from them. Although more knowledgeable others and experts are generally teachers and parents, they can also be children’s social peers and other sources of information, including the Internet.
Compared to the information provided at the discretion of adults, children learn better when they are open to asking questions in a familiar environment with constant access to information. Through their seminal work, Hickling and Wellman (2001) examined children’s questions in the CHILDES database, revealing that most of the questions sought factual information, and children tended to ask explanatory questions as they grew up. The frequency of questions posed by children surged significantly throughout early childhood, as documented by Callanan and Oakes (1992). In contrast, a notable decrease in the number of children’s questions was observed by Tizard et al. (1983) upon children’s commencement of formal schooling: children who constantly engaged with their mothers at home and asked challenging questions seemed less active in school, with far fewer questions and interactions with the teachers. The authors speculated that the classroom environment, characterized by its unique decorative style, assortment of books, and available toys, might have tempered children’s inquisitiveness as it differed from their familiar home environment. Tizard and her colleagues suggest that children’s perceptions of available information sources affect their information-seeking behavior.
Considering the significance of question-asking in children’s cognitive development and increasing access to VCAs at home, this technology can potentially extend children’s learning outside of the classroom by facilitating their question-asking capabilities when adults are not available to answer their questions.
Children’s information-seeking and searching behavior
Research into children’s interactions with digital interfaces for information retrieval commenced in the 1980s when information sources were limited to CD-ROM encyclopedias and digital libraries (e.g., Marchionini, 1989). Even in this nascent stage of digital information retrieval, elementary-aged children displayed a proclivity for utilizing natural language in search fields (Marchionini, 1989). Children have used the internet to seek information for a long time. Bilal (2000, 2001, 2002) conducted a series of studies investigating children’s cognitive, affective, and physical behaviors when they used the Yahooligans! search engine to find information on three different types of search tasks. Bilal (2002) reported that 13% of children used natural language (i.e., using everyday language in complete sentences or questions to form search queries) rather than search keywords in their self-generated queries, concluding that children should receive more training to improve their Web search literacy. Foss and Druin (2014) spent four years conducting a study on children’s search behaviors on the internet, and identified nine search behavior patterns for children 7–15 years old. The authors categorized children as searchers according to their emotional reactions (developing, nonmotivated, distracted searchers), preferences (rule-bound, domain-specific, visual searchers), and proficiency (social, power searchers). They found that as children became more experienced with technology and information seeking, they developed more sophisticated search strategies. Their search role framework offers insights into designing children’s search tools to support children’s learning based on their existing search skills. Barriage (2022) explored 5- to 7-year-old children’s information behavior within the context of their interests. The author found that young children seek information from print and digital objects, other people, and experiences, highlighting the importance of guidance from parents to help children develop effective search strategies, provide financial support, and teach them to read and understand information.
Children often need help evaluating the credibility and reliability of information sources, particularly on the internet, where misinformation is widespread. Children need help obtaining and evaluating information (Large et al., 2008) because they have limited comprehension of the availability of online resources and the search processes involved. Extensive academic research has reported the barriers children encounter in the formulation of search queries and the identification of relevant research results (e.g., Azpiazu et al., 2018; Druin et al., 2010; Duarte Torres et al., 2014; Gossen et al., 2014). For example, Druin et al. (2010) examined how children aged 7, 9, and 11 used Google to search the Web, finding that these young users had difficulties typing, spelling, and reformulating queries. In addition, these researchers noted that children usually looked at the keyboard when typing in search keywords, and many children would not see the auto word completion in the search engine. In this study, parents expressed their wish to use voice input to help with children’s typing and spelling issues. Duarte Torres et al. (2014) investigated children’s (aged 6–12) search behavior and search queries on Yahoo!, and pointed out that younger children spent less time searching, typed in shorter queries, and exhibited a higher propensity to use natural language than older children. Younger users tended to undo autocorrections by the search engine and select the most prominent link displayed in the search results. Contrary to previous findings on children’s natural language use, Kammerer and Bohnacker (2012) compared the search outcomes of using natural language queries to keyword queries using Google by 21 children aged 8–10. Their research findings highlighted the benefits of using natural language queries as an especially advantageous approach for children with limited Web search experience.
The emergence and proliferation of VCAs in households have mitigated literacy-related obstacles (e.g., typing and spelling) in children’s information searches, enabling younger children to navigate the Web more autonomously. Moreover, the limited natural language use barrier may be less of an issue when children use voice-based conversational search systems. VCAs hold the potential to facilitate natural language queries, thereby enabling children to concentrate on formulating and reformulating queries.
4. Children’s search as learning
Search as learning
Over the past decade, there has been a growing trend among researchers to reconceptualize search systems as rich online learning environments in which people learn new knowledge (Hansen and Rieh, 2016). Using the framework of SAL, researchers aim to shift away from the traditional views of search systems as mere tools for retrieving online content. Instead, Hansen and Rieh recognize the significant role of search systems in supporting human learning. Theoretical and empirical studies exploring search as learning have focused on the intertwined nature of search activities and learning experiences, highlighting synergistic relationships between them. Furthermore, these studies emphasize the importance of learning as a primary outcome of searching (Hansen and Rieh, 2016).
From the SAL perspective, Vakkari (2016) points out that simply providing users with high-quality search results is insufficient for search systems to affect people’s knowledge structures. Therefore, he emphasizes the importance of formatting and manipulating search results and sources in a way that supports people in making sense of the search results and incorporating them into their knowledge structures. Rieh et al. (2016) conceptualize the process in which people engage in various search activities aimed at learning. These activities encompass “critically analyzing information, bringing pieces of information together to create something new, evaluating and using information” (p. 22). They proposed a “comprehensive search” framework that depicts search sessions as iterative, reflective, and integrative, facilitating critical and creative learning rather than receptive learning. According to their framework, search activities associated with critical and creative learning modes include comparing, extracting, differentiating, prioritizing, sense-making, assessing credibility, and evaluating usefulness.
Only a few studies have investigated children’s search behavior from the perspectives of SAL. Reynolds (2016) conducted a study examining the collaborative information-seeking behavior of middle school students in the context of game design. The study analyzed observational video footage from six case study teams and all teams’ final games. Her research found that the complexity of tasks is related to the collaborative information-seeking modality for solving problems. Her research findings provided valuable instructional design implications, highlighting the need for a social constructivist education approach to explore collaborative information-seeking and knowledge-building among young people. Kodama et al. (2017) conducted a study in participants’ school libraries during after-school sessions to gain a deeper understanding of how middle school students perceive the functionality of a search engine. Their study with 26 participants indicated that most students perceive Web search engines, such as Google, as people or connections. According to the authors, “positioning a person/people behind the scenes of the computer or device screen mirrors the way people are accustomed to finding information offline - asking teachers, parents, librarians, and others questions directly and getting answers in return” (p. 426). Azpiazu et al. (2017) developed and evaluated the tool YouUnderstand.Me, which was designed to enhance children’s search experience by establishing an intermediate layer between the child and the search engine, thereby fostering a more seamless and effective interaction between them. It was tailored for 5- to 15-year-old children to promote learning through searching. Furthermore, this tool aimed to foster a strong teacher-student relationship, enabling teachers to monitor students’ reading progress and customize learning plans accordingly. A significant finding of their study was the importance of the readability level of retrieved resources when evaluating the success of a search from learning perspectives. Landoni et al. (2021, 2022) explored whether emoji-enriched Search Engine Result Pages (SERP) can help children identify pertinent resources when performing online search tasks (i.e., Ancient Rome) within a classroom context. Their findings provided valuable design implications of various visual cues to help children navigate, highlighting an enhanced search interface that can signal the credibility, reading difficulty, and types of resources contained in a result page.
Overall, the SAL perspectives emphasize the significance of designing search systems that go beyond delivering search results and supporting users in actively engaging with the information, fostering critical thinking, and promoting creative learning processes.
Children’s interactions with voice-based conversational agents
The reviews of research reveal that children ask VCAs questions as an exploratory way to understand the conversational agent embodied within the interface. Children display basic features of interpersonal conversation when interacting with VCAs, such as shared attention and coordinated reciprocity (Cheng et al., 2018; Xu et al., 2021a, 2021b). Garg and Sengupta (2020a, 2020b) found that children’s interactions with voice search systems exhibit a developmental variance in understanding and expectations: Younger children, aged 3–6 years, attempted to anthropomorphize the voice search system, posing questions that ascribe a human-like identity to the system (e.g., “Hey Alexa, how old are you?”) (p. 16); in contrast, older children, aged 7–10 years, probe the functionality of the system by instructing it to execute tasks typically performed by humans, such as opening a door, thereby testing the extent of the system’s capabilities. Garg and Sengupta (2020a, 2020b) reported that children partook in dialogue with VCAs using informal conversation (e.g., “Are you listening to me?”) (p. 10), expressed emotional sentiments (e.g., “I miss you Google”) (p. 10), and adhered to pragmatic communication norms (e.g., “How are you doing today Google?”) (p. 10). More interestingly, children often showed nonverbal behaviors such as shaking heads, smiling, nodding, waving hands, and shrugging (Xu and Warschauer, 2020), which VCAs cannot recognize. This engagement demonstrates their interaction patterns and emotional expressions toward VCAs, showcasing diverse methods of communication beyond task-oriented conversation.
Previous studies show that requesting information is the most common activity (Festerling and Siraj, 2020; Lovato and Piper, 2015; Oranç and Ruggeri, 2021). Children use VCAs as an information source for their questions to seek knowledge (Garg and Sengupta, 2020a, 2020b; Lovato and Piper, 2015). Children’s inquiries to VCAs encompass a diverse array of topics. These include personal questions, requests for factual information pertinent to science, technology, culture, and language – including word spelling and translation – and practical information such as recipes and directions (Lovato et al., 2019). Notably, questions related to science, technology, and culture emerge as the most frequently posed, as corroborated by multiple studies (Beneteau et al., 2020; Hoffman et al., 2021; Lovato et al., 2019). In their 2021 study, Oranç and Ruggeri observed distinct trends in queries directed at VCAs among different age groups. Younger children, who were less familiar with VCAs, predominantly posed questions about themselves and their surroundings. Conversely, older children exhibited adaptive question-asking behavior, altering the subject and nature of their inquiries in response to irrelevant or noninformative answers from VCAs. Lovato et al. (2019) reported empirical evidence showcasing children’s use of VCAs to pose questions driven by curiosity. Based on this observation, the researchers proposed the potential employment of VCAs to augment children’s self-directed learning via voice search. Alaimi et al. (2020) examined how asking questions can be supported by VCAs and designed a Web application with an embedded conversational agent for 10- to 12-year-old children called Curiosity Notebook. The authors demonstrated that such agents could cultivate question-asking skills in children, a skill that propels intellectual inquisitiveness and contributes to the construction of knowledge.
Besides question-asking, children also use VCAs to achieve functional tasks and commands (e.g., playing games, listening to music, sending messages, and making phone calls). Nevertheless, children’s use of extended functions on VCAs is markedly less frequent than their seeking knowledge or information (Lovato and Piper, 2015; Lovato et al., 2019; Oranç and Ruggeri, 2021). This trend may be attributed to the predominant perception of children regarding VCAs, primarily viewing them as sources of information more than entertainment tools.
VCAs cannot respond to children’s queries and commands in a manner that considers their background knowledge, feelings, and cognitive development stage, leading to communication breakdowns. Moreover, VCAs often struggle to accurately interpret children’s speech due to their higher pitches and unpredictable speech patterns (Bhardwaj et al., 2022). This can lead to frustration or, more concerningly, exposure to inappropriate content if the VCA misunderstands the query. Children are markedly underrepresented in training data used for these systems. At this point, children employ various communication repair strategies to make VCAs better understand their queries and commands. Cheng and her colleagues (2018) defined a taxonomy of communication repair strategies after studying preschoolers’ interactions with a voice-driven game character: repetition, speaking loudly, variation, and technical investigation. Du et al. (2021) also found that children would increase their voice level, pause, and rephrase their questions when communication breakdowns occurred. Children sometimes struggle to formulate their queries. Researchers identified the categories of reformulations children use to ask the VCAs (Yuan et al., 2019): off-course, restating or repeating, substituting words, reordering, stating context, expanding pronouns in question, and adding context phrases.
5. Opportunities and challenges in conversational voice search systems for children
By reviewing conceptual and empirical studies from three areas of literature, we contend that VCA has the potential to facilitate children’s learning as it can provide access to rich online information and engage children to learn through interactions with information. However, there are challenges in reconfiguring current VCAs as conversational voice search systems for children’s learning. This section discusses opportunities and challenges for supporting and facilitating human learning, generating diverse data to empower VCAs, and assessing children’s learning from voice search interactions. We will then present the design implications for building child-centered conversational voice search systems.
Supporting and facilitating human learning
Children start a conversation by asking questions that allow them to acquire new knowledge and revise their existing knowledge. Prior research has found that children conceptualize VCAs as potential sources of information for their questions (Garg and Sengupta, 2020a, 2020b; Lovato and Piper, 2015) rather than entertainment tools. Children also frequently indicated that VCAs were intelligent and trustworthy. In addition, VCAs support natural language queries and may remove the spelling and typing barriers, enabling children to concentrate on formulating and reformulating queries.
Empirical findings have shown that VCAs can engage children in playing and learning in educational contexts. Pantoja et al. (2019) indicated that children enjoy incorporating voice user interfaces in high-quality social play in participatory design sessions. VCAs engage children by redirecting their attention and stimulating their creativity. VCAs can also support young children’s joint reading; researchers reported that VCA-guided conversation improves children’s story comprehension compared to adult language partners (Xu et al., 2021). As for English as a Foreign Language learners, Taiwanese researchers Tai and Chen (2020) found that Google Assistant interaction provided adolescents with a less threatening language learning environment, substantially enhancing their willingness and confidence to communicate while reducing their speaking anxiety.
Based on the concept of comprehensive search proposed by Rieh et al. (2016), search systems need to support critical and creative learning through various search behaviors. In this case, conversational voice search systems that support children’s search behaviors, such as differentiating, selecting, specifying, and modifying their initial questions, would enhance children’s search as learning.
However, the current VCAs must be optimized to support human learning for children. VCAs typically provide direct answers to children, potentially fostering an expectation among children for immediate responses to their inquiries. This may hinder children’s curiosity, keeping them from exploring the question more deeply in ways that stretch them intellectually. Moreover, it is difficult for children to understand the complicated and nested information within a VCA response (Lovato et al., 2019). It is more challenging for children with special needs to interact with VCAs (e.g., waking them up) due to their cognitive, sensory, and language impairments (Catania et al., 2021; Spitale et al., 2020).
Generating data to empower VCAs for children
Bailey et al. (2021) emphasized the significance of ethical data sets for conversational agents to facilitate learning through social interactions with children. They suggested using real-world social interaction data from children to train AI models in conversational agents. Having a database that captures what children say to one another is more valuable than relying on adults’ assumptions about children’s thoughts, beliefs, and behaviors. Moreover, continuous recordings of children’s conversations with little or no intervention and interruption from adults can form the basis of the entire data set, emphasizing peer-to-peer guided participation. Druga et al. (2017) observed that children preferred VCAs that imitate their communication styles. VCAs built around children’s real-world interactions may prompt children to trust the information provided by the systems.
Although generating a data set based on children’s real-world social interactions is inviting, it requires intensive time and effort and entails critical ethical concerns. Interpreting the social meaning within children’s conversations requires an analysis of the content, associated emotions, and nonverbal behaviors (Oertel et al., 2013). It is also imperative to determine the appropriate ages for collecting dialogue data to ensure that age-relevant content will be available for training the AI models in VCAs. Young children who are still in the stage of developing their speech patterns exhibit variability in word pronunciation. Studies indicate that VCAs occasionally fail to recognize children’s speech (Beneteau et al., 2019; Cheng et al., 2018). Nonetheless, we could integrate the evolving speech patterns of younger children into the VCA system to train algorithms capable of detecting queries from children who have not yet fully developed their speech patterns. This approach could also benefit VCAs designed to respond to multilingual children with accents.
Diversifying groups of children with various backgrounds and identities is essential to building culturally sensitive data sets based on children’s authentic social interactions and dialogue. In addition, particular attention must be devoted to privacy concerns when deriving data sets from children’s real-life conversations. Both parents and children must understand the type and amount of information they agree to provide, its retention period, and its intended purpose. Although recording children’s conversations in their everyday lives enriches AI training data, it raises ethical issues, especially when sensitive information is disclosed, such as sexual harassment or abuse. This situation demands careful consideration: whether to analyze such data, anonymize it, or flag it for expert review.
In addition, safeguarding children’s privacy and safety is paramount. Laws such as the USA Children’s Online Privacy Protection Act (COPPA) mandate strict guidelines for handling children’s data. However, enforcing these policies in the context of VCAs, which must process and sometimes store voice data to function, presents a complex challenge. McReynolds et al. (2017) interviewed parent-child pairs to understand privacy expectations and concerns regarding parental controls with connected interactive toys. The authors found that children are often unaware that others might hear their conversations with a toy, underscoring the importance of increasing parents’ awareness of privacy and security issues. Manufacturers and developers must navigate these legal landscapes carefully while designing systems that protect children’s data and ensure online safety. In addition, there is an ongoing debate about the balance between providing personalized experiences and the risk of overreach in data collection, highlighting the need for transparent practices and robust parental controls.
Assessing children’s learning from voice search interactions
Analysis of the voice search log files in VCAs can help characterize, explain, and evaluate the nature of children’s learning. Using voice search log files, which contain information about query lengths, counts, durations, linguistic characteristics, vocabulary, and other user behaviors, we can obtain an enhanced understanding of children’s learning goals and progress, which is likely to be essential for implementing voice-based conversational search in the future. VCA provides a rich source of search interaction data, including raw sound files, which can help assess children’s learning. Raw sound files capture children’s real-world social interactions when they talk to VCAs. Children’s effects and emotions can be speculated through their intonation and other paralinguistic cues (e.g., pitch, proximity to the device) to estimate children’s motivation and engagement to learn.
On the other hand, a rich source of voice search interaction data also means a much larger data set to analyze, and the process can be time-intensive. In addition, many other potential factors could affect children’s learning, which leads to the complication of assessing children’s learning outcomes. Child-related factors include the age and gender of the child as well as the neurotype of the child’s brain (e.g., neurotypical or neurodivergent). For example, Spitale et al.’s (2020) study has shown that the embodiment of VCAs influenced the linguistic performance of children with language impairments while exhibiting no discernible effect on neurotypical children. VCA-related factors include the level of embodiment of the VCAs (e.g., embodied and disembodied), the gender of the voice, the level of personification and customization, the target user group of the device (specifically for children with special needs or for general users), and the physical appearances of the VCAs (e.g., material and shape of the device).
Designing child-centered conversational voice search systems
We propose the following recommendations for designing child-centered conversational voice search systems to facilitate learning.
Our intelligent conversational voice search system needs to be imbued with as much background and contextual information about children as possible to facilitate successful communication. People have a theory of mind (Baron-Cohen, 1995) that helps them decipher others’ mental states (i.e., needs, knowledge, and ability to understand) in communication. Unlike adults, young children are still developing their theory of mind skills. Before age 5, children lack a complete understanding of the information the search system can or cannot provide and how much context to provide when interacting with the system (Lovato and Piper, 2019). Currently, VCAs have not sufficiently developed to understand children’s needs, knowledge, and abilities to understand the system at different cognitive development stages. Designers can offer initial instructions on using the device and briefly explain what the device can and cannot answer in simple words.
Enhancing speech recognition capabilities specific to children’s vocal characteristics is essential. Children’s speech frequently differs significantly from adults’, exhibiting variations in pitch, pronunciation, and speed. Therefore, VCAs should be specifically designed or adapted to recognize and understand these nuances more effectively. This process involves training voice recognition models on diverse data sets that include a wide range of children’s voices from different ages and linguistic backgrounds. Moreover, the current focus on English-centric VCAs significantly limits their potential impact, especially in regions where they could serve as pivotal educational tools. Acknowledging this, our paper advocates for developing and supporting non-English conversational voice search systems. This advocacy is rooted in the understanding that broadening the linguistic range of VCAs can democratize access to technology, enabling a broader spectrum of global learners to benefit from interactive voice-driven educational experiences.
Strategies for repairing communication breakdowns should also be integrated as troubleshooting suggestions in the systems, serving as additional support when children need help. While children may initially find the voices and expressions of VCAs appealing, their interest could diminish if the device fails to understand their questions (Druga et al., 2017). Furthermore, research suggests that children assess informants based on their history of accuracy in a domain-specific manner (Danovitch and Alzahabi, 2013). Strategies for fixing communication breakdown include restating or repeating questions, substituting words, reordering phrases, providing context, expanding pronouns in questions, speaking more loudly, and conducting technical investigations. Moreover, we suggest the development of a voice search interface that engages children in conversation by referencing their previous questions, offering alternative topics, seeking clarification, and responding to their inputs. Search results may sometimes overwhelm children, making it difficult for them to review and filter the information.
We also recommend introducing a slow search function that allows children to reflect and learn when faced with challenging questions (Smith and Rieh, 2019). This function, along with guided interactions, can serve as a scaffold, providing information gradually and encouraging children to explore independently. Xu et al. (2021) assessed voice-based apps that effectively supported children’s learning on Amazon Alexa and Google Assistant platforms. They emphasized the need for designers to develop and refine dialogue trees that incorporate feedback specific to the range of children’s responses and use wit to promote in-depth, multiturn dialogues on a single topic. A recent study on Google voice search for children’s learning highlighted its role in reinforcing and repeating the concepts over time to aid children’s learning (Yadav and Chakraborty, 2022). To further support this, we propose several strategies for incorporating reinforcement and scaffolding into children’s search sessions: automatic completion and suggestion of children’s queries with alternative topics, introduction of new vocabulary for children’s initial queries, repetition of core concepts related to children’s queries, presentation of carefully selected pertinent subtopics on SERP (Câmara et al., 2021), and immediate feedback regarding the user’s satisfaction with the agent’s response (e.g., Do you like the answer? Did you get what you wanted?).
Integrating emotional intelligence into child-centered VCAs is crucial to developing empathetic and supportive conversational interfaces. Shi et al. (2018) found that VCAs with iconic facial expressions or animated texts engage users more than voice waveforms. Jeong et al. (2019) showed that using filler words to answer open-ended or silly questions and making the agent pretend to “think” would entertain people. System developers can design devices that detect children’s emotions by scanning their facial expressions or remotely monitoring their heart and breathing rates (Garg and Sengupta, 2020a, 2020b). It is beneficial for VCAs to display emotions through animated facial expressions, voice tones, and filler words to make conversations feel more natural and emotionally resonant. This approach helps VCAs respond to a child’s emotional state, offering customized encouragement or humor to boost learning motivation.
Privacy and security are vital in designing child-friendly VCAs. With rising data privacy concerns, especially for children, VCAs must incorporate stringent data protection protocols. This includes minimal data collection practices, secure storage mechanisms, and data use in transparent ways that comply with legal standards like the COPPA. Furthermore, offering robust parental controls enables parents to oversee and manage their children’s interactions with these devices, ensuring a safe and controlled learning environment. Designers can enhance this by allowing parents to set learning objectives, track progress, and limit device usage time (Garg and Sengupta, 2020a, 2020b).
6. Conclusion
This paper contributes to the information and learning sciences fields by proposing a new perspective in which we reconfigure current VCAs as conversational voice search systems to facilitate children’s learning. We argue that VCA has great potential to facilitate children’s learning by providing access to rich online information and engaging children to learn through interactions with information. We also present the opportunities and challenges around effective learning interactions, generating diverse data to empower VCAs, and assessing children’s learning from voice search interactions.
In our paper, the literature review uncovers a substantial body of research on VCAs across various disciplines, including education, communication, information science, computer science, HCI, medicine, and psychology. We observed that researchers refer to VCAs with various terms in their studies (e.g., digital assistant, voice assistant, embodied conversational agent, intelligent/virtual personal assistant). This impedes interdisciplinary collaborations among researchers and makes building upon new knowledge and empirical findings challenging. Advancing research on this topic requires a more explicit definition of the terms used to refer to VCA and effective collaboration across multiple research communities.
We discovered that researchers used various methods to study VCAs for children, including experiments, questionnaires, child-parent dyad interviews, observations, focus groups, audio diaries, in-home audio recordings, voice recordings, device log analyses, and video content analyses. These studies have been conducted in labs, schools, homes, pediatric hospitals, and therapeutic centers. All the studies were conducted with neurotypical children except for two: children with neurodevelopmental disorders (Catania et al., 2021) and children with language impairments (Spitale et al., 2020).
We have also identified some limitations of the research methods employed in these studies. First, several studies were based on participants’ self-reported data (e.g., Catania et al., 2021; Lovato and Piper, 2015; Van Brummelen et al., 2021), potentially subject to recall and reporting biases. Researchers could consider incorporating multiple data collection methods to bolster the validity of self-reported data. In addition, several studies had a relatively small and homogeneous sample (e.g., Beneteau et al., 2019, 2020). Moreover, only a few studies justified the specific child age range (Girouard-Hallam and Danovitch, 2022; Xu and Warschauer, 2020), while others did not mention the age differences. As Agosto (2019) pointed out, the imperative to fortify research on youth information behaviors underscores the necessity for the standardization of age groups among child participants and the diversification of data collection methods. Future research must explain the target age range and the sample’s rationale.
Below are our recommendations for future research directions. Further longitudinal studies are necessary to examine the effect of VCAs on children’s learning with follow-up assessments for long-term retention. Clark (1983) noted that participants often exhibit heightened attention toward new technology in learning contexts, potentially leading to temporary achievement gains. When the novelty effect wanes as participants acclimate to the technology, participants’ learning outcomes may become statistically insignificant. Most existing studies focus on children’s short-term interactions with VCAs, typically within a single lab session or through one-time interviews. This approach may overlook the long-lasting and nuanced effect on children’s learning and cognitive development over time.
Additional observation studies are crucial. Previous research has shown that some child participants lacked prior experience with VCAs (e.g., Lovato et al., 2019). However, some studies have not clearly reported this background information (e.g., Aeschlimann et al., 2020; Cheng et al., 2018). Children’s interactions with VCAs likely represent exploratory behavior toward new digital devices rather than consistent patterns tied to their current cognitive development stage. If the observations were performed in settings where children were unfamiliar with the devices, children’s understanding and perception of VCAs might be underestimated. In addition, we recognize the absence of log file access for noncorporate researchers as a critical obstacle in voice UI research and advocate for increased data sharing to enhance the data validity of observation studies in naturalistic settings.
Finally, research should encompass more culturally diverse participants. Our review indicates a predominant focus on Western perspectives and norms, neglecting the experiences of children from other cultural backgrounds. An exploratory study (Yi et al., 2020) specifically investigated African-American and Latinx children and found similar themes with previous studies in usability, privacy concerns, and digital literacy. Future studies should enhance and expand the understanding of how VCAs can be culturally responsive to children’s questions.
This literature review paper establishes a foundational understanding for researchers exploring the convergence of conversational voice search systems and children’s information behavior. By presenting key concepts and empirical findings, we believe this work will offer valuable insights for researchers and practitioners who are interested in studying and designing future conversational voice search systems to support children’s search as learning. Researchers delving into the intersection of searching and learning can benefit from this comprehensive review, while practitioners seeking to design and develop more effective conversational voice search systems that support children’s learning and development will find valuable design guidance presented in this study.
© Emerald Publishing Limited.
