Content area
Purpose
There is a need for precollege learning designs that empower youth to be epistemic agents in contexts that intersect burgeoning areas of computing, big data and social media. The purpose of this study is to explore how “sandbox” or open-inquiry data science with social media supports learning.
Design/methodology/approach
This paper offers vignettes from an illustrative youth study case that highlights the pedagogical prospects and obstacles tied to designing for open-ended inquiry with computational data science to access or “scrape” Twitter/X. The youth case showcases how social media can be taken up productively and in ways that facilitate epistemological agency, an approach where individuals actively shape understanding and knowledge-creation processes, highlighting the potentially transformative impact this approach might have in empowering learners to engage productively.
Findings
The authors identify three key affordances for learning that emerged from the illustrative case: (1) flexible opportunities for content-specific domain mastery, (2) situated inquiry that embodies next-generation science practices and (3) embedded computational skill development. The authors discuss these findings in relation to contemporary education needs to broaden participation in data science and computing.
Originality/value
To address challenges in current data science education associated with supporting sustained and productive engagement in computing-based data science, the authors leverage a “sandbox” approach – an original pedagogical framework to support open inquiry with precollege groups. The authors demonstrate how “big data” drawn from social media with high school-aged youth supports learning designs and outcomes by emphasizing learner interests and authentic practice.
Introduction
In an era influenced by the convergence of burgeoning areas of data science (e.g. big data), computing (e.g. artificial intelligence [AI]) and social media (e.g. TikTok), the demand for innovative learning designs that foster productive and meaningful engagement with and around these tools is more pressing than ever. This is increasingly essential in precollege education because the ways that learners perceive and understand these fields will have consequential impacts on how they, as future innovators, make decisions about how to participate in fields that produce or leverage these tools and how, as critical citizens, they understand how these technologies influence society. A nascent example is generative pre-trained transformer, which launched in pilot form in November 2022 (Dwivedi et al., 2023a). This platform, often characterized as a generative AI tool, promises to change the technical and social landscape by enhancing how we leverage and rely on technology – by supporting users in creating code, writing prose and making art (Dwivedi et al., 2023b; Liu et al., 2023). While current attention is placed on the “generative” characteristics these AI-based tools afford, often overlooked is their role as data collection tools that gather massive troves of information that can inform how users think about, make and enact complex and critical decisions (Tukur et al., 2023). This oversight is complicated by the fact that media outlets, and particularly highly engaged social media platforms such as Facebook, TikTok and Twitter/X, often shape public attitudes around the value and reliability of such media. This contemporary example underscores the need for learning designs and accompanying pedagogical strategies that can help learners engage productively with these areas that represent the next generation of teaching and learning.
The challenge with achieving this goal is that existing learning paradigms that connect these burgeoning areas remain largely underdeveloped. Despite efforts to spur transformative shifts in the use of technology in education (Collins and Halverson, 2018), insights along these frames tend to be tangential in both context and practice, providing only close approximations of how to support agency in learning and, eventually, public literacy in these areas. Examples include data science activities and literature that do not include “big data” sets (Al-Sai et al., 2020; Chang et al., 2021), or activities where social media platforms are leveraged to encourage classroom discourse (Krutka and Carpenter, 2016) rather than as a tool to promote, for instance, learners’ computational literacies around how social media work and affect society. For instance, initiatives like bootstrap data science (Krishnamurthi et al., 2020) aim to integrate data science into secondary education by enhancing students’ understanding of math and computing. While valuable, this approach was criticized for offering limited connections between educational content and its relevance to learners (Lee et al., 2021). Even less work has been done in areas involving AI, despite international calls to reconcile the field, workforce development and impact (Chen et al., 2020). Concomitant with these discussions are questions about how these technologies may exacerbate matters of equity and inclusion, including concerns about whether there is sufficient demographic and intellectual diversity and representation to responsibly teach future generations of citizens about these technologies and how they manifest in the world. This is a reasonable supposition given persistent issues of inequity in related areas such as computing (Hubbard Cheuoua, 2021; Williamson, 1965).
We address these issues in the design and deployment of a “big data” research implementation that leverages the popular social media app, Twitter/X. Our implementation is called Coding Like a Data Miner, a research project funded by the National Science Foundation (2024) with the goal of empowering students in underrepresented groups (Burke‐Garcia et al., 2020; Millar et al., 2023). By innovating on computer science (CS) curricula, fostering cultural relevance and addressing educational disparities through hands-on engagement with real-world data and the development of computer science skills, learners are empowered to be self-directed agents in the learning process (Navarrete, 2022). The work also addresses a prevailing issue in computer science education among youth ages 13–17: the fragmentation of approaches, wherein big data is often overlooked and data science learning experiences are confined to what has been construed as detached simulations (Rubin, 2022) or replicative simulations of inquires defined by others (e.g. researchers, curriculum developers, educators, etc.). To overcome this challenge, we proposed the implementation of a “sandbox” or open-inquiry concept in data science education (Walker et al., 2023a, 2023b, 2024). The sandbox approach allows learners to explore and construct adaptive data science inquiry projects of both personal, cultural and/or sociopolitical relevance using social media datasets that they uniquely conceptualize, access/download, process, analyze and visualize (Walker et al., 2023b). This is accomplished in a learning progression that is designed to incrementally give learners opportunities to make epistemic and heuristic decisions aligned with inquiry topics they define, for themselves (Walker et al., 2023a, 2024). This approach aligns with the principles of culturally relevant pedagogy, an educational framework that integrates the values of diverse cultural perspectives into the curriculum that encourages students to develop cultural competencies by maintaining connections to their own culture while understanding, valuing and respecting the cultures of others, (Ladson-Billings, 1995; Morales-Chicas et al., 2019) and aims to bridge the gap between theory and practice in data science education by engaging learners in computationally rich and personally meaningful lines of inquiry.
This work will offer vignettes from an illustrative student study case that highlights the pedagogical prospects and obstacles tied to the integration of open-ended inquiry using sandbox data science in the Coding Like a Data Miner curriculum. The implementation led adolescents aged 13–17 in crafting personalized computational data science projects using Twitter/X data during a two-week online workshop in July of 2023. The case study presented herein illustrates how social media can be taken up productively– and as a trove of socially and culturally relevant topics. Building upon the insights gained from the Coding Like a Data Miner curriculum, our exploration leads us to the research question:
In the next section, we review relevant literature on social media, contemporary education and computing.
Literature review
Social media and education
Social media platforms have permeated many aspects of modern life to not only transform how we communicate but also to influence our perspectives and decision-making (Java et al., 2007; Letierce et al., 2010). This is readily observed in platforms such as Facebook, YouTube or TikTok, which not only enable us to maintain regular social interactions across global scales but also whose algorithms take account of our behaviors and interests to inform the content we encounter – often making suggestions to influence our decisions (Burke‐Garcia et al., 2020; Kang and Lou, 2022) or learn (Walker et al., 2019). Recognizing the growing importance these platforms have in society and the engagement they have drawn from youth, there have been steady developments to assess the potential educational affordances these platforms might offer in myriad academic subject areas including science, technology, engineering and mathematics (STEM) teaching and learning (Grosseck and Holotescu, 2008; Hong et al., 2011). There is evidence from the literature that the Twitter/X platform has shown promising results in precollege educational settings (Carpenter and Krutka, 2014). High school students as a group represent a significant majority (42%) of all Twitter/X users in the USA (Tankovska, 2021). Coupled with the accessibility of Twitter/X data using the website’s application programming interface (API), this marks Twitter/X as a potentially rich opportunity for learning and social media use in classrooms.
Research on the application and outcomes of social media use in classrooms has historically been mixed, despite affirmations of benefits to students’ motivation and engagement (Van Den Beemt et al., 2020), with researchers often balancing the potential for collaborative, personalized and identity-centric learning experiences against the perception that social media is distracting and off-topic (Van Den Beemt et al., 2020; Greenhow and Lewin, 2016). To address this, Greenhow and colleagues (2019) suggested a shift in research emphasis toward deeper examinations of learning designs and pedagogical practices – the how and why – of social media use. Examples of research with this emphasis have emerged from work informed by constructionist and situationist perspectives, respectively, highlighting the potential for social media to position users as both consumers and producers of educational content (Walker et al., 2019) and as socioculturally situated spaces for distributed knowledge building and collective negotiation in legitimate and authentic ways. More recently, social media sites such as Facebook and Twitter/X have also been examined for their potential to support key science discourse practices for literacy and socioscientific reasoning (Java et al., 2007). These efforts represent valuable strides toward the broader application of tools in the context of STEM education (Rinaldo et al., 2011).
Despite these innovations, fewer efforts have been made to understand and leverage the affordances of social media for learning in two key areas. First, further research examination is needed around the use of tools like these to support learning in CS contexts; the lack of examples is a surprising phenomenon given that these tools are largely representations of computational algorithms suited for human interaction (Kumar et al., 2014). To use these tools for CS learning, educators and students need access to algorithms and datasets that are often intellectual property, and therefore proprietary or unwieldy for students’ direct use. Second, while extant examples exist of learner engagement with the client-facing side of social media platforms for learning goals, such as a platform for asynchronous learning (Danjou, 2020) or social engagement (Bonilla Quijada et al., 2022), fewer position students as agents in the exploration of the server-facing side of the platforms, where data sets may be examined as a source of rich, diverse, authentic and personally and culturally relevant data for embedded exploration and skill development. Broadly, further research is needed to explore the potential of these areas as authentic real-world contexts for learning computer and data science.
To address these challenges and capitalize on the affordances of social media for education, our curriculum leverages Twitter/X, a social media platform that makes user data accessible through a publicly accessible API and that has already been taken up for use in precollege educational contexts (Walker et al., 2023a, 2023c). As a result, it is possible to collect large amounts of platform data (i.e. Web scrape) for use in computational and analytic purposes. Twitter/X also offers a robust library archive of information across a variety of topics – data sources along diverse lines of interests are readily available to support personalized learning experiences. Learners can draw data from Twitter/X based on discussion topics that are of personal interest or social relevance.
Learning priorities in STEM + C teaching and learning
Over the past decade, there have been significant efforts to modernize Kindergarten through 12th (K-12) grade STEM education to account for advances in the fields as well as to reflect what the research has unveiled about how students learn best. This was motivated, in part, by a push to revitalize education pipelines seeking to support workforce development and public awareness in STEM fields. The result was a national call to action to produce a framework, known as the Next Generation Science Standards (NGSS), which identifies key disciplinary knowledge, field practices and concepts that cut across STEM academic domains (NGSS, 2013). The integration of data science in educational settings, particularly in alignment with NGSS, represents a significant shift in science education toward more integrated (cross-disciplinary) engagement. The NGSS aims to enhance K-12 science teaching, making it more relevant and engaging while preparing students for higher education and careers in science. These standards emphasize not just content knowledge but also the development of skills and practices essential in the scientific field. They encourage educators to design learning experiences that can help develop students’ interests and prepare them for future scientific endeavors.
In many ways, this framework also marked a shift away from education standards that emphasize discrete knowledge and toward perspectives that recognize that learning is more meaningful when situated in authentic acts and when knowledge construction can occur in contexts that are integrated with and across what learners might encounter in everyday life – that is, learning that is an amalgamation of intersecting complex ideas and methods for operationalizing or enacting them (National Research Council, 2013). This progress has resulted in several opportunities, such as the chance to modernize public education processes including how we prepare teachers, allocate resources and design learning experiences. The introduction of interdisciplinary practices has raised justifiable concerns about initiating change in education systems. These systems are often persistently underfunded and vary by state and context. Importantly, there is a lack of clarity on how to productively implement these practices in traditional class settings, which typically exist as disciplinary silos (Pruitt, 2014). Ultimately, these tensions raise questions about what and how to teach STEM topics for the 21st century in institutions that were designed centuries before.
These concerns are further complicated by recent inroads in computing education research, practice and policy – where educators and policymakers alike are increasingly and rightfully interested in mandating its inclusion in K-12 education. These efforts are reflected in a wide range of precollege curricular interventions that seek to teach computational thinking – often characterized as the knowledge, practice and literacy needed to productively engage with CS (Kafai et al., 2020). Prominent examples include the advanced placement CS programs administered by the college board and the exploring CS (ECS) curriculum, which both seek to introduce learners to advanced areas of programming through building websites, games, robots and wearables (Arpaci-Dusseau et al., 2013; Goode et al., 2012). Concomitant with these developments has been a push to develop interventions that harness the so-called “data revolution” (National Science Foundation, 2024) – hallmarked by complex and large data sets often generated using CS tools. This has resulted in strides to incorporate perspectives in data science into CS curricula (Grillenberger and Romeike, 2014; Gould et al., 2015; Krishnamurthi and Fisler, 2020) and to include an emphasis on data literacy – the ability to collect, process and assess data sources (Lee and Wilkerson, 2018). These efforts have contributed to our understanding of CS education priorities – including insights about the educational practices, resources and pedagogies that best support learning outcomes. This progress, however, is persistently undermined by parity issues related to access, participation and learning among underrepresented groups (e.g. women, people of color, the socioeconomically disadvantaged, disabled, etc.) who benefit disproportionately less in these areas overall (Margolis et al., 2017; Santo et al., 2019). These issues are especially prominent in growing and increasingly important CS education interventions that focus on data science, an area where learning is often decontextualized (e.g. second-hand versus first-hand sources) and typically replicative (Hug and McNeill, 2008). The landscape of CS education is rapidly changing, requiring increased efforts to train K-12 teachers to meet growing demands. Yadav and colleagues (2016) shed light on the challenges faced by CS teachers in the USA, emphasizing issues like isolation, limited professional development resources and a lack of adequate CS background. Understanding these challenges is crucial for offering effective support to educators entering the dynamic field of CS. Ultimately, further work is needed that is situated at the intersection of these calls for innovation: research and pedagogy that leverages opportunities afforded by social media as a data source for embedded and personalized inquiry that can overcome computational learning issues and better support culturally relevant pedagogy, expand user practice within and across disciplinary silos and offer structured examples for educators to embed these affordances in their practice.
In response to traditional teacher-centric approaches in programming education, Handur and colleagues (2016) advocate for a transformative method that integrates classroom and laboratory settings with hands-on programming. This student-centric approach fosters active learning, with positive outcomes observed in increased student achievement. The study highlights the benefits and challenges associated with this innovative approach, contributing to the ongoing evolution of programming education. Additionally, their review of the literature brings to light the increasing importance of big data and AI technology in education, as evidenced by Kim et al. (2023) analysis of the 2022 revised curriculum in the Republic of Korea. The study emphasizes the use of programming tools, various AI algorithms and programming practices with data sets, while also recognizing the need for continuous research to enhance learning materials and address real-life relevance in datasets. Mejias and colleagues (2018) propose culturally relevant CS pedagogy as a solution to engage and retain underrepresented students who bring intellectual and experiential diversity quintessential to field innovation. Their success demonstrates the potential of culturally relevant approaches in addressing disparities in CS education. In summary, the literature review brings to light the multifaceted challenges and opportunities in CS education, encompassing teacher experiences, innovative pedagogical practices and culturally relevant approaches.
These developments in STEM and computing education have forced educators and leaders to think carefully about what constitutes meaningful learning in these areas and where these subjects might be deployed: in an integrated fashion consistent with real practice, or in separate courses altogether, which is consistent with existing public-school structures. These questions are tightly connected to the same persistent resources and teacher preparation challenges facing STEM, and the gaps and affordances surfaced in research on social media in education. We examine outcomes that occur when engaging these challenges head-on, using an integrated stance to leverage social media and data science as an interdisciplinary context within which to enact learning that not only cuts across STEM subject areas and beyond, but that does so in fashions that are interdisciplinary and situated in relation to learners and their everyday life.
Methods
Context and participants
The Coding Like a Data Miner project, funded by the National Science Foundation (2024), focused on developing and piloting a sandbox data science-based approach to learning computing on the Twitter/X social-media platform. This curriculum is designed to enable learners to access real-world data of their own from social media, facilitating data analysis that resonates with their personal interests, cultural backgrounds and/or sociopolitical concerns. Situating data science in topics that have cultural relevance is highlighted as a way to equitably teach epistemic practice (Walker et al., 2023a, 2023b, 2024). The research was conducted over a two-week period through an online workshop with youth spanning four hours each day from Monday to Friday – using Zoom and Google Collaboratory. The workshop used a modified version of the 17-module Coding Like a Data Miner pilot curriculum (Walker et al., 2023a, 2023c). Participants engaged with four elements of the data science process (data collection, data preprocessing, data analysis and data visualization) across four phases that supported increasing learner agency in their data science inquiries. The first week of the workshop included introductory materials on data science processes (phase 1), followed by a guided inquiry phase (phase 2), in which facilitators demonstrated each step of the data science process (e.g. a walkthrough of how to use Python code to scrape tweets by hashtag). The following week offered learners opportunities for scaffolded inquiry (phase 3), during which facilitators and curricular materials supported students’ use of Python code to collect, preprocess, analyze and visualize their own social media datasets. Finally, phase 4 involved free inquiry opportunities for learners to explore and tinker with code and social media datasets of their choosing toward the completion of summative project presentations. On the final workshop day, each participant presented a 10–20-min overview of their data mining project.
Eight youth between the ages of 13–17 engaged in the workshop with self-identified ethnic backgrounds that included three of Asian descent, three African American/Black, one Latinx and one Caucasian/White. Four students identified as male and four as female. Participants were geographically spread across the south-central and Northeast regions of the USA. The data analyzed encompassed seven participants, as one was unable to participate in the final presentation due to technical access difficulties. This study provides an illustrative case example of one participant, Mira (pseudonym), a 14-year-old Asian female participant who displayed characteristics that were central to our research objectives. This case was selected as illustrative of the ways that social media can be taken up productively and in ways that facilitate epistemological agency.
Data sources and analysis
Data sources included video recordings of participant final presentations and slides (Google Slides or PowerPoint) and pre/post interviews conducted by the research team, focusing on data mining inquiry phases (e.g. research question formation, data collection, data processing, data analysis and data communication). Interviews, presentations and slides were transcribed using Otter.ai and edited for clarity. The duration of the video recordings varied from 9:13 to 19:18 minutes, pseudonyms were assigned to each participant in both workshop artifacts and the transcriptions were systematically organized in a table for qualitative coding. Analyzing the transcriptions and presentation slides we primarily used deductive approaches (Ravitch and Carl, 2019) whereby we operationalized preexisting constructs drawn from mathematics (National Governors Association Center for Best Practices and Council of Chief State School Officers, 2010), the NGSS crosscutting practices (NRC, 2011) and computational thinking (Wing, 2017). Such constructs included (1) constructing viable arguments and critiquing the reasoning of others (mathematics), (2) understanding scale, proportion and quantity (NGSS) and (3) formulating a problem that a human/computer can effectively solve (computational thinking). Transcripts were excerpted based on the aforementioned computational data mining inquiry phase. Researchers began with three vignettes of a single participant drawn from the dataset that exemplifies how social media can be used for domain mastery (in math), computation and integrated science (i.e. as indicated in NGSS standards). Author one identified specific features within the transcript text that warranted the vignettes, particularly aspects of participant explanations perceived as complex in the context of computational data mining. Author one autonomously applied these vignettes and subsequently engaged in discussions with authors two, three and four to ensure alignment and address any disparities in observations. Instances, where interpretive consensus faced challenges, prompted reconciliatory deliberation among all authors until a unanimous understanding and consensus were achieved. The outcomes of these comprehensive analyses are detailed in the subsequent section. These data were the basis of the case study account presented in this paper.
Findings
In the following section, we introduce Mira (pseudonym), a 14-year-old youth who significantly engaged with the Coding Like a Data Miner online workshop. Mira’s case is notable due to her initial unfamiliarity with data science, which transformed over the course of her involvement in the workshop. Her survey responses and observational notes highlight her active participation and evolving understanding of both quantitative and qualitative data analysis, especially in the context of the AP Human Geography class Mira indicated was a connecting point to her inquiry pursuit in the workshop. Mira’s inquiries during the workshop sessions, such as questioning the qualitative nature of tweets and user data (Mira: “Are tweets and users qualitative?”), exhibit her developing analytical skills and curiosity. Her participation was further enriched through her final project on animal testing (a personal interest), which provided evidence of her coding skills in Python, a widely used programming language known for its simplicity and versatility, and also insights about her knowledge of the subject matter. Through Mira’s participation as a beginner to a more informed participant, her case exemplifies the potential affordances of leveraging social media as a context for more productive learning using math, computing and inquiry.
Disciplinary knowledge: a case of math learning
In this segment, we explore Mira’s involvement in a data science project on Twitter/X data analysis – and the various ways she evidenced mastery in math topics. Mira’s project analyzed the sentiment of tweets on the topic of animal testing, with the goal of ascertaining whether public opinion leaned toward support or opposition on this potentially contentious issue. At several points in Mira's approach to the project, she demonstrated sophistication in her understanding of mathematical concepts through their embedded application in her chosen inquiry. In her representations of the findings of her work, Mira strategically used a histogram and a pie chart to respectively represent distinct features of her sentiment analysis findings (see Figure 1). Mira chose a histogram (left) to represent the frequency of tweets in her dataset that were for and against animal testing, noting in her presentation how the x-axis on a histogram separates the data into “Against” and “For” categories so that differences may be illustrated: “So on the x axis, you can see against and for and the count, and as you can tell, a lot of people are against animal testing”. This required that Mira import the Seaborn visualization library and the code ax = sns.countplot(). Ultimately, this choice of visualization illustrates her understanding of data frequency, and how the histogram held particular value for helping her showcase the skew (skew was a generalized topic presented in the workshop). Mira’s pie chart (right), on the other hand, was used on categorical data to showcase the different proportions of tweets that reference specific animal types. Mira’s discussion of her pie chart instead highlighted her knowledge of how percentages are calculated from the raw counts for each category generated by the Python code (e.g. 29 tweets about dogs equaling 32% of tweets in the sample). Ultimately, the pie chart served as a tool for Mira to succinctly compare the prevalence of individual species mentioned in relation to each other as part of the “whole” conversation in the dataset around animal testing, as evidenced by her use of the matplotlib library code plt.show().
The embedded mathematics knowledge demonstrated by Mira around math concepts in her inquiry project ultimately manifested with greater specificity between her pre and post interview responses around similar math concepts. Consider the increased detail with which she was able to identify and describe solutions to scaling issues when presented with the same example of an unbalanced histogram in her pre and post assessment interview:
Interviewer question: So now your data is analyzed as much as you need. And you want to make a visualization that represents your findings. So, you make a bar graph to represent your findings, but it ends up looking like this picture that I'm about to show (unbalanced box and whisker plot) [Supplementary_material_appendix_1]
Mira pre interview: Oh, you change like the format. Yeah, you don’t use a bar graph [histogram]. You use I’m not sure what type of graph but so the information like shows better and visualizes better right (Pre interview, 07/30/2023).
Mira post interview: I would change the scaling of the bar graph [histogram]. So instead of starting at six, I would probably start at zero, because then you can actually see the values of November, December, January […] So maybe, since none of the graphs are going above 6.3, I could make the top of the graph. Maybe like 6.2 or 6.15 (Post interview, 08/23/2023).
While Mira was able to identify in her pre interview response above that the information presented in the graph was not optimally visualized, the creation of a histogram in her inquiry project may have helped to expand her understanding of the functions of the x- and y-axis sometimes referred to as independent and dependent variables in math and science. Mira elaborated in her post interview response above on how to appropriately scale different data sources based on their minimum and maximum values so that a histogram can present the distribution of data most clearly, in line with her discussion of how to represent counts along the axis of the histogram in her project. This exploration into Mira’s discussions showcases how data science coupled with social media can be a potent context for mastering and contextualizing mathematics. Mira applied her mathematical knowledge to analyze real-world phenomena led by her own inquiry and then leveraged her emergent understanding toward the evaluation of a new data source in the interviews. Her skills were not confined to theoretical knowledge; she demonstrated a practical application of mathematical concepts in data analysis and made informed decisions on the appropriate mathematical tools to use. Her case exemplifies the successful integration of domain knowledge in mathematics with applied data science, highlighting the efficacy of using real-world contexts drawn from social media for educational purposes.
Situated inquiry: data science and next generation science practices
Mira’s case also exemplifies how situated data science inquiry can support key next generation science practices. This section examines how Mira’s project presentation and interview reflections align with NGSS practices such as “asking questions and defining problems”, “planning and carrying out investigations”, “engaging in argument from evidence” and “obtaining, evaluating, and communicating information”. Mira’s presentation (shown in Figure 2) showcases her research questions and chosen topic and rationale. Mira identified animal testing as an important sociopolitical issue that her audience (and broader society) needs to become more informed on [shown in Figure 2(b)]. To address the potential problem of an uninformed audience, she divided her central research question into four subquestions that were directly related to different aspects of an animal testing topic that others might need to know, such as differences in public sentiment, and references to specific animals used in testing [shown in Figure 2(a)]. Her skill in crafting research questions and proposing an investigative topic aligns with the NGSS practice of “asking questions and defining problems” as evidenced by the clarity and organization of her questions, their practicality given the nature of Twitter data and their utility for addressing an identified problem.
Mira’s research questions and project topic also foreground her “planning and carrying out specific investigations” situated in Twitter/X discourse around animal testing. Mira had to take specific and targeted steps in data processing, analysis and visualization to find the answer to each question, such as downloading her dataset and manually coding each tweet for users’ sentiments (for or against animal testing), reasons for their sentiment (e.g. “not furthering scientific research”) or references to specific animals (e.g. rabbits, crabs, etc.). These processes were unique and tailored to her data context and situated inquiry. At the conclusion of her presentation, she also demonstrated a clear understanding of how her data science processes can be leveraged as a way to “engage in argument from evidence”, stating:
My project may help organizations that fight for animal rights as I have collected real statistics, and the majority part of the data portray how animal testing is hurting more than doing good. (Post interview, 08/23/2023)
This not only aligns with her research questions and initial project goals but also highlights how the results of her analysis and visualizations can serve as an argument against animal testing practices from the perspective of public sentiment.
Finally, Mira highlighted similar processes in a hypothetical project on deforestation during her post interview, where she described the process of “obtaining, evaluating, and communicating information” in detail:
So first, you would have to gather all the data. So you would want a couple of different search words. So you can get as much data as you can. For deforestation, I could probably do hashtag climate change, hashtag deforestation….So the next step will be preprocessing. And I would take that large data set and clean it. So I would get rid of missing or noisy data […] the next step would be an analysis. So, I would maybe look at, or try to find the tweet that has the most retweets or like count. So, if the tweet that had the most likes, and if it was talking about how they were against deforestation, then I would know that a lot of people agree with that person because they obviously have the most amount of likes. So then, after analyzing my data, I'll move on to the visualizations. So I can maybe make a bar graph and write down or measure like how many people said that they were against deforestation and how many people are for deforestation […] and then draw a final conclusion for all of that (Post interview, 08/23/2023).
Findings from Mira’s project and reflections reveal the ways in which she was able to leverage several key data science and next-generation science practices in tandem toward inquiry that was deeply contextualized and situated in inquiry topics that she found socio-politically relevant. Her research questions and topics ultimately laid the groundwork for her engagement with data science, aligning with NGSS’s emphasis on connecting classroom learning with real-world issues (Miller et al., 2018). Mira’s inclination toward researching global changes through data science reflects the NGSS’s goal of nurturing students’ scientific inquiries grounded in personal interests and relevant societal issues.
Computational literacy: computational thinking in service of social and cultural relevance
Findings from Mira’s case also illustrate the dynamic discipline of computing. This section explores Mira’s engagement with computational literacy by describing how she applied multifaceted aspects of computational thinking, including decomposition, algorithmic reasoning, pattern recognition and abstraction, to delve into her chosen context (i.e. line of inquiry aligned with a personal interest). Figure 3 of Mira’s final presentation signals her computational thought processes. Mira navigated through counts related to opinions on animal testing [shown in Figure 3(a)], showcasing how she recognizes patterns, which demonstrates her ability to identify trends to decipher complex datasets. This occurred when Mira categorized her animal testing data into three groups, using them to discern and organize recurring themes or trends within the data. Aspects of computational thinking are also evident in her systematic breakdown of the data into three distinct categories, each representing a nuanced aspect of public sentiment. By decomposing (i.e. disentangling) the complex landscape of opinions, Mira unveiled a structured framework that not only simplifies the complexities of the data but also underscored her strategic approach to understanding and analyzing diverse perspectives. This methodical decomposition highlights Mira’s ability to break down complex data into more manageable components. Counting tweets that reference specific animals and express different sentiments also required her strategic application of the code value_counts().
On the right [see Figure 3(b)], Mira abstracted her approach by consolidating the complex and newly disentangled landscape of opinions on animal testing as evidenced in her slide annotations. Abstraction, in this context, involves distilling the essential features of the data while disregarding unnecessary or erroneous information. Mira abstracted by concatenating multiple datasets, ensuring a comprehensive representation of public sentiment. This process includes the removal of duplicates, streamlining the dataset for analysis. This part of Mira’s final project showcases her ability to transform raw data into a more manageable and comprehensible form. We also observed algorithmic thinking, as Mira seemed to strategically concatenate four datasets, ensuring a comprehensive representation of public sentiment on animal testing. Specifically, she exhibited foresight as reflected in her code and effort to simultaneously remove duplicates, a pivotal step that streamlines the dataset for precise analysis. This nuanced design showcased Mira’s ability to translate or disaggregate her conceptualization into strategic steps. Her consideration of data cleanliness as evidenced in the strategic removal of duplicates using the Pandas drop_duplicates() method alongside comprehensive representation as evidenced in the concatenation of multiple datasets using pd.concat(), Figure 3(a) and (b: right), underscores her intention in algorithmic design.
Mira’s mastery was also evident in post interviews as her responses were indicative of a sophisticated understanding of the decomposition, algorithmic thinking, pattern recognition and abstraction facets of computational thinking. Mira’s post-interview reflections served as a valuable continuation and confirmation of what we observed in her presentation, where Mira discussed the practical steps she would take to gather and preprocess data for analysis. Similar to her animal testing project, her abstraction skills can be seen as she strategically concatenated multiple datasets, ensuring a comprehensive representation of public sentiment:
Interviewer Question: [After Mira suggests project on deforestation]. So, then, let's go with this project of deforestation, right. So, talk me through the steps that you would go through to complete this project.
Mira: So first, you would have to gather all the data. So, you would want a couple of different search words. So, you can get as much data as you can. For deforestation, I could probably do hashtag climate change, hashtag deforestation, hashtag, maybe global warming. And then I would mention that in the Twitter inquiry to get the data from Twitter, and maybe I could get about 100 tweets from each search word. And then I would concatenate those three data sets so that I have one big one that I can later preprocess and the next step. (Post-interview, 08/23/2023).
Throughout Mira’s project, the threads of decomposition, algorithmic thinking, pattern recognition and abstraction design interacted in complex ways as she navigated the landscape of Twitter/X data analysis. Findings reveal a narrative that underscores the integral role these computational thinking components play in unraveling the complexities of sentiment analysis on social media while illustrating how the practical application of computational literacy transformed her approach to real-world challenges.
Discussion
Given the current emphasis in existing education research that situates pedagogical engagement on the user side of social media platforms, this work illustrates the untapped potential of empowering learners to access, explore and leverage social media on the server side as a data source. Mira’s engagement with social media as a sandbox data source fostered the integration of disciplinary knowledge in mathematics, which suggests that the Coding Like a Data Miner curriculum, and other future curricula that adopt a similar inquiry-focused approach, hold opportunities to engage along targeted disciplinary areas. Mira’s case is also an example of authentically situated learning practices through intersecting complex ideas and methods for enacting them lauded by the National Research Council (2013). While this case illustrated the embedded application of math concepts and techniques, similar projects in prior work leveraged social media data in disciplinary areas such as social studies and science to examine the complexities of public and political opinion on Twitter/X around COVID-19 vaccinations (Walker et al., 2023c). This approach holds promise for connecting key ideas and practices across multiple disciplinary areas, such as using descriptive statistics to examine data sources in science, thus breaking free of the disciplinary silos identified by Pruitt (2014).
Vignettes from Mira’s illustrative case highlight the potential utility of inquiry-based data science that leverages social media data as a data source that can promote the embedded application and contextualized refinement of STEM disciplinary knowledge. Open inquiry has advantages because it fosters active engagement with real-world data, promoting deeper understanding and application of STEM concepts. However, a potential downside to using an open inquiry approach is that teachers need to develop skills as facilitators and encourage deep engagement with disciplinary knowledge, ensuring that students are guided effectively through the inquiry process. Mira’s sentiment analysis of tweets on animal testing also demonstrated how curricula that leverage social media as a “sandbox” data source align with several key NGSS practices, including “asking questions and defining problems”, “planning and carrying out investigations”, “engaging in argument from evidence” and “obtaining, evaluating, and communicating information”. That have personal connections and real-world impact. This shift represents opportunities to leverage learning as a tool in service of learners, their interests and communities. This approach is consistent with culturally responsive pedagogies, as it recognizes and values learners’ diverse backgrounds and experiences, promoting inclusivity and equity in education. This is important because it situates learning as an epistemic process rather than one where knowledge is treated as discrete information that should be memorized. It also reflects that our sandbox approach offers opportunities for learners to construct ideas along topics that are relevant to their personal, cultural or sociopolitical concerns. This stance thus creates space for broadening participation in areas (i.e. computing, data science and math disciplines) that have a history of excluding learners from underrepresented backgrounds. Here we provide evidence that open-inquiry approaches that connect with learners create the possibility for disrupting histories of exclusion, by centering learners in ways that are fundamentally empowering. This is reflected in our observations of Mira’s application of computational thinking throughout her project, including decomposition, algorithmic reasoning, pattern recognition and abstraction, which exemplifies their crucial role in unraveling complexities within social media-based data analysis as a pedagogical practice.
Findings from Mira’s case study aligned with next generation science practices (National Research Council, 2013), emphasizing inquiry-based learning that connects to personal interests and real-world issues. The study found that Mira’s engagement with data science, particularly her focus on global changes, exemplifies the NGSS’s objective of integrating relevant and contemporary topics into science education. The work also represents a merging of CS and data science thinking and practices toward a unified goal that aligns with existing unifying efforts (e.g., Grillenberger and Romeike, 2014; Gould et al., 2015; Krishnamurthi and Fisler, 2020). This approach not only supported Mira in deeper exploration of topics she found relevant but also provided her with tools and strategies to construct her own meaningful scientific inquiries. A notable limitation, however, was the challenge of providing tailored support to foster the deeper kinds of data literacy advocated by Lee and Wilkerson (2018) in such individualized inquiry projects. While Mira's project effectively demonstrated the NGSS’s goal of connecting education with real-world scenarios, the case study suggests a need for more structured guidance to help students navigate complex data science concepts. This could include more focused instructional strategies or resources specifically designed to bridge the gap between students’ personal interests and the scientific concepts they are exploring. Despite these challenges, Mira’s case study underlines the effectiveness of NGSS in promoting a more engaged and relevant science learning experience, preparing students like her for future scientific pursuits and problem-solving in real-world contexts.
Finally, Mira’s case also included embedded applications of computational and data science literacy, revealing the transformative and integrative potential of social media-based open inquiry data science in enriching STEM education. Though emergent in both sophistication and prevalence as Mira transitioned from problem identification and data collection to data preprocessing, analysis and visualization, her application of computational thinking processes served as foundational elements throughout her inquiry to identify and achieve her inquiry-specific goals. In Coding Like a Data Miner, educators and designers had practical and design freedom to emphasize computational elements, which may not be the case for other formal and informal learning settings. This highlights the essential and cross-cutting utility of computation in STEM but also raises questions for future work about the role and position of the discipline.
A potential limitation of our work in this area lies in the small sample and class size for this initial implementation of the curriculum. While multiple facilitators were on hand to support Mira and others’ disciplinary explorations in Coding Like a Data Miner, a key area of future research will need to be around support for educators around how to practically and feasibly support such complex and individualized computational processes with larger student groups. This work may serve as a practical example for educators seeking to adapt their practice to accommodate the data revolution, especially as a pedagogical design that invites educators and learners to engage flexibly and collaboratively in knowledge and skill development that is tailored to individual projects, efforts to both prepare educators to engage this role and to address the issues facing educators as outlined by Yadav and colleagues (2016).
This work serves as an early step toward the development and examination of social media sandbox data science curricula in practice. In support of educators and possible expansions of the application of social media for learning using this approach, future work will need to examine teachers’ experience of the approach toward the design of targeted professional development. Also, research is needed on the implementation of Coding Like a Data Miner across diverse contexts to tease out the role of cross-disciplinary engagement. Finally, research is needed to explore the design of assessments that account for the diversity of learner understanding and knowledge representation.
Conclusion
This research introduces empirical evidence that the “Sandbox” concept is a potentially valuable approach for integrating computing, data science and disciplinary topics in ways that are socially and culturally responsive. This approach also responds to the imperative for educational practices to evolve in an era shaped by computational advancements. The Sandbox approach provides a dynamic platform for learners to interact with big data curated from social media, fostering the development of both computational and data science literacies. Recognizing the indelible impacts of CS on contemporary society, particularly for traditionally underrepresented groups, this proactive stance addresses disparities, particularly in high school settings where learners shape their future education and career pursuits. The goal is that curricula and pedagogy such as Coding Like a Data Miner may offer flexibility for independent exploration of concepts but also encourage creativity, allowing learners to authentically connect with their culture and personal interests, thus creating a dynamic and culturally relevant learning experience. As we aim to prepare the next generation of computational thinkers, this work shows that social media as a learning context offers opportunities to support the development of a more informed citizenry amid evolving societal challenges that are inextricably linked to how we teach.
This work was supported in part by a grant from the National Science Foundation (#2137708 for review). Any opinions, findings, conclusions and/or recommendations expressed herein are those of the authors and do not necessarily reflect the views of the National Science Foundation or the University of Texas at El Paso for review. The authors extend their gratitude to Alan Barrera for his efforts in this research.
Figure 1.Examples of how Mira represented the math she mastered in her final presentation during social media-based open inquiry data science
Figure 2.Examples of how Mira’s project suggested engagement with NGSS practices in her final presentation during social media-based open inquiry data science
Figure 3.Examples of how Mira demonstrated computational thinking in her final presentation during social media-based open inquiry data science
© Emerald Publishing Limited.
