Content area
The transformational potential of data science and adaptive instructional systems (AISs) to radically impact the learning ecosystem is based on the following: (1) evidence suggesting that individualized instruction is generally more effective than traditional classroom instruction where finegrain monitoring and tailored support to each individual learner is not possible (2) the capability of modern computing technologies to collect, store, access, and process fine-grain, vast and rich sets of learning data while accounting for data ownership, security, and privacy; (3) promising new advances in data science, including powerful machine learning and statistical methods for extracting useful knowledge from big educational data sets; and (4) access to affordable, powerful, and scalable distributed computing resources for processing big data, e.g., a cloud-fog-edge computing continuum. Despite these promising developments, data science and AISs have yet to exert a transformative impact on the learning ecosystem. The Learner Data Institute, a project sponsored by the US National Science Foundation, serves as the much needed catalyst for these developments to converge and transform the learning ecosystem. LDI intends a rigorous test of the hypothesis that emerging learning ecologies that incorporate AISs are capable of providing effective, engaging, equitable, and affordable individualized assistance for both learners and instructors, and that the parameters of these systems, e.g., effectiveness, can be improved over time given sufficient attention to evidence, captured as data, and expertise, provided by teams of interdisciplinary researchers like ours.
Abstract: The transformational potential of data science and adaptive instructional systems (AISs) to radically impact the learning ecosystem is based on the following: (1) evidence suggesting that individualized instruction is generally more effective than traditional classroom instruction where finegrain monitoring and tailored support to each individual learner is not possible (2) the capability of modern computing technologies to collect, store, access, and process fine-grain, vast and rich sets of learning data while accounting for data ownership, security, and privacy; (3) promising new advances in data science, including powerful machine learning and statistical methods for extracting useful knowledge from big educational data sets; and (4) access to affordable, powerful, and scalable distributed computing resources for processing big data, e.g., a cloud-fog-edge computing continuum. Despite these promising developments, data science and AISs have yet to exert a transformative impact on the learning ecosystem. The Learner Data Institute, a project sponsored by the US National Science Foundation, serves as the much needed catalyst for these developments to converge and transform the learning ecosystem. LDI intends a rigorous test of the hypothesis that emerging learning ecologies that incorporate AISs are capable of providing effective, engaging, equitable, and affordable individualized assistance for both learners and instructors, and that the parameters of these systems, e.g., effectiveness, can be improved over time given sufficient attention to evidence, captured as data, and expertise, provided by teams of interdisciplinary researchers like ours.
Keywords: learning ecosystem, data science, adaptive instructional systems, science convergence
I. OVERVIEW OF THE LEARNER DATA INSTITUTE
Equal educational opportunities for every child and adult in the form of access to quality education is probably the best way to address long term inequality and inequity, enable social mobility, and broadening diversity and inclusion across all levels and aspects of our society. The Learner Data Institute (LDI: www.learnerdatainstitute.org) Institute focuses on this fundamental societal need.
Based on existing federally-funded infrastructure (see below) and the foundation which we laid for the Learner Data Institute (LDI) as part of our Harnessing the Data Revolution (HDR) Institutes Frameworks NSF award (I-HDR-DIRSE-FW; DIRSE - Data Intensive Research in Science and Engineering), our LDI team has embarked on a mission to harness the data revolution to further our understanding of how people learn, how to improve adaptive instructional systems (AISs), and how to make emerging learning ecologies that include online and blended learning with AISs more effective, efficient, engaging, equitable, relevant, and affordable. Our interdisciplinary team of people from academia, industry, and government will accomplish this mission and explore new modes of data-driven discovery to transform the learning ecosystem by building on the framework for science convergence which our team developed, implemented, tested, and refined (Rus et al., 2020).
The institute brings together researchers, developers, and practitioners from: the Institute for Intelligent Systems, University of Memphis (lead); Carnegie Learning (co-lead); Northwestern University; Carnegie Mellon University; Columbia University; the U.S. Army Research Laboratory in the Human Research and Engineering Directorate (ARL-HRED); The Center for Educational Informatics at North Carolina State University; the Institute for Cognitive Science, University of Colorado, Boulder; the Pittsburgh Science and Learning Center, University of Pittsburgh; the Wisconsin Center for Education Research; Penn State University, Educational Testing Service (ETS); and four other commercial developers of education technologies: Age of Learning, ENGAGE - Susan, Gooru, SoarTech, Workbay, and Yixue. Together, we intend a rigorous test of the hypothesis that emerging learning ecologies that incorporate AISs are capable of providing affordable, effective, efficient, equitable, relevant and engaging individualized assistance for both learners and instructors, and that the parameters of these systems, e.g., effectiveness, can be improved over time given sufficient attention to evidence, captured as data, and expertise, provided by teams of interdisciplinary researchers like ours.
1.1. Convergent Community of Researchers, Developers, and Practitioners
To accomplish LDI's mission, we are building a convergent community of interdisciplinary researchers, developers, and practitioners, to scale up the work we started. Our strategy to accomplish the LDI mission of transforming the learning ecosystems is to focus on a number of carefully selected research priorities targeting key aspects of the learning ecosystem which we believe are at a tipping point, 1.е., timely investment in data-intensive approaches focusing on those critical aspects has the maximum potential for a transformative effect. Since the learning ecosystem is a complex web of interrelated elements, improvements in these key aspects will percolate throughout the whole learning ecosystems. While we are aware that within the scope and resources of an HDR institute we cannot address all needs of the learning ecosystem, it is our belief that by targeting with latest Data Science advances a critical mass of key aspects or challenges that are at a tipping point, we hope to start a chain reaction that will transform the whole learning ecosystem lifting it to a qualitatively higher state that is more effective, equitable, engaging, efficient, and affordable.
Following this strategy, we have identified during the conceptualization phase (LDI Frameworks - Phase 1) a number of research priorities, some of which we have started to address in Phase 1 and for which we will soon release data science based prototype solutions for related, welldefined tasks (see examples in Section - Science Convergence).
The plan is to scale up and expand the work we started in Phase 1 as part of Phase 2. The identified research priorities were the result of an intense science convergence process involving a number of activities (brainstorming sessions or "ideas labs" followed by ranking and selection iterative discussions - see details in the Science Convergence section) that engaged all our team members across many disciplines (e.g., educators, computer scientists, statisticians, cognitive scientists, engineers, education technology developers), developers (Carnegie Learning, Age of Learning, Gooru), school districts, as well as researchers from other HDR projects funded by NSF (e.g., Northwestern's TRIPODS Cohort II project: IDEAL - The Institute for Data, Econometrics, Algorithms, and Learning; CMU's DIBBs LearnSphere project: Building a Scalable Infrastructure for Data-Driven Discovery and Innovation in Education; and University of Memphis NSF project: Advancing the Science of Learning Data Science with Adaptive Learning for Future Workforce Development). Indeed, we identified those research priorities for which we believe, based on our collective interdisciplinary wisdom, that timely investment in data-intensive approaches will have the maximum potential for a transformative effect.
The identified research priorities or investment opportunities constitute our 5-year plan for the LDI institute. It should be noted that we also generated a 10-year plan such that the impacts of the LDI Institute will propagate and evolve beyond the lifetime of the award and beyond our own team thus acting as an agent of change for how research questions are conceived and addressed through interdisciplinary collaboration. For the HDR community, LDI will contribute as the education and training hub working with the other HDR institutes to develop training platforms for their respective target communities. In addition, we will share the basic Data Science advances we will develop for education related challenges but which are generally applicable to other domains such as the incentivebased mechanisms to share data, neuro-symbolic approaches, or the cloud continuum for at-scale the distributed data processing. At the same time, LDI will adopt and adapt ideas, methods and tools developed by other HDR institutes that could benefit the learning ecosystem.
1.2. Identified Research Priorities
Specifically, LDI will pursue through data and data science and science convergence a number of research priorities in learning science and engineering which can be grouped into five broad categories: (1) furthering our understanding of learning and instructional processes and environments; (2) data science infrastructure for the education and the HDR ecosystem; (3) improving AISs and scale them up both horizontally and vertically; (4) the human-technology frontier in future learning ecologies that involve AISs and transforming communities of practice, e.g., triggering a culture shift in teacher training programs; and (5) how data science can address equity, ethics, diversity, and inclusion aspects of education. For instance, we will tackle research priorities ranging from addressing the "impoverished datasets" challenge in education through a combination of mechanisms meant to incentivize stakeholders to engage in secure and privacy-preserving learner data access and sharing and a cloud-continuum distributed computing infrastructure layer enabling distributed, at-scale learner data analysis to novel solutions to majors tasks such as learner modelling and the automated discovery of instructional and learning strategies based on big edu-data and advanced data science methods such as neuro-symbolic approaches to starting a culture change in educators' community of practice through the development of new curricula for data literacy, teacher and educator training workshops, and teacher fellowships. A culture change in educators' community is essential for teachers, school administrators, and policy makers to understand and embrace data-driven approaches and AISs as powerful tools that allow them to better understand the learners, to discover effective and engaging learning and instructional strategies, to free them from certain tasks that machines can do thus allowing them to focus on more advanced and complex aspects of learning, which together will transform education.
We intend that our research will contribute to at least three of the ten new Big Ideas for Future Investment announced by NSF: Harnessing the Data Revolution for 21st Century Science and Engineering (HDR), Growing Science Convergence (GSC), and The Future of Work at the HumanTechnology Frontier (FW-HTF). Specifically, the LDI will help us learn: (1) how to transform a farflung group of interdisciplinary researchers, developers, and practitioners into a community of practice that can fully exploit the data revolution through data and science convergence and in the process enable new modes of data-driven discovery to ask and seek answers to fundamental questions at the frontiers of science and engineering and key societal needs, i.e., education in our case; (2) how AISs and data science can be used as a research vehicle to further our understanding of how learners learn; (3) how to optimize the human-technology partnership with data and Data Science to improve learners' and teachers' ability to employ the technology in a way that facilitates learning, while at the same time improving the affordability, effectiveness, scalability of these systems; and (4) more generally, how to extend the frontiers of data science to include: new methods of data collection and design; more interpretable machine learning methods (e.g., by combining deep learning with Markov Logic); scalable new algorithms for joint inference in Markov Logic Networks; and methods for identifying causal mechanisms from unstructured, semi-structured, and structured data. Equity, inclusion, diversity, and inclusion is another major foci of our work, specifically, our working assumption is that data science and AISs can help detect and mitigate issues related to ethics, equity, inclusion, and diversity in education. However, this would be the case only if proper "checks and balances" are added throughout the data lifecycle (collection, organization, categorization, analysis, visualization, and implication) and during instruction and intervention (teacher-driven, AISs-driven, or blended) to address issues such as algorithm bias, attribution bias, and construct bias in order to help close achievement gaps.
1.3. Enabling Learner and Learning Data Sharing: Incentives, Security, and Privacy
We aim to develop technology that encourages data owners to share their educational data. The three key thrusts of this effort are the incentives, security, and privacy of educational data. Our consideration of incentives identifies opt-in data sharing mechanisms where there is a significant benefit to opting in. Our consideration of security enables collaborative computation across many parties without the need to share their data with anyone else. Our consideration of privacy allows data analysis where properties of individuals cannot be subsequently identified. Due to space reasons, we briefly describe only the incentive-based data sharing activity.
There is public benefit of having educational data available for data analyses by learning scientists. This benefit is what economists term a "public good": if it is available then all individuals can consume the benefit of it, regardless of whether the individual contributed to it. For example, if data studies enable improvements to education policy, then all reap the benefits of these improvements. Of course, the marginal improvement of the analyses with an additional individual's data are negligible. Thus, the data contribution suffers from the infamous "free rider problem".
Our team aims to study the challenge of data science in education as a public good problem, and understand the benefits of transforming outcome of data analyses into a private good, namely one that is consumed individually by contributors. For example, if data analyses enable individualized study plans then the individual would have to provide their own data to obtain such a study plan. There is a rich theory in economics and computer science on how to design mechanisms for private goods.
The difference between public goods and private goods in educational data ecosystems is analogous to the difference between reputation systems and recommendation systems in online marketplaces (Chen, Hartline, Liu, Waggoner, and Weld, 2016). We will identify optimal or near optimal mechanisms for incentivizing participation in the learning ecosystem.
II. THE FRAMEWORK FOR SCIENCE CONVERGENCE, ITS IMPLEMENTATION, EVALUATION, AND OUTCOMES
In order to facilitate and sustain science convergence, we designed a convergence framework based on a combination of a team structure, processes, and activities that will enable team members to develop a shared vision and language, which over time should lead to effective and meaningful crossdiscipline collaborations and ultimately science convergence. Such mutual sense making, science convergence, and R&D efforts are likely to incubate solutions to complex problems to enable effective, efficient, engaging, equitable, relevant, and affordable learning experiences for everyone (Rus et al, 2020). We discuss next the main components of our science convergence framework: the team structure, convergence processes, and convergence activities.
The LDI framework for data-intensive research in science and engineering is designed in the form of a community and accompanying processes supporting the proposal and development of interrelated scale-up projects led by one or two individual researchers (primarily the key personnel as well as others) and also benefit from input from teams of experts from various domains, i.e., the Expert Panels. That is, individual researchers are the main drivers of targeted related scale-up projects. Those grass-roots scale-up projects fit their personal interests, thus, maximizing motivation, and must align with the LDI mission and identified research priorities by our entire LDI team through science convergence processes in Phase 1 and during the preparation of this proposal.
2.1. Team Structure
Specifically, we designed a team structure and processes that enabled the harnessing and diffusion of expertise from various areas in an efficient and effective way (see details below) while fostering individual initiative and interests, e.g., all LDI team members have been encouraged during the LDI Frameworks award to propose prototyping tasks related to their research interests and which also fit the LDI mission (it should be noted that the PI and co-PIs proposed prototyping tasks at proposal time).
Team Structure. The LDI team structure consists of a leadership team responsible for overseeing and coordinating all LDI activities, Expert Panels meant to offer support to all activities from a specific domain perspective, and task-oriented groups meant to address a well-defined challenge in education and develop data science prototype solutions. We detail next the role of each of those team structures.
The LDI Core Team (or LDI Leadership Team) makes sure all activities align with the mission of the institute and offer necessary support and cohesiveness of all activities. Furthermore, members of the LDI Core Team participate in activities with other projects in the HDR ecosystem (e.g., we participated in the NSF-run PI meeting in April 2020 and identified potential collaborators such as Northwestern University's TRIPODS team). The PI, co-PIs, the project coordinator, the Pls on each subaward, and the co-leaders of the Expert Panels are part of the LDI Core Team.
LDI's Expert Panels are homogeneous groups in terms of expertise meant to maximize expertise coverage in one area, e.g., Data Science or Learning Science, as individual researchers are specialized in different subareas of a relatively broad area such as Data Science or Learning Science. The role of the Expert Panels is twofold: (1) to provide solid (breadth and depth) input from an area of expertise to all LDI efforts such as specific prototyping projects resulting in prototype solutions in Phase 1 and (2) help shape the 5-year plan for Phase 2 by identifying opportunities for investment, 1.е., promising developments in one area such as Data Science or Learning Science that could benefit the LDI mission. The composition of Expert Panels was derived from the needs of the project and the expertise and preferences of our team members. We also tried to balance the number and size of each of the Expert Panels. The following panels have formed initially: Data Science, K-12 Education (this panel includes teachers and other school representatives), Learning Sciences, Learning Systems Engineering, Ethics & Equity (which emphasizes related issues such as inclusion and diversity), and Human-Technology Frontier. Institute members may belong to more than one Expert Panel but must be actively engaged in at least one, contributing to all activities of the panel. Expert Panels have coleaders who ensure that panels successfully accomplish their assigned tasks.
2.2. Collaboration Processes and Activities For Broad, Interdisciplinary Groups
A number of processes were put in place to grow science convergence among our large team of interdisciplinary experts. To this end, within- and cross-domain interaction and collaboration processes were designed and implemented. Within-domain interactions mainly occur via Expert Panels, described above.
Such cross-domain interactions are more challenging as experts from various walks of life rely on different concepts, approaches, and communication tools to express their thinking, which can lead to communication breakdowns. To facilitate patient and open-minded interactions, we opted for a number of activities (e.g., all-team meetings, brainstorming sessions or "ideas labs" to generate idea for research priorities and investment opportunities for the 5-year institute plan, the identification of broad-impact transformative apps, reviewing and feedback for prototyping tasks) and structured processes to facilitate fruitful interactions during those activities. In terms of structured processes to manage cross-domain interactions, we relied on a combination of Nominal Group Technique (NGT), SWOT analysis, and pre-mortem analysis, similar to the within-domain interactions. Expert Panel and other working groups had the freedom to run their internal operations as they felt best using the suggested processes or others. This freedom allowed our team to experiment with various modes of discovery. Expert Panels were then asked to share with others the collaboration processes they implemented and how they worked. The intent was to learn what worked best so that everyone can take advantage of those best processes.
To facilitate collaboration, we have been using and will use collaborative work tools such as Google docs, slack, and wikis, ideas labs and other brainstorming activities, expert panels, prototyping tasks which are individually led, workshops at major conferences to present solutions (of various maturity levels) and get feedback from the wider community, and ad-hoc, open elicitations for feedback (we allocated money in the budget for 10 ad-hoc reviewers - to be determined through open calls).
2.3. Exemplars of Science Convergence
We already described the team structure and collaboration processes earlier. We will just illustrate how convergence was facilitated through structured collaboration processes that involved all Expert Panels reviewing all proposed prototyping tasks.
As an example of a structured process for meaningful and intentional integration of knowledge, techniques, and expertise from multiple disciplines, i.e., convergence, each proposer of a prototyping task was instructed to write a task description in the form of a Google Doc, which was then reviewed by the Expert Panels who provided input on how to improve the proposed solution or even suggest an entirely new approach, method, or technique. Again, the goal is to develop better solutions that incorporate input from various domains, resulting in more powerful solutions that account for aspects that were missed when a single-domain view was applied. Once the Expert Panels provided their feedback, the prototyping task leader(s) were asked to specify how they plan to integrate the feedback, which may have included several rounds of clarifying discussions with Expert Panels. Once feedback was integrated, prototyping task leader(s) were asked to compare and report the results obtained with the convergent solution to the original solution. The results and reports were once again analyzed by all Expert Panels and further feedback was provided. Each prototyping task went through 2-3 such feedback-implementation-refinement cycles under supervision of experts from all our 6 Expert Panels. Final results as of this writing are pending.
The task of learner modelling is another exemplar of how science convergence is needed and is being implemented Бу LDI to address key challenges in learning science and engineering.
Learner Modelling Through Data Science and Science Convergence. Given the centrality of learner modelling, i.e., knowing the learner, to quality, effective instruction, a science convergence approach, i.e., an approach that accounts for input from various areas of expertise, is much needed as only such an approach can provide us with a complete or as nearly complete as possible "image" of the learner. Given the vast, distributed nature of the learning activities and environments, we imagine a future learning ecosystem where various stakeholders (learners and their parents/legal guardians, teachers and schools, AISs developers) will opt for various degrees of privacy and data ownership ranging from being totally open which implies sharing their data (after being anonymized) or providing access to their data in secure and privacy-preserving ways to being totally unwilling to share or offer secure and privacy-preserving access to their data (e.g., some AISs developers or even teachers or schools may consider their data as a key asset to improve their processes and instruction; this may have public education data policy recommendations such as enforcing public schools to share data in privacy preserving ways, for instance). To accommodate this wide range of stakeholders while at the same time pushing for developing as comprehensive and accurate as possible learner models based on available data (pooled data), we imagine a shared learner model which everyone can access and download at any time and which can then be updated locally based on, for instance, individual learner's data (privately held, non-pooled data). Other scenarios can be imagined such as the shared learner model is first updated based on non-pooled school data which in turn is shared by a particular school district with all their students which can thus take this school-updated learner model and further updated it with their own, privately held, non-pooled data, e.g., when working with an AISs on their home computer.
In order to achieve this goal of developing (nearly) complete learner models in order to provide teachers and AISs with real-time, accurate characterization of learners, novel approaches to learner modelling based on data and data science are needed that allow for accounting in real-time for large, distributed streams of learner data (e.g., through a cloud continuum effort) from various stakeholders (schools, developers, individual learners/users) and of different modalities (e.g., cognitive, affective, motivational, epistemic, behavioral) while accounting for privacy and security concerns (see the secure multi-party and differential privacy efforts) and allowing explicit input from domain experts (see the neuro-symbolic approaches based on deep neural networks and statistical relation learning methods such as Markov Logic and Probabilistic Soft Logic).
2.4. Framework Evaluation Plan
We will evaluate the proposed framework of the future LDI institute in order to demonstrate its effectiveness. We will report quantitative and qualitative metrics for the community building and engagement effort, identifying research priorities, and development of interdisciplinary prototype solutions activities. For quantitative metrics, to account for different perspectives, we will report how many experts and from how many different disciplines contribute to specific tasks, e.g., data convergence requirements specification. For each expert, we can monitor their individual contributions in terms of content (as number of words), comments, and revisions to others' contributions (using a wiki such editing activity can be monitored). More qualitatively, each member's contributions will be assessed in terms of the depth of their contributions, e.g., initiated and led the development of a novel solution that could improve the detections of learners' emotions in classroom context.
Furthermore, we will report the number of publications, presentations, tutorials, meetings, email exchanges and other forms of direct communication (within SIGs, among all team members, with the external advisory board members, the broader research community), and improvements of our prototype solutions over existing solutions.
While not possible during the 2-year performance period, other success measures can be monitored longer term such as how many citations the products of this project will generate, how many research groups integrate the proposed solutions, and how impactful our work is on learners and teachers.
III. CONCLUSIONS
The Learner Data Institute (LDI) aims to transform the learning ecosystem via a framework for science convergence based on innovative data science approaches coupled with trans-disciplinary, collaborative, and co-designed research and development.
Our strong team of interdisciplinary experts, developers, and practitioners will work together during the 5-year LDI Institute project to move current practices beyond the small-scale studies to bring the learning sciences into the era of big data and interdisciplinary science convergence. The proposed processes, methods, and studies pave the way for taking these outcomes to other domains. The impact of LDI will be felt far and wide, e.g., many of the research priorities we identified will result in career-long efforts by some of our younger faculty members and our work, and will propagate and evolve beyond the lifetime of the award and beyond our own team thus acting as an agent of change for how research questions are conceived and addressed through interdisciplinary collaboration.
Acknowledgements
The Learner Data Institute is sponsored by the National Science Foundation (NSF; award #1934745). The opinions, findings, and results are solely the authors' and do not reflect those of NSF.
Reference Text and Citations
[1] Rus, V., Fancsali, S.E., Bowman, D., Pavlik Jr., P., Ritter, S., Venugopal, D., Morrison, D., and The LDI Team (2020). The Learner Data Institute: Mission, Framework, & Activities. In V. Rus & S.E. Fancsali (Eds.) Proceedings of The First Workshop of the Learner Data Institute - Big Data, Research Challenges, & Science Convergence in Educational Data Science, The 13th International Conference on Educational Data Mining (EDM 2020), July 10-13, Ifrane, Morroco (held online).
Copyright "Carol I" National Defence University 2022