Content area
Purpose
This article aims to explore how the mapping strategies between user requirements expressed by the humanities researchers lead to a better customization of user-driven digital humanities tools and to the creation of innovative functionalities, which can directly affect the way of doing research in a digital context.
Design/methodology/approach
It describes the user-driven development of a tool that helps researchers in the quantitative and qualitative analysis of large textbook collections.
Findings
This article presents an exemplary user journey map, which shows the different steps of the digital transformation process and how the humanities researchers are involved for (1) producing innovative research solutions, comprehensive and personalized reports, and (2) customizing access to content data used for the analysis of digital documents. The article is based on a case study on a German textbooks collection and content analysis functionalities.
Originality/value
The focus of this article is the reiterative research process, in which humanists (from the human centred point of view) starts from an initial research question, using quantitative and qualitative data and develops both the research question and the answers to it by with the aim to find patterns in the content and structure of educational media. Thus, from the viewpoint of digital transformation the humanist is part of the interaction between digitization and digitalization processes, where he/she uses digital data, metadata, reports and findings created and supported by the digital tools for research analysis.
1. Introduction
The availability of digitised or digital born text sources provides new opportunities for both humanities scholars and computer scientists: humanities scholars can access an enormous volume of digitised texts and computer scientists are developing innovative solutions to help with their analysis.
This shift from reading a single, printed, book to the option to browse many digital texts is at the heart of the digital humanities (DH) domain, which helps to develop solutions for the handling of vast amounts of data. New research questions arise from work in the DH: computational methods may support raising innovative research questions in the humanities.
Emerging in the late 1980s, the DH discipline initially focused on designing standards for cultural heritage data such as the text encoding initiative (TEI) [1], and on aggregating, digitising and delivering data. Then, visualization techniques have been integrated for better analysing the data provided (Saito et al., 2010). In the meantime, while the close reading of texts was developed in the middle of the twentieth century as a method in literary criticism (Gold, 2012), distant reading is a relatively new approach introduced by Franco Moretti at the beginning of the twenty-first century (Moretti, 2005). More recently, the research trend has been a combination of close and distant reading approaches (Beals, 2014; Bradley and Meister, 2012; Coles and Lein, 2013; Shneiderman, 1996). The rather specific field of textbook studies followed a similar development, complementing qualitative analysis with quantitative methods (Lachmann and Mitchell, 2014). Meanwhile, Svensson differentiates minimalist reading “application of computer technology to traditional scholarly work” and maximalist reading “changing the substance of humanistic matter” (Svensson, 2016).
A recent paper (Joo et al., 2022) explores the DH research agenda providing the knowledge structure and research trends in the domain of DH in the recent decade. Data mining in the DH usually involves extracting information from a body of texts and/or their metadata to answer research questions, that may be quantitative or qualitative, and to detect patterns across large text collections/corpora. While text analysis is part of qualitative research, the algorithms used by the tools apply quantitative methods as well as search/match procedures to identify the elements and features in any text.
This article explores how quantitative and qualitative approaches of text analysis in the field of research on educational media can be supported by humanities-driven digital processes. It describes solutions and challenges that have been identified throughout the evaluation procedures and requirement analysis in an interdisciplinary process. In a previous paper, we formalized the digital transformation processes in the humanities (Gay et al., 2009), which laid the ground for the development of the educational media research toolbox (short: Edumeres toolbox) (Gold, 2012). This tool aims to meet challenges posed by heterogeneous digital corpora from the field of education. It supports interdisciplinary textbook analyses using sources from different services (such as the collection of digitized school textbooks Georg Eckert Institute (GEI)-Digital [2] or the collection of school curricula Curricula Workstation [3]).
This article describes how the Edumeres toolbox has been designed in contrast to two other digital tools to respond to the fundamental needs of today's DH researchers and how it provides functionalities to manage a digital library through a web application. Section 2 describes text analysis as a method in the humanities with a focus on the specific requirements of educational media research. Section 3 presents how different tasks and challenges of textbook analysis can be supported by computational methodologies and how three different tools have been approaching this aim: the Max Weber Qualitative Data Analysis (MAXQDA), DiaCollo and Edumeres Toolbox. Based on this, the article explains some solutions included in Edumeres Toolbox to implement different content analysis functionalities in Section 4.
2. Related research
Despite the quickly evolving nature of what is now commonly referred to as DH, most humanists agree upon an early pioneer in humanities computing, Roberto Busa. Busa wrote the foreword to A companion to digital humanities which formally introduced the term “digital humanities” (Brandeis Library, 2012). Since then, there have been varied ways of defining DH ( De Luca et al., 2019). From John Unsworth to Julia Flanders, or from Ernesto Priego to Ed Finn, the bottom line is that, DH define the overlap between humanities research and digital tools. But the humanities are the study of cultural life, and our cultural life will soon be inextricably bound up with digital media. There are hallmarks that are part of the nature of the DH field. These hallmarks include the application of technology to a research question(s); collaboration between disciplines, services, programs and departments; iterative nature of projects requiring an attitude of experimentation and problem-solving; critically questioning the role and impact of the technology; creating space to ask new questions; and bringing new ways of exploring old data.
The common thread is always being the humanist bringing computational power to bear on their questions and applying analysis to their findings.
As DH has emerged as a distinct academic field, researchers have tried to explore research topics and knowledge structure in the field. As an early effort, in investigated the emergence of DH between 2005 and 2008 based on the analysis of DH research publications Gao et al. (2017) conducted author co-citation analysis using VOSviewer, which is a software tool designed to automatically analyse bibliographic records. Su (2020) examines the structure, patterns and themes of cross-national collaborations in DH research through the application of social network analysis and visualization tools.
Joo et al. (2022), analyse the recent decade and provide the knowledge structure and research trends in the domain of DH.
Su et al. study aims to update and extend previous efforts gauging the status of the quickly evolving field of DH during 2005–2020 providing a longitudinal examination of the research output, intellectual structures and contributors (Su and Zhang, 2022).
Research on natural language processing (NLP) shows a great promise for shedding light on key questions about the social and political aspects of education. By bringing together computational socio-linguistics and computational folkloristics, Nguyen applies NLP methods to identify variations in texts such as twitter feeds in order to provide insights into social and cultural phenomena (Moretti, 2005). Such a computational approach to analyse and model variation in texts provides important insights on textual data, which might be further developed in the field of educational media analysis, for instance to explore different versions of textbook series.
A survey of 15 history textbooks from Texas in the United States published by Lucy et al. (2020) applies NLP, first, to measure complex concepts shedding new light on the scope and scale of trends in educational discourse, second, to recognize linguistic connections between words in the texts and thus relational forms of meaning and associations between concepts and third, to systematically capture the use of specific words to promote particular perspectives and frames (Joo et al., 2022). Hereby different approaches relevant for educational media analysis have been considered, including a tool for online and in-person class discussions (Loewen, 2008), a variety of NLP tools, such as Coh-Metrix (Gold, 2012), the tool for the automatic analysis of text cohesion (Crossley et al., 2016), and ReaderBench (Dascalu et al., 2014). These tools have been used to characterize text cohesion, difficulty and complexity in learning analytics and education data mining (Crossley et al., 2018). As has been shown by Lucy et al., these and other tools can be successfully used by educators to select educational materials or analyse dialogues in digital learning environments (Lucy et al., 2020).
However, there has been less work on applying NLP to investigate educational media, like some early efforts to apply machine counting of words to scanned textbooks, such as Lachmann and Mitchell (2014) or Lucy et al. (2020).
All these approaches consider only a part of possible research questions, while a general framework is needed, that enables source-independent analysis, data upload with NLP methods.
3. Digital humanities and the challenges of interdisciplinary research
This section addresses key questions related to the possibilities and requirements for text analysis in the specific field of educational media.
Digitisation has made an increasing amount of relevant material easily accessible to researchers in the humanities. While quantitative methods describe a specific collection, corpus or sample with numerical data (Joo et al., 2022), qualitative research is concerned with meaning and the processes and conditions leading to the creation of meaning. Even though, by relying on figures and facts, quantitative analysis appears to be as objective as possible and qualitative research being attributed a rather subjective quality of interpretation, all empirical data – including quantitative figure and numbers cannot be understood and presented without interpretation (Wang and Inaba, 2009).
While school textbooks form a pertinent source for global comparison of content assigned to educating and inspiring the next generation about issues deemed relevant, as a specific form of educational media, they present specific requirements to digital software that aim to support the analysis. They are characterized by a complex structure, including texts and images, historical and other sources and pedagogical assignments. Furthermore, textbooks for different subjects, states and school forms might provide different structural features, languages and fonts. As globally available and regularly revised sources, textbooks lend themselves for comparisons between subjects, states, publication dates and school grades. In some cases, the differences between textbook versions can be rather small, such as single sentences or images being replaced, in other cases whole textbook pages or chapters are revised. Textbooks for different subjects, decades or states are following totally different conceptualization that provides a challenge to recognize any similarities. Hence, researchers working on textbooks have to adapt their research questions and methods to the respective context and sources. Therefore, there is no possibility to implement a “one fit for all” digital solution.
Different characteristics, which are relevant for textbook analysis and educational media research, can be summarized as the following:
The recognition of complex book structures such as columns, captions etc. and different fonts including historical fonts (optical character recognition) (OCR)
The comparison of different editions (revised at intervals or designed for different regions, e.g. federal states)
Identification of repetitions in different editions and versions and control for small deviations
Inter-lingual comparison of textbooks from different states
Automatic and manual identification of didactic assignments
In the following the importance of supporting research with codes and categories in quantitative and qualitative analysis is emphasized. For this purpose, two different tools (MAXQDA and DiaCollo) have been considered in order to recognize characteristics, which are relevant for textbook analysis and are not yet included in common software programs, which led to the implementation and development of the Edumeres Toolbox (De Luca et al., 2019, 2022), based on the digital transformation processes discussed with different inter-disciplinary researchers (De Luca and Spielhaus, 2019).
MAXQDA is a software for computer-assisted qualitative, quantitative and mixed methods data analysis. It has been designed by a German based company since 1989. Its first scope was to organise and analyse interview data (Gizzi and Rädiker, 2021). Since 2000 it has been enlarged to include all kinds of texts and even images. The creation of large corpora and samples of educational media in MAXQDA is challenging as the program requires extensive storage capacity and Random Access Memory (RAM). The program provides vocabulary and frequency analyses which can be correlated with extensive attribute functions called variables by the programme and its ability to process a large number of files – PDFs and others (Loewen, 2008; Rädiker and Kuckartz, 2019). The lack of interfaces, because of its proprietary implementation, however, means that for large corpora metadata need to be introduced and controlled in a time-consuming process. Meanwhile, the possibility to arrange a corpus (or sample) and subsamples as “sets” allows for comparisons of educational media according to subject, year or date of publication, school type, school year or state. However, arranging such comparisons, which are a recurring task in the analysis of educational media, involves a time consuming and error prone procedure.
DiaCollo is open source software (source code) for diachronic collocation analysis. It was developed and continuously improved by Bryan Jurish from 2015 onwards as part of the software development of the joint project Common Language Resources and Technology Infrastructure (CLARIN)-D at the Centre for Language Studies of the Berlin – Brandenburg Academy of Sciences and Humanities (BBAW) in the context of cooperation and in collaboration with the specialist working group on historical studies (Jurish and Whitt, 2018). DiaCollo is part of the corpus management software D* of the Centre for Language at the BBAW, which has been optimised for working with carefully curated digital text collections, i.e. those that are as error-free and true to the original as possible.
The project DiaCollo for GEI-Digital – a collection of historical German school textbooks from 1648 – 1920 – was supported by the seed funds “GEI-Innovation 2020” and had an experimental character. The aim of the project was to integrate the GEI's data into tools from the BBAW in order to enable computer-aided analysis and visualisation of the full text and to evaluate the current data quality (Nieländer et al., 2022b) [4].
Hereby DiaCollo is always restricted to the corpora already included in the software program (DTA or GEI-Digital). It is not possible to draw sub-samples and the program does so far not enable comparison of different corpora or resources. NLP, OCR or machine learning processes have not been yet included. The analysis of external corpora can only be enabled by the creation of an instance of the D* and DiaCollo software, which has to be set up (e.g. the GEI) and hosted at the institution.
Based on these findings, contextual interviews with educational media researchers and began the process implementing a user journey map, as described in the following.
4. Creating a user journey map for the digital transformation processes of research
The availability of digitised or digital born textual sources provides opportunities to automatically analyse the textbooks and create new forms of support through information technology, for researchers.
Quantitative analysis gathers statistical and structured data which help to draw general conclusions from the research question. Qualitative analysis, in contrast, collects information that describes a topic rather than measures it. It is more concerned with impressions, opinions and points of view. A qualitative analysis is therefore less structured than a quantitative analysis, and aims to gather information about people's motivations, thoughts and attitudes in order to get to the bottom of the topic in question. While this may provide a deeper insight into the research questions, it also makes the analysis of the results more difficult.
Quantitative data can provide a general overview of a phenomenon, while qualitative data adds detail and can sometimes provide an informed interpretation of the results of the analysis. Quantitative analysis means, therefore, the examination of a set of data in terms of seeking a statistical breakdown of phenomena and categories of information.
Quantitative analysis, which is generally applied to research carried out with similar criteria, tends to reveal the significance of a phenomenon in relation to its percentage of incidence.
Edumeres Toolbox is the result of interdisciplinary work on digital tools in the field of textual analysis. Edumeres Toolbox can only continue to be successful if humanities scholars and other researchers, who want to use digital tools for their analyses as well as tool developers, continue to contribute their reflections on the processes involved in textual analysis in order to provide detailed descriptions of each step in the research. This type of collaboration has defined the development of Edumeres Toolbox from the beginning.
The Edumeres Toolbox enables researchers to analyse heterogeneous digital corpora. It supports researchers, for example, when performing interdisciplinary research; analysing data from different perspectives, and allowing them to use relevant data gathered by other services (like GEI-digital or curricula workstation).
In order to develop an appropriate research design, humanities scholars must define research questions and formalise them using a requirements analysis. It is then possible for computer scientists to develop a module for the Edumeres Toolbox for qualitative analysis. The focus is on the research loop process, where the humanist expresses their needs, the data scientist team translate them, and the computer scientists create the required functionalities. The humanist utilizes the digital data, metadata, reports and findings created and supported by the digital tools for his/her research analysis, which is also included in the research loop process.
Qualitative data does not, however, necessarily need to be analysed in a (purely) qualitative way. Our system generates different ways to analyse qualitative data using quantitative methods, ranging from simple word frequency analysis and text mining to sophisticated topic modelling approaches. Edumeres Toolbox allows humanities scholars to combine qualitative and quantitative approaches and enables them to interpret and group the terms and text segments.
The first stage of starting corpus analysis in the research toolbox, as mentioned before, is to follow the three steps of the text analysis process: corpora creation, corpora selection and corpora analysis. This means the researcher creates a project, imports documents from external resources and then selects and analyses subsamples. When building a corpus, the Edumeres Toolbox provides an analysis environment that is designed to facilitate reading and interpretive practices for DH researchers as well as for the general public.
In Figure 1 the typical Edumeres Toolbox user journey map is reported. It describes a step-by-step process, where the user interacts with Edumeres toolbox to analyse the own corpus.
The researcher starts creating a project in the Edumeres Toolbox. The set of textbooks and its metadata are imported, and an OCR process is started to extract content and to make it available in a digital machine-readable form. This also allows to make the word of interest and its derivatives searchable for using them in a frequency or co-occurrence report as well as for topic modelling creation. Researcher will be able also to get data visualization like distribution of documents over the time, word tree, etc. So, the researcher is now in the phase of combining quantitative and qualitative methods to draw conclusions and to answer his question.
To do this for example we give the opportunity to compare the results across the large collection of textbooks by filtering results by metadata, for example the researcher can see the use of the word in the different academic years, or to compare its appearances by grades etc.
5. Text analysis
Researchers in the field of humanities carrying out research on one or more literary works might be interested in analysing related texts or text passages. But the digital age has opened possibilities for scholars to enhance manual workflows. The shift from reading a single “analogue book” to the ability to browse many digital texts is one of the foundations and principal pillars of the DH, which aims to develop solutions for handling vast amounts of cultural heritage data. In contrast to manual methods, the DH encourage new research questions on cultural heritage datasets. The discipline employs existent algorithms and tools from the computer science domain, but new methods are needed for certain research questions in the humanities, which are invented in collaboration with computer scientists. In order to develop a suitable research project, humanities scholars need an overview of the resources they will use: existing curricula, textbooks and other educational media that might be required to answer a specific research question. They also need access to the sources they want to include in their analysis (preferably the full texts). A corpus can subsequently be created and different subsamples can be selected and analysed. This leads to the text analysis process being divided into three phases: corpus creation, corpus selection and corpus analysis. For each of these steps, the search question and its settings play an important role. Digital tools can support all three steps. When creating corpora, researchers must review, find and select sources for a specific analysis or study. In general, the different sources that are needed to answer a research question can be located in different libraries or digital libraries. An important approach in textbook studies is to investigate how often, or within what context, specific terms or topics are presented. Digital research tools can perform this process faster and more efficiently than manual techniques. The next step in textual analysis is often to code identified passages and sentences and to produce an in-depth description of recurring semantic and syntactic patterns. This may include frequencies, occurrences, passive and active constructions or prepositions used with the term(s) analysed.
The extensible markup language (XML) format and TEI are leading technologies in the humanities, used to map the structure of a digital text. The XML format can precisely render complex documents to be reliably processed by automated software. It is used in the humanities for adding annotations to texts that can subsequently be processed programmatically, and for defining the syntactical rules of a language. Since XML only describes the grammar, and not the vocabulary and semantics, TEI is used as an additional standard to facilitate the interchange of documents. It describes the semantic features of a document and the elements, tags and attributes that can be used to convey particular information about a textual document. For instance, TEI tags can denote that a sequence of characters is a particular date or time, a geographical location, a proper name or a particular part of speech.
An important method in textbook analysis is the comparison of textbooks across countries/states, years of publication, subjects, publishing houses etc. Thus, visualisations of found results per line, passage, page or textbook should help researchers to find similarities and differences, as well as dominant patterns and exceptions relevant to the representation of a specific aspect. In many cases, humanities scholars depend on the ability to try different methods of analysis which make enable a surprising observation or reveal a meaningful pattern. The analysis of similar patterns includes the discovery, the alignment and the visualisation of similar text segments among the texts of a given collection. Consequently, different options of systematisation and comparison are needed.
6. The analysis of German textbooks with the Edumeres Toolbox
In this section, we explain how the Edumeres Toolbox was used to construct a frequency analysis report that was required to answer the research questions posed by a project in German. The section then describes the development of the Edumeres Toolbox and the frequency analysis functionality which was customised, in this case, for a project analysing a collection of German textbooks.
“Leibniz Institute for Educational Media │ Georg Eckert Institute (GEI)” conducts and facilitates fundamental research into textbooks and educational media from the perspective of history and cultural studies. Textbooks are appealing research objects, which lend themselves to manifold deductive and inductive textual analyses and comparisons with regards to teaching subjects, historical periods, states, regions or languages.
Those who use the Edumeres Toolbox application are researchers who require quantity analysis tools capable of analysing and processing large amounts of data.
In a research project that aims to produce a comprehensive review of democracy education in Germany, researchers worked on German textbooks for civics to explore the extent to which discussions, depictions and visualisations of the subject have emanated in a range of chapters. In order to identify passages about democracy education, a quantitative analysis was performed using the Edumeres Toolbox.
The content analysis can be done through different functionalities. We realized different functions and features like digital reports (i.e. topic modelling, frequency analysis, co-occurrence) and interactive visual data visualization (i.e. distribution of documents over time, Geography distribution of document over the time and Word Tree View), which enable researchers to perform distant reading and to track patterns across large collections of data.
In this section, we give some examples, in order to let the reader, understand how the humanities researchers are supported in the content analysis.
6.1 Frequency analysis report
The frequency analysis report is an Edumeres Toolbox functionality that creates a report detailing the frequency of words throughout documents within a corpus. The researcher has several options to customise the analysis by combining qualitative and quantitative approaches, and by interpreting and grouping terms considered relevant for the analysis. Researchers may prepare a list of relevant words, and research questions such as those below, to provide a guide to quantitative aspects:
How often are the selected words mentioned in the corpus?
How are these words distributed across the textbook metadata, such as states, subjects, school year, school types, years of publication, etc.
In which general thematic and semantic context do they appear?
Metadata expertise is key to structuring curricula and textbooks as well as other digital objects. To enable an overview of the content of a specific corpus of educational media such as textbooks or the (co-)occurrence of specific terms, Edumeres Toolbox provides the feature of topic modelling reports. These reports are computed on a selected corpus or sub-sample and can reveal abstract structural and semantic patterns or specific terms that occur in a collection of documents (see Figure 2). Parameters can be changed by filters of metadata.
The case study aimed at examining how often terms such as democracy (German: “demokratie”) appeared in each textbook and to compare the results according to metadata such as subject, school year and year of publication.
The search can be carried out with different “criteria” such as prefix, suffix, inclusion, lemma and stem. The word count of terms that correspond with one of the chosen “criteria” is documented in the report.
For example, the report of words connected with “demokratie” is shown in Figure 3. For reasons of visibility, a threshold of 10 was set, i.e. to remove words that occur less than 10 times in the corpus from the report.
6.2 Co-occurrence analysis report
In order to find segments of a text in which two or more words appear together in a sentence or to extract semantic links between words, the tools can perform co-occurrence analysis. This shows how often two terms appear alongside each other, or in a certain order, within a text corpus. Stop-word lists and black lists can be used in connection with co-occurrence analysis (see Figure 4).
The co-occurrence report can be customized by filters (see Figure 5) like black and white list or a threshold minimum and maximum.
6.3 Data visualization report
Another function provided by Edumeres Toolbox is the word tree (see Figure 6) that allows researchers to explore how keywords are used in different sentences throughout the corpus, and visualises the frequency of the terms being used throughout the texts. This kind of distant reading provides researchers with a means to track word (co-)occurrences to the specific sources and visualize relevant sources.
7. Conclusion
This article discussed and emphasized the importance of supporting research with codes and categories in quantitative and qualitative analysis with NLP-based digital tools. For this purpose, two different tools (MAXQDA and DiaCollo) have been compared in order to recognize characteristics, which are relevant for textbook analysis and are not yet included in common software programs. This was a basis for the implementation and development of the Edumeres Toolbox (De Luca et al., 2019, 2022), based on the digital transformation processes discussed with different inter-disciplinary researchers (De Luca and Spielhaus, 2019).
Based on the findings, contextual interviews with educational media researchers have been conducted and the process of implementing a user journey map has been completed. At the moment, the Edumeres Toolbox allows researchers to import information, and to analyse and compare data in ways that produce new information. A continuous dialogue is been held between computer scientists and humanists to understand how to create functionalities in the toolbox that would allow corpus text analysis. Users express their needs, the Edumeres toolbox team translate them and create the required functionalities.
Its diverse functions and features realized like reports (i.e. topic modelling, frequency analysis, co-occurrence) and interactive visual data visualization (i.e. distribution of documents over time, geography distribution of document over the time and word tree view) enable researchers to perform distant readings and to track patterns across large collections of data. Human intervention is, however, necessary at all stages, as the data obtained would be relatively meaningless without close reading. Researchers need to combine distant and close readings also referred to as quantitative and qualitative methods, annotations and visualization to contextualize the data and, thus, detecting connections between and patterns within texts and other research data. OCR, especially in the particularly complex setting of educational media require improvement and an enhanced automatic layout recognition and the possibility to include research data as annotations in the digital process and the indexing process of the annotations in the advanced search functionalities in order to develop the big data architecture of a tool for an enhanced text analysis. A specific asset to the analysis of educational media would be settings which allow recurring comparisons of similar data and corpora and the visualization of results in tables and charts.
Remaining issues to be further developed are (1) the recognition of complex “book structures” such as columns have to be supported, this is a question of OCR, (2) functionalities enabling comparisons between different editions, school subjects, school years, publishing houses and years of publication, (3) tools that identify repetitions of identical phrases and passages or minor changes in texts such as included in different editions and versions and control of small deviations have to be recognized (Support research on versioning digitally) as well as (4) a workflow to identify and annotate didactic assignments.
The DemoS research project, which has been financed by the Federal Ministry of Education and Research (BMBF), aimed to produce a comprehensive review of democracy education in Germany, both in practice and as a subject and objective of political education at secondary level general education and vocational schools as well as in examples of specialist teacher training. DemoS was part of the collaborative project Demokratiebildung in Deutschland (democracy education in Germany) being conducted with the German Youth Institute (DJI) which explored early childhood education in a complementary sub-project called Bildung und Demokratie mit den Jüngsten – BilDe (Education and democracy with very young children). This project enabled the GEI and the DJI to jointly contribute their research to debates on a highly relevant social topic: political decision-makers and educationalists emphatically formulate the goal of a fundamental education in democracy as a requirement for the inclusive education of children and young people in general and for school education in particular.
Since submission of this article, the following authors have updated their affiliations: Ernesto William De Luca is at Otto-von-Guericke-University of Magdeburg, Magdeburg, Germany, Dipartimento di Scienze Ingegneristiche, Guglielmo Marconi University, Rome, Italy, Leibniz Institute for Educational Media | Georg Eckert Institute, Braunschweig, Germany; Francesca Fallucchi, Bouchra Ghattas and Riem Spielhaus is at Leibniz Institute for Educational Media | Georg Eckert Institute, Braunschweig, Germany.
Notes
1.Text Encoding Initiative: TEI, 2015. https://www.tei-c.org/ (accessed 9 January 2015).
2.https://gei-digital.gei.de/viewer/index/
3.https://curricula-workstation.edumeres.net/en/about-curricula-workstation/
4.OCR has been especially challenging in the context of textbooks from the late 19th and early 20th century (Nieländer et al., 2022b).
Figure 1
User journey map (Edumeres Toolbox)
[Figure omitted. See PDF]
Figure 2
Topic modelling report on a selected corpus
[Figure omitted. See PDF]
Figure 3
Frequency analysis report of the words that have “demokratie” as prefix
[Figure omitted. See PDF]
Figure 4
Co-occurrence report view
[Figure omitted. See PDF]
Figure 5
Menu to customize co-occurrence report post computing
[Figure omitted. See PDF]
Figure 6
Word tree visualisation of the word “Demokratie”
[Figure omitted. See PDF]
© Emerald Publishing Limited.
