Content area
Background
Students take many tests and exams during their school career, but they usually receive feedback about their test performance based only on an analysis of the item responses. With the increase in digital assessment, other data have become available for analysis as well, such as log data of student actions in online assessment environments. This paper explores how we can use log data to extend performance-related feedback with information related to the applied solution strategy.
Methods
First, we performed an exploratory model-based cluster analysis in order to identify the solution strategy of 802 students with a modal age of 14 in a pre-algebra item from the French national assessment CEDRE. Second, we related the students’ solution strategies to their mathematical ability based on the entire assessment.
Results
Five distinct groups of students with different in-assessment behavior were identified, of which one group had a significantly lower estimated mathematics ability than the other groups.
Conclusion
These findings can provide a basis for more in-depth feedback and further instruction on the level of an individual student and can inform teaching practices at the class level.
Introduction
Advancements in computer-based assessment in the past decades have focused on providing a direct measurement of complex concepts or skills (Pellegrino & Quellmalz, 2010). Now that test designers move to take advantage of the wide array of unique possibilities on digital platforms and devices, new technology-enhanced item types are emerging and assessment environments are becoming more than a translation of existing paper-based assessments. Technology-enhanced assessment offers new opportunities for just-in-time and personalized feedback contingent on the student’s progress, as well as more adaptive assessments (Timmis et al., 2016).
With the digitization of the educational assessment domain and the rapid developments in technology in general also comes a higher availability of student data, as well as the possibility of storing more meaningful data concerning the test-taking process and situation than just student responses and the accompanying scores. This type of data is referred to as process data. Examples of process data include gestures, eye movements, and physiological responses. A specific category of process data is log data, which consists of the logged interactions that students have with a digital testing environment, such as mouse movements, mouse clicks, keystrokes and their timing (e.g. Maddox, 2023; Lindner and Greiff, 2023; Kröhne and Goldhammer, 2018). Log data has long been stored and analyzed in the field of web analytics and more recently within digital learning systems as well (e.g. Cetintas et al., 2009a, b; Wang, 2021). In recent decades, studies that make use of log data have begun to emerge in the field of educational assessment. One common application is the use of response times for improving estimates of ability (e.g Van der Linden et al., 2010; Bolsinova and Tijmstra, 2017; Fox and Marianti, 2016). Studying student behavior (either using log data or other methods such as observational studies) can teach us about test design and student learning processes (Shute et al., 2009).
ILSAs also increasingly make use of digital-based interactive and technology-enhanced items (e.g. PISA (OECD, 2024), TIMSS (Von Davier, Fishbein, and Kennedy, 2024), ICILS (Heldt et al., 2020), and PIAAC (He et al., 2019)). Process data of these items potentially contain valuable information about interindividual and intraindividual differences in response processes. However, how to effectively incorporate process data into ILSA analyses remains unclear, despite the growing number of studies on this topic in recent years (e.g. Goldhammer et al., 2017; Tang et al., 2020; Ulitzsch et al., 2021).
Log data research in educational assessment
Research into log data in educational assessment has for a large part so far focused on the topics of game-based assessment (Cui et al., 2019; F. Chen et al., 2020; Kerr & Chung, 2012), response motivation (Pokropek et al., 2023; Nagy et al., 2022; Ulitzsch et al., 2022; Nagy & Ulitzsch, 2022) or rapid guessing (Deribo et al., 2023; Nagy et al., 2022), and the assessment of (complex) problem-solving ability (e.g. He and on Davier, 2015; Greiff et al., 2016; Stadler et al., 2019; F. Chen et al., 2020; Ulitzsch et al., 2022). Studies on the assessment of (complex) problem-solving ability suggest that log data can significantly enhance our understanding of students’ cognitive and behavioral processes.
Approaches to log data analysis in the field of (complex) problem-solving can roughly be divided into data-driven or theory-driven methods (or a combination of the two). Log data are very suited to data-driven analysis methods (F. Chen et al., 2020), and many researchers have consequentially approached the analysis of log data in a bottom-up fashion (He & on Davier, 2015, 2016; He et al., 2019, 2019; Chang et al., 2017; Y. Chen et al., 2019; Ulitzsch et al., 2022; Xiao, He, Veldkamp, and Liu, 2021). An advantage of data-driven log data analysis methods is that they can generally be generalized over items or tasks (He et al., 2021).
However, for the results of such analyses to be useful in practice, it is important to bridge the gap between the data and didactic theory (Gobert et al., 2013). For this reason, others have opted for a more theory-driven and top-down approach, in which hypotheses about the relations between certain behaviors and outcomes of interest are formulated in advance (Eichmann et al., 2019; Albert & Steinberg, 2011; Greiff et al., 2016; Han et al., 2019; Westera et al., 2014; F. Chen et al., 2020; He et al., 2023; Jiang et al., 2023; Goldhammer et al., 2021). These theoretically relevant behaviors are often operationalized by (aggregated) events in the log data into single unit measures, and subsequently analyzed using standard statistical methods. It is possible to aggregate some sequential information into such single unit measure, but this approach essentially does not take the order of student interactions into account, which poses an important disadvantage. However, there are recent examples of studies using theory-driven approaches that also take the order of student interactions into account. For example, Jiang et al. (2023) and He et al. (2023) compared sequences of log events to expert reference sequences using sequence mining, and Zhang and Andersson (2023) made use of network analysis to represent the order in which students perform operations.
Other recent studies have combined a more top-down, theory-driven approach with data-driven techniques (Eichmann et al., 2020; He et al., 2021; Stadler et al., 2019). For example, Eichmann et al. 2020 categorizes student actions into several didactically meaningful categories before doing a sequence-based analysis. An advantage of this approach is that the analysis allows for students to display behaviors from several different categories (instead of being limited to one category). However, for this approach to be feasible, it is necessary that student actions can be assigned to one (and only one) category, which is not always the case.
Log data research in mathematics assessment
As demonstrated by the studies described above, analyses of log data have led to new insights in the assessment of (complex) problem-solving. In different areas of study, log data could lead to similar advances. In the area of mathematics education, the recent decade has seen a steady increase of studies taking log data into account to advance our understanding of student cognitive and behavioral processes. Some of this work has taken place in the context of online learning or tutoring environments (Kerr & Chung, 2012; Martin et al., 2015; Gobert et al., 2015; Derr et al., 2018; Hrastinski et al., 2021; Olsher et al., 2023), whereas others have analyzed log data coming from collaborative, formative or summative assessment (Salles et al., 2020; Jiang et al., 2021; Mohan et al., 2020; Faber et al., 2017; Reis Costa et al., 2021; Jiang et al., 2023; Araneda et al., 2022). Several of these studies have related variables derived from log data to student achievement or success on tasks (Salles et al., 2020; Mohan et al., 2020; Faber et al., 2017; Derr et al., 2018; Araneda et al., 2022). Other interesting applications included using time-on-task to improve the precision of ability estimates (Reis Costa et al., 2021), relating questions that were posed in an online tutoring environment to perceived satisfaction and learning (Hrastinski et al., 2021), using variables derived from log data to identify students at risk of dropping out (Derr et al., 2018) and using sequence mining on keystroke sequences to relate onscreen calculator use to student proficiency (Jiang et al., 2023).
A few studies aimed to identify student solution strategies or error patterns: Kerr and Chung (2012) did so in the context of educational video games and simulations, and both Jiang et al. (2021) and Salles et al. (2020) did so in the context of a large-scale national mathematics assessment. Learning about the different solution strategies that students take in solving items in mathematics assessments, and which strategies are effective in what situations, can provide a basis for feedback towards students and teachers to inform the learning process (Zhang & Andersson, 2023), since it constitutes feedback on possible improvements (Shute et al., 2009).
Theoretical framework and research questions
In this study, we examine different solution strategies that students can use on an interactive mathematics item, both from a theoretical and an empirical viewpoint. We theoretically distinguish the possibility for an algebraic solution strategy as well as for a numerical trial and error solution strategy on the item used in this study. The theoretical framework for student solution strategies used in this study was introduced by Sfard (1991), and defines a distinction between an operational and a structural approach to mathematical concepts.
In the operational approach, the student views a mathematical concept as a process. A numerical trial and error solution strategy is considered an operational approach. In the structural approach, the concept is viewed as an object with its own characteristics, which can be compared to other objects of the same type. An example of the distinction in these views is the notion of algebraic expression. An algebraic expression can be seen as a series of operations that lead from input to output, or as an object with characteristics such as being quadratic, being equivalent to another one, or being symmetric in its variables. An algebraic solution strategy is considered a structural approach. It is suggested that students start viewing mathematical concepts from a more structural perspective towards the higher secondary grade levels.
Within this study we aim to identify which of these two approaches (if any) students have taken on a digital mathematics assessment item. To this end, we have derived meaningful variables from student log data based on this theoretical framework to answer the following two research questions:
Do students use an algebraic (structural), a numerical trial and error (operational), or a different type of solution strategy while solving a digital mathematics assessment item?
What is the relationship between student mathematical ability and their solution strategy?
Methods
In this study, we looked at an item from French national mathematics assessments in grade nine as a test case to develop methods for analyzing mathematical log data in a way that leads to didactically meaningful inferences about a student’s solution strategy.
The product equation item
The item that was analyzed in this paper is called ‘Product Equation’, and is displayed in Fig. 1. The question that has to be answered by the student is which number they have to choose for the result of the calculation on the left to become zero. The final responses are entered into the input fields on the right. In the input field on the left, the student can try different values and the result of the equation is calculated for them. When the student enters a value into the field, the left side of the calculation multiplies the value by 3 and then adds 2. The right side subtracts 3 from the input value. These intermediate results are then multiplied to produce the final result of the calculation. If the result is 0, the starting value the student has entered is a correct response to the item. The students also have a pencil tool, eraser, measuring tool, graphing tool, and calculator at their disposal, although none of these tools are necessary to solve the item. The correct responses to the item are “3” and “”.
The scoring for the item was dichotomous and without the possibility of partial credit, meaning that the students could either score zero points or one point on the item in the assessment. The student was awarded the point if they correctly identified one of the two possible correct answers, either in the answer fields or in the input field of the calculation program on the left. Numbers within 0.01 distance of either of the two correct answers were also deemed correct.
[See PDF for image]
Fig. 1
The ‘Product Equation’ test item. This test item has been translated from French to English for the benefit of the reader
The two main strategies that can be followed for solving this item are a trial and error (operational) approach and an algebraic (structural) approach. In a trial and error approach, the student tries out different input values in the calculation program to find the correct one(s) by trial and error. The student may stop after the first correct response has been found, or may continue to search for another correct response. Students who use an algebraic approach realize that the calculation program translates to the expression . To find its solutions, they should be aware that for a product to be zero, one of the factors should be equal to zero (or both, but that does not apply here). These students may input an “x” as starting value into the program, which then outputs the expression , and solve the equation either mentally or using pen and paper. As such, these students may spend more time away from the assessment environment.
Data collection
The item that was analyzed in this paper is part of the assessment cycle called CEDRE (Cycle des Évaluations Disciplinaires Réalisées sur Échantillon). CEDRE is a French national sample-based low-stakes assessment created and organized by the Department for Evaluation, Prospective and Performance (DEPP). DEPP is the entity within the French ministry of education (Le ministère de l’Éducation nationale, de la Jeunesse et des Sports) that is responsible for national educational measurement. CEDRE was created to measure the level of (among other abilities) mathematics among different age groups in the French educational system, as well as to provide a testing ground for innovations in national educational assessment.
In recent years, DEPP has developed technology-enhanced interactive item types for computer-based assessment to measure specific skills within the mathematical domain. Specifically, they developed items which engage higher-order mathematical thinking (as opposed to more rudimentary arithmetic skills) by having the assessment environment do calculations for the student (Salles et al., 2020). It is then up to the student to use the environment and its results to draw conclusions and respond to the item.
In total, the CEDRE administration in 2019 encompassed 348 items, administered to 7992 students in 309 schools. The assessment had 30 items in a linked design with 13 test booklets, followed by a second, multistage test and a context questionnaire. The assessment took place in the digital TAO test environment, with which the participating students were familiar beforehand. It was administered to students in ninth grade (troisième in the French school system) in both public and private sector schools. The modal age of the participating students was 14 years. Taking part in the assessment was obligatory for students in the participating schools. The test item analyzed in this paper was administered to 1004 students. The students were provided with pen and scrap paper.
The students’ abilities were estimated with a two-step procedure using the scores on all items in the assessment with a two-parameter logistic (2PLM) model. In the first step, item parameters were estimated using Marginal Maximum Likelihood (MML) with an Expectation-Maximization (EM) algorithm. In the second step, item parameters were kept fixed and student abilities were estimated using Warm’s Weighted Likelihood Estimation (WLE). This two-step procedure was used to allow for the comparison of results across different test cycles. More detailed information can be found in the assessment’s technical report (Philbert et al., 2022).
The log data resulting from the aforementioned assessment consisted of an identification variable for the test taker, the time and date of the administration, a final state for all the components in the item environments (e.g. a value in the case of an input field), and all actions the students have taken in the item environment accompanied by a timestamp.
The following steps were undertaken to clean the data. Students who did not interact with the assessment environment at all were removed from the data, as well as students for which no score, estimated ability, or response time was registered. Another five students were removed from the data due to a technical malfunction in the registration of their response times. After cleaning, the data consisted of 802 students.
The interactions derived from students’ input into one of the textual input fields in the item interface (the input field of the calculation program, see section The product equation item) were also cleaned. Specifically, two types of interactions were removed: construction events, interactions that were registered while typing a longer sequence of characters, and deletion events, interactions that were registered while deleting input from a field. For example, if a student typed “” into one of the fields and subsequently deleted this input, four interactions would be registered (respectively): “−”, “”, “−”, and an empty string. The first of these interactions is a construction event and the last two are deletion events, and would thus have been removed while cleaning the data.
Variable construction and conjectured relationships to solution strategy
Variables were constructed from the log data based on the expected student behavior for the algebraic solution strategy, or structural approach (Sfard, 1991), and for the numerical trial and error solution strategy, or operational approach (Sfard, 1991). These variables were used to see if we could identify which approach a student took. Table 1 describes how each variable was constructed and reflects conjectures about underlying solving strategies for different student behavior displayed in the variables. The variables are listed in order of the strength of the conjectures.
Table 1. Constructed variables and conjectured relationships to solution strategy
Name | Type | Description | Expectation |
|---|---|---|---|
Entered the value “x” | binary | Whether the student entered the value “x” into the computational input field. When “x” is entered, the computational program outputs as the result | It was expected that students using an algebraic approach more often input the value “x” than students using a numerical trial and error approach. |
Answered “” | binary | Whether the student submitted “” as answer to the item | It was expected that students who took an algebraic approach were able to find the more difficult answer more often than students using a numerical trial and error approach. |
Number of interactions | count | The total number of times the student interacted with the item | It was expected that students who took a numerical trial and error approach interacted more with the assessment environment than students using an algebraic approach. |
Longest time without interaction | continuous | The longest interval (in seconds) in which the student has not interacted with the item | It was expected that students who took an algebraic approach spent a larger amount of time away from the assessment environment than students using a numerical trial and error approach. |
Time before interacting | continuous | The interval (in seconds) between when the student was faced with the item and when the student first interacted with the item | It was expected that students who took a numerical trial and error approach interacted with the item more quickly than students who took an algebraic approach. |
Entered the value “” | binary | Whether the student entered the value “” into the computational input field | It was expected that students using an algebraic approach input the value “” more often than students taking a numerical trial and error approach, as “” is unlikely to appear in a trial-and-improve process, and rather suggests the check of an algebraically found value. |
Number of unique values entered into the computational input field | count | The number of unique values the student entered into the computational input field | It was expected that students who took a numerical trial and error approach entered more unique values into the calculation program than students using an algebraic approach. |
Number of values entered into the computational input field | count | The number of values the student entered into the computational input field | It was expected that students who took a numerical trial and error approach entered more values into the calculation program than students using an algebraic approach. |
Answered “3” | binary | Whether the student submitted “3” as answer to the item | There was no clear expectation of a relationship between this variable and the approach a student took, as the correct answer “3” can be found quickly using either approach. |
Response time | continuous | The interval (in seconds) between when the student was faced with the item and when the student moved on to the next item | There was no clear expectation of a relationship between this variable and the approach that the student took. Both the numerical trial and error approach and the algebraic approach may take a shorter or longer amount of time. |
Entered the value “3” | binary | Whether the student entered the value “3” into the computational input field | There was no clear expectation of a relationship between this variable and the approach a student took, as this can reflect the result of a trial-and-improve process, or a conscious way to check the result of the algebraic strategy. |
Analysis
To find meaningfully distinct groups of students based on in-assessment actions, we performed an exploratory model-based cluster analysis using the constructed variables as input variables. The analysis was performed using version 1.5-0 of the depmixS4 package (Visser & Speekenbrink, 2010) in version 4.4.1 of the R programming language R Core Team (2022). The models used in the analysis were finite mixture models. Binary variables were modeled using a binomial distribution with a logit link function. Count variables and continuous variables were both modeled using a Gaussian distribution with a log link function.
In the Variable construction and conjectured relationships to solution strategy section, we described how theoretical expectations on student behavior informed the construction of a set of variables. In order to determine which of these variables to include in the finite mixture models, we inspected the correlations between these constructed variables. We used a Pearson correlation for combinations of count or continuous variables, a point-biserial correlation for combinations of binary variables with count or continuous variables, and a tetrachoric correlation for combinations of binary variables. Variables that were very highly correlated () with other variables were excluded from the model.
As a first step in determining the number of latent classes of students to fit in the final model, we compared the fit of models with one to ten classes, simultaneously assessing the robustness of the models to variations in the data by using a bootstrap procedure (e.g. Efron & Tibshirani, 1994). In each of 100 bootstraps, we sampled the total number of students from the original dataset (802) with replacement, creating 100 new datasets that differed from the original dataset. For each of these 100 new datasets, we fit ten finite mixture models: one for each number of classes from one to ten. For each model, the BIC (Bayesian Information Criterion, Schwarz 1978) model fit measure was computed. We inspected the standard deviations of the BIC values over the 100 bootstraps to assess the robustness of the model fit for variations in the student data. The means of the BIC values over the bootstraps were used to determine which number of estimated classes led to acceptable models in terms of their fit.
Next, each of these models was fitted on the original dataset 100 times using different sets of starting values, to prevent the estimation of the parameters from landing on a local maximum. The expectation-maximization algorithm (Dempster et al., 1977) that was used to estimate the parameters was allowed 100 iterations each time to converge to a solution. A minimal example of the code used for the analysis is included in appendix A. To choose a final model from the at most ten models that we are left with (one for each number of classes that led to a model with acceptable model fit), descriptive statistics and visualizations of the behavior of students in the different classes on each of the variables in the Variable construction and conjectured relationships to solution strategy section as well as the students’ scores were inspected to describe the type of student that was typical in each of the classes. As a rule, we opted for a model with fewer classes (a simpler model) unless adding a class would lead to a solution with more meaningfully distinguishable and describable groups of students (Spurk et al., 2020). The descriptive statistics and visualizations for the selected model are included in the Results section. To ensure that the chosen model fitted the data well, we visually inspected the model parameters for the classes with the observed values for students in those classes. For the variables that were included in the selected model, the estimated model parameters are included in the visualizations to provide the reader with a practical sense of the fit of the model. To answer the second research question, two-sided t-tests with and a Bonferroni adjustment were used to determine the statistical significance of the difference in student ability between the classes of students.
Results
Exploring student solution strategies
Model selection
To identify the different solution strategies students used while solving the digital mathematics item, we performed an exploratory cluster analysis. We first inspected the correlations between the variables, which can be seen in Fig. 2, to determine which variables should be included in the model. Due to their very high correlations with other variables, the following four variables were excluded from the model: whether the student answered “”, the number of unique values entered into the computational input field, the number of values entered into the computational input field, and whether the student answered “3”. The remainder of the variables described in the Variable construction and conjectured relationships to solution strategy section were included in the finite mixture models.
[See PDF for image]
Fig. 2
Correlations between variables
For each of the one to ten classes of students, models were fit on 100 bootstraps of the data. The means and standard deviations of the BIC fit measures over these bootstraps are shown as an elbow plot in Fig. 3. The standard deviations of the BIC fit measures over the bootstraps were very small for each number of classes, which indicates that the models seem robust to variations in the data. Furthermore, it can be seen that a model with two classes constituted a large improvement in BIC compared to a model with one class. Each model with more than two classes further improved the BIC marginally.
[See PDF for image]
Fig. 3
Means and standard deviations of BIC fit measures over 100 bootstraps for one to ten latent classes. Note that the y-axis does not start at zero for reasons of legibility
To select a final model, we interpreted the behavior of students in the estimated classes through descriptive statistics and visualizations, starting with a model that estimated two classes, and finally moving up to a model with six classes. The fit measures for the different models that were fitted and interpreted are displayed in Table 2.
Table 2. All fitted finite mixture models with model fit statistics
Number of classes | Log-likelihood | Degrees of freedom | BIC | AIC |
|---|---|---|---|---|
2 | 23 | 31628 | 31520 | |
3 | 35 | 30977 | 30813 | |
4 | 47 | 30581 | 30361 | |
5 | 59 | 30216 | 29939 | |
6 | 71 | 29988 | 29655 |
The model with two estimated classes seemed to distinguish students based on the level of engagement with the item in general. Students in one of the classes spent more time on the item and had a higher performance than students in the other class. Fitting the model with three latent classes constituted an improvement in interpretability over the model with two classes. This model was able to more clearly identify students who engaged with the item very little and scored very badly. The other two classes were more difficult to interpret and seemed to contain students that used a mix of solution strategies, with one class spending more time on the item and the other interacting a bit more with the environment.
Estimating a model with four latent classes led to an improvement in interpretability of the classes over a model with three classes. Not only was the model again able to identify a group of students that engaged very little and were usually unsuccessful on the item, the other three classes also showed more internal consistency and interpretable behavior. Two of the classes showed behavior consistent with the expected behavior in the hypothesized algebraic and trial and error solution strategies. The third contained students that spent a reasonable amount of time on the item, but interacted little with the environment and often did not succeed. Reasons for this type of behavior could be that the student did not know how to approach the item, was distracted, or was unsuccessful in solving the item outside of the environment (perhaps algebraically). A model with five estimated classes was more interpretable still: it yielded similar classes of students that were found by the model with four classes, but with increased consistency. It also identified a class of students who spent enough time and effort on the item to enter “3” as the first correct answer but who did not put in much effort to find the second answer. The model with six estimated latent classes did not result in a more interpretable solution than the model with five classes, as it yielded two classes of students with rather similar behavior. Therefore, the final selected model was the model with five estimated latent classes.
Selected model
The selected model estimated five latent classes and included seven variables: whether the student entered the value “x” into the computational input field, whether the student entered the value “” into the computational input field, the number of interactions, the longest time without interaction, the time before interacting, the student’s response time on the item, and whether the student entered the value “3” into the computational input field. The classes contained 178, 101, 228, 177, and 118 students, respectively. Descriptive statistics on all variables are listed in Table 3 for all students in the data as well as the students in the estimated classes. The high similarity between the observed and estimated values in Figs. 5, 6, and 7 shows that the practical fit of the selected model is adequate.
Table 3. Descriptive statistics on all variables for all students in the data as well as in the estimated classes
Variable name | Statistic | All students | Class 1 | Class 2 | Class 3 | Class 4 | Class 5 |
|---|---|---|---|---|---|---|---|
Entered the value “x” | Frequency “Yes” (%) | 74 (9.2) | 2 (1.1) | 18 (17.8) | 7 (3.1) | 46 (26.0) | 1 (0.8) |
Answered “” | Frequency “Yes” (%) | 32 (4.0) | 4 (2.2) | 14 (13.9) | 3 (1.3) | 10 (5.6) | 1 (0.8) |
Number of interactions | Median (IQR) | 12.0 (16.8) | 5.0 (7.0) | 21.0 (27.0) | 13.0 (8.0) | 26.0 (12.0) | 3.0 (2.0) |
Longest time without interaction | Median (IQR) | 32.4 (31.1) | 53.5 (29.8) | 120.7 (69.5) | 22.3 (8.0) | 33.7 (14.0) | 19.2 (16.8) |
Time before interacting | Median (IQR) | 24.7 (21.6) | 42.6 (29.0) | 64.4 (91.4) | 18.6 (9.8) | 27.2 (14.8) | 15.5 (14.3) |
Entered the value “” | Frequency “Yes” (%) | 19 (2.4) | 1 (0.6) | 5 (5.0) | 3 (1.3) | 10 (5.6) | 0 (0.0) |
Number of unique values entered | Median (IQR) | 8.0 (13.0) | 3.0 (6.0) | 14.0 (19.0) | 9.0 (6.0) | 18.0 (9.0) | 1.0 (1.0) |
Number of values entered | Median (IQR) | 9.0 (16.0) | 3.0 (7.0) | 18.0 (25.0) | 11.0 (9.0) | 24.0 (11.0) | 1.0 (1.0) |
Answered “3” | Frequency “Yes” (%) | 603 (75.2) | 112 (62.9) | 86 (85.1) | 208 (91.2) | 168 (94.9) | 29 (24.6) |
Response time | Median (IQR) | 118.5 (100.6) | 115.0 (51.2) | 286.1 (136.7) | 95.2 (37.2) | 186.4 (76.7) | 39.6 (28.6) |
Entered the value “3” | Frequency “Yes” (%) | 498 (62.1) | 77 (43.3) | 66 (65.3) | 182 (79.8) | 159 (89.8) | 14 (11.9) |
Score | Frequency succeeded (%) | 604 (75.3) | 112 (62.9) | 87 (86.1) | 208 (91.2) | 168 (94.9) | 29 (24.6) |
Theta | Mean (SD) | 0.08 (1.02) | 0.09 (1.02) | 0.39 (1.13) | 0.19 (0.96) | 0.25 (0.94) | (0.80) |
[See PDF for image]
Fig. 4
Characteristics of students in different classes with regard to score-related variables
As can be seen in the top-left corner of Fig. 4, the vast majority of students in classes two, three, and four successfully completed the item, with students in class four succeeding most often. A smaller percentage, though still a majority, of students in class one successfully completed the item, versus only about a quarter of students in class five. The chart in the top-right corner of the same figure shows a very similar pattern, because students who successfully completed the item almost always answered “3”, which was the easiest to find of the two possible correct answers. Finding the much more difficult correct answer “” was rare: only 32 () of students did so. All students who answered “”, with the exception of one student in class two, also answered “3”. Class two housed the largest percentage of students who answered “”, followed by class four, one, three, and five, respectively.
[See PDF for image]
Fig. 5
Characteristics of students in different classes with regard to value-related variables
The charts in Fig. 5 show how often specific values were typed into the input field of the calculation program (see Fig. 1) by students in the five classes. A relatively large part of students in classes four and two entered the value “x” into the computational input field. Very few students in classes three, one, and five did so. Most of the students in class two, three, and four tried the value “3”, versus a little over half of the students in class one, and much fewer still in class five. Very few students (19, or ) tried the value “”, and most of these were assigned to class four or class two.
[See PDF for image]
Fig. 6
Distribution of action-related variables for students in different classes
The graphs in Fig. 6 show how often students in the five classes interacted with the item in the digital environment, how many unique values they tried in the input field of the calculation program, and how many interactions they had with this field in general. Students in class five interacted very little with the item, showing both the lowest median and interquartile range of all classes. Following in order of median were students in class one, three, two, and finally four. Students in class four had the highest median, and students in class two showed the most variation in number of interactions. Students with an exceptionally high amount of interactions were most often in class two, and sometimes in class four. The distributions of students for the other two variables showed a very similar pattern, which follows logically from the very high correlations between these three variables.
[See PDF for image]
Fig. 7
Distribution of time-related variables for students in different classes
The distributions of time-related variables for students in the four classes can be seen in Fig. 7. Class two had a larger interquartile range than the other classes on all three time-related variables (response time, time before interaction with the item, and longest interval with no interaction), indicating that these students portrayed more variation in time-related behavior than students in the other classes. Class two had the highest median on all time-related variables, class five the lowest, and class three the second lowest. Students in class two often had a higher response time than students in class one, but generally waited less long before interacting with the item and their longest intervals in which they did not interact with the system were generally shorter than those of students in class one.
What follows is a description of student behavior per class. Class one contained students who interacted very little with the environment, but spent a reasonable amount of time on it. A large chunk of this time was often spent without any interaction with the environment. A small majority of students in this class successfully completed the item by finding the correct answer “3”, but they rarely found the correct answer “”. Students in class two spent a lot of time on the item and spent the most time away from the digital environment. They interacted with the environment quite a bit, yet not the most of all classes. Students in class two were most often able find the more difficult correct answer “”. Most students in this class (although fewer than in class three and four) successfully completed the item. Students in class three spent little time away from the digital environment, started interacting with the environment very quickly, and very often successfully completed the item. However, very few of the students in this class found both correct answers to the item. They were unexceptional in the amount of (unique) interactions they performed (not very many nor very few), as well as in the time they spent in the digital environment. Class four contained the students who interacted the most with the item. Almost all of them completed the item successfully, and some of them continued to find the second answer. They spent a long time on the item, although notably, they did not spend much of that time away from the environment, nor did they wait long before interacting with the environment. Furthermore, this class contained the largest percentages of students to try the specific values of interest (“x”, “3”, and “”) in the input field of the calculation program. Class five contained students who interacted little with the item, spent little time on the item, and of whom most did not succeed on the item.
Relationship between mathematical ability and solution strategy
[See PDF for image]
Fig. 8
Distribution of estimated ability for students in different classes
The identified classes (found in the Exploring student solution strategies section) differed in their mean estimated ability, which means that more proficient students engaged with the item in a different way than less proficient students. The average estimated ability was the lowest for students in class five. The differences between the mean estimated ability of students in class five and those of the students in the other classes were statistically significant (, after Bonferroni adjustments) with large effect sizes ( as compared to class one, as compared to class two, as compared to class three, and as compared to class four). The mean ability of students in class one was also significantly lower than that of students in class two () with a small effect size (). None of the other differences in mean estimated ability were statistically significant. All means and standard deviations of the estimated abilities can be found in Table 3, and their distributions are shown in Fig. 8.
Discussion
With regard to the first research question, we found five distinct classes of students based on their in-assessment behavior. Class one consisted of students who spent quite some time away from the environment, which for a small majority of the students led to success on the item, but they rarely found both correct answers. These students may have tried (unsuccessfully) to solve the item using pen and paper without having seen the expression resulting from inserting “x” into the computational input field. Other reasons for their time spent away from the environment may have been that they did not know how to approach solving the item, or that they were simply distracted. We will refer to this class as absent. The behavior of the students in class two could be consistent with an algebraic strategy on (and a structural approach to) the item, in which the student takes time away from the digital environment to work on the problem using pen and paper. We will refer to this class as algebraic. The defining characteristic of students in class three is that they did not spend time and effort on finding the second correct answer to the question. They possibly did not want to try to find another correct answer, or perhaps did not realize that it was possible to find another correct answer. Their in-assessment behavior showed elements of trial and error as well as simple reasoning. We will refer to this class as pragmatic. Students in class four showed behavior consistent with the numerical trial and error strategy described earlier and may indicate that the students took an operational approach to solve the item. We will refer to this class as trial and error. Contrary to expectations, students who were in this class entered the value “x” more often into the computational input field than students in class two (algebraic approach). A possible explanation for this is the presence of a button for “x” in the digital interface, which invites students using a trial and error approach to see what happens when it is used. Furthermore, not all students who wanted to use an algebraic approach may have realized it was possible for the program to output an algebraic expression. Students in class five engaged very little with the assessment environment. Possible reasons for this behavior could be a lack of motivation or that the item was too difficult for the student. We will refer to this class as disengaged.
With regard to the second research question, students in the disengaged class (five) had a significantly lower mean estimated ability than students in the other four classes. This corresponds with their behavior, in the sense that students in this class scored fewer points and interacted less with the item. It is possible that these students were generally not very motivated to spend time and effort on the assessment, and that their lack of motivation biased the estimates of their abilities. It is also possible that these students were indeed less mathematically able than the other students, in which case the item that was analyzed may have been too difficult for them. The estimated abilities of students in the algebraic class (two) and the trial and error class (four) did not differ significantly from each other. This is a surprising finding, because we would expect students who are capable of using an algebraic approach to be more advanced in their understanding, hence to have a higher estimated ability. A possible explanation for this could be that (part of) the students who did not use an algebraic approach were capable of doing so but opted for a different approach, or that some students who used an algebraic approach do not really master it yet. The small but statistically significant difference between the estimated abilities of students in the absent class (one) and the algebraic class (two) may support the earlier stated possibility that students in class one attempted to solve the item algebraically, but were not able to. Lastly, it is interesting to note that students in the pragmatic class (three) achieved good scores on the item with relatively minimal investments in terms of interactions and time. The estimated abilities of these students did not statistically differ from those of the students in the algebraic class (two) or those in the trial and error class (four), which suggests that these students may be displaying an ability to manage their time and effort effectively during assessment.
The theory-driven analysis approach in this study has advantages as well as disadvantages. An obvious advantage is that any results can more easily be placed and interpreted in context, and have greater didactic relevance. A potential pitfall is that it is possible to miss relevant patterns in the data that were not explicitly searched for. Another disadvantage is that theory-driven analyses are not easily generalized to other test items, since a (partially) new set of variables needs to be developed for the constructs that are relevant to the new item. An analysis such as the one presented in this study is informative due to the richness of the log data from such a technology-enhanced item, but it is also very time-consuming. Using sets of items with an identical structure but different values can improve the speed and value of the analyses. A limitation of the current study is that no sequence-based variables or sequence-based methods were used in the analysis. Sequence-based analysis of student log data can be performed either in a theory-driven or data-driven manner, and both have offered promising results in the field of complex problem-solving.
A valuable direction of further research would be to validate the findings of the current study (the importance of which was highlighted by Goldhammer et al. in 2021) by finding out whether the results of the current study generalize, for example to different items, tests, contexts, and different ages of students. Various strategies could be used to contribute evidence to the validation, such as using multiple sets of student data on the same test item, using different clustering methods, or asking teachers to classify students into groups based on their perceived solution strategy. Other sources of data, such as think-aloud protocols, may also be considered. Combining different sources of data may lead to a more complete understanding of student processes. Eye tracking data has been of particular interest in this area (Zhu & Feng, 2015; Maddox et al., 2018).
Conclusion
Students take many tests and exams during their school career, but the feedback they receive about their test performance is usually based only on an analysis of the correctness of the item responses. With the increase in digital assessment, not only the item responses but also other data have become available for analysis. The time spent on solving an item is a well-known example of log data that can be used to enrich feedback, but is—when used at all—often only taken into account to improve ability estimates (e.g. Van der Linden et al., 2010). The purpose of this paper was to assess whether we can use log data in order to extend performance-related feedback with information related to the applied solution strategy. Specifically, we aimed first to make didactically meaningful inferences about students’ solution strategies based on their actions in a digital mathematics assessment and second to investigate whether there is a relationship between students’ mathematical ability and their solution strategy.
We have found that it is possible to distinguish several classes of student behavior that seem to correspond with the use of different solution strategies. Furthermore, we were able to identify seemingly disengaged and absent classes of students, which is a useful finding, since such students in particular may be in need of further instruction or other didactic intervention. Identifying these students, as well as groups of students who seem to adhere to a specific solution strategy may, on an individual student level, provide a basis for feedback and further instruction by their teacher. The ability to detect such strategies in real time would open the door to providing instantaneous automated feedback on this aspect during formative assessment in online learning systems. At the level of a class or a teacher, the classification of students into these groups may be used to inform teaching. However, feedback about the application of an operational (viewing a mathematical concept as a process with input and output) versus structural (viewing a mathematical concept as an object with its own properties) solution strategy (Sfard, 1991) is only useful if the information is understood by both students and teachers and the importance of this perspective is acknowledged.
Log data can also reveal points of improvement in the design of the item or the test. Through studying the log data for the item Product Equation, it became apparent that very few students found both correct answers. For some students, it may have been the case that they did not realize the question had several possible correct answers. For others, the reason may have been that the item asks them “which number” (singular) as opposed to “which number(s)” (potential plural). Such design choices can influence the behavior of students and thus the log data they leave behind, and log data can in turn help identify improvements to the item design.
In order to be able to provide students and teachers with meaningful information on the applied solution strategy based on log data it is necessary to follow a holistic approach, taking into consideration various aspects related to item construction, assessment platform, data storage, and analysis. Item developers and content experts need to think about which response behavior they wish to elicit from students, and how this behavior should be recorded in the assessment environment. From the item developers, this requires a theoretical understanding of didactic and practical knowledge of the possibilities of digital items and assessment platforms. IT architects have to consider which log data should be stored and how they should be stored. It can often be very cumbersome and time-consuming to extract data from assessment platforms into formats that are feasible for analysis. Research on which log data to store in what format is important and strides are currently being made in this area (e.g. Kröhne and Goldhammer, 2018). Most important, however, is a combined and interdisciplinary effort to fully benefit from the log data and what it can reveal about student learning processes.
Acknowledgements
The authors would like to thank prof. Dr. Bryan Maddox and prof. Dr. Ulf Kröhne for their intellectual contributions to this study. The authors would like to thank the SCRIPT of the Ministry of National Education, Children and Youth of Luxembourg and Vretta Inc. for supporting the technology-enhanced items’ development.
Author Contributions
FS and PD contributed to the design of the theoretical framework. EdS wrote the original draft of the manuscript. EdS, RF, SK, and PD reviewed and edited the manuscript. EdS, RF, SK, RdS, and BV contributed to the methodology and statistical analyses. All authors have read and approved the final manuscript.
Funding
This research was publicly funded.
Data Availability
Availability of data and materials is contingent on specific agreements between the ministry of education in France and interested research institutions.
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
All authors have approved the manuscript and agree with submission to Large-Scale Assessments in Education.
Conflict of interest
The authors declare that they have no conflict of interest.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
Albert, D; Steinberg, L. Age differences in strategic planning as indexed by the Tower of London. Child Development; 2011; 82,
Araneda, S; Lee, D; Lewis, J; Sireci, SG; Moon, JA; Lehman, B; Keehner, M. Exploring relationships among test takers’ behaviors and performance using response process data. Education Sciences; 2022; 12,
Bolsinova, M; Tijmstra, J. Improving precision of ability estimation: Getting more from response times. British Journal of Mathematical and Statistical Psychology; 2017; 71,
Cetintas, S; Si, L; Xin, YP; Hord, C. Automatic detection of off-task behaviors in intelligent tutoring systems with machine learning techniques. IEEE Transactions on Learning Technologies; 2009; 3,
Cetintas, S., Si, L., Xin, Y.P., & Hord, C. (2009b). Predicting correctness of problem solving from low-level log data in intelligent tutoring systems. In Proceedings of the 2nd International Conference on Educational Data Mining.
Chang, C-J; Chang, M-H; Chiu, B-C; Liu, C-C; Chiang, S-HF; Wen, C-T; Chen, W. An analysis of student collaborative problem solving activities mediated by collaborative simulations. Computers & Education; 2017; 114, pp. 222-235. [DOI: https://dx.doi.org/10.1016/j.compedu.2017.07.008]
Chen, F; Cui, Y; Chu, M-W. Utilizing game analytics to inform and validate digital game-based assessment with evidence-centered game design: A case study. International Journal of Artificial Intelligence in Education; 2020; 30,
Chen, Y; Li, X; Liu, J; Ying, Z. Statistical analysis of complex problem-solving process data: An event history analysis approach. Frontiers in Psychology; 2019; 10, 486. [DOI: https://dx.doi.org/10.3389/fpsyg.2019.00486]
Cui, Y; Chu, M-W; Chen, F. Analyzing student process data in game-based assessments with Bayesian knowledge tracing and dynamic Bayesian networks. Journal of Educational Data Mining; 2019; 11,
Dempster, AP; Laird, NM; Rubin, DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological); 1977; 39,
Deribo, T; Goldhammer, F; Kröhne, U. Changes in the speed-ability relation through different treatments of rapid guessing. Educational and Psychological Measurement; 2023; 83,
Derr, K; Hübl, R; Ahmed, MZ. Prior knowledge in mathematics and study success in engineering: informational value of learner data collected from a web-based pre-course. European Journal of Engineering Education; 2018; 43,
Efron, B; Tibshirani, RJ. An introduction to the bootstrap; 1994; Chapman and Hall/CRC:
Eichmann, B; Goldhammer, F; Greiff, S; Pucite, L; Naumann, J. The role of planning in complex problem solving. Computers & Education; 2019; 128, pp. 1-12. [DOI: https://dx.doi.org/10.1016/j.compedu.2018.08.004]
Eichmann, B; Greiff, S; Naumann, J; Brandhuber, L; Goldhammer, F. Exploring behavioural patterns during complex problem-solving. Journal of Computer Assisted Learning; 2020; 36,
Faber, JM; Luyten, H; Visscher, AJ. The effects of a digital formative assessment tool on mathematics achievement and student motivation: Results of a randomized experiment. Computers & Education; 2017; 106, pp. 83-96. [DOI: https://dx.doi.org/10.1016/j.compedu.2016.12.001]
Fox, J-P; Marianti, S. Joint modeling of ability and differential speed using responses and response times. Multivariate Behavioral Research; 2016; 51,
Gobert, JD; Kim, YJ; Sao Pedro, MA; Kennedy, M; Betts, CG. Using educational data mining to assess students’ skills at designing and conducting experiments within a complex systems microworld. Thinking Skills and Creativity; 2015; 18, pp. 81-90. [DOI: https://dx.doi.org/10.1016/j.tsc.2015.04.008]
Gobert, JD; Sao Pedro, M; Raziuddin, J; Baker, RS. From log files to assessment metrics: Measuring students’ science inquiry skills using educational data mining. Journal of the Learning Sciences; 2013; 22,
Goldhammer, F; Hahnel, C; Kröhne, U; Zehner, F. From byproduct to design factor: On validating the interpretation of process indicators based on log data. Large-Scale Assessments in Education; 2021; 9,
Goldhammer, F; Naumann, J; Rölke, H; Stelter, A; Tóth, K. Leutner, D; Fleischer, J; Grünkorn, J; Klieme, E. Relating product data to process data from computer-based competency assessment. Competence assessment in education: Research, models and instruments; 2017; Springer International Publishing: pp. 407-425.
Greiff, S; Niepel, C; Scherer, R; Martin, R. Understanding students’ performance in a computer-based assessment of complex problem solving: An analysis of behavioral data from computer-generated log files. Computers in Human Behavior; 2016; 61, pp. 36-46. [DOI: https://dx.doi.org/10.1016/j.chb.2016.02.095]
Han, Z; He, Q; Von Davier, M. Predictive feature generation and selection using process data from Pisa interactive problem-solving items: An application of random forests. Frontiers in Psychology; 2019; 10, 2461. [DOI: https://dx.doi.org/10.3389/fpsyg.2019.02461]
He, Q; Borgonovi, F; Paccagnella, M. Using process data to understand adults’ problem-solving behaviour in the programme for the international assessment of adult competencies (PIAAC): Identifying generalised patterns across multiple tasks with sequence mining. OECD Education Working Papers; 2019; 205, pp. 1-50. [DOI: https://dx.doi.org/10.1787/650918f2-en]
He, Q; Borgonovi, F; Paccagnella, M. Leveraging process data to assess adults’ problem-solving skills: Using sequence mining to identify behavioral patterns across digital tasks. Computers & Education; 2021; 166, [DOI: https://dx.doi.org/10.1016/j.compedu.2021.104170] 104170.
He, Q., Liao, D., & Jiao, H. (2019). Clustering behavioral patterns using process data in piaac problem-solving items. In Theoretical and practical advances in computer-based educational measurement (pp. 189–212). Springer.
He, Q., & on Davier, M. (2015). Identifying feature sequences from process data in problem-solving items with n-grams. Quantitative Psychology Research (pp. 173–190). Springer.
He, Q., & on Davier, M. (2016). Analyzing process data from problem-solving items with n-grams: Insights from a computer-based large-scale assessment. Handbook of research on technology tools for real-world skill development (pp. 750–777). IGI Global.
He, Q; Shi, Q; Tighe, EL. Predicting problem-solving proficiency with multiclass hierarchical classification on process data: A machine learning approach. Psychological Test and Assessment Modeling; 2023; 65,
Heldt, M; Massek, C; Drossel, K; Eickelmann, B. The relationship between differences in students’ computer and information literacy and response times: an analysis of iea-icils data. Large-scale Assessments in Education; 2020; 8,
Hrastinski, S; Stenbom, S; Benjaminsson, S; Jansson, M. Identifying and exploring the effects of different types of tutor questions in individual online synchronous tutoring in mathematics. Interactive Learning Environments; 2021; 29,
Jiang, Y; Cayton-Hodges, GA; Oláh, LN; Minchuk, I. Using sequence mining to study students’ calculator use, problem solving, and mathematics achievement in the national assessment of educational progress (naep). Computers & Education; 2023; 193, [DOI: https://dx.doi.org/10.1016/j.compedu.2022.104680] 104680.
Jiang, Y; Gong, T; Saldivia, LE; Cayton-Hodges, G; Agard, C. Using process data to understand problem-solving strategies and processes for drag-and-drop items in a large-scale mathematics assessment. Large-scale Assessments in Education; 2021; 9,
Kerr, D; Chung, GK. Identifying key features of student performance in educational video games and simulations through cluster analysis. Journal of Educational Data Mining; 2012; 4,
Kröhne, U; Goldhammer, F. How to conceptualize, represent, and analyze log data from technology-based assessments? A generic framework and an application to questionnaire items. Behaviormetrika; 2018; 45,
Lindner, MA; Greiff, S. Process data in computer-based assessment; 2023; Hogrefe Publishing:
Maddox, B. The uses of process data in large-scale educational assessments. OECD Education Working Papers; 2023; 286, pp. 1-23. [DOI: https://dx.doi.org/10.1787/5d9009ff-en]
Maddox, B; Bayliss, AP; Fleming, P; Engelhardt, PE; Edwards, SG; Borgonovi, F. Observing response processes with eye tracking in international large-scale assessments: Evidence from the oecd piaac assessment. European Journal of Psychology of Education; 2018; 33,
Martin, T; Petrick Smith, C; Forsgren, N; Aghababyan, A; Janisiewicz, P; Baker, S. Learning fractions by splitting: Using learning analytics to illuminate the development of mathematical understanding. Journal of the Learning Sciences; 2015; 24,
Mohan, K; Bergner, Y; Halpin, P. Predicting group performance using process data in a collaborative assessment. Technology, Knowledge and Learning; 2020; 25,
Nagy, G; Ulitzsch, E. A multilevel mixture IRT framework for modeling response times as predictors or indicators of response engagement in IRT models. Educational and Psychological Measurement; 2022; 82,
Nagy, G; Ulitzsch, E; Lindner, MA. The role of rapid guessing and test-taking persistence in modelling test-taking engagement. Journal of Computer Assisted Learning; 2022; 39,
OECD. Pisa 2022 technical report; 2024; OECD Publishing:
Olsher, S; Chazan, D; Drijvers, P; Sangwin, C; Yerushalmy, M. Pepin, B; Gueudet, G; Choppin, J. Digital assessment and the “machine”. Handbook of digital resources in mathematics education; 2023; Springer International Publishing: pp. 1-27.
Pellegrino, JW; Quellmalz, ES. Perspectives on the integration of technology and assessment. Journal of Research on Technology in Education; 2010; 43,
Philbert, L., Bernigole, V., Ninnin, L.-M., Santos, R.D., le Cam, M., Salles, F., & Rocher, T. (2022). Cedre rapport technique, mathématiques 2019 (Tech. Rep.). Direction de l’évaluation, de la prospective et de la performance. https://www.education.gouv.fr/media/133181/download
Pokropek, A; Żółtak, T; Muszyński, M. Mouse chase: Detecting careless and unmotivated responders using cursor movements in web-based surveys. European Journal of Psychological Assessment; 2023; 39,
R Core Team. (2022). Vienna, Austria. https://www.R-project.org/.
Reis Costa, D; Bolsinova, M; Tijmstra, J; Andersson, B. Improving the precision of ability estimates using time-on-task variables: Insights from the Pisa 2012 computer-based assessment of mathematics. Frontiers in Psychology; 2021; 12, [DOI: https://dx.doi.org/10.3389/fpsyg.2021.579128] 579128.
Salles, F; Dos Santos, R; Keskpaik, S. When didactics meet data science: Process data analysis in large-scale mathematics assessment in France. Large-scale Assessments in Education; 2020; 8, pp. 1-20. [DOI: https://dx.doi.org/10.1186/s40536-020-00085-y]
Schwarz, G. Estimating the dimension of a model. The Annals of Statistics; 1978; 6,
Sfard, A. On the dual nature of mathematical conceptions: Reflections on processes and objects as different sides of the same coin. Educational Studies in Mathematics; 1991; 22,
Shute, V.J., Ventura, M., & Bauer, M., et al. (2009). Melding the power of serious games and embedded assessment to monitor and foster learning: Flow and grow. Serious games (pp. 317–343). Routledge.
Spurk, D; Hirschi, A; Wang, M; Valero, D; Kauffeld, S. Latent profile analysis: A review and “how to” guide of its application within vocational behavior research. Journal of Vocational Behavior; 2020; 120, [DOI: https://dx.doi.org/10.1016/j.jvb.2020.103445] 103445.https://www.sciencedirect.com/science/article/pii/S0001879120300701
Stadler, M; Fischer, F; Greiff, S. Taking a closer look: An exploratory analysis of successful and unsuccessful strategy use in complex problems. Frontiers in Psychology; 2019; 10, 777. [DOI: https://dx.doi.org/10.3389/fpsyg.2019.00777]
Tang, X; Wang, Z; He, Q; Liu, J; Ying, Z. Latent feature extraction for process data via multidimensional scaling. Psychometrika; 2020; 85,
Timmis, S; Broadfoot, P; Sutherland, R; Oldfield, A. Rethinking assessment in a digital age: Opportunities, challenges and risks. British Educational Research Journal; 2016; 42,
Ulitzsch, E; He, Q; Pohl, S. Using sequence mining techniques for understanding incorrect behavioral patterns on interactive tasks. Journal of Educational and Behavioral Statistics; 2022; 47,
Ulitzsch, E; He, Q; Ulitzsch, V; Molter, H; Nichterlein, A; Niedermeier, R; Pohl, S. Combining clickstream analyses and graph-modeled data clustering for identifying common response processes. Psychometrika; 2021; 86,
Ulitzsch, E; Yildirim-Erbasli, SN; Gorgun, G; Bulut, O. An explanatory mixture IRT model for careless and insufficient effort responding in self-report measures. British Journal of Mathematical and Statistical Psychology; 2022; 75,
Van der Linden, WJ; Klein Entink, RH; Fox, J-P. IRT parameter estimation with response times as collateral information. Applied Psychological Measurement; 2010; 34,
Visser, I., & Speekenbrink, M. (2010). depmixS4: An R package for hidden markov models. Journal of Statistical Software, 36(7), 1–21, https://www.jstatsoft.org/v36/i07/
Von Davier, M., Fishbein, B., & Kennedy, A. (Eds.). (2024). TIMSS 2023 technical report (methods and procedures). https://timss2023.org/methods.
Wang, FH. Interpreting log data through the lens of learning design: Second-order predictors and their relations with learning outcomes in flipped classrooms. Computers & Education; 2021; 168, [DOI: https://dx.doi.org/10.1016/j.compedu.2021.104209] 104209.
Westera, W; Nadolski, R; Hummel, H. Serious gaming analytics: What students log files tell us about gaming and learning. International Journal of Serious Games; 2014; 1,
Xiao, Y; He, Q; Veldkamp, B; Liu, H. Exploring latent states of problem-solving competence using hidden Markov model on process data. Journal of Computer Assisted Learning; 2021; 37,
Zhang, M; Andersson, B. Identifying problem-solving solution patterns using network analysis of operation sequences and response times. Educational Assessment; 2023; 28,
Zhu, M., & Feng, G. (2015). An exploratory study using social network analysis to model eye movements in mathematics problem solving. Proceedings of the fifth international conference on learning analytics and knowledge (pp. 383–387).
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.