Content area
Purpose
Educational data mining (EDM) discovers significant patterns from educational data and thus can help understand the relations between learners and their educational settings. However, most previous data mining techniques focus on prediction of learning performance of learners without integrating learning patterns identification techniques.
Design/methodology/approach
This study proposes a new framework for identifying learning patterns and predicting learning performance. Two modules, the learning patterns identification module and the deep learning prediction models (DNN), are integrated into this framework to identify the difference of learning performance and predicting learning performance from profiles of students.
Findings
Experimental results from survey data indicate that the proposed identifying learning patterns module could facilitate identifying valuable difference (change) patterns from student’s profiles. The proposed learning performance prediction module which adapts DNN also performs better than traditional machine techniques in prediction performance metrics.
Originality/value
To our best knowledge, the framework is the only educational system in the literature for identifying learning patterns and predicting learning performance.
1. Introduction
Educational data mining (EDM) is an emerging research area focused on discovering patterns from educational data to help understand the relations between learners and educational settings. However, most educational data mining techniques focus on predicting learning performance based on learners’ profiles, rather than identifying their characteristics to evaluate their learning performance. In particular, the characteristics of learners with low learning performance are necessary to initiate early intervention for learners who need teaching assistance.
This study considers association-based classification patterns, which are used to identify the associations between cause and effect to establish students’ learning performance profiles. Furthermore, we propose a measure (OddsRatio) to determine valuable patterns from each cluster of instances. For example, (Pattern X1) → (Bad) means that students belonging to the learning outcome (Bad) group have the characteristics: (Pattern X1). We also identify another pattern: (Pattern X2) → (Good). Identifying the difference (ΔX) between patterns (X1 and X2) that leads to different results is the main purpose of this study. The knowledge in this example reveals the association causes for the effect. Assume that we have two patterns, Pattern X1 = {(Paid = no) → (Bad); support = 0.92; count = 293; OddsRatio = 1.08} and Pattern X2 = {(Higher = yes, Paid = no) → (Good); support = 0.94; count = 289; OddsRatio = 1.13}; therefore, the difference (ΔX) between two patterns (X2 and X1) is {(Higher = yes)}.
The above example indicates that difference pattern (ΔX), {(Higher = yes)}, is a change pattern for instances (students) with pattern X1 = {(paid = no) in cluster learning performance (Bad) who move to cluster learning performance (Good). The knowledge, pattern {(Higher = yes)}, in the above example reveals the association causes for the effect, cluster learning performance (Bad) moving to cluster learning performance (Good). However, no studies in the education data mining field, to our knowledge, have addressed the important issue of change patterns, difference (ΔX), identification in association-based classification patterns.
It is important for educational institutions to have approximate prior knowledge of students to predict their performance in future academics. To address these problems, we propose a framework for identifying learning patterns and predicting learning performance. First, the learning patterns identification module is used for discovering difference patterns that could identify the difference of learning performance among different clusters of students. Second, deep learning prediction models (DNN) are employed for constructing model to predict students’ learning performance.
The rest of this paper is organized as follows. Section 2 reviews related work. Methodology is given in Section 3. The experimental results are illustrated in Section 4. Conclusions and future work are discussed in Section 5.
2. Related work
2.1 Frequent itemsets mining
Frequent pattern mining reveals intrinsic and important properties of datasets and is the foundation of association rule mining. Mining frequent itemsets in association rule mining is crucial (Agrawal et al., 1993). Most of the frequent itemset mining algorithms are improved or derivative algorithms based on Apriori (Agrawal and Srikant, 1994) and FP-growth (Han et al., 2000). More efficient methods for mining frequent itemsets have also been proposed, such as H-mine (Pei et al., 2001) and Index-BitTableFI (Song et al., 2008). However, most of these algorithms focus on improving the efficiency in frequent itemset mining processes, rather than mining specific itemsets, such as specific later-marketed items. We provide overview of the literature on frequent itemsets mining in Table 1.
2.2 Educational data mining
Data mining or knowledge discovery in databases (KDD) has been applied to some central e-learning issues, such as the assessment of student’s learning performance and the evaluation of learning materials and Web based courses. KDD can also be used to learn the model for the learning process (Hämäläinen et al., 2004) and student modeling (Tang and McCalla, 2002), to evaluate and improve e-learning systems (Zaïane and Luo, 2001) and to discover useful learning information from learning portfolios (Hwang et al., 2004).
The data mining techniques applied in these contexts enable course adaptation and learning recommendations based on the students’ learning behavior. These techniques also enable feedback to teachers and students of e-learning courses and help identify typical learning behavior (Castro et al., 2007; Baker and Yacef, 2009).
The increasing interest in data mining and educational systems have made educational data mining a new and growing research community. Romero and Ventura (2007) surveyed the application of data mining to traditional educational systems, particular web-based courses, well-known learning content management systems, and adaptive and intelligent web-based educational systems.
The goals of EDM are varied: from constructing and improving student models, designing support in digital settings to scientific discovery about learners and learning (Baker, 2010). Areas of application include predictive (decision support), generative (creating new or improved designs for learning), or explanatory (scientific analysis) (Pahl, 2004). They also include studies on individual learning using educational software, computer-supported collaborative learning, computer adaptive testing and factors relating to student failure or non-retention in courses (Baker and Yacef, 2009).
EDM is a multidisciplinary field related to several well-established areas of research, including e-learning, adaptive hypermedia, intelligent tutoring systems and data mining (Nachmias, 2011). Nachmias (2011) considered that most of the web mining techniques applied to educational systems use one of three types of analysis: (1) clustering, classification and outlier detection; (2) association rule mining and sequential pattern mining; and (3) text mining. Angeli et al. (2017) illustrated how data mining (association rules mining) can be used to advance educational software evaluation practices in the field of educational technology. Rodrigues et al. (2018) reviewed EDM research on the teaching and learning process from the pedagogical perspective. Table 2 provides an overview of EDM research.
The overview of EDM (Table 2) shows where itemset mining has been applied in EDM. Buldu and Üçgün (2010) discovered the rules which identified the relation between the courses that the students failed have been revealed. Santos and Boticario (2015) identified indicators that could predict course success. Differences and applications of our and prior studies are shown in Table 3.
2.3 Deep learning in education data
Deep learning (DL) enables computers to perform complex calculations by relying on simpler calculations to optimize computer efficiency. Kabashima et al. (2018) used DNN-based scoring techniques to examine the tasks of (1) predicting a language learner’s oral proficiency and (2) predicting comprehensibility of his/her pronunciation based on native listeners’ responsive shadowing. Experiments show that their proposed automatic rating module could be introduced to language education to function as another human rater.
Using the huge datasets obtained from previous student performance, traditional machine learning does not work well when run directly because it does not consider the nature of data behavior. In DL, features are extracted automatically from given data (Hassan et al., 2020). DL can adapt to any improvement in the hidden layers during training and the training goes through a backpropagation algorithm. The DNN model performs better with complicated data and nonlinear functions (Lin et al., 2020).
Predicting students’ performance is very important for higher education as well as for deep learning and its relationship to educational data. Li and Liu (2021) used deep neural networks for prediction by extracting informative data as features with corresponding weights. Multiple updated hidden layers are used to design neural networks automatically. Their proposed system has demonstrated efficiency through the achieved results to obtain the most accurate predictions. Therefore, this study uses deep learning methods (such as DNN) as classifiers to predict the learning performance labels of students. Then the models which are most suitable to predict the learning performance labels of students can be evaluated.
3. Methodology
This study proposes a framework of identifying learning patterns and predicting learning performance, as shown in Figure 1. There are two modules in the proposed framework: the learning patterns identification module and the learning performance prediction model. We will illustrate the two modules in the next sections.
3.1 Problem definitions of learning patterns
This section defines the problem of how to identify changes (difference patterns) among different clusters of instances for cause-and-effect relationship.
, where |D| denotes the number of transactions in D and |X ∪ c | denotes the number of transactions containing X ∪ c in D.
Given user-specified support threshold σsup and confidence threshold σconf, an itemset Xc is frequent if sup(Xc) is no smaller than σsup. In addition, an association rule X⇒c is identified if Confidence(X ⇒ c) is no smaller than σconf.
In this study, we apply an index called Odds Ratio (OR), based on the concept of “relative risk” (Li et al., 2005), to indicate whether a pattern is more frequent in one cluster than in another cluster.
In addition, assume a pattern {(Higher = yes, Paid = no); support = 0.94; count = 289; learning performance = Good; dataset cluster label = Good}, and the other pattern {(Higher = yes, Paid = no); support = 0.74; count = 255; learning performance = Good; dataset cluster label = Bad}, and then its Odds Ratio is OR((Higher = yes, Paid = no)Good, (Higher = yes, Paid = no)Bad) = 289/255 = 1.13.
Furthermore, there is pattern {(Higher = yes, Paid = no); support = 0.94; count = 289; learning performance = Good; dataset cluster label = Good} is a high Odds-Ratio pattern in cluster (learning performance = Good), because of OR((Higher = yes, Paid = no)Good, (Higher = yes, Paid = no)Bad) = 289/255 = 1.13 > σOR (1.05).
3.2 Framework for identifying learning patterns
To further explain the proposed procedures to identify characteristics (changes) for discovering learning patterns, the proposed framework of identifying learning patterns is illustrated in Figure 2. The procedures can be divided into four phases: (1) instance classification; (2) frequent itemset discovery; (3) pattern identification; and (4) pattern evaluation.
This study discovers difference (change) patterns that could identify differences of learning performance among different clusters of students. Given two clusters (Bad and Good) of student learning performance, we first identify base patterns with high Odds-Ratio values in cluster (Bad) and comparison patterns with high Odds-Ratio values cluster (Good). Then high Odds-Ratio difference patterns can be identified.
The operating steps of the proposed framework of identifying learning patterns are as follows.
Step 1: Instances classification
The instance classification phase classifies instances (records) according to their labels. Taking learning performance of students for example, we can classify instances (records) into two class labels (Good and Bad), according to the students’ quiz scores. If the scores are above average, their learning performance label is classified as “Good”, otherwise “Bad”. Finally, the student instances are divided into two clusters (G: Good and B: Bad), as shown in Table 4.
Step 2: Frequent itemsets discovering.
In mining phase, we discover frequent itemsets which are used for base and comparison patterns for each cluster of instances. Given minimum support threshold (σsup = 0.5), two sets of frequent itemsets are discovered from the student instances belonged to two clusters (G: Good and B: Bad) shown in Table 5.
Step 3: Patterns identifying.
Step 3.1: Determine base patterns.
A base pattern is a frequent itemset. With a minimum support threshold of 0.5, pattern (Paid = no) with support (0.92) is a base pattern in cluster (Bad); pattern (Paid = no) with support (0.96) is a base pattern in cluster (Good). In addition, pattern (Higher = yes) with support (0.81) is a base pattern in cluster (Bad); pattern (Higher = yes) with support (0.99) is a base pattern in cluster (Good).
Step 3.2: Determine high Odds-Ratio base patterns.
Minimum Odds-Ratio threshold (σOR) is set to 1.05. Given a base pattern (Paid = no) with counts (317) in cluster (Bad) and a base pattern (Paid = no) with counts (293) in cluster (Good), we know that Odds-RatioBadGood(Paid = no) = (317/293) = 1.08 > 1.0 (σOR). Therefore, base pattern (Paid = no) with Odds-Ratio value (1.08) is a high Odds-Ratio base pattern in cluster (Bad). That means instances (students) with pattern (Paid = no) are more frequent in cluster (Bad) than in cluster (Good).
Step 3.3: Determine comparison patterns.
A comparison pattern is a frequent itemset. We set minimum support threshold to 0.5. Given a base pattern (Paid = no) in cluster (Bad), we can choose a pattern (Higher = yes, Paid = no) with support (0.94) in cluster (Good) to be a comparison pattern.
Step 3.4: Identify high Odds-Ratio comparison patterns.
We set minimum Odds-Ratio threshold (σOR) to 1.05. Given a comparison pattern (Higher = yes, Paid = no) with counts (289) in cluster (Good) and a comparison pattern (Higher = yes, Paid = no) with counts (255) in cluster (Bad), we know that Odds-RatioGoodBad(Higher = yes, Paid = no)= (289/255) = 1.13 > 1.0. Therefore, comparison pattern (Higher = yes, Paid = no) with Odds-Ratio value (1.13) in cluster (Good) is a high Odds-Ratio comparison pattern. That indicates instances (students) with pattern (Higher = yes, Paid = no) are more frequent in cluster (Good) than in cluster (Bad).
Step 3.5: Patterns evaluating
Given a high Odds-Ratio base pattern (Paid = no) in cluster (Bad) and a high Odds-Ratio comparison pattern (Higher = yes, Paid = no) in cluster (Good), we can identify the difference pattern (Higher = yes) between comparison pattern (Higher = yes, Paid = no) and base pattern (Paid = no). Moreover, there is the difference pattern (Higher = yes) with counts (278) in cluster (Bad) and a difference pattern (Higher = yes) with counts (302) in cluster (Good) and so OddsRatioSR(Higher = yes)= (302/278) = 1.09 > 1.0. That reveals instances (students) with difference pattern (Higher = yes) are more frequent in cluster (Good) than in cluster (Bad). The minimum Odds-Ratio threshold (σOR) is set to 1.05, so difference pattern (Higher = yes) with Odds-Ratio value (1.09) is a high Odds-Ratio difference pattern.
3.3 Development of prediction models for student learning performance
The proposed framework for predicting student learning performance is illustrated in Figure 3. The several classification techniques for constructing models to predict students’ learning performance are: Support-Vector Machine (SVM), Multi-Layer Perceptron (MLP), Decision Tree (DT), Random Forests (RF) and Deep Neural Networks (DNN). These classification techniques are briefly described below.
- Support-Vector Machine (SVM): SVM is an algorithm that uses nonlinear mapping to transform the original training data into a higher dimension by searching the examples of one class label from another. SVM uses a subset of training examples (known as the support vectors) to represent the decision boundary (Joachims, 1998). SVM finds the best separating plane/hyperplane to separate all of the examples of class label (+1) from all of the examples of class label (−1). The learning task in Linear SVM can be formalized as the following constrained optimization problem:
- Multi-Layer Perceptron (MLP): An artificial neural network (ANN) is an abstract computational model of a human brain. The architecture of an artificial neural network is defined by the characteristics of a node and the characteristics of the node’s connectivity in the network (Haykin and Lippmann, 1994). Perceptron is the simplest model in ANN model family. MLP can learn powerful non-linear transformations: in fact, with enough hidden units they can represent arbitrarily complex but smooth functions. In a perceptron, each input node is connected via a weighted link to the output node. The output of a perceptron model can be expressed as follows (Tan et al., 2006):
The sign function acts as an activation function for the output neuron, output a value (+1) if its argument is positive and (−1) if its argument is positive. An artificial neural network has a more complex structure than that of a perceptron model. The goal of the MLP learning algorithm is to determine a set of weights (w) that minimizes the total sum of squared errors:where is the predicted error.
The weight update formula used the gradient descent method can be written as follows:where l is the learning rate.
- Decision Tree (DT): Quinlan (1986) proposed C4.5 (a successor of ID3), which became a benchmark to which newer supervised learning algorithms are often compared. The attribute with the maximum gain ratio is selected to be the splitting attribute. The gain ratio is defined as
- Random Forests (RF): Random forests is an ensemble method designed for decision tree classifiers. It combines the predictions made by multiple decision trees, where each tree is generated based on the values of an independent set of random vectors. When the number of trees is sufficiently large, it had been theoretically proven that the upper bound for generalization error of random forests converges to the following expression (Tan et al., 2006):
where is the average correlation among the trees and s is a quantity that measures the “strength” of the tree classifiers. The strength of a set of classifiers is to be the average performance of the classifiers, where performance is measured probabilistically in terms of the classifier’s margin: where is the predicted class of X according to a classifier built from some random vector θ. The higher the margin is, the more likely it is that the classifier correctly predicts a given example X.
- Deep Neural Networks (DNN)
A deep neural network (DNN) can be considered as a conventional multi-layer perceptron (MLP) with many hidden layers (thus deep). The DNN parameters are optimized with back propagation using stochastic gradient descent. DNN, a (L+1)-layer MLP, is used to model the posterior probability of a hidden Markov model (HMM) tied state s given an observation vector o. The first L layers, l = 0 … L−1, are hidden layers that model posterior probability of hidden nodes hl given input vectors vl from previous layer while the top layer L is used to compute the posterior probability for all tied states using softmax (Pan et al., 2012):where Wl and αl denote weight matrix and bias vectors for hidden layer l, while and denote the j-th component of hidden node, hl and its activation , respectively.
4. Experimental results
4.1 Data set description
A Portuguese language course was used to evaluate performance of the proposed approach. The two datasets were provided by UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/). The Portuguese language course dataset contains 649 instances (records), and it has 33 attributes. It was provided by Paulo Cortez, University of Minho, Guimarães, Portugal, http://www3.dsi.uminho.pt/pcortez. There are 3 grades in this dataset: G1 (first period grade), G2 (second period grade) and G3 (final period grade). In this study uses the G1 (first period grade) presenting the learning performance of students.
In data preprocessing phase, we calculate the average of G1 (first period grade) of all students and then use the average score to separate students into two clusters. In the first cluster, the G1 (first period grades) of students are lower than the average, so these students are classified as bad learning performance. In the second cluster, the G1 (first period grades) of students are higher than the average score, so these students are classified as good learning performance. The average of the G1 (first period grade) of students is 11.40. There are 343 students whose G1 (first period grade) is higher than average score (11.40) and 306 students whose G1 (first period grade) is lower than average (11.40).
4.2 Identification of learning patterns
4.2.1 Discovery of frequent itemsets
The minimum support was varied from 0.5 to 0.9. Tables 6 and 7 show the numbers of frequent itemsets for different minimum supports σsup. The minimum support was set to 0.75. Table 8 shows the frequent itemsets for the G1 (first period grade) belonged to different learning performance clusters (G1 = Bad).
4.2.2 Identification of high odds-ratio base patterns
Given frequent itemsets generated in Table 8, we calculate Odds-Ratio values of base patterns in cluster (G1 = Bad) as shown in Table 9. The minimum Odds-Ratio threshold (σOR) is set to 1.05. Patterns (Paid = no, Schoolsup = no) with Odds-Ratio (1.045) and (Higher = yes) with Odds-Ratio (0.921), are not high Odds-Ratio base patterns because their Odds-Ratio values are smaller than Odds-Ratio threshold (1.05). Finally, six high Odds-Ratio base patterns can be identified in cluster (G1 = Bad) as shown in Table 10.
4.2.3 Identification of high odds-ratio comparison patterns
After Odds-Ratio base patterns were determined, comparison patterns can be identified, corresponding to the base patterns determined. Given the minimum support σsu (0.75) and minimum Odds-Ratio threshold σOR (1.05), high Odds-Ratio comparison patterns are identified, as shown in Table 11.
4.2.4 Identification of high odds-ratio difference patterns
After high Odds-Ratio base patterns and high Odds-Ratio comparison patterns are determined, we then identified the difference patterns by comparing the difference between high Odds-Ratio base patterns and high Odds-Ratio comparison patterns. Table 12 shows the difference patterns for high Odds-Ratio base patterns and high Odds-Ratio comparison patterns as shown in Table 11. Given minimum Odds-Ratio threshold σOR (1.05), we identify high Odds-Ratio difference patterns as shown in Table 13.
4.2.5 Statistical hypothesis testing
This section discusses the statistical tests used for investigating the consistency between the results of finding which factors (base patterns or comparison patterns) perform better for predicting good learning outcome (G1L = G). That is, we explore which patterns (base patterns or comparison patterns) could effectively predict good learning outcome (G1L = G). Therefore, this study hypothesizes the following:
Generally, logistic regression (logit) is well suited for describing and testing hypotheses about relationships between a categorical outcome variable and one or more categorical or continuous predictor variables (Peng et al., 2002). Logit regression (logit) equation is used to predict class attribute (“G1L = G”) when given the condition of base patterns (such as “x1 = (Paid = no)”). The equation of logic regression (logit) of base patterns is defined as Eq. (1). In addition, another logit regression (logit) equation is defined to predict class attribute (“G1L = G”) when given the condition of comparison patterns (such as “x1 = (Paid = no), x2 = (Higher = yes), and x3 = (Internet = yes)”). Furthermore, the interaction between terms in comparison patterns when predicting should be considered. The equation of logic regression (logit) of comparison patterns without interaction terms is defined in Eq. (2). Finally, the equation of logic regression (logit) of comparison patterns with interaction terms is defined in Eq. (3).(1)(2)(3)where β1, β2, β3, β4 and β5, are the estimators for their respective variable(s).
Since equations (1 and 2) are used to model the result (learning outcome) of the two patterns (base and comparison patterns), we test the difference of deviances of equations (1 and 2) for Hypothesis 1. Furthermore, equation (2) models comparison patterns without interaction terms and equation (3) models comparison patterns with interaction terms. Therefore, we test the difference of deviances of equations (2 and 3) for Hypothesis 2.
For example, given a base pattern (“x1 = (Paid = no)”) and a comparison pattern (“x1 = (Paid = no), x2 = (Higher = yes) and x3 = (Internet = yes)”), we test the difference of deviances of equations (1 and 2) for Hypothesis 1. Hypothesis 1 is rejected with a p-value = 0.0000. Furthermore, we investigate whether there are interaction terms in comparison pattern (“x1 = (Paid = no), x2 = (Higher = yes) and x3 = (Internet = yes)”) when predicting the result (learning outcome). For this, the differences of deviances of equations (2 and 3) for Hypothesis 2 are tested. Hypothesis 2 with a p-value = 0.9254768 is not rejected. That is, there are no interaction terms existing in comparison pattern (“x1 = (Paid = no), x2 = (Higher = yes) and x3 = (Internet = yes)”) when predicting good learning outcome (G1L = G).
The results of rejecting Hypothesis 1 show that the comparison pattern (“x1 = (Paid = no), x2 = (Higher = yes) and x3 = (Internet = yes)”) are appropriate factors to predict learning outcome (“G1L = G”), compared to only using base pattern (“x1 = (Paid = no)”). The comparison pattern (“x1 = (Paid = no), x2 = (Higher = yes), and x3 = (Internet = yes)”) are appropriate factors which result in good learning outcome (“G1L = G”). The results of rejecting Hypothesis 2 show that there are no interaction terms in comparison pattern (“x1 = (Paid = no), x2 = (Higher = yes) and x3 = (Internet = yes)”) when predicting good learning outcome (G1L = G). Finally, the results of statistical hypothesis testing of patterns (base patterns and comparison pattern) are shown in Table 14.
The results in Table 14 show that the results of statistical hypothesis testing are consistent with results of patterns identified by this study. That is, the difference identified can present the difference between base pattern and comparison pattern. Therefore, the proposed approach should be able to help identify valuable characteristics (difference patterns) for cause-and-effect relationships from a student’s profiles.
4.3 DNN for learning outcome prediction
There are 649 instances in the Portuguese language course dataset. This file has been edited and several indicator variables are added to make it suitable for algorithms that cannot handle categorical variables, so several attributes that are ordered categorically have been coded as integers. The classification techniques used for constructing prediction models are: Deep Neural Networks (DNN), Support-Vector Machine (SVM), Multi-Layer Perceptron (MLP), Decision Tree (DT) and Random Forests (RF).
The DNN model's configuration is as follows: The input layer employs “ReLU” as the activation function with 20 cells. The model includes two hidden layers, each using “ReLU” for activation – hidden layer #1 with 256 cells and hidden layer #2 with 32 cells. The output layer utilizes a “sigmoid” activation function with a single cell to produce the model's output. Additional model parameters include an “adam” optimizer, a batch size of 10, 100 epochs and a validation split of 0.1.
Firstly, five approaches (DNN, SVM, MLP, DT, and RF) are compared for prediction performance metrics (Precision, Recall, F1-score, and Accuracy). The average experimental results of the five methods for 10-fold cross-validation are shown in Table 15. This shows that the DNN approach with Precision (0.90), TPR/Recall (0.90), F1 (0.90), and Accuracy (0.90) has higher prediction performance than the other four methods. The DT approach with Precision (0.57), TPR/Recall (0.54), F1 (0.55) and Accuracy (0.59) performs worst.
Secondly, it is essential to note that both the Area Under the Curve (AUC) values in Receiver Operating Characteristic (ROC) and Precision-Recall Curve (PRC) serve as pivotal metrics for evaluating the predictive efficacy of binary classifier models across diverse threshold values. Our investigation delves into a comparative analysis encompassing five distinct methodologies, DNN, SVM, MLP, DT and RF, based on their performance metrics (AUC values within ROC and PRC). The comprehensive depiction of average experimental outcomes stemming from the 10-fold cross-validation method is graphically represented in Figures 4–8.
Upon scrutiny, the graphical representations affirm that the DNN approach displays superior predictive performance compared to the other methodologies, boasting an AUC in ROC of 0.9027 and an AUC in PRC of 0.9169. In contrast, the DT approach falls short, demonstrating significantly lower AUC values, with 0.5888 in ROC and 0.6637 in PRC, indicating its inferior predictive capability.
4.4 Management implications for education
These learning patterns (base, comparison and difference patterns) from education data can be used to identify student behavior patterns and provide assistance to improve learning performance. The experimental results in Table 14 show that the difference pattern (Higher = yes, Internet = yes) is the major difference between the two learning performance values (G1: Bad and G1: Good). Therefore, teachers should provide assistance for students who want to have higher education, have Internet access at home and who still could have high learning performance (G1: Good).
After identifying performance learning patterns, we suggest adapting prediction models to understand learning performance of students in advance. The experimental results in Section 4.3 show that DNN performs better than other methods (SVM, MLP, DT and RF) in prediction performance metrics (Precision, Recall, F1-score and Accuracy). Therefore, deep learning methods, such as DNN, should be used as classifiers to predict the learning performance labels classifications of students.
5. Conclusion
This study makes several contributions, including a new framework for identifying learning patterns and predicting learning performance. First, the learning patterns identification module is used for discovering difference patterns which could identify the difference of learning performance among different clusters of students. Second, deep learning prediction models (DNN) are used to construct models for predicting students’ learning performance. Finally, we integrate the two modules into a framework to forecast learning performance of students.
Experimental results from survey data indicated that the proposed identifying learning patterns module can facilitate identifying patterns of interest and valuable difference (change) patterns from student’s profiles. In addition, statistical hypothesis testing is used to verify experimental results generated by the proposed approach. In addition, the proposed predicting learning performance module that adapts DNN to be a classifier performs better than the other traditional machine techniques prediction performance metrics.
The proposed model exhibits some weaknesses and limitations. Firstly, its evaluation relies solely on a single dataset, potentially restricting the model’s generalizability. Secondly, we have not assessed the performance impact of integrating feature filter or wrapper methods with machine learning algorithms, which could further enhance the model’s efficacy.
There are several issues that remain to be addressed in the future. First, we focus on discovering difference patterns from different clusters classified by learning performance labels. In some applications, education administrators may be interested in other classification labels. For this, it would be helpful to design other algorithms for specific aims. Second, it could also be useful to integrate more deep learning prediction techniques to provide better learning assistance. Chang et al. (2023) integrated feature selection methods to improve prediction performance. Third, incorporating graphs or charts to visually depict predicted outcomes can greatly enhance the manuscript's clarity and facilitate a deeper understanding of student learning performance. Such visual aids could make the results more accessible and easier to interpret for educators and administrators. For instance, Figure 9 provides a visual representation of our experimental results using the Decision Tree (DT) method. While we acknowledge that the DT method may not offer the best performance among the techniques we tested, its visual output serves as a beneficial tool for illustrating our findings in a more accessible manner. Fourth, numerous efficient algorithms for identifying frequent itemsets have been proposed, ensuring models can scale effectively to accommodate large datasets. Notable examples include H-mine (Pei et al., 2001) and Index-BitTableFI (Song et al., 2008). Integrating these advanced algorithms into our framework represents a critical avenue for future research, enhancing our model’s scalability and performance in processing extensive datasets. Finally, the proposed framework presents an opportunity for refinement and enhancement, aiming for greater efficiency in future iterations. Through targeted improvements and strategic adjustments, the framework can evolve to meet emerging needs and challenges in the educational application domain [1].
The authors would like to thank Dr Ruo-ping Han for Statistical Analysis. The research was supported by the National Science and Technology Council of the Republic of China under the grants NSTC 112-2410-H-018-044 and NSTC 112-2410-H-194-032-MY2.
Notes1.We would like to extend our sincere gratitude to the anonymous reviewers for their insightful comments and suggestions. Their constructive feedback has played a pivotal role in refining and improving the quality of this manuscript.
Data availability: The datasets analyzed during the current study are available in the UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/.
Conflict of interest statement: There is no conflict of interest in this study.
Figure 1
Framework of this study
[Figure omitted. See PDF]
Figure 2
Framework for identifying learning patterns
[Figure omitted. See PDF]
Figure 3
Framework for predicting students’ learning performance
[Figure omitted. See PDF]
Figure 4
AUC of ROC and PRC in DNN
[Figure omitted. See PDF]
Figure 5
AUC of ROC and PRC in SVM
[Figure omitted. See PDF]
Figure 6
AUC of ROC and PRC in MLP
[Figure omitted. See PDF]
Figure 7
AUC of ROC and PRC in RF
[Figure omitted. See PDF]
Figure 8
AUC of ROC and PRC in decision tree
[Figure omitted. See PDF]
Figure 9
Visualization of the result by the DT approach
[Figure omitted. See PDF]
Table 1
Overview of literature in frequent itemsets mining
| Studies | Type of itemsets | Contribution |
|---|---|---|
| Agrawal and Srikant (1994) | Frequent itemset | Proposed Apriori algorithm for mining frequent itemset and association rules |
| Chen and Li (2008) | Frequent itemset | Proposed lattice-based frequent itemsets mining algorithm for improving the performance in support counting |
| Duong et al. (2014) | Constraint-based frequent itemset | Proposed an efficient method for mining frequent itemsets with double constraints |
| Han et al. (2000) | Frequent itemset | Proposed an efficient method for mining frequent itemsets without candidate geneRatio |
| Lin et al. (2011) | Frequent itemset | Proposed the IFP-growth (improved FP-growth) algorithm to improve the performance of FP-growth in mining frequent itemsets |
| Liu et al. (2012) | Maximal frequent itemsets | Proposed maximal frequent itemsets mining algorithm to improve storage efficiency of data structure and time efficiency |
| Sarath and Ravi (2013) | Frequent itemset | Proposed a binary particle swarm optimization (BPSO) based association rule mining algorithm for generating the best M rules |
Source(s): By authors
Table 2
Overview of EDM research
| Works | Techniques | Aim |
|---|---|---|
| Buldu and Üçgün (2010) | Association rule | Discover the rules that identify relations between the courses that the students failed |
| Şen et al. (2012) | Classification | Predicting secondary education placement test results and identifying the most important predictors |
| Sen and Ucar (2012) | Classification | Comparing the achievements of students studying in distance education with those in regular education |
| Romero et al. (2013) | Classification, Clustering | Improving prediction of students’ final performance |
| Natek and Zwilling (2014) | Classification | Predicting the success rate of students enrolled in their courses |
| Okoye et al. (2014) | Association rule | Discovering user interaction patterns within learning processes |
| Gupta et al. (2015) | Classification | Identifying knowledge indicators in higher education organization |
| Kaur et al. (2015) | Classification | Identifying the slow learners among students and presenting that knowledge |
| Santos and Boticario (2015) | Association rules, Decision tree | Identifying indicators for course success |
| Xing et al. (2015) | Genetic Programming | Predicting participation-based student final performance |
| Angeli et al. (2017) | Association rules mining | Educational technologists’ use of association rules mining for guiding and monitoring school-based technology integRatio efforts |
| Romero and Ventura (2020) | Updated survey | Reviewing the main publications, key milestones, the knowledge discovery cycle, main educational environments and specific tools in this research area |
Source(s): By authors
Table 3
Comparison of this study with previous works
| Works | Type of itemsets | Contribution |
|---|---|---|
| This study | Frequent itemsets | Identify learning difference from classified student’s profiles |
| Buldu and Üçgün (2010) | Frequent itemsets | Discover rules that identify the relation between courses that the students failed |
| Santos and Boticario (2015) | Frequent itemsets | Identifying indicators that could refer to course success |
Source(s): By authors
Table 4
Some student instances for learning
| ID | Sex | Medu | Fedu | Mjob | Fjob | Guardian | schoolsup | famsup | Higher | Score |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | F | 1 | 1 | at_home | other | mother | yes | no | yes | G |
| 2 | F | 4 | 2 | health | services | mother | no | yes | yes | G |
| 3 | M | 4 | 3 | services | other | mother | no | yes | yes | G |
| 4 | M | 2 | 2 | other | other | mother | no | no | yes | G |
| 5 | M | 3 | 2 | services | other | mother | no | yes | yes | G |
| 6 | F | 2 | 1 | services | other | mother | no | yes | no | B |
| 7 | F | 2 | 1 | at_home | other | mother | no | yes | no | B |
| 8 | M | 2 | 1 | services | services | mother | no | no | no | B |
| 9 | M | 2 | 1 | services | other | mother | no | no | no | B |
| 10 | M | 2 | 2 | services | services | mother | no | yes | no | B |
Source(s): By authors
Table 5
Frequent itemsets discovered from clusters
| Learning performance = bad | Learning performance = good | ||
|---|---|---|---|
| Itemset | sup | Itemset | sup |
| Paid = no | 0.92 | Higher = yes | 0.99 |
| Pstatus = T | 0.88 | Paid = no | 0.96 |
| Schoolsup = no | 0.88 | Higher = yes, Paid = no | 0.94 |
| Paid = no, Schoolsup = no | 0.82 | Schoolsup = no | 0.91 |
| Paid = no, Pstatus = T | 0.81 | Higher = yes, Schoolsup = no | 0.90 |
| Higher = yes | 0.81 | Paid = no, Schoolsup = no | 0.88 |
| Nursery = yes | 0.79 | Pstatus = T | 0.87 |
| Pstatus = T, Schoolsup = no | 0.78 | Higher = yes, Paid = no, Schoolsup = no | 0.86 |
| Higher = yes, Paid = no | 0.74 | Higher = yes, Pstatus = T | 0.86 |
| Nursery = yes, Paid = no | 0.73 | Paid = no, Pstatus = T | 1.83 |
Note(s): Bad and good
Source(s): By authors
Table 6
Frequent itemsets vs minimum support
| Support | 0.90 | 0.85 | 0.80 | 0.75 | 0.70 | 0.65 | 0.60 | 0.55 | 0.50 |
|---|---|---|---|---|---|---|---|---|---|
| L1 | 1 | 3 | 4 | 5 | 7 | 8 | 9 | 11 | 12 |
| L2 | 0 | 0 | 2 | 3 | 6 | 13 | 19 | 25 | 33 |
| L3 | 0 | 0 | 0 | 0 | 1 | 2 | 8 | 18 | 32 |
| Total | 1 | 3 | 6 | 8 | 14 | 23 | 36 | 54 | 77 |
Note(s): G1 = Bad
Source(s): By authors
Table 7
Frequent itemsets vs minimum support
| Support | 0.90 | 0.85 | 0.80 | 0.75 | 0.70 | 0.65 | 0.60 | 0.55 | 0.50 |
|---|---|---|---|---|---|---|---|---|---|
| L1 | 3 | 4 | 6 | 6 | 7 | 8 | 10 | 11 | 13 |
| L2 | 1 | 4 | 7 | 11 | 14 | 19 | 25 | 34 | 45 |
| L3 | 0 | 1 | 2 | 6 | 11 | 18 | 28 | 40 | 63 |
| Total | 4 | 9 | 15 | 23 | 32 | 45 | 63 | 85 | 121 |
Note(s): G1 = Bad
Source(s): By authors
Table 8
Frequent itemsets
| No | Itemset | Support | Count | Odds-ratio |
|---|---|---|---|---|
| 1 | Paid = no | 0.924 | 317 | 1.082 |
| 2 | Pstatus = T | 0.883 | 303 | 1.139 |
| 3 | Schoolsup = no | 0.880 | 302 | 1.082 |
| 4 | Paid = no, Schoolsup = no | 0.816 | 280 | 1.045 |
| 5 | Paid = no, Pstatus = T | 0.813 | 279 | 1.094 |
| 6 | Higher = yes | 0.810 | 278 | 0.921 |
| 7 | Nursery = yes | 0.790 | 271 | 1.084 |
| 8 | Pstatus = T, Schoolsup = no | 0.781 | 268 | 1.107 |
Note(s): G1 = Bad
Source(s): By authors
Table 9
Base patterns in cluster
| No | Itemset | Support | Count | Odds-ratio |
|---|---|---|---|---|
| 1 | Paid = no | 0.924 | 317 | 1.082 |
| 2 | Pstatus = T | 0.883 | 303 | 1.139 |
| 3 | Schoolsup = no | 0.880 | 302 | 1.082 |
| 4 | Paid = no, Schoolsup = no | 0.816 | 280 | 1.045 |
| 5 | Paid = no,Pstatus = T | 0.813 | 279 | 1.094 |
| 6 | Higher = yes | 0.810 | 278 | 0.921 |
| 7 | Nursery = yes | 0.790 | 271 | 1.084 |
| 8 | Pstatus = T, Schoolsup = no | 0.781 | 268 | 1.107 |
Note(s): G1 = Bad
Source(s): By authors
Table 10
High odds-ratio base patterns in cluster
| Itemset | Cluster (G1 = Bad) | Cluster (G1 = Good) | ||||
|---|---|---|---|---|---|---|
| Count | Sup | Odds-ratio | Count | Sup | Odds-ratio | |
| Paid = no | 317 | 0.924 | 1.082 | 293 | 0.924 | 0.958 |
| Pstatus = T | 303 | 0.883 | 1.139 | 266 | 0.878 | 0.869 |
| Schoolsup = no | 302 | 0.880 | 1.082 | 279 | 0.924 | 0.912 |
| Paid = no, Pstatus = T | 279 | 0.813 | 1.094 | 255 | 0.914 | 0.833 |
| Nursery = yes | 271 | 0.790 | 1.084 | 250 | 0.923 | 0.817 |
| Pstatus = T, Schoolsup = no | 268 | 0.781 | 1.107 | 242 | 0.903 | 0.791 |
Note(s): G1 = Bad
Source(s): By authors
Table 11
High odds-ratio comparison patterns
| Base pattern | Comparison pattern | ||||||
|---|---|---|---|---|---|---|---|
| Itemset | Count | Sup | Odds-ratio | Itemset | Count | Sup | Odds-ratio |
| Paid = no | 317 | 0.924 | 1.082 | Higher = yes, Internet = yes, Paid = no | 237 | 0.775 | 1.288 |
| Paid = no | 317 | 0.924 | 1.082 | Higher = yes, Paid = no, Schoolsup = no | 264 | 0.863 | 1.200 |
| Paid = no | 317 | 0.924 | 1.082 | Higher = yes, Nursery = yes, Paid = no | 236 | 0.771 | 1.163 |
| Paid = no | 317 | 0.924 | 1.082 | Higher = yes, Paid = no | 289 | 0.944 | 1.133 |
| Paid = no | 317 | 0.924 | 1.082 | Higher = yes, Paid = no, Pstatus = T | 252 | 0.824 | 1.120 |
| Paid = no | 317 | 0.924 | 1.082 | Internet = yes, Paid = no | 240 | 0.784 | 1.062 |
| Pstatus = T | 303 | 0.883 | 1.139 | Higher = yes, Paid = no, Pstatus = T | 252 | 0.824 | 1.120 |
| Pstatus = T | 303 | 0.883 | 1.139 | Higher = yes, Pstatus = T, Schoolsup = no | 239 | 0.781 | 1.117 |
| Pstatus = T | 303 | 0.883 | 1.139 | Higher = yes, Pstatus = T | 263 | 0.859 | 1.065 |
| Schoolsup = no | 302 | 0.880 | 1.082 | Higher = yes, Paid = no, Schoolsup = no | 264 | 0.863 | 1.200 |
| Schoolsup = no | 302 | 0.880 | 1.082 | Higher = yes, Schoolsup = no | 275 | 0.899 | 1.151 |
| Schoolsup = no | 302 | 0.880 | 1.082 | Higher = yes, Pstatus = T, Schoolsup = no | 239 | 0.781 | 1.117 |
| Schoolsup = no | 302 | 0.880 | 1.082 | Internet = yes, Schoolsup = no | 231 | 0.755 | 1.065 |
| Nursery = yes | 271 | 0.790 | 1.084 | Higher = yes, Nursery = yes, Paid = no | 236 | 0.771 | 1.163 |
| Nursery = yes | 271 | 0.790 | 1.084 | Higher = yes, Nursery = yes | 248 | 0.810 | 1.122 |
Source(s): By authors
Table 12
Difference patterns
| No | Base pattern | Comparison pattern | Difference pattern |
|---|---|---|---|
| 1 | Paid = no | Higher = yes, Internet = yes, Paid = no | Higher = yes, Internet = yes |
| 2 | Paid = no | Higher = yes, Paid = no,S choolsup = no | Higher = yes, Schoolsup = no |
| 3 | Paid = no | Higher = yes, Nursery = yes, Paid = no | Higher = yes, Nursery = yes |
| 4 | Paid = no | Higher = yes, Paid = no | Higher = yes |
| 5 | Paid = no | Higher = yes, Paid = no, Pstatus = T | Higher = yes, Pstatus = T |
| 6 | Paid = no | Internet = yes, Paid = no | Internet = yes |
| 7 | Pstatus = T | Higher = yes, Paid = no, Pstatus = T | Higher = yes, Paid = no |
| 8 | Pstatus = T | Higher = yes, Pstatus = T, Schoolsup = no | Higher = yes, Schoolsup = no |
| 9 | Pstatus = T | Higher = yes, Pstatus = T | Higher = yes |
| 10 | Schoolsup = no | Higher = yes, Paid = no, Schoolsup = no | Higher = yes, Paid = no |
| 11 | Schoolsup = no | Higher = yes, Schoolsup = no | Higher = yes |
| 12 | Schoolsup = no | Higher = yes, Pstatus = T, Schoolsup = no | Higher = yes, Pstatus = T |
| 13 | Schoolsup = no | Internet = yes, Schoolsup = no | Internet = yes |
| 14 | Nursery = yes | Higher = yes, Nursery = yes, Paid = no | Higher = yes, Paid = no |
| 15 | Nursery = yes | Higher = yes, Nursery = yes | Higher = yes |
Source(s): By authors
Table 13
High odds-ratio difference patterns
| No | Base pattern | Comparison pattern | Difference pattern | Odds-rastio |
|---|---|---|---|---|
| 1 | Paid = no | Higher = yes, Internet = yes, Paid = no | Higher = yes, Internet = yes | 1.233 |
| 2 | Paid = no | Higher = yes, Paid = no, Schoolsup = no | Higher = yes, Schoolsup = no | 1.151 |
| 3 | Paid = no | Higher = yes, Nursery = yes, Paid = no | Higher = yes, Nursery = yes | 1.122 |
| 4 | Paid = no | Higher = yes, Paid = no | Higher = yes | 1.086 |
| 5 | Paid = no | Higher = yes, Paid = no, Pstatus = T | Higher = yes, Pstatus = T | 1.065 |
| 6 | Pstatus = T | Higher = yes, Paid = no, Pstatus = T | Higher = yes, Paid = no | 1.133 |
| 7 | Pstatus = T | Higher = yes, Pstatus = T, Schoolsup = no | Higher = yes, Schoolsup = no | 1.151 |
| 8 | Pstatus = T | Higher = yes, Pstatus = T | Higher = yes | 1.086 |
| 9 | Schoolsup = no | Higher = yes, Paid = no, Schoolsup = no | Higher = yes, Paid = no | 1.133 |
| 10 | Schoolsup = no | Higher = yes, Schoolsup = no | Higher = yes | 1.086 |
| 11 | Schoolsup = no | Higher = yes, Pstatus = T, Schoolsup = no | Higher = yes, Pstatus = T | 1.065 |
| 12 | Nursery = yes | Higher = yes, Nursery = yes, Paid = no | Higher = yes, Paid = no | 1.133 |
| 13 | Nursery = yes | Higher = yes, Nursery = yes | Higher = yes | 1.086 |
Source(s): By authors
Table 14
Statistical hypothesis testing
| No | Base pattern | Comparison pattern | Difference pattern | H1 p-value | H2 p-value |
|---|---|---|---|---|---|
| 1 | Paid = no | Higher = yes, Internet = yes, Paid = no | Higher = yes, Internet = yes | 0.0000 | 0.9174 |
| 2 | Paid = no | Higher = yes, Paid = no, Schoolsup = no | Higher = yes, Schoolsup = no | 0.0000 | 0.9255 |
| 3 | Paid = no | Higher = yes, Nursery = yes, Paid = no | Higher = yes, Nursery = yes | 0.0000 | 0.4583 |
| 4 | Paid = no | Higher = yes, Paid = no | Higher = yes | 0.0000 | 0.6952 |
| 5 | Paid = no | Higher = yes, Paid = no, Pstatus = T | Higher = yes, Pstatus = T | 0.0000 | 0.6921 |
| 6 | Pstatus = T | Higher = yes, Paid = no, Pstatus = T | Higher = yes, Paid = no | 0.0000 | 0.6921 |
| 7 | Pstatus = T | Higher = yes, Pstatus = T, Schoolsup = no | Higher = yes, Schoolsup = no | 0.0000 | 0.8219 |
| 8 | Pstatus = T | Higher = yes, Pstatus = T | Higher = yes | 0.0000 | 0.6613 |
| 9 | Schoolsup = no | Higher = yes, Paid = no, Schoolsup = no | Higher = yes, Paid = no | 0.0000 | 0.9255 |
| 10 | Schoolsup = no | Higher = yes, Schoolsup = no | Higher = yes | 0.0000 | 0.7225 |
| 11 | Schoolsup = no | Higher = yes, Pstatus = T, Schoolsup = no | Higher = yes, Pstatus = T | 0.0000 | 0.8219 |
| 12 | Nursery = yes | Higher = yes, Nursery = yes, Paid = no | Higher = yes, Paid = no | 0.0000 | 0.4583 |
| 13 | Nursery = yes | Higher = yes, Nursery = yes | Higher = yes | 0.0000 | 0.2081 |
Note(s): G1: Bad → Good
Source(s): By authors
Table 15
Average experiment results of machine learning approaches
| Method | Precision | TPR/recall | F1 | Accuracy | ROC | PRC |
|---|---|---|---|---|---|---|
| DNN | 0.90 | 0.90 | 0.90 | 0.90 | 0.9027 | 0.9169 |
| SVM | 0.62 | 0.64 | 0.63 | 0.65 | 0.6482 | 0.7171 |
| MLP | 0.56 | 0.72 | 0.63 | 0.60 | 0.6042 | 0.7035 |
| RF | 0.61 | 0.61 | 0.61 | 0.63 | 0.6306 | 0.7018 |
| DT | 0.57 | 0.54 | 0.55 | 0.59 | 0.5888 | 0.6637 |
Source(s): By authors
© Emerald Publishing Limited.
