Novel framework for learning performance prediction using pattern identification and deep learning

Abstract

Purpose

Educational data mining (EDM) discovers significant patterns from educational data and thus can help understand the relations between learners and their educational settings. However, most previous data mining techniques focus on prediction of learning performance of learners without integrating learning patterns identification techniques.

Design/methodology/approach

This study proposes a new framework for identifying learning patterns and predicting learning performance. Two modules, the learning patterns identification module and the deep learning prediction models (DNN), are integrated into this framework to identify the difference of learning performance and predicting learning performance from profiles of students.

Findings

Experimental results from survey data indicate that the proposed identifying learning patterns module could facilitate identifying valuable difference (change) patterns from student’s profiles. The proposed learning performance prediction module which adapts DNN also performs better than traditional machine techniques in prediction performance metrics.

Originality/value

To our best knowledge, the framework is the only educational system in the literature for identifying learning patterns and predicting learning performance.

Full text

Translate

Turn on search term navigation

1. Introduction

Educational data mining (EDM) is an emerging research area focused on discovering patterns from educational data to help understand the relations between learners and educational settings. However, most educational data mining techniques focus on predicting learning performance based on learners’ profiles, rather than identifying their characteristics to evaluate their learning performance. In particular, the characteristics of learners with low learning performance are necessary to initiate early intervention for learners who need teaching assistance.

This study considers association-based classification patterns, which are used to identify the associations between cause and effect to establish students’ learning performance profiles. Furthermore, we propose a measure (OddsRatio) to determine valuable patterns from each cluster of instances. For example, (Pattern X₁) → (Bad) means that students belonging to the learning outcome (Bad) group have the characteristics: (Pattern X₁). We also identify another pattern: (Pattern X₂) → (Good). Identifying the difference (ΔX) between patterns (X₁ and X₂) that leads to different results is the main purpose of this study. The knowledge in this example reveals the association causes for the effect. Assume that we have two patterns, Pattern X₁ = {(Paid = no) → (Bad); support = 0.92; count = 293; OddsRatio = 1.08} and Pattern X₂ = {(Higher = yes, Paid = no) → (Good); support = 0.94; count = 289; OddsRatio = 1.13}; therefore, the difference (ΔX) between two patterns (X₂ and X₁) is {(Higher = yes)}.

The above example indicates that difference pattern (ΔX), {(Higher = yes)}, is a change pattern for instances (students) with pattern X₁ = {(paid = no) in cluster learning performance (Bad) who move to cluster learning performance (Good). The knowledge, pattern {(Higher = yes)}, in the above example reveals the association causes for the effect, cluster learning performance (Bad) moving to cluster learning performance (Good). However, no studies in the education data mining field, to our knowledge, have addressed the important issue of change patterns, difference (ΔX), identification in association-based classification patterns.

It is important for educational institutions to have approximate prior knowledge of students to predict their performance in future academics. To address these problems, we propose a framework for identifying learning patterns and predicting learning performance. First, the learning patterns identification module is used for discovering difference patterns that could identify the difference of learning performance among different clusters of students. Second, deep learning prediction models (DNN) are employed for constructing model to predict students’ learning performance.

The rest of this paper is organized as follows. Section 2 reviews related work. Methodology is given in Section 3. The experimental results are illustrated in Section 4. Conclusions and future work are discussed in Section 5.

2. Related work

2.1 Frequent itemsets mining

Frequent pattern mining reveals intrinsic and important properties of datasets and is the foundation of association rule mining. Mining frequent itemsets in association rule mining is crucial (Agrawal et al., 1993). Most of the frequent itemset mining algorithms are improved or derivative algorithms based on Apriori (Agrawal and Srikant, 1994) and FP-growth (Han et al., 2000). More efficient methods for mining frequent itemsets have also been proposed, such as H-mine (Pei et al., 2001) and Index-BitTableFI (Song et al., 2008). However, most of these algorithms focus on improving the efficiency in frequent itemset mining processes, rather than mining specific itemsets, such as specific later-marketed items. We provide overview of the literature on frequent itemsets mining in Table 1.

2.2 Educational data mining

Data mining or knowledge discovery in databases (KDD) has been applied to some central e-learning issues, such as the assessment of student’s learning performance and the evaluation of learning materials and Web based courses. KDD can also be used to learn the model for the learning process (Hämäläinen et al., 2004) and student modeling (Tang and McCalla, 2002), to evaluate and improve e-learning systems (Zaïane and Luo, 2001) and to discover useful learning information from learning portfolios (Hwang et al., 2004).

The data mining techniques applied in these contexts enable course adaptation and learning recommendations based on the students’ learning behavior. These techniques also enable feedback to teachers and students of e-learning courses and help identify typical learning behavior (Castro et al., 2007; Baker and Yacef, 2009).

The increasing interest in data mining and educational systems have made educational data mining a new and growing research community. Romero and Ventura (2007) surveyed the application of data mining to traditional educational systems, particular web-based courses, well-known learning content management systems, and adaptive and intelligent web-based educational systems.

The goals of EDM are varied: from constructing and improving student models, designing support in digital settings to scientific discovery about learners and learning (Baker, 2010). Areas of application include predictive (decision support), generative (creating new or improved designs for learning), or explanatory (scientific analysis) (Pahl, 2004). They also include studies on individual learning using educational software, computer-supported collaborative learning, computer adaptive testing and factors relating to student failure or non-retention in courses (Baker and Yacef, 2009).

EDM is a multidisciplinary field related to several well-established areas of research, including e-learning, adaptive hypermedia, intelligent tutoring systems and data mining (Nachmias, 2011). Nachmias (2011) considered that most of the web mining techniques applied to educational systems use one of three types of analysis: (1) clustering, classification and outlier detection; (2) association rule mining and sequential pattern mining; and (3) text mining. Angeli et al. (2017) illustrated how data mining (association rules mining) can be used to advance educational software evaluation practices in the field of educational technology. Rodrigues et al. (2018) reviewed EDM research on the teaching and learning process from the pedagogical perspective. Table 2 provides an overview of EDM research.

The overview of EDM (Table 2) shows where itemset mining has been applied in EDM. Buldu and Üçgün (2010) discovered the rules which identified the relation between the courses that the students failed have been revealed. Santos and Boticario (2015) identified indicators that could predict course success. Differences and applications of our and prior studies are shown in Table 3.

2.3 Deep learning in education data

Deep learning (DL) enables computers to perform complex calculations by relying on simpler calculations to optimize computer efficiency. Kabashima et al. (2018) used DNN-based scoring techniques to examine the tasks of (1) predicting a language learner’s oral proficiency and (2) predicting comprehensibility of his/her pronunciation based on native listeners’ responsive shadowing. Experiments show that their proposed automatic rating module could be introduced to language education to function as another human rater.

Using the huge datasets obtained from previous student performance, traditional machine learning does not work well when run directly because it does not consider the nature of data behavior. In DL, features are extracted automatically from given data (Hassan et al., 2020). DL can adapt to any improvement in the hidden layers during training and the training goes through a backpropagation algorithm. The DNN model performs better with complicated data and nonlinear functions (Lin et al., 2020).

Predicting students’ performance is very important for higher education as well as for deep learning and its relationship to educational data. Li and Liu (2021) used deep neural networks for prediction by extracting informative data as features with corresponding weights. Multiple updated hidden layers are used to design neural networks automatically. Their proposed system has demonstrated efficiency through the achieved results to obtain the most accurate predictions. Therefore, this study uses deep learning methods (such as DNN) as classifiers to predict the learning performance labels of students. Then the models which are most suitable to predict the learning performance labels of students can be evaluated.

3. Methodology

This study proposes a framework of identifying learning patterns and predicting learning performance, as shown in Figure 1. There are two modules in the proposed framework: the learning patterns identification module and the learning performance prediction model. We will illustrate the two modules in the next sections.

3.1 Problem definitions of learning patterns

This section defines the problem of how to identify changes (difference patterns) among different clusters of instances for cause-and-effect relationship.

Definition 1. Association rule Let X be a set of items. A transaction T is said to contain X if and only if X ⊆ T. An association rule is an implication of the form X⇒c, where X⊂I, c ∈ C, I is a set of items, and C is a set of class labels. The rule X⇒c holds in dataset D with support sup, where sup is the percentage of the transactions in D containing X that also include c, i.e. X ∪ c. Association rule X⇒c has a confidence conf in D, where conf is the percentage of transactions in D containing X that also contain c. The following are the formal expressions for Support(X ∪ c) and Confidence(X ⇒ c), respectively.

$S u p p o r t (X \cup c) = \frac{| X \cup c |}{| D |}$ , where |D| denotes the number of transactions in D and |X ∪ c | denotes the number of transactions containing X ∪ c in D. $C o n f i d e n c e (X \Rightarrow c) = \frac{S u p p o r t (X \cup c)}{S u p p o r t (X)}$

Given user-specified support threshold σ_sup and confidence threshold σ_conf, an itemset Xc is frequent if sup(Xc) is no smaller than σ_sup. In addition, an association rule X⇒c is identified if Confidence(X ⇒ c) is no smaller than σ_conf.

Example 1. Assume two frequent itemsets, {(paid = no); support = 0.92; learning performance = Bad; dataset cluster label = Bad}, and {(Higher = yes, Paid = no); support = 0.94; learning performance = Good; dataset cluster label = Good}. Given minimum support threshold (σ_sup = 0.5) and minimum confidence threshold (σ_conf = 100%), we have two association rules {(paid = no) ⇒ (learning performance = Bad); and (Higher = yes, Paid = no) ⇒ (learning performance = Good)}.

Definition 2. Base/comparison pattern Assume two frequent itemsets (X and Y) discovered. Itemset (X) could be taken as base pattern, and then the other itemset (Y) would be taken as comparison pattern, where X ∈ Y.

Example 2. In Example 1, we can take frequent itemset {(paid = no); support = 0.92; (learning performance = Bad); dataset cluster label = Bad} as a base pattern, and the other frequent itemset {(Higher = yes, paid = no); support = 0.94; (learning performance = Good); dataset cluster label = Good} as a comparison pattern, because {(paid = no) ∩ (Higher = yes, paid = no)} = (paid = no).

In this study, we apply an index called Odds Ratio (OR), based on the concept of “relative risk” (Li et al., 2005), to indicate whether a pattern is more frequent in one cluster than in another cluster.

Definition 3. Odds Ratio; OR Assume a pattern (X_i^r) in cluster R and the other pattern (X_j^s) in cluster S, where X_i^r = X_j^s. The OR of pattern (X_i^r) against pattern (X_j^s) can be defined as follows:

OR (X_{i}^{r}, X_{j}^{s}) = c o u n t (X_{i}^{r}) / c o u n t (X_{j}^{s}) .

Example 3. Assume a pattern {(paid = no); support = 0.92; count = 317; learning performance = Bad; dataset cluster label = Bad}, and the other pattern {(paid = no); support = 0.96; count = 293; learning performance = Good; dataset cluster label = Good}, and then its Odds Ratio is OR((paid = no)^Bad, (paid = no)^Good) = 317/293 = 1.08.

In addition, assume a pattern {(Higher = yes, Paid = no); support = 0.94; count = 289; learning performance = Good; dataset cluster label = Good}, and the other pattern {(Higher = yes, Paid = no); support = 0.74; count = 255; learning performance = Good; dataset cluster label = Bad}, and then its Odds Ratio is OR((Higher = yes, Paid = no)^Good, (Higher = yes, Paid = no)^Bad) = 289/255 = 1.13.

Definition 4. High odds-ratio pattern Given two patterns (X_i^r and X_j^s) and a user-specified score threshold σ_OR, pattern (X_i^r) is a high Odds-Ratio pattern if $OR (X_{i}^{r}, X_{j}^{s})$ is no smaller than σ_OR.

Example 4. Let score threshold σ_OR = 1.05. In Example 3, we know that {(paid = no); support = 0.92; count = 317; learning performance = Bad; dataset cluster label = Bad} is a high Odds-Ratio pattern in cluster (learning performance = Bad), because OR((paid = no)^R, (paid = no)^S) = 317/293 = 1.08 > σ_OR (1.05).

Furthermore, there is pattern {(Higher = yes, Paid = no); support = 0.94; count = 289; learning performance = Good; dataset cluster label = Good} is a high Odds-Ratio pattern in cluster (learning performance = Good), because of OR((Higher = yes, Paid = no)^Good, (Higher = yes, Paid = no)^Bad) = 289/255 = 1.13 > σ_OR (1.05).

Definition 5. Difference pattern Assume a base pattern (X_i^r) in cluster R and a comparison pattern (X_k^s) in cluster S and patterns (X_i^r and X_k^s) are both high Odds-Ratio patterns, the difference (X_k^s – X_i^r) is defined as difference pattern which represents the difference between base pattern (X_i^r) and comparison pattern (X_k^s).

Example 5. In Example 4, we have a high Odds-Ratio base pattern (Paid = no) in cluster (learning performance = Bad) and a high Odds-Ratio comparison pattern (Higher = yes, Paid = no) in cluster (learning performance = Good), we can identify the difference pattern (Higher = yes) between pattern (Paid = no) in cluster (learning performance = Bad) and pattern (Higher = yes, Paid = no) in cluster (learning performance = Good).

3.2 Framework for identifying learning patterns

To further explain the proposed procedures to identify characteristics (changes) for discovering learning patterns, the proposed framework of identifying learning patterns is illustrated in Figure 2. The procedures can be divided into four phases: (1) instance classification; (2) frequent itemset discovery; (3) pattern identification; and (4) pattern evaluation.

This study discovers difference (change) patterns that could identify differences of learning performance among different clusters of students. Given two clusters (Bad and Good) of student learning performance, we first identify base patterns with high Odds-Ratio values in cluster (Bad) and comparison patterns with high Odds-Ratio values cluster (Good). Then high Odds-Ratio difference patterns can be identified.

The operating steps of the proposed framework of identifying learning patterns are as follows.

Step 1: Instances classification

The instance classification phase classifies instances (records) according to their labels. Taking learning performance of students for example, we can classify instances (records) into two class labels (Good and Bad), according to the students’ quiz scores. If the scores are above average, their learning performance label is classified as “Good”, otherwise “Bad”. Finally, the student instances are divided into two clusters (G: Good and B: Bad), as shown in Table 4.

Step 2: Frequent itemsets discovering.

In mining phase, we discover frequent itemsets which are used for base and comparison patterns for each cluster of instances. Given minimum support threshold (σ_sup = 0.5), two sets of frequent itemsets are discovered from the student instances belonged to two clusters (G: Good and B: Bad) shown in Table 5.

Step 3: Patterns identifying.
- Step 3.1: Determine base patterns.

A base pattern is a frequent itemset. With a minimum support threshold of 0.5, pattern (Paid = no) with support (0.92) is a base pattern in cluster (Bad); pattern (Paid = no) with support (0.96) is a base pattern in cluster (Good). In addition, pattern (Higher = yes) with support (0.81) is a base pattern in cluster (Bad); pattern (Higher = yes) with support (0.99) is a base pattern in cluster (Good).

Step 3.2: Determine high Odds-Ratio base patterns.

Minimum Odds-Ratio threshold (σ_OR) is set to 1.05. Given a base pattern (Paid = no) with counts (317) in cluster (Bad) and a base pattern (Paid = no) with counts (293) in cluster (Good), we know that Odds-Ratio^Bad_Good(Paid = no) = (317/293) = 1.08 > 1.0 (σ_OR). Therefore, base pattern (Paid = no) with Odds-Ratio value (1.08) is a high Odds-Ratio base pattern in cluster (Bad). That means instances (students) with pattern (Paid = no) are more frequent in cluster (Bad) than in cluster (Good).

Step 3.3: Determine comparison patterns.

A comparison pattern is a frequent itemset. We set minimum support threshold to 0.5. Given a base pattern (Paid = no) in cluster (Bad), we can choose a pattern (Higher = yes, Paid = no) with support (0.94) in cluster (Good) to be a comparison pattern.

Step 3.4: Identify high Odds-Ratio comparison patterns.

We set minimum Odds-Ratio threshold (σ_OR) to 1.05. Given a comparison pattern (Higher = yes, Paid = no) with counts (289) in cluster (Good) and a comparison pattern (Higher = yes, Paid = no) with counts (255) in cluster (Bad), we know that Odds-Ratio^Good_Bad(Higher = yes, Paid = no)= (289/255) = 1.13 > 1.0. Therefore, comparison pattern (Higher = yes, Paid = no) with Odds-Ratio value (1.13) in cluster (Good) is a high Odds-Ratio comparison pattern. That indicates instances (students) with pattern (Higher = yes, Paid = no) are more frequent in cluster (Good) than in cluster (Bad).

Step 3.5: Patterns evaluating

Given a high Odds-Ratio base pattern (Paid = no) in cluster (Bad) and a high Odds-Ratio comparison pattern (Higher = yes, Paid = no) in cluster (Good), we can identify the difference pattern (Higher = yes) between comparison pattern (Higher = yes, Paid = no) and base pattern (Paid = no). Moreover, there is the difference pattern (Higher = yes) with counts (278) in cluster (Bad) and a difference pattern (Higher = yes) with counts (302) in cluster (Good) and so OddsRatio^S_R(Higher = yes)= (302/278) = 1.09 > 1.0. That reveals instances (students) with difference pattern (Higher = yes) are more frequent in cluster (Good) than in cluster (Bad). The minimum Odds-Ratio threshold (σ_OR) is set to 1.05, so difference pattern (Higher = yes) with Odds-Ratio value (1.09) is a high Odds-Ratio difference pattern.

3.3 Development of prediction models for student learning performance

The proposed framework for predicting student learning performance is illustrated in Figure 3. The several classification techniques for constructing models to predict students’ learning performance are: Support-Vector Machine (SVM), Multi-Layer Perceptron (MLP), Decision Tree (DT), Random Forests (RF) and Deep Neural Networks (DNN). These classification techniques are briefly described below.

Support-Vector Machine (SVM): SVM is an algorithm that uses nonlinear mapping to transform the original training data into a higher dimension by searching the examples of one class label from another. SVM uses a subset of training examples (known as the support vectors) to represent the decision boundary (Joachims, 1998). SVM finds the best separating plane/hyperplane to separate all of the examples of class label (+1) from all of the examples of class label (−1). The learning task in Linear SVM can be formalized as the following constrained optimization problem:

\min_{w} \frac{{| | w | |}^{2}}{2} subject to y_{i} (w ∙ x_{i} + b) \geq 1, i = 1, 2, \dots, N .

where x_i is the ith example; w and b are parameter of the model. By the convention, let y_i∈{−1, 1} denote its class label.

Multi-Layer Perceptron (MLP): An artificial neural network (ANN) is an abstract computational model of a human brain. The architecture of an artificial neural network is defined by the characteristics of a node and the characteristics of the node’s connectivity in the network (Haykin and Lippmann, 1994). Perceptron is the simplest model in ANN model family. MLP can learn powerful non-linear transformations: in fact, with enough hidden units they can represent arbitrarily complex but smooth functions. In a perceptron, each input node is connected via a weighted link to the output node. The output of a perceptron model can be expressed as follows (Tan et al., 2006):

\hat{y} = sign [w_{d} x_{d} + w_{d - 1} x_{d - 1} + \dots + w_{1} x_{1} + w_{0} x_{0}] = sign (w ∙ x)

where w₀, w₁, …, w_d are the weights of the input links, x₀, x₁, …, w_d are the input attribute values and w is the weight vector and x is the input vector x.

The sign function acts as an activation function for the output neuron, output a value (+1) if its argument is positive and (−1) if its argument is positive. An artificial neural network has a more complex structure than that of a perceptron model. The goal of the MLP learning algorithm is to determine a set of weights (w) that minimizes the total sum of squared errors: $E (w) = \frac{1}{2} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}$ where $(y_{i} - {\hat{y}}_{i})$ is the predicted error.

The weight update formula used the gradient descent method can be written as follows: $w_{j} = w_{j} - l \frac{\partial E (w)}{\partial w_{j}}$ where l is the learning rate.

Decision Tree (DT): Quinlan (1986) proposed C4.5 (a successor of ID3), which became a benchmark to which newer supervised learning algorithms are often compared. The attribute with the maximum gain ratio is selected to be the splitting attribute. The gain ratio is defined as

GainRatio (A) = \frac{G a i n (A)}{{S p l i t I n f o}_{A} (D)}

where Gain(A) called Information gain which is defined as the difference between the original information requirement (i.e. based on just the proportion of classes) and the new requirement (i.e. obtained after portioning based on attribute A) from dataset D which is a set of class-labeled tuples. That is, Gain(A) = Info(D) - Info_A(D), where Info(D) is also known as the entropy of D.

Inf o_{A} (D)

is the expected information required to classify a tuple from D based on the portioning by A. Furthermore, a normalization is applied to the information gain using a split information (

{S p l i t I n f o}_{A} (D))

value defined analogously with Info(D) as

{S p l i t I n f o}_{A} (D) = - \sum_{j = 1}^{v} \frac{| D_{j} |}{| D |} \times \log_{2} (\frac{| D_{j} |}{| D |})

where

\frac{| D_{j} |}{| D |}

is the weight of the jth partition.

Random Forests (RF): Random forests is an ensemble method designed for decision tree classifiers. It combines the predictions made by multiple decision trees, where each tree is generated based on the values of an independent set of random vectors. When the number of trees is sufficiently large, it had been theoretically proven that the upper bound for generalization error of random forests converges to the following expression (Tan et al., 2006):

Generalization error \leq \frac{\hat{ρ} (1 - s^{2})}{s^{2}},

where $\hat{ρ}$ is the average correlation among the trees and s is a quantity that measures the “strength” of the tree classifiers. The strength of a set of classifiers is to be the average performance of the classifiers, where performance is measured probabilistically in terms of the classifier’s margin: $margin, M (X, Y) = P (\hat{Y_{θ}} = Y) - \max_{Z \neq Y} P (\hat{Y_{θ}} = Z),$ where $\hat{Y_{θ}}$ is the predicted class of X according to a classifier built from some random vector θ. The higher the margin is, the more likely it is that the classifier correctly predicts a given example X.

Deep Neural Networks (DNN)

A deep neural network (DNN) can be considered as a conventional multi-layer perceptron (MLP) with many hidden layers (thus deep). The DNN parameters are optimized with back propagation using stochastic gradient descent. DNN, a (L+1)-layer MLP, is used to model the posterior probability $P_{s | o} (s | o)$ of a hidden Markov model (HMM) tied state s given an observation vector o. The first L layers, l = 0 … L−1, are hidden layers that model posterior probability of hidden nodes h^l given input vectors v^l from previous layer while the top layer L is used to compute the posterior probability for all tied states using softmax (Pan et al., 2012): $P_{h_{j} | v}^{l} (h_{j}^{l} | v^{l}) = \frac{1}{1 + e^{- z_{j}^{l} (v^{l})}}, 0 \leq l \leq L$ $P_{s | v}^{L} (s | v^{L}) = {s o f t m a x}_{s} (z^{L} (v^{L}))$ $z^{l} (v^{l}) = {(W^{l})}^{T} v^{l} + α^{l}$ where W^l and α^l denote weight matrix and bias vectors for hidden layer l, while $h_{j}^{l}$ and $z_{j}^{l} (v^{l})$ denote the j-th component of hidden node, h^l and its activation $z^{l} (v^{l})$ , respectively.

4. Experimental results

4.1 Data set description

A Portuguese language course was used to evaluate performance of the proposed approach. The two datasets were provided by UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/). The Portuguese language course dataset contains 649 instances (records), and it has 33 attributes. It was provided by Paulo Cortez, University of Minho, GuimarÃ£es, Portugal, http://www3.dsi.uminho.pt/pcortez. There are 3 grades in this dataset: G1 (first period grade), G2 (second period grade) and G3 (final period grade). In this study uses the G1 (first period grade) presenting the learning performance of students.

In data preprocessing phase, we calculate the average of G1 (first period grade) of all students and then use the average score to separate students into two clusters. In the first cluster, the G1 (first period grades) of students are lower than the average, so these students are classified as bad learning performance. In the second cluster, the G1 (first period grades) of students are higher than the average score, so these students are classified as good learning performance. The average of the G1 (first period grade) of students is 11.40. There are 343 students whose G1 (first period grade) is higher than average score (11.40) and 306 students whose G1 (first period grade) is lower than average (11.40).

4.2 Identification of learning patterns

4.2.1 Discovery of frequent itemsets

The minimum support was varied from 0.5 to 0.9. Tables 6 and 7 show the numbers of frequent itemsets for different minimum supports σ_sup. The minimum support was set to 0.75. Table 8 shows the frequent itemsets for the G1 (first period grade) belonged to different learning performance clusters (G1 = Bad).

4.2.2 Identification of high odds-ratio base patterns

Given frequent itemsets generated in Table 8, we calculate Odds-Ratio values of base patterns in cluster (G1 = Bad) as shown in Table 9. The minimum Odds-Ratio threshold (σ_OR) is set to 1.05. Patterns (Paid = no, Schoolsup = no) with Odds-Ratio (1.045) and (Higher = yes) with Odds-Ratio (0.921), are not high Odds-Ratio base patterns because their Odds-Ratio values are smaller than Odds-Ratio threshold (1.05). Finally, six high Odds-Ratio base patterns can be identified in cluster (G1 = Bad) as shown in Table 10.

4.2.3 Identification of high odds-ratio comparison patterns

After Odds-Ratio base patterns were determined, comparison patterns can be identified, corresponding to the base patterns determined. Given the minimum support σ_su (0.75) and minimum Odds-Ratio threshold σ_OR (1.05), high Odds-Ratio comparison patterns are identified, as shown in Table 11.

4.2.4 Identification of high odds-ratio difference patterns

After high Odds-Ratio base patterns and high Odds-Ratio comparison patterns are determined, we then identified the difference patterns by comparing the difference between high Odds-Ratio base patterns and high Odds-Ratio comparison patterns. Table 12 shows the difference patterns for high Odds-Ratio base patterns and high Odds-Ratio comparison patterns as shown in Table 11. Given minimum Odds-Ratio threshold σ_OR (1.05), we identify high Odds-Ratio difference patterns as shown in Table 13.

4.2.5 Statistical hypothesis testing

This section discusses the statistical tests used for investigating the consistency between the results of finding which factors (base patterns or comparison patterns) perform better for predicting good learning outcome (G1L = G). That is, we explore which patterns (base patterns or comparison patterns) could effectively predict good learning outcome (G1L = G). Therefore, this study hypothesizes the following:

H1. Base patterns perform better in prediction of good learning outcome (G1L = G) than comparison patterns.

H2. There are no interaction terms in comparison patterns when predicting good learning outcome (G1L = G).

Generally, logistic regression (logit) is well suited for describing and testing hypotheses about relationships between a categorical outcome variable and one or more categorical or continuous predictor variables (Peng et al., 2002). Logit regression (logit) equation is used to predict class attribute (“G1L = G”) when given the condition of base patterns (such as “x₁ = (Paid = no)”). The equation of logic regression (logit) of base patterns is defined as Eq. (1). In addition, another logit regression (logit) equation is defined to predict class attribute (“G1L = G”) when given the condition of comparison patterns (such as “x₁ = (Paid = no), x₂ = (Higher = yes), and x₃ = (Internet = yes)”). Furthermore, the interaction between terms in comparison patterns when predicting should be considered. The equation of logic regression (logit) of comparison patterns without interaction terms is defined in Eq. (2). Finally, the equation of logic regression (logit) of comparison patterns with interaction terms is defined in Eq. (3).(1) ${l o g i t}_{B} = l o g i t (P (G 1 L =^{″} G^{″})) = α + β_{1} x_{1}$ (2) ${l o g i t}_{C} = l o g i t (P (G 1 L =^{″} G^{″})) = α + β_{1} x_{1} + β_{2} x_{2} + β_{3} x_{3}$ (3) ${l o g i t}_{B C} = l o g i t (P (G 1 L =^{″} G^{″})) = α + β_{1} x_{1} + β_{2} x_{2} + β_{3} x_{3} + β_{4} x_{1} x_{2} + β_{5} x_{2} x_{3} + β_{6} x_{1} x_{3}$ where β₁, β₂, β₃, β₄ and β₅, are the estimators for their respective variable(s).

Since equations (1 and 2) are used to model the result (learning outcome) of the two patterns (base and comparison patterns), we test the difference of deviances of equations (1 and 2) for Hypothesis 1. Furthermore, equation (2) models comparison patterns without interaction terms and equation (3) models comparison patterns with interaction terms. Therefore, we test the difference of deviances of equations (2 and 3) for Hypothesis 2.

For example, given a base pattern (“x₁ = (Paid = no)”) and a comparison pattern (“x₁ = (Paid = no), x₂ = (Higher = yes) and x₃ = (Internet = yes)”), we test the difference of deviances of equations (1 and 2) for Hypothesis 1. Hypothesis 1 is rejected with a p-value = 0.0000. Furthermore, we investigate whether there are interaction terms in comparison pattern (“x₁ = (Paid = no), x₂ = (Higher = yes) and x₃ = (Internet = yes)”) when predicting the result (learning outcome). For this, the differences of deviances of equations (2 and 3) for Hypothesis 2 are tested. Hypothesis 2 with a p-value = 0.9254768 is not rejected. That is, there are no interaction terms existing in comparison pattern (“x₁ = (Paid = no), x₂ = (Higher = yes) and x₃ = (Internet = yes)”) when predicting good learning outcome (G1L = G).

The results of rejecting Hypothesis 1 show that the comparison pattern (“x₁ = (Paid = no), x₂ = (Higher = yes) and x₃ = (Internet = yes)”) are appropriate factors to predict learning outcome (“G1L = G”), compared to only using base pattern (“x₁ = (Paid = no)”). The comparison pattern (“x₁ = (Paid = no), x₂ = (Higher = yes), and x₃ = (Internet = yes)”) are appropriate factors which result in good learning outcome (“G1L = G”). The results of rejecting Hypothesis 2 show that there are no interaction terms in comparison pattern (“x₁ = (Paid = no), x₂ = (Higher = yes) and x₃ = (Internet = yes)”) when predicting good learning outcome (G1L = G). Finally, the results of statistical hypothesis testing of patterns (base patterns and comparison pattern) are shown in Table 14.

The results in Table 14 show that the results of statistical hypothesis testing are consistent with results of patterns identified by this study. That is, the difference identified can present the difference between base pattern and comparison pattern. Therefore, the proposed approach should be able to help identify valuable characteristics (difference patterns) for cause-and-effect relationships from a student’s profiles.

4.3 DNN for learning outcome prediction

There are 649 instances in the Portuguese language course dataset. This file has been edited and several indicator variables are added to make it suitable for algorithms that cannot handle categorical variables, so several attributes that are ordered categorically have been coded as integers. The classification techniques used for constructing prediction models are: Deep Neural Networks (DNN), Support-Vector Machine (SVM), Multi-Layer Perceptron (MLP), Decision Tree (DT) and Random Forests (RF).

The DNN model's configuration is as follows: The input layer employs “ReLU” as the activation function with 20 cells. The model includes two hidden layers, each using “ReLU” for activation – hidden layer #1 with 256 cells and hidden layer #2 with 32 cells. The output layer utilizes a “sigmoid” activation function with a single cell to produce the model's output. Additional model parameters include an “adam” optimizer, a batch size of 10, 100 epochs and a validation split of 0.1.

Firstly, five approaches (DNN, SVM, MLP, DT, and RF) are compared for prediction performance metrics (Precision, Recall, F1-score, and Accuracy). The average experimental results of the five methods for 10-fold cross-validation are shown in Table 15. This shows that the DNN approach with Precision (0.90), TPR/Recall (0.90), F₁ (0.90), and Accuracy (0.90) has higher prediction performance than the other four methods. The DT approach with Precision (0.57), TPR/Recall (0.54), F₁ (0.55) and Accuracy (0.59) performs worst.

Secondly, it is essential to note that both the Area Under the Curve (AUC) values in Receiver Operating Characteristic (ROC) and Precision-Recall Curve (PRC) serve as pivotal metrics for evaluating the predictive efficacy of binary classifier models across diverse threshold values. Our investigation delves into a comparative analysis encompassing five distinct methodologies, DNN, SVM, MLP, DT and RF, based on their performance metrics (AUC values within ROC and PRC). The comprehensive depiction of average experimental outcomes stemming from the 10-fold cross-validation method is graphically represented in Figures 4–8.

Upon scrutiny, the graphical representations affirm that the DNN approach displays superior predictive performance compared to the other methodologies, boasting an AUC in ROC of 0.9027 and an AUC in PRC of 0.9169. In contrast, the DT approach falls short, demonstrating significantly lower AUC values, with 0.5888 in ROC and 0.6637 in PRC, indicating its inferior predictive capability.

4.4 Management implications for education

These learning patterns (base, comparison and difference patterns) from education data can be used to identify student behavior patterns and provide assistance to improve learning performance. The experimental results in Table 14 show that the difference pattern (Higher = yes, Internet = yes) is the major difference between the two learning performance values (G1: Bad and G1: Good). Therefore, teachers should provide assistance for students who want to have higher education, have Internet access at home and who still could have high learning performance (G1: Good).

After identifying performance learning patterns, we suggest adapting prediction models to understand learning performance of students in advance. The experimental results in Section 4.3 show that DNN performs better than other methods (SVM, MLP, DT and RF) in prediction performance metrics (Precision, Recall, F1-score and Accuracy). Therefore, deep learning methods, such as DNN, should be used as classifiers to predict the learning performance labels classifications of students.

5. Conclusion

This study makes several contributions, including a new framework for identifying learning patterns and predicting learning performance. First, the learning patterns identification module is used for discovering difference patterns which could identify the difference of learning performance among different clusters of students. Second, deep learning prediction models (DNN) are used to construct models for predicting students’ learning performance. Finally, we integrate the two modules into a framework to forecast learning performance of students.

Experimental results from survey data indicated that the proposed identifying learning patterns module can facilitate identifying patterns of interest and valuable difference (change) patterns from student’s profiles. In addition, statistical hypothesis testing is used to verify experimental results generated by the proposed approach. In addition, the proposed predicting learning performance module that adapts DNN to be a classifier performs better than the other traditional machine techniques prediction performance metrics.

The proposed model exhibits some weaknesses and limitations. Firstly, its evaluation relies solely on a single dataset, potentially restricting the model’s generalizability. Secondly, we have not assessed the performance impact of integrating feature filter or wrapper methods with machine learning algorithms, which could further enhance the model’s efficacy.

There are several issues that remain to be addressed in the future. First, we focus on discovering difference patterns from different clusters classified by learning performance labels. In some applications, education administrators may be interested in other classification labels. For this, it would be helpful to design other algorithms for specific aims. Second, it could also be useful to integrate more deep learning prediction techniques to provide better learning assistance. Chang et al. (2023) integrated feature selection methods to improve prediction performance. Third, incorporating graphs or charts to visually depict predicted outcomes can greatly enhance the manuscript's clarity and facilitate a deeper understanding of student learning performance. Such visual aids could make the results more accessible and easier to interpret for educators and administrators. For instance, Figure 9 provides a visual representation of our experimental results using the Decision Tree (DT) method. While we acknowledge that the DT method may not offer the best performance among the techniques we tested, its visual output serves as a beneficial tool for illustrating our findings in a more accessible manner. Fourth, numerous efficient algorithms for identifying frequent itemsets have been proposed, ensuring models can scale effectively to accommodate large datasets. Notable examples include H-mine (Pei et al., 2001) and Index-BitTableFI (Song et al., 2008). Integrating these advanced algorithms into our framework represents a critical avenue for future research, enhancing our model’s scalability and performance in processing extensive datasets. Finally, the proposed framework presents an opportunity for refinement and enhancement, aiming for greater efficiency in future iterations. Through targeted improvements and strategic adjustments, the framework can evolve to meet emerging needs and challenges in the educational application domain [1].

The authors would like to thank Dr Ruo-ping Han for Statistical Analysis. The research was supported by the National Science and Technology Council of the Republic of China under the grants NSTC 112-2410-H-018-044 and NSTC 112-2410-H-194-032-MY2.

Notes

1.We would like to extend our sincere gratitude to the anonymous reviewers for their insightful comments and suggestions. Their constructive feedback has played a pivotal role in refining and improving the quality of this manuscript.

Data availability: The datasets analyzed during the current study are available in the UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/.

Conflict of interest statement: There is no conflict of interest in this study.

Figure 1

Framework of this study

[Figure omitted. See PDF]

Figure 2

Framework for identifying learning patterns

[Figure omitted. See PDF]

Figure 3

Framework for predicting students’ learning performance

[Figure omitted. See PDF]

Figure 4

AUC of ROC and PRC in DNN

[Figure omitted. See PDF]

Figure 5

AUC of ROC and PRC in SVM

[Figure omitted. See PDF]

Figure 6

AUC of ROC and PRC in MLP

[Figure omitted. See PDF]

Figure 7

AUC of ROC and PRC in RF

[Figure omitted. See PDF]

Figure 8

AUC of ROC and PRC in decision tree

[Figure omitted. See PDF]

Figure 9

Visualization of the result by the DT approach

[Figure omitted. See PDF]

Table 1

Overview of literature in frequent itemsets mining

Studies	Type of itemsets	Contribution
Agrawal and Srikant (1994)	Frequent itemset	Proposed Apriori algorithm for mining frequent itemset and association rules
Chen and Li (2008)	Frequent itemset	Proposed lattice-based frequent itemsets mining algorithm for improving the performance in support counting
Duong et al. (2014)	Constraint-based frequent itemset	Proposed an efficient method for mining frequent itemsets with double constraints
Han et al. (2000)	Frequent itemset	Proposed an efficient method for mining frequent itemsets without candidate geneRatio
Lin et al. (2011)	Frequent itemset	Proposed the IFP-growth (improved FP-growth) algorithm to improve the performance of FP-growth in mining frequent itemsets
Liu et al. (2012)	Maximal frequent itemsets	Proposed maximal frequent itemsets mining algorithm to improve storage efficiency of data structure and time efficiency
Sarath and Ravi (2013)	Frequent itemset	Proposed a binary particle swarm optimization (BPSO) based association rule mining algorithm for generating the best M rules

Source(s): By authors

Table 2

Overview of EDM research

Works	Techniques	Aim
Buldu and Üçgün (2010)	Association rule	Discover the rules that identify relations between the courses that the students failed
Şen et al. (2012)	Classification	Predicting secondary education placement test results and identifying the most important predictors
Sen and Ucar (2012)	Classification	Comparing the achievements of students studying in distance education with those in regular education
Romero et al. (2013)	Classification, Clustering	Improving prediction of students’ final performance
Natek and Zwilling (2014)	Classification	Predicting the success rate of students enrolled in their courses
Okoye et al. (2014)	Association rule	Discovering user interaction patterns within learning processes
Gupta et al. (2015)	Classification	Identifying knowledge indicators in higher education organization
Kaur et al. (2015)	Classification	Identifying the slow learners among students and presenting that knowledge
Santos and Boticario (2015)	Association rules, Decision tree	Identifying indicators for course success
Xing et al. (2015)	Genetic Programming	Predicting participation-based student final performance
Angeli et al. (2017)	Association rules mining	Educational technologists’ use of association rules mining for guiding and monitoring school-based technology integRatio efforts
Romero and Ventura (2020)	Updated survey	Reviewing the main publications, key milestones, the knowledge discovery cycle, main educational environments and specific tools in this research area

Source(s): By authors

Table 3

Comparison of this study with previous works

Works	Type of itemsets	Contribution
This study	Frequent itemsets	Identify learning difference from classified student’s profiles
Buldu and Üçgün (2010)	Frequent itemsets	Discover rules that identify the relation between courses that the students failed
Santos and Boticario (2015)	Frequent itemsets	Identifying indicators that could refer to course success

Source(s): By authors

Table 4

Some student instances for learning

ID	Sex	Medu	Fedu	Mjob	Fjob	Guardian	schoolsup	famsup	Higher	Score
1	F	1	1	at_home	other	mother	yes	no	yes	G
2	F	4	2	health	services	mother	no	yes	yes	G
3	M	4	3	services	other	mother	no	yes	yes	G
4	M	2	2	other	other	mother	no	no	yes	G
5	M	3	2	services	other	mother	no	yes	yes	G
6	F	2	1	services	other	mother	no	yes	no	B
7	F	2	1	at_home	other	mother	no	yes	no	B
8	M	2	1	services	services	mother	no	no	no	B
9	M	2	1	services	other	mother	no	no	no	B
10	M	2	2	services	services	mother	no	yes	no	B

Source(s): By authors

Table 5

Frequent itemsets discovered from clusters

Learning performance = bad		Learning performance = good
Itemset	sup	Itemset	sup
Paid = no	0.92	Higher = yes	0.99
Pstatus = T	0.88	Paid = no	0.96
Schoolsup = no	0.88	Higher = yes, Paid = no	0.94
Paid = no, Schoolsup = no	0.82	Schoolsup = no	0.91
Paid = no, Pstatus = T	0.81	Higher = yes, Schoolsup = no	0.90
Higher = yes	0.81	Paid = no, Schoolsup = no	0.88
Nursery = yes	0.79	Pstatus = T	0.87
Pstatus = T, Schoolsup = no	0.78	Higher = yes, Paid = no, Schoolsup = no	0.86
Higher = yes, Paid = no	0.74	Higher = yes, Pstatus = T	0.86
Nursery = yes, Paid = no	0.73	Paid = no, Pstatus = T	1.83

Note(s): Bad and good

Source(s): By authors

Table 6

Frequent itemsets vs minimum support

Support	0.90	0.85	0.80	0.75	0.70	0.65	0.60	0.55	0.50
L1	1	3	4	5	7	8	9	11	12
L2	0	0	2	3	6	13	19	25	33
L3	0	0	0	0	1	2	8	18	32
Total	1	3	6	8	14	23	36	54	77

Note(s): G1 = Bad

Source(s): By authors

Table 7

Frequent itemsets vs minimum support

Support	0.90	0.85	0.80	0.75	0.70	0.65	0.60	0.55	0.50
L1	3	4	6	6	7	8	10	11	13
L2	1	4	7	11	14	19	25	34	45
L3	0	1	2	6	11	18	28	40	63
Total	4	9	15	23	32	45	63	85	121

Note(s): G1 = Bad

Source(s): By authors

Table 8

Frequent itemsets

No	Itemset	Support	Count	Odds-ratio
1	Paid = no	0.924	317	1.082
2	Pstatus = T	0.883	303	1.139
3	Schoolsup = no	0.880	302	1.082
4	Paid = no, Schoolsup = no	0.816	280	1.045
5	Paid = no, Pstatus = T	0.813	279	1.094
6	Higher = yes	0.810	278	0.921
7	Nursery = yes	0.790	271	1.084
8	Pstatus = T, Schoolsup = no	0.781	268	1.107

Note(s): G1 = Bad

Source(s): By authors

Table 9

Base patterns in cluster

No	Itemset	Support	Count	Odds-ratio
1	Paid = no	0.924	317	1.082
2	Pstatus = T	0.883	303	1.139
3	Schoolsup = no	0.880	302	1.082
4	Paid = no, Schoolsup = no	0.816	280	1.045
5	Paid = no,Pstatus = T	0.813	279	1.094
6	Higher = yes	0.810	278	0.921
7	Nursery = yes	0.790	271	1.084
8	Pstatus = T, Schoolsup = no	0.781	268	1.107

Note(s): G1 = Bad

Source(s): By authors

Table 10

High odds-ratio base patterns in cluster

Itemset	Cluster (G1 = Bad)			Cluster (G1 = Good)
Itemset	Count	Sup	Odds-ratio	Count	Sup	Odds-ratio
Paid = no	317	0.924	1.082	293	0.924	0.958
Pstatus = T	303	0.883	1.139	266	0.878	0.869
Schoolsup = no	302	0.880	1.082	279	0.924	0.912
Paid = no, Pstatus = T	279	0.813	1.094	255	0.914	0.833
Nursery = yes	271	0.790	1.084	250	0.923	0.817
Pstatus = T, Schoolsup = no	268	0.781	1.107	242	0.903	0.791

Note(s): G1 = Bad

Source(s): By authors

Table 11

High odds-ratio comparison patterns

Base pattern				Comparison pattern
Itemset	Count	Sup	Odds-ratio	Itemset	Count	Sup	Odds-ratio
Paid = no	317	0.924	1.082	Higher = yes, Internet = yes, Paid = no	237	0.775	1.288
Paid = no	317	0.924	1.082	Higher = yes, Paid = no, Schoolsup = no	264	0.863	1.200
Paid = no	317	0.924	1.082	Higher = yes, Nursery = yes, Paid = no	236	0.771	1.163
Paid = no	317	0.924	1.082	Higher = yes, Paid = no	289	0.944	1.133
Paid = no	317	0.924	1.082	Higher = yes, Paid = no, Pstatus = T	252	0.824	1.120
Paid = no	317	0.924	1.082	Internet = yes, Paid = no	240	0.784	1.062
Pstatus = T	303	0.883	1.139	Higher = yes, Paid = no, Pstatus = T	252	0.824	1.120
Pstatus = T	303	0.883	1.139	Higher = yes, Pstatus = T, Schoolsup = no	239	0.781	1.117
Pstatus = T	303	0.883	1.139	Higher = yes, Pstatus = T	263	0.859	1.065
Schoolsup = no	302	0.880	1.082	Higher = yes, Paid = no, Schoolsup = no	264	0.863	1.200
Schoolsup = no	302	0.880	1.082	Higher = yes, Schoolsup = no	275	0.899	1.151
Schoolsup = no	302	0.880	1.082	Higher = yes, Pstatus = T, Schoolsup = no	239	0.781	1.117
Schoolsup = no	302	0.880	1.082	Internet = yes, Schoolsup = no	231	0.755	1.065
Nursery = yes	271	0.790	1.084	Higher = yes, Nursery = yes, Paid = no	236	0.771	1.163
Nursery = yes	271	0.790	1.084	Higher = yes, Nursery = yes	248	0.810	1.122

Source(s): By authors

Table 12

Difference patterns

No	Base pattern	Comparison pattern	Difference pattern
1	Paid = no	Higher = yes, Internet = yes, Paid = no	Higher = yes, Internet = yes
2	Paid = no	Higher = yes, Paid = no,S choolsup = no	Higher = yes, Schoolsup = no
3	Paid = no	Higher = yes, Nursery = yes, Paid = no	Higher = yes, Nursery = yes
4	Paid = no	Higher = yes, Paid = no	Higher = yes
5	Paid = no	Higher = yes, Paid = no, Pstatus = T	Higher = yes, Pstatus = T
6	Paid = no	Internet = yes, Paid = no	Internet = yes
7	Pstatus = T	Higher = yes, Paid = no, Pstatus = T	Higher = yes, Paid = no
8	Pstatus = T	Higher = yes, Pstatus = T, Schoolsup = no	Higher = yes, Schoolsup = no
9	Pstatus = T	Higher = yes, Pstatus = T	Higher = yes
10	Schoolsup = no	Higher = yes, Paid = no, Schoolsup = no	Higher = yes, Paid = no
11	Schoolsup = no	Higher = yes, Schoolsup = no	Higher = yes
12	Schoolsup = no	Higher = yes, Pstatus = T, Schoolsup = no	Higher = yes, Pstatus = T
13	Schoolsup = no	Internet = yes, Schoolsup = no	Internet = yes
14	Nursery = yes	Higher = yes, Nursery = yes, Paid = no	Higher = yes, Paid = no
15	Nursery = yes	Higher = yes, Nursery = yes	Higher = yes

Source(s): By authors

Table 13

High odds-ratio difference patterns

No	Base pattern	Comparison pattern	Difference pattern	Odds-rastio
1	Paid = no	Higher = yes, Internet = yes, Paid = no	Higher = yes, Internet = yes	1.233
2	Paid = no	Higher = yes, Paid = no, Schoolsup = no	Higher = yes, Schoolsup = no	1.151
3	Paid = no	Higher = yes, Nursery = yes, Paid = no	Higher = yes, Nursery = yes	1.122
4	Paid = no	Higher = yes, Paid = no	Higher = yes	1.086
5	Paid = no	Higher = yes, Paid = no, Pstatus = T	Higher = yes, Pstatus = T	1.065
6	Pstatus = T	Higher = yes, Paid = no, Pstatus = T	Higher = yes, Paid = no	1.133
7	Pstatus = T	Higher = yes, Pstatus = T, Schoolsup = no	Higher = yes, Schoolsup = no	1.151
8	Pstatus = T	Higher = yes, Pstatus = T	Higher = yes	1.086
9	Schoolsup = no	Higher = yes, Paid = no, Schoolsup = no	Higher = yes, Paid = no	1.133
10	Schoolsup = no	Higher = yes, Schoolsup = no	Higher = yes	1.086
11	Schoolsup = no	Higher = yes, Pstatus = T, Schoolsup = no	Higher = yes, Pstatus = T	1.065
12	Nursery = yes	Higher = yes, Nursery = yes, Paid = no	Higher = yes, Paid = no	1.133
13	Nursery = yes	Higher = yes, Nursery = yes	Higher = yes	1.086

Source(s): By authors

Table 14

Statistical hypothesis testing

No	Base pattern	Comparison pattern	Difference pattern	H2 p-value
1	Paid = no	Higher = yes, Internet = yes, Paid = no	Higher = yes, Internet = yes	0.9174
2	Paid = no	Higher = yes, Paid = no, Schoolsup = no	Higher = yes, Schoolsup = no	0.9255
3	Paid = no	Higher = yes, Nursery = yes, Paid = no	Higher = yes, Nursery = yes	0.4583
4	Paid = no	Higher = yes, Paid = no	Higher = yes	0.6952
5	Paid = no	Higher = yes, Paid = no, Pstatus = T	Higher = yes, Pstatus = T	0.6921
6	Pstatus = T	Higher = yes, Paid = no, Pstatus = T	Higher = yes, Paid = no	0.6921
7	Pstatus = T	Higher = yes, Pstatus = T, Schoolsup = no	Higher = yes, Schoolsup = no	0.8219
8	Pstatus = T	Higher = yes, Pstatus = T	Higher = yes	0.6613
9	Schoolsup = no	Higher = yes, Paid = no, Schoolsup = no	Higher = yes, Paid = no	0.9255
10	Schoolsup = no	Higher = yes, Schoolsup = no	Higher = yes	0.7225
11	Schoolsup = no	Higher = yes, Pstatus = T, Schoolsup = no	Higher = yes, Pstatus = T	0.8219
12	Nursery = yes	Higher = yes, Nursery = yes, Paid = no	Higher = yes, Paid = no	0.4583
13	Nursery = yes	Higher = yes, Nursery = yes	Higher = yes	0.2081

Note(s): G1: Bad → Good

Source(s): By authors

Table 15

Average experiment results of machine learning approaches

Method	Precision	TPR/recall	F₁	Accuracy	ROC	PRC
DNN	0.90	0.90	0.90	0.90	0.9027	0.9169
SVM	0.62	0.64	0.63	0.65	0.6482	0.7171
MLP	0.56	0.72	0.63	0.60	0.6042	0.7035
RF	0.61	0.61	0.61	0.63	0.6306	0.7018
DT	0.57	0.54	0.55	0.59	0.5888	0.6637

Source(s): By authors

Word count: 7315

Show less

Novel framework for learning performance prediction using pattern identification and deep learning

Content area

Abstract

Full text

1. Introduction

2. Related work

2.1 Frequent itemsets mining

2.2 Educational data mining

2.3 Deep learning in education data

3. Methodology

3.1 Problem definitions of learning patterns

3.2 Framework for identifying learning patterns

3.3 Development of prediction models for student learning performance

4. Experimental results

4.1 Data set description

4.2 Identification of learning patterns

4.2.1 Discovery of frequent itemsets

4.2.2 Identification of high odds-ratio base patterns

4.2.3 Identification of high odds-ratio comparison patterns

4.2.4 Identification of high odds-ratio difference patterns

4.2.5 Statistical hypothesis testing

4.3 DNN for learning outcome prediction

4.4 Management implications for education

5. Conclusion