Abstract: Automating performance improvement in 4G cellular networks is a challenging research area due to existing limitations in artificial intelligence and machine learning applications. This study addresses these challenges by developing a data analysis model using Hidden Markov Models (HMM) to predict Key Performance Indicators (KPIs) and automate performance assessments. Data was analyzed from 1600 new sites of a mobile operator in Indonesia, collected from July 2023 to January 2024. The methodology follows Knowledge Discovery in Database (KDD) for data mining and applying HMMs to forecast KPIs such as eRAB Drop Rate and Setup Success Rate. The model achieved a Mean Absolute Error (MAE) of 0.005 and a Root Mean Square Error (RMSE) of 0.069 for eRAB Drop Rate, with an F1 Score reaching up to 99.76%. The performance of the model improves with an increasing number of observation states, particularly for Inter Frequency Handover Success Rate (HOSR) and RRC Connection Setup Success Rate. Despite strong performance, there is potential for further enhancement, especially for KPIs with high variability like Intra Frequency HOSR. This research demonstrates that HMMs are effective in predicting KPIs with high accuracy, rather than traditional time-series models. The results align with recent studies and suggest that combining HMMs with techniques such as LSTM or Random Forests could improve predictive accuracy. These methods are also applicable to another technology, especially 5G networks, offering valuable insights for more effective network management and performance optimization.
Keywords: Network Optimization; Hidden Markov Model; Machine Learning; 4G Network; Data Mining.
(ProQuest: ... denotes formulae omitted.)
1. Introduction
The rapid evolution of mobile networks from 1G, which supported voice communication at speeds of 14.4 Kbps, to the advanced 5G networks offering ultra-fast data speeds, has significantly increased the complexity of cellular networks. To address these complexities, Self-Organizing Networks (SON) were introduced to automate network management functions, initially in 3G and standardized by 3GPP [1]. While SONs have enhanced Key Performance Indicators (KPIs) for network performance, they still face challenges in automating performance improvements in 4G networks due to the limited integration of artificial intelligence (AI), machine learning (ML), and deep learning (DL) techniques. Therefore, there is a growing need for new models that can leverage AI and ML to optimize network performance in real-time environments.
Recent studies have demonstrated the potential of various machine learning approaches to analyze and predict KPI behavior in cellular networks. Haider (2021) [2] utilized machine learning for anomaly detection in KPI time-series data, employing outlier detection methods such as Interquartile Range (IQR) and Evidence Lower Bound (ELBO) to identify abnormal patterns in the network data. Similarly, Santos (2018) [3] applied unsupervised machine learning techniques, specifically k-means clustering, to group cells based on their performance levels, facilitating the identification of poorly performing cells that require optimization. Omer et al. (2022) [4] combined K-Mean clustering with a Markov Chain model to predict network accessibility and retainability, achieving a prediction accuracy of 94.61%. These studies highlight the potential of machine learning models for enhancing network management, yet they primarily focus on different models or approaches and do not fully address the temporal dependencies inherent in network KPIs.
Markov models have been extensively used in predicting network KPIs. For example, Hendrawan (2019) [5] and Amirrudin et al. (2013) [6] used Discrete Time Markov Chains (DTMC) to predict KPI accessibility and mobility, respectively. While these models provide valuable insights, they are limited in their ability to capture the hidden patterns in temporal sequences. More advanced models, such as the Weighted Markov Model proposed by Yan et al. (2021) [7], improve prediction accuracy by optimizing weighting coefficients based on mobile user classifications. Despite these advances, there remains a gap in utilizing Hidden Markov Models (HMM) to fully capture the underlying temporal dependencies and predict KPI behaviors in a dynamic 4G network environment. This research fills these gaps by employing HMM to predict a wider range of KPIs, such as RRC connection setup success rate, handover, and call drop rates, offering a more comprehensive approach to network optimization.
The quality of cellular networks is evaluated through various performance metrics, such as call setup, signal strength, data speed, latency, etc. which are categorized by 3GPP into accessibility, retainability, integrity, availability, and mobility [8] and illustrated in figure 1. While monitoring and optimizing these KPIs is essential, current analyses often rely on empirical methods that provide limited predictive insights and lack effective solutions for improvement.
To address these challenges, this study proposes a data-driven approach using Hidden Markov Models (HMM) to predict and optimize 4G network KPIs. By leveraging the temporal dependencies within the data, HMM provides more accurate predictions and actionable insights, enabling proactive management and optimization of network performance. This research builds upon and extends previous work by applying HMM to a broader range of KPIs, demonstrating its effectiveness in real-time decision-making and contributing to improved network quality and efficiency.
2. Conceptual Background
The primary challenge for Mobile Network Operators (MNOs) is delivering high-performance multimedia services, and 4G Cellular Technology was developed to meet these demands. Key Performance Indicators (KPIs) play a critical role in monitoring and optimizing network performance. These KPIs are categorized into accessibility, retainability, mobility, integrity, and availability [9]. These KPIs are crucial during network planning, deployment, and initial optimization, ensuring that actual performance aligns with planned targets. Service KPIs also evaluate the quality of user-perceived services, contributing to network optimization, commercial introductions, and issue resolution [10].
A. Accessibility KPI
The Accessibility KPI measures the network's ability to establish connections successfully [11]. It includes multiple components such as the RRC Setup Success Rate, S1 Signaling Success Rate, and E-RAB Setup Success Rate [5]. These metrics are critical in identifying performance bottlenecks and targeting specific areas for optimization:
1. RRC Setup Success Rate measures the ratio of successful Radio Resource Control (RRC) connection establishments to the total number of attempts. Failures in RRC setup can result from issues like interference or coverage gaps, which prevent the network from receiving a Random-Access Response or completing the RRC Connection Setup procedure. The formula for calculating this rate is:
... (1)
2. S1 Signaling Success Rate assesses the success ratio of signaling connections between the eNodeB and the Mobility Management Entity (MME), which is expected to approach 100% due to the relatively simple handshake between these network elements. It is defined as:
... (2)
3. E-RAB Setup Success Rate indicates the rate at which Evolved Radio Access Bearers (E-RABs) are successfully assigned during call setup. Failures typically occur due to radio conditions (such as interference or poor coverage) or transport failures. The formula is given by:
... (3)
These formulas help network operators pinpoint specific areas for performance optimization, which is critical for maintaining high accessibility rates.
B. Retainability KPI
The Retainability KPI measures the network's ability to maintain active connections without interruption. It is typically evaluated using the eRAB drop rate, which is defined as:
... (4)
where the numerator reflects the number of eRAB releases due to failures and the denominator reflects the total number of eRAB setup attempts. Analyzing these counters helps identify which specific conditions contribute to dropped connections, allowing targeted optimizations to reduce drop rates.
C. Mobility KPI
The Mobility KPI evaluates the network's efficiency in managing handovers. Handovers are essential for maintaining ongoing sessions as users move across different cell areas. Two key metrics are used:
1. Inter-Frequency Handover Success Rate (HOSR) measures successful handovers between cells operating on different frequencies:
... (5)
2. Intra-Frequency Handover Success Rate (HOSR) assesses handovers within the same frequency:
... (6)
These metrics are crucial for understanding and optimizing mobility management within the network.
D. Integrity KPI
Integrity KPI is concerned with the throughput, or data rate, delivered to the end user. The downlink and uplink throughputs are calculated as:
... (7)
... (8)
where PDCP SDU VOL represents the total volume of data successfully transmitted during the active transmission time intervals (TTIs).
E. Availability KPI
Availability KPI measures the proportion of time that a cell is available for service, considering various outages due to hardware failures, maintenance, or other issues. It is computed as:
... (9)
Monitoring availability is crucial for ensuring a reliable network service, as it helps to quickly detect and resolve issues affecting network uptime.
F. Hidden Markov Model
Hidden Markov Models (HMMs) are well-suited for predicting network KPIs due to their ability to capture temporal dependencies and hidden patterns in sequential data. Unlike machine learning models like Random Forests or Support Vector Machines, which assume independent observations, HMMs account for dependencies between observations by modeling state transitions over time. This is achieved through a state-transition matrix that represents the probabilities of moving from one state to another and an emission matrix that models the likelihood of specific KPI values given a state. This dual modeling capability enables HMMs to effectively capture both temporal and probabilistic relationships in network data, providing more nuanced insights than traditional methods [12]. Describing an HMM begins with calculating the transition probability, which is the likelihood of moving from one hidden state i at time t to another hidden state j at time t+1. This probability is mathematically represented as:
... (10)
whereaij is the transition probability, St is the hidden state at time t, and St+1 is the hidden state at time t+1.
The next step involves calculating the emission probability, which is the likelihood of observing a specific symbol k in hidden state j.
... (11)
where bj (k) is the emission probability of observing symbol k in hidden state j, and Ot is the observed output at time t.
The initial probability vector (?), is then computed, representing the probability that the system is in a particular state i at the initial time t = 1. The Viterbi algorithm is employed to determine the most likely sequence of hidden states given the observed data by maximizing the probability of the observed sequence given the model parameters:
... (11)
... (12)
where ψi is the initial probability of the system being in state i at time t = 1. įt(i) represents the highest probability of any path ending in state i at time t, given the observed data O1, O2,....., O3.
3. Research Methodology
This research develops a data analysis framework and applies a Machine Learning strategy to predict and optimize Key Performance Indicators (KPIs) in 4G networks, following the Knowledge Discovery in Database (KDD) approach with a development research paradigm and trend study method [13], [14]. The Hidden Markov Model (HMM) is employed for its ability to model stochastic processes with hidden states inferred from observations [12]. In cellular networks, HMM identifies hidden patterns in complex KPI data and predicts future KPI behavior, aiding in network planning and optimization.
The dataset, sourced from a leading vendor in Indonesia's cellular telecommunications sector, includes daily time-series data over six months (July 2023 to January 2024) with KPIs such as Accessibility, Retainability, Integrity, Availability, and Mobility. The research aims to generate predictive visualizations to recommend parameter adjustments for KPI enhancement, utilizing the Hidden Markov Model framework. in Figure 2.
A. Data Collection
In this initial phase, Initially, real-time data is collected from the 4G mobile network's evolved NodeB (eNB). Data was collected from a major cellular network vendor in Indonesia, covering a six-month period from July 1, 2023 to January 31, 2024. The dataset includes statistical KPI information, configuration parameters, and performance counters from the Performance Management System (PM), presented in a time-series format with daily granularity as shown in table 1.
B. Preprocessing
During this stage, the dataset undergoes processing and cleansing, involving the removal of empty or NULL values and unnecessary data. To address potential issues such as missing data, noise, or outliers, given the data's varied origin and extended period, several data preprocessing techniques are applied:
1. Handling Missing Data. Missing values are treated using mean or median imputation for continuous variables and mode imputation for categorical variables. In cases where data loss exceeds a certain threshold, interpolation methods are applied to estimate missing values, ensuring continuity in time-series data.
2. The dataset is standardized to ensure that all features have a mean of zero and a standard deviation of one. This process is particularly important for machine learning models like HMM, which are sensitive to the scale of input data.
3. Normalization is applied to rescale the data to a fixed range, typically between 0 and 1, to ensure that all features contribute equally to the model. This step is especially useful for models like HMMs that rely on probability distributions.
C. Transformation
After preprocessing, the data is transformed to make it suitable for modeling. The Data Transformation phase involves two key steps, Exploratory Data Analysis (EDA) [15] and encoding to define the hidden and observation states for the Hidden Markov Model (HMM).
1. Exploratory Data Analysis (EDA) is conducted to understand the dataset's structure, identify patterns, and detect anomalies. This involves visualizing KPI distributions, examining correlations, and assessing time-series trends to select relevant features that influence network performance.
2. Based on EDA insights, hidden states (e.g., "Maintain", "Degraded", "Improved") are defined to represent unobservable network conditions, while observation states are encoded to represent KPI values over time. Techniques like one-hot encoding for categorical data and normalization for continuous data are used to prepare the dataset for modeling with HMM.
D. Data mining
This stage phase focuses on leveraging the Hidden Markov Model (HMM) to define hidden states, calculate emission probabilities, and identify the most likely sequence of hidden states. This process generates synthetic data which is an integration between real data and prediction for further analysis [16]. The hidden states represent underlying network conditions that are not directly observable. These states are determined based on domain knowledge and patterns identified during exploratory data analysis, reflecting different levels of network performance. Emission probabilities are then calculated to determine the likelihood of observing specific KPI values given a particular hidden state. The Viterbi algorithm is applied to identify the most likely sequence of hidden states over time, revealing the temporal dynamics of network performance.
This sequence is combined with the original dataset to create synthetic data that includes both observed KPI values and inferred hidden states. The resulting synthetic dataset enriches the analysis by capturing underlying patterns and dependencies, enabling more accurate predictions and deeper insights into network optimization.
E. Performance Evaluation
To evaluate the performance of the HMM-based prediction model, several metrics are utilized, depending on whether the task is treated as regression or classification:
1. Mean Absolute Error (MAE), measures the average absolute difference between predicted and actual values, useful for assessing the accuracy of continuous KPI predictions.
... (14)
2. Root Mean Square Error (RMSE), provides a measure of the magnitude of prediction errors, giving higher weight to larger errors and thus penalizing them more severely.
... (15)
3. F1-Score: Used for classification tasks, particularly in cases where the focus is on both precision (correctness of positive predictions) and recall (coverage of actual positives). This metric helps to evaluate the performance of KPI state classification, such as distinguishing between different levels of accessibility or retainability.
... (16)
By utilizing these metrics, the effectiveness of the HMM model in predicting network performance can be rigorously assessed, ensuring that the model provides reliable insights for network optimization.
4. Results and Discussion
The most main issues in 4G cellular networks are non-optimal coverage and interference [17], [18]. Addressing these problems is expected to allow for accurate predictions of KPI fluctuations, whether they signal deterioration or improvement. As previously noted, this study focuses on five critical performance metrics: Accessibility, Retainability, Mobility, Integrity, and Availability. Each KPI generates data points at regular measurement intervals, resulting in a detailed time-series database. These values can be computed using a simple method, such as daily aggregation of data from multiple sites [19]. Table 2 presents the KPIs for the 4G network that will be used for prediction, with specified thresholds to assess potential improvements or degradations.
Using Python [20], an analysis of the dataset, which comprised 61 columns and 198,128 rows, revealed 2,594 null or NaN values. Following this, parameter-related columns were removed, resulting in a refined dataset with 44 attributes and 208 rows, with no null values. Categorical columns were converted to numerical values. The refined dataset was then subjected to data mining using the Hidden Markov Model algorithm as part of the Knowledge Discovery in Databases process.
A. Determining Initial Hidden State and Observations state
The initial phase of developing a hidden Markov model involves specifying the number of hidden states (N) and observation states (M). In this study, the hidden states denote the predicted status of the Key Performance Indicator (KPI). If the KPI exceeds the threshold, it is categorized as 'Improved'. If it falls between the threshold and the maximum tolerance value, it is classified as 'Maintained'. If it is below these values, it is labeled as 'Degraded', represented as S = {S1, S2, ..., SN} = {Degraded, Maintain, Improved} (N = 3).
Concurrently, the observation states signify the factors influencing changes in KPI status, related to signal coverage and interference issues. The observation states include 'Coverage Problem' for issues with signal coverage, 'Interference Problem' for interference issues, and 'OK' for sites without these problems, denoted as V = {v1, v2, ..., vM} = {Coverage_Problem, Interference_Problem, OK} (M = 3).
B. Determining Hidden State Transition Matrix
The next phase involves estimating the transition probabilities for the hidden state matrix, represented as A = {aij} = P{Qt+1 MŇ4t = i} where L 1 DQG M 1 This estimation is based on observing transitions between hidden states in relation to the 4G KPI status over a six-month period. The analysis specifically utilizes data from site ID 01BMH0019, focusing on the KPI RRC Connection Setup Success Rate. The transition data for the hidden states is detailed in Table 3.
Table 3 presents the transition counts between hidden states: 11 transitions from "Degraded" to "Degraded", 2 transitions from "Maintain" to "Degraded", and 12 transitions from "Improved" to "Degraded". Additionally, it records 2 transitions from "Degraded" to "Maintain" and 11 transitions from "Degraded" to "Improved", among other transitions. The transition matrix for the hidden states can be derived using the equation A = {aij} = P{Qt+1 MŇ4t = i}, as outlined below:
a11 = P{Qt+1 = Degraded | = Degraded} = 11/24 = 0.4583
a12 = P{Qt+1 = Degraded | = Maintain } = 2/24 = 0.0833
a13 = P{Qt+1 = Degraded | = Improved} = 11/24 = 0. 4583
a21 = P{Qt+1 = Degraded | = Degraded} = 2/13 = 0.1538
a22 = P{Qt+1 = Degraded | = Maintain } = 7/13 = 0.5385
a23 = P{Qt+1 = Degraded | = Improved} = 4/13 = 0.3077
a31 = P{Qt+1 = Degraded | = Degraded} = 12/170 = 0.0706
a32 = P{Qt+1 = Degraded | = Maintain } = 3/170 = 0.0176
a33 = P{Qt+1 = Degraded | = Improved} = 155/170 = 0.9118
From a mathematical perspective, the transition matrix for the hidden states is defined as follows:
... (17)
Using the network package [21], the Markov diagram can be visualized as shown in Figure 3.
The Markov diagram shows a 91.2% probability that a site's condition will remain in the improved state, 45.8% in the degraded state, and 53.8% in the maintained state. The remaining probabilities reflect transitions between states: from degraded to improved, maintained to degraded, or improved to maintained, and vice versa, with probabilities ranging from 1.8% to 45.8%.
C. Determining Observable Emission Probabilities
The next phase involves computing the emission probabilities for observable states, denoted by the matrix B, which indicates the likelihood of an observation state vk given a hidden state qi. This is mathematically represented as B = bi(vk) = P(Ot = vk | Xt = qi ZKHUH L 1 DQG N 1 Essentially, B is derived from monitoring the changes in 4G network conditions that affect KPI status over a six-month period. The detailed emission probabilities are provided in Table 4.
Table 4 demonstrates that the KPI status deteriorated 7 times due to interference, 5 times due to coverage issues, and experienced 13 instances of degradation under otherwise good conditions. Conversely, improvements in KPI are directly proportional to the site's condition. The calculations for the observation probability matrix are provided below:
b1(v1) = P(Ot = Degraded | Xt = Interference) = 7/25 = 0.2800
b1(v2) = P(Ot = Degraded | Xt = Coverage) = 5/25 = 0.2000
b1(v3) = P(Ot = Degraded | Xt = OK) = 13/25 = 0.5200
b2(v1) = P(Ot = Maintain | Xt = Interference) = 3/13 = 0.2310
b2(v2) = P(Ot = Maintain | Xt = Coverage) = 2/13 = 0.1539
b2(v3) = P(Ot = Maintain | Xt = OK) = 8/13 = 0.6154
b3(v1) = P(Ot = Improved | Xt = Interference) = 14/170 = 0.0824
b3(v2) = P(Ot = Improved | Xt = Coverage) = 35/170 = 0.2059
b3(v3) = P(Ot = Improved | Xt = OK) = 121/170 = 0.7118
Transformed into matrix format as follows:
... (18)
Based on the computed emission probabilities, the initial probability distribution can be established as follows:
II1 = P(X0 = degraded) = 25/208 = 0.1202
II2 = P(X0 = maintain) = 13/208 = 0.0625
II3 = P(X0 = improved) = 170/208 = 0.8173
As a result, the initial probability distribution is given by Ȇ > @ The Markov diagram visualizing the emission probabilities is shown in Figure 4.
D. Identify Sequence of Hidden State using Viterbi
The Viterbi algorithm is used to identify the most probable sequence of states from an observed sequence by iteratively maximizing the likelihood of reaching each state i at time t , , given the observed sequence. This algorithm dynamically tracks the state with the highest probability and then performs a backward pass to determine the most likely path. Sampling is carried out through random filtering based on site ID. The process for determining the optimal sequence involves initialization, recursion, selecting the best state at the final time T, and evaluating the resulting state sequence. For example, the observed sequence is represented as: V = {maintain, degraded, maintain, maintain, improved, improved, maintain, improved, degraded,..., degraded}. After encoding numerically, with "improved" assigned a value of 2, "maintain" a value of 1, and "degraded" a value of 0, the sequence is mapped as: V = {1, 0, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 0, 0, 2, 2, 2, 2, 2, 2,...,0}. The Viterbi algorithm determines the optimal sequence of hidden states, resulting in: V = {improved, improved, improved, improved, improved, improved, improved, improved, degraded,..., degraded}.
The Viterbi algorithm forecasts hidden states for site observations. Recommendations for addressing degraded KPIs are based on site predictions of degradation, while "maintain" status, considered acceptable, does not require further action. Assuming that "maintain" and "improved" statuses are optimal, Figure 5 illustrates the Viterbi predictions, based on synthetic data [16], and serves as a foundation for predictions regarding network performance improvements.
E. Evaluation Model
The next step is to evaluate the prediction results. The synthetic data produced must be validated by comparing the actual values with the predicted values. This validation process includes calculating metrics such as mean absolute error (MAE) [22], root mean squared error (RMSE) [23], and F1-score [24].
MAE measures the average absolute difference between the predicted values and the actual values. In this study, MAE provides an indication of how much the predicted values (such as "Improved," "Maintained," or "Degraded" KPI statuses) deviate from the actual observed values. MAE is used to understand how accurate the HMM model is in predicting changes in KPI status. A lower MAE value indicates better model performance, as the predicted values are closer to the actual values. In evaluating HMM predictions for KPI status, RMSE is used to identify and quantify the impact of significant prediction errors. Large errors, which can have a greater impact on network performance, are given more emphasis with this metric. This helps ensure that the HMM model not only has good overall accuracy but also avoids making substantial errors in KPI predictions.
Last metrics used to evaluate the model is F1-Score. In this study, the F1-Score is used to assess the effectiveness of the HMM in correctly predicting the KPI status changes within a 4G network. The model's predictions are compared against the actual observed values, and the F1-Score is calculated to determine the balance between the Precision and Recall of these predictions. Higher F1-Score indicates that the HMM model has a good balance between correctly predicting when the KPI status improves, degrades, or remains maintained and minimizing both false positives and false negatives. To compute the model evaluation metrics MAE and RMSE, a sample calculation for site ID 01BMH0019 is taken using equations 14 and 15, as presented in Figure 6.
The F1-Score metric calculation requires the values of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN). Figure 7 provides a detailed example of the confusion matrix calculation for the KPI RRC Connection Succes Rate, used to assess the F1-Score.
Table 5 presents the performance evaluation metrics from synthetic data predictions using the Hidden Markov Model (HMM). This systematic evaluation ensures the model's reliability and effectiveness, establishing a robust foundation for subsequent analyses.
This study demonstrates that the Hidden Markov Model (HMM) can be effectively applied to predict Key Performance Indicators (KPIs) in 4G networks, offering insightful results across various metrics. For instance, the model shows a low Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) when predicting eRAB Drop Rate, achieving an F1 Score of up to 99.76% with different observation states. Moreover, for Inter Frequency Handover Success Rate (HOSR), the model's accuracy improves as the number of observation states increases. It is indicated by a decrease value in both MAE and RMSE. A similar trend is observed for the RRC Connection Setup Success Rate, where increasing the observation states results in higher prediction accuracy. These findings highlight that HMM is particularly suitable for predicting KPIs with moderate to high variability in 4G networks. However, for KPIs such as Intra Frequency HOSR, while the model still performs robustly, the lower precision indicates room for further enhancement, particularly under conditions of high variability.
The reason of choosing HMMs in this study is because of their ability to make a model time-series data with hidden states effectively. HMMs offer a unique advantage over traditional time-series models.
Several studies have highlighted the challenges and opportunities in network performance prediction using various Markov models. Omer et al. (2022) used Markov Chain models and achieved a 94.61% accuracy in KPI prediction. The results of this study align more closely with the findings of Yan et al. (2023), who explored the application of HMMs for time-series prediction in other domains and emphasized their balance between accuracy and computational efficiency, particularly for tasks requiring real-time insights. To further enhance prediction accuracy, hybrid models that combine HMMs with other machine learning techniques such as LSTM or Random Forests could be explored. For example, a combination of LSTM to capture long-term temporal dependencies and HMM to model state transitions might provide a more comprehensive approach. This would align with the work of Omer et al. (2022), who demonstrated the effectiveness of hybrid models in enhancing predictive performance across different network environments. Such an approach could mitigate the limitations observed in using HMMs alone, especially under conditions of high variability or in predicting KPIs where multiple performance factors are interdependent.
Finally, the practical implications of this study are significant for network operators. The use of HMM-based predictions can make proactive in terms of network performance management, to anticipate the potential issues and implement corrective actions before impact to service quality. For example, by predicting a potential drop in the KPI eRAB Drop Rate, operators could preemptively adjust network configurations or allocate additional resources to maintain the optimal service levels.
5. Conclusions
In conclusion, this study affirms that Hidden Markov Models (HMMs) are effective in predicting KPIs in 4G networks, especially under conditions of moderate variability. The model's has low MAE and RMSE values, combined with high precision and F1 Scores, demonstrate its capability in minimizing prediction errors. However, the study also recognizes that hybrid modeling approaches may offer potential improvements, particularly for KPIs with high variability. Future research should focus on developing such hybrid models and extending these methodologies to accommodate the complexities of 5G networks, where increased data volume and variability demand more sophisticated predictive approaches.
By refining these methods, network operators can enhance their prediction frameworks, leading to more efficient network management and improved service quality in both 4G and future network generations.
Eka Kosasih, a distinguished telecommunications professional with an impressive 18 years career in both domestic and international projects, stands as the focal point of this master's thesis. His educational background includes a Diploma in Telecommunications from Politeknik Negeri Bandung (was Politeknik ITB) in 2001, followed by a Bachelor's degree from Institut Sains dan Teknologi Nasional Jakarta in 2007. Eka pursued further academic endeavors by enrolling in the Master's program in Information System Management at BINUS University in 2023. The study aims to contribute valuable insights to the ever-evolving landscape of telecommunications technology and management.
Tanty Oktavia is currently Head of Master of Information Systems Study Program of Binus Graduate Program, Bina Nusantara University (http://mmsi.binus.ac.id) and Vice Chair & Treasurer of IEEE Computer Society Indonesia (http://ieeecomputer.id/). Her research interest in Database, Software Engineering, E-Learning, Business Intelligence, Knowledge Management Systems, Business Start-up, Information Engineering, transformative higher education teaching and learning process, and social media (https://www.scopus.com/authid/detail.uri?authorId=56049242500). Dr. Tanty Oktavia is a member of Association for Information Systems (AIS) and International Association of Engineers (IAENG). She is the recipient of local and government grant (RISTEKDIKTI). Dr. Tanty Oktavia is a highly sought-after consultant on business intelligence specifically in helping companies dealing with proficiency and profile monitoring. She involved with some professional projects relate with Technology Alignment in some of multinational companies as well as some government projects. She also the advisor for Business Start-up incubation and the reviewer of National and International conferences/journals. She has a lot of experience in managing conference both local and international flagship conference.
6. References
[1]. 3GPP, "3GPP TS 32.500." 3GPP, Valbonne -FRANCE, p. 10, 2022. [Online]. Available: http://www.3gpp.org
[2]. M. Thesis and M. E. Haider, "Machine Learning and KPI Analysis applied to Time-Series Data in Physical Systems: Comparison and Combination," no. May, 2021.
[3]. R. M. M. Santos, "Machine Learning Techniques using Key Performance Indicators for the Configuration Optimization of 4G Networks," Pdfs.Semanticscholar.Org, vol. 800, pp. 1-10, 1800, [Online]. Available: https://pdfs.semanticscholar.org/eda9/e7b8de05d3f5b11fff25bfce8f31b2175325.pdf
[4]. A. S. Omer, T. A. Yemer, and D. H. Woldegebreal, "Hybrid K-Mean Clustering and Markov Chain for Mobile Network Accessibility and Retainability Prediction †," Eng. Proc., vol. 18, no. 1, pp. 1-11, 2022, doi: 10.3390/engproc2022018009.
[5]. Hendrawan, "Accessibility degradation prediction on LTE/SAE network using discrete time markov chain (DTMC) model," J. ICT Res. Appl., vol. 13, no. 1, pp. 1-18, 2019, doi: 10.5614/itbj.ict.res.appl.2019.13.1.1.
[6]. N. A. Amirrudin, "Mobility Prediction via Markov Model in LTE Femtocell," Int. J. Comput. Appl. (0975 - 8887), vol. 65, no. 18, pp. 40-44, 2013.
[7]. M. Yan, S. Li, C. A. Chan, Y. Shen, and Y. Yu, "Mobility prediction using a weighted markov model based on mobile user classification," Sensors, vol. 21, no. 5, pp. 1-20, 2021, doi: 10.3390/s21051740.
[8]. R. Acc, "TS 136 413 -V12.6.0 -LTE; Evolved Universal Terrestrial Radio Access Network (E-UTRAN); S1 Application Protocol (S1AP) (3GPP TS 36.413 version 12.6.0 Release 12)," vol. 0, 2015.
[9]. F. Krasniqi, L. Gavrilovska, and A. Maraj, The analysis of key performance indicators (KPI) in 4G/LTE networks, vol. 283. Springer International Publishing, 2019. doi: 10.1007/978-3-030-23976-3_25.
[10]. M. R. Elnashar, A., El-saidny, M.A. & Sherif, Design, Deployment and Performance of 4G-LTE Networks a Practical Approach.
[11]. C. Johnson, L. Serna, T. Novosad, N. Hathiramani, L. Serna, and C. Johnson, "LTE FDD Optimization Guidelines," pp. 1-255, 2022.
[12]. P. Dymarski, Hidden Markov Model , Edited by Przemyslaw Dymarski, no. November. 2011. [Online]. Available: http://www.intechopen.com/books/show/title/hidden-markov-models-theory-and-applications
[13]. C. Zhang and J. Han, Data Mining and Knowledge Discovery. 2021. doi: 10.1007/978-981-15-8983-6_42.
[14]. S. Chernov, Detecting cellular network anomalies using the knowledge discovery process. 2015. [Online]. Available: https://jyx.jyu.fi/bitstream/handle/123456789/47934/978-951-39-6392-7_vaitos11122015.pdf;sequence=1%0Ahttps://jyx.jyu.fi/dspace/handle/123456789/479 34
[15]. A. T. Jebb, S. Parrigon, and S. E. Woo, "Exploratory data analysis as a foundation of inductive research," Hum. Resour. Manag. Rev., vol. 27, no. 2, pp. 265-276, 2017, doi: 10.1016/j.hrmr.2016.08.003.
[16]. Y. Lu, M. Shen, H. Wang, X. Wang, C. van Rechem, and W. Wei, "Machine Learning for Synthetic Data Generation: A Review," vol. 14, no. 8, pp. 1-18, 2023, [Online]. Available: http://arxiv.org/abs/2302.04062
[17]. B. Partov, D. J. Leith, and R. Razavi, "Utility fair optimization of antenna tilt angles in LTE networks," IEEE/ACM Trans. Netw., vol. 23, no. 1, pp. 175-185, 2015, doi: 10.1109/TNET.2013.2294965.
[18]. Reverb Networks, "Antenna Based Self Optimizing Networks for Coverage and Capacity Optimization," White Pap., 2012.
[19]. E. J. Khatib, R. Barco, and I. Serrano, "Degradation Detection Algorithm for LTE Root Cause Analysis," Wirel. Pers. Commun., vol. 97, Dec. 2017, doi: 10.1007/s11277-017-4738-6.
[20]. W. McKinney, "Pandas: Powerful Python Data Analysis Toolkit," Pandas -Powerful Python Data Anal. Toolkit, pp. 1-3743, 2022, [Online]. Available: https://pandas.pydata.org/pandas-docs/version/1.4.4/.
[21]. A. Hagberg, D. Schult, and P. Swart, "NetworkX Reference (Python)," Python Packag., p. 464, 2011.
[22]. T. O. Hodson, "Root-mean-square error (RMSE) or mean absolute error (MAE): when to use them or not," Geosci. Model Dev., vol. 15, no. 14, pp. 5481-5487, 2022, doi: 10.5194/gmd-15-5481-2022.
[23]. Hajiar Yuliana, "Hyperparameter Optimization of Random Forest for 5G Coverage Prediction," Bul. Pos dan Telekomun., vol. 22, no. 1, pp. 75-90, 2024, doi: 10.17933/bpostel.v22i1.390.
[24]. G. Donald Allen and D. Goldsby, "Confusion Theory and Assessment," IJISET-International J. Innov. Sci. Eng. Technol., vol. 1, no. 10, pp. 436-443, 2014, [Online]. Available: www.ijiset.com
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2024. This work is published under https://creativecommons.org/licenses/by-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Automating performance improvement in 4G cellular networks is a challenging research area due to existing limitations in artificial intelligence and machine learning applications. This study addresses these challenges by developing a data analysis model using Hidden Markov Models (HMM) to predict Key Performance Indicators (KPIs) and automate performance assessments. Data was analyzed from 1600 new sites of a mobile operator in Indonesia, collected from July 2023 to January 2024. The methodology follows Knowledge Discovery in Database (KDD) for data mining and applying HMMs to forecast KPIs such as eRAB Drop Rate and Setup Success Rate. The model achieved a Mean Absolute Error (MAE) of 0.005 and a Root Mean Square Error (RMSE) of 0.069 for eRAB Drop Rate, with an F1 Score reaching up to 99.76%. The performance of the model improves with an increasing number of observation states, particularly for Inter Frequency Handover Success Rate (HOSR) and RRC Connection Setup Success Rate. Despite strong performance, there is potential for further enhancement, especially for KPIs with high variability like Intra Frequency HOSR. This research demonstrates that HMMs are effective in predicting KPIs with high accuracy, rather than traditional time-series models. The results align with recent studies and suggest that combining HMMs with techniques such as LSTM or Random Forests could improve predictive accuracy. These methods are also applicable to another technology, especially 5G networks, offering valuable insights for more effective network management and performance optimization.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer