Content area
Full text
ABSTRACT
In software development systems, the maintenance process of software systems attracted the attention of researchers due to its importance in fixing the defects discovered in the software testing by using bug reports (BRs) which include detailed information like descriptions, status, reporter, assignee, priority, and severity of the bug and other information. The main problem in this process is how to analyze these BRs to discover all defects in the system, which is a tedious and timeconsuming task if done manually because the number of BRs increases dramatically. Thus, the automated solution is the best. Most of the current research focuses on automating this process from different aspects, such as detecting the severity or priority of the bug. However, they did not consider the nature of the bug, which is a multi-class classification problem. This paper solves this problem by proposing a new prediction model to analyze BRs and predict the nature of the bug. The proposed model constructs an ensemble machine learning algorithm using natural language processing (NLP) and machine learning techniques. We simulate the proposed model by using a publicly available dataset for two online software bug repositories (Mozilla and Eclipse), which includes six classes: Program Anomaly, GUI, Network or Security, Configuration, Performance, and Test-Code. The simulation results show that the proposed model can achieve better accuracy than most existing models, namely, 90.42% without text augmentation and 96.72% with text augmentation.
Keywords: BRs, Machine Learning, NLP, Multi-Class Classification, Text Augmentation.
1. INTRODUCTION: Bug reports are crucial for identifying, tracking, and resolving issues in software development, directly impacting software quality and user satisfaction. The effective prediction and classification of bug reports help streamline the debugging process by prioritizing and categorizing them accurately. Traditionally, Support Vector Machines (SVM) have been Widely employed for bug report prediction due to their ability to handle high-dimensional datasets and binary classification tasks effectively. However, SVM often faces challenges such as scalability, sensitivity to noise, and limited performance in highly imbalanced datasets. To address these limitations, we propose a Nature-Based Prediction Model of Bug Reports utilizing XGBoost (Extreme Gradient Boosting), an advanced ensemble machine learning technique. XGBoost combines multiple decision trees to optimize performance and offers better handling of large-scale datasets, imbalanced classes, and missing data. With its ability...





