Content area

Abstract

Context. For more than six decades, software cost/effort estimation has been a relevant topic for research due to its impact on the industry. Although many estimation models exist, regression-based estimation approaches have been predominantly used in the literature. However, some problems have been observed both in industry and academia: the lack of datasets with a high or at least enough number of data points and the arbitrary combination of different source databases belonging to practitioners in order to create larger datasets.

Objective. Propose the application of the Kruskal–Wallis test to validate the integration of distinct source databases (independent groups), thereby avoiding the mixing of unrelated data, increasing the number of data points, and improving the estimation models.

Method.We conducted a case study using real data from an international company, specifically data from their Mexico office. This office provides software development services for a technological tower identified as “Microservices and APIs.” The data were collected in 2020.

Results: The quality criteria in the final estimation model were improved. The MMRE was reduced by 25.4% (from 78.6 to 53.2%), the standard deviation was reduced by 97.2% (from 149.7 to 52.5%), and the Pred (25%) indicator increased by 3.2 percentage points. Additionally, the number of data points increased significantly, and linear regression constraints was accomplished. The application of the Kruskal–Wallis test to validate the integration of distinct source databases (independent groups) proved useful in improving the estimation models.

Details

Title
Merging Distinct Sources Databases to Improve Software Estimation Models
Publication title
Volume
50
Issue
8
Pages
786-795
Publication year
2024
Publication date
Dec 2024
Publisher
Springer Nature B.V.
Place of publication
New York
Country of publication
Netherlands
ISSN
03617688
e-ISSN
16083261
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-01-12
Milestone dates
2025-01-05 (Registration); 2024-05-08 (Received); 2024-09-12 (Accepted); 2024-08-17 (Rev-Recd)
Publication history
 
 
   First posting date
12 Jan 2025
ProQuest document ID
3154524634
Document URL
https://www.proquest.com/scholarly-journals/merging-distinct-sources-databases-improve/docview/3154524634/se-2?accountid=208611
Copyright
Copyright Springer Nature B.V. Dec 2024
Last updated
2025-01-13
Database
ProQuest One Academic