Data and Ensemble Machine Learning Fusion Based Intelligent Software Defect Prediction System

Sagheer Abbas, Shabib Aftab, Muhammad Adnan Khan, Taher M. Ghazal, Hussam Al Hamadi, Chan Yeob Yeun

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

The software engineering field has long focused on creating high-quality software despite limited resources. Detecting defects before the testing stage of software development can enable quality assurance engineers to concentrate on problematic modules rather than all the modules. This approach can enhance the quality of the final product while lowering development costs. Identifying defective modules early on can allow for early corrections and ensure the timely delivery of a high-quality product that satisfies customers and instills greater confidence in the development team. This process is known as software defect prediction, and it can improve end-product quality while reducing the cost of testing and maintenance. This study proposes a software defect prediction system that utilizes data fusion, feature selection, and ensemble machine learning fusion techniques. A novel filter-based metric selection technique is proposed in the framework to select the optimum features. A three-step nested approach is presented for predicting defective modules to achieve high accuracy. In the first step, three supervised machine learning techniques, including Decision Tree, Support Vector Machines, and Naïve Bayes, are used to detect faulty modules. The second step involves integrating the predictive accuracy of these classification techniques through three ensemble machine-learning methods: Bagging, Voting, and Stacking. Finally, in the third step, a fuzzy logic technique is employed to integrate the predictive accuracy of the ensemble machine learning techniques. The experiments are performed on a fused software defect dataset to ensure that the developed fused ensemble model can perform effectively on diverse datasets. Five NASA datasets are integrated to create the fused dataset: MW1, PC1, PC3, PC4, and CM1. According to the results, the proposed system exhibited superior performance to other advanced techniques for predicting software defects, achieving a remarkable accuracy rate of 92.08%.

Original languageBritish English
Pages (from-to)6083-6100
Number of pages18
JournalComputers, Materials and Continua
Volume75
Issue number3
DOIs
StatePublished - 2023

Keywords

  • Ensemble machine learning fusion
  • fuzzy logic
  • software defect prediction

Fingerprint

Dive into the research topics of 'Data and Ensemble Machine Learning Fusion Based Intelligent Software Defect Prediction System'. Together they form a unique fingerprint.

Cite this