`PR104`

Reproduction of a Software Quality Prediction Study

Comparison of Machine Learning Techniques for Software Quality Prediction
a paper by Goyal, S. (2020) published in Int. J. Knowl. Syst. Sci., 11(2) - IGI Global

This repository contains the reproduction of a small paper on software quality prediction as a university project. The original paper provides an overview of different software quality prediction models and their performance in terms of accuracy, recall and ROC AUC. The goal of this reproduction project is to validate the findings of the original paper, provide a deeper understanding of the various software quality prediction models, and to critically evaluate the approach and choices made by the authors.

In addition to reproducing the paper, this repository also critiques the approach taken by the authors. One major criticism is that the authors did not properly handle the class imbalance problem, which can greatly impact the performance of the models. Furthermore, the authors used misleading and inadequate performance metrics, which also affected the conclusions of the results.

If you are interested in software quality prediction or simply want to learn about the different models and techniques used in this field, this repository is for you! The code is well documented and easy to follow, making it an excellent resource for anyone looking to get started with software quality prediction. In addition, the critical evaluation of the original paper provides valuable insight into the limitations and potential improvements in this field of research.

So feel free to take a look, experiment with the code, and let me know if you have any questions or suggestions!

Data

The work utilizes data collected from NASA projects using McCabe metrics which are made available in the PROMISE repository. This research is done with six fault prediction benchmark datasets named CM1, KC1, KC2, PC1, JM1, and ALL_DATA (a combination of the previous datasets). The data has been collected using McCabe and Halstead features extractors from the source code of multiple projects.

Name	Instances	Buggy	Clean	Imbalance Ratio	Features	Source
CM1	498	49	449	0.109	22	CM1 is a NASA spacecraft instrument written in C
JM1	10885	2106	8779	0.240	22	JM1 is written in C and is a real-time predictive ground system. It uses simulations to generate predictions
KC1	2109	326	1783	0.183	22	KC1 is a C++ system implementing storage management for receiving and processing ground data
KC2	522	107	415	0.258	22	C++ functions used in a scientific data project which is separate from another part known as KC1. These share some third-party software libraries with no other software overlap
PC1	1109	77	1032	0.075	22	Data from C functions. Flight software for earth orbiting satellite
ALL_DATA	15123	2665	12458	0.214	22	Combined Dataset

Usage

To run the analysis, you must have Python 3.x and the required libraries installed. The required libraries are listed and imported in the setup.ipynb notebook.

The MLDS_PR104 repository contains the following folders:

scripts that contains the Jupyter notebooks for the analysis and setup of utility functions;
conf, if necessary, that contains configuration files used in scripts or jupyter notebook files;
data, that contains input benchmark datasets both in '.csv' and '.arff' format;
results contains outputs from the reproduction for an easy comparison with the original study, usually in the '.csv' format;
figures that contains plot files
reference that contains any possibly referenced resource.

Acknowledgments

References, Inspiration, Code Snippets, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`PR104`

Reproduction of a Software Quality Prediction Study

Data

Usage

Acknowledgments

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
data		data
figures		figures
references		references
results		results
scripts		scripts
.gitignore		.gitignore
README.md		README.md

zurlog/MLDS_PR104

Folders and files

Latest commit

History

Repository files navigation

PR104

Reproduction of a Software Quality Prediction Study

Data

Usage

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Languages

`PR104`