Skip to content

TonmoyTalukder/An-Empirical-Study-of-the-Efficacy-among-multiple-MachineLearning-Algorithms-for-Diabetes-Prediction

Repository files navigation

An-Empirical-Study-of-the-Efficacy-among-multiple-MachineLearning-Algorithms-for-Diabetes-Prediction

Pima Indians Diabetes Database

CS Udergrad, AUST, Dhaka, Bangladesh

Context: This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.

Dataset

Abstract

Diabetes Mellitus (DM) has become a global epidemic as a chronic illness. The prevalence of DM has been rising every year and by 2025 DM is expected to affect 380 million people worldwide. Insufficient insulin production by the pancreas or incorrect insulin uptake by the body’s cells causes diabetes. Diabetes can be controlled if it is predicted earlier. Machine Learning methods provide better results for prognosis by constructing models from datasets collected from patients. The dataset we used is Pima Indians Diabetes Database (PIDD). National Institute of Diabetes and Digestive and Kidney Diseases is the source of PIDD. We have used Logistic Regression (LR), Decision Tree Classifier (DTC), Support Vector Machine (SVM), Random Forest (RF), Gaussian Naive Bayes (NB), K-Neighbors (KNN), and XGBoost (XGB) along with some ensemble model estimation to predict diabetes and find out which algorithm provides the best prediction result. In this study, we concentrated on the F1 score rather than accuracy, and using grid search and cross-validation, we discovered that DTC method performed the best based on F1 metrics, providing a score of 72.0%. An F1 Score is nothing but the harmonic mean of a system’s precision and recall values. In addition, the Harmonic Mean determined that LR delivered the best performance with a score of 70.33%. Since we were dependent on the F1 score to achieve that, the AB (AdaBoost) algorithm is giving a performance score of 63.23% among the three models of the EL method. Tracking down the most optimal ML algorithm for predicting diabetes is the target of this study. This research work provides the best-performed ML model in terms of predicting diabetes. We determined the efficacy of different ML models in diabetes prediction.

Index Terms — Diabetes, Machine Learning, Logistic Regression, Decision Tree Classifier, XGBoost, Ensemble Learning

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published