UCI open source dataset used to detect breast cancer. This breast cancer database was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg
In this project I am building a logistic regression model in order to identify correlations between the following 9 independent variables and the class of the tumor (benign or malignant). Logistic regression can identify important predictors of breast cancer using odds ratios and generate confidence intervals that provide additional information for decision-making. Then, I used other classification models to predict the dependent variable: K-Nearest Neighbor, Support Vector Machine, Kernel Support Vector Machine, Naive Bayes, Decision Tree Classifier and Random Forest model.
Independent variables: Clump thickness Uniformity of cell size Uniformity of cell shape Marginal adhesion Single epithelial cell Bare Nuclei Bland chromatin Normal nucleoli Mitoses
Model performance depends on the ability of the radiologists to accurately identify findings on mammograms.