Academic Projects for CSE-6363 Machine Learning

kNN - Classifier

Iris data set is used in .csv format.
Downloaded from Iris Dataset
It includes three iris species with 50 samples each as well as some properties about each flower.
One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.
The columns in this dataset are:

Id
SepalLengthCm
SepalWidthCm
PetalLengthCm
PetalWidthCm
Species

The code is contained in knn-classifier.py
It contains 6 functions -

loadDataset
Loads the dataset from a .csv file. First instance is assumed to contain feature labels, so it is skipped.
getDistance
Returns euclidean distance between two vectors
getNeighbours
Returns k nearest neighbours to a test instance
predictClass
Returns the most likely class from all the neighbours
getAccuracy
Tests the predictions against the entire test dataset. Accuracy is printed in percentage
myknnclassify
Required function mentioned in the problem

Value of k: 12
Training and test data are created by randomly splitting data in 66:34 ratio.
Classifier accuracy generally >95%

kNN - Regressor

Fertility dataset is used in .csv format.
Downloaded from Fertitlity Dataset
100 volunteers provide a semen sample analyzed according to the WHO 2010 criteria.
Sperm concentration are related to socio-demographic data, environmental factors, health status, and life habits
Season in which the analysis was performed. 1) winter, 2) spring, 3) Summer, 4) fall. (-1, -0.33, 0.33, 1)
Age at the time of analysis. 18-36 (0, 1)
Childish diseases (ie , chicken pox, measles, mumps, polio) 1) yes, 2) no. (0, 1)
Accident or serious trauma 1) yes, 2) no. (0, 1)
Surgical intervention 1) yes, 2) no. (0, 1)
High fevers in the last year 1) less than three months ago, 2) more than three months ago, 3) no. (-1, 0, 1)
Frequency of alcohol consumption 1) several times a day, 2) every day, 3) several times a week, 4) once a week, 5) hardly ever or never (0, 1)
Smoking habit 1) never, 2) occasional 3) daily. (-1, 0, 1)
Number of hours spent sitting per day ene-16 (0, 1)
Output: Diagnosis normal (N), altered (O)

The code is contained in knn-regressor.py
It contains 6 functions -

loadDataset
Loads the dataset from a .csv file. First instance is assumed to contain feature labels, so it is skipped.
getDistance
Returns euclidean distance between two vectors
getNeighbours
Returns k nearest neighbours to a test instance
calculateValue
Returns the most likely class from all the neighbours
getAccuracy
Tests the predictions against the entire test dataset. Accuracy is printed in percentage
myknnregress
Required function mentioned in the problem statement

Value of k: 12
Training and test data are created by randomly splitting data in 66:34 ratio.
Regressor accuracy is generally >85%

Naive Bayes Classifier

Mushroom dataset in .csv format.
Downloaded from Mushroom Dataset
Mushroom Dataset includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family. Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended.
The columns in this dataset are -

cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s
cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s
cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r, pink=p,purple=u,red=e,white=w,yellow=y
bruises?: bruises=t,no=f
odor: almond=a,anise=l,creosote=c,fishy=y,foul=f, musty=m,none=n,pungent=p,spicy=s
gill-attachment: attached=a,descending=d,free=f,notched=n
gill-spacing: close=c,crowded=w,distant=d
gill-size: broad=b,narrow=n
gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g, green=r,orange=o,pink=p,purple=u,red=e, white=w,yellow=y
stalk-shape: enlarging=e,tapering=t
stalk-root: bulbous=b,club=c,cup=u,equal=e, rhizomorphs=z,rooted=r,missing=?
stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s
stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s
stalk-color-above-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y
stalk-color-below-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y
veil-type: partial=p,universal=u
veil-color: brown=n,orange=o,white=w,yellow=y
ring-number: none=n,one=o,two=t
ring-type: cobwebby=c,evanescent=e,flaring=f,large=l, none=n,pendant=p,sheathing=s,zone=z
spore-print-color: black=k,brown=n,buff=b,chocolate=h,green=r, orange=o,purple=u,white=w,yellow=y
population: abundant=a,clustered=c,numerous=n, scattered=s,several=v,solitary=y
habitat: grasses=g,leaves=l,meadows=m,paths=p, urban=u,waste=w,woods=d

The code is contained in naive-bayes.py
It contains 6 functions -

loadDataset - Loads the dataset from a .csv file.
split data by classes - Split training data according to class labels
calculate probabilities - Calculating dependent probabilities for each feature given a particular class label.
calculate z - Calculate the scaling factor Z
predict class - Returns the most likely class by calculating argmax for each class label
main - driver function

Instances with missing attributes are skipped.
Training - top 4000 instances
Test - Remaining 1644 instances
Classifier accuracy is 84.97%

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
eculidean-distance		eculidean-distance
knn-classifier		knn-classifier
knn-regressor		knn-regressor
logistic-regression		logistic-regression
music-speech-classifier		music-speech-classifier
naive-bayes-classifier		naive-bayes-classifier
neural-networks		neural-networks
sentiment-analysis		sentiment-analysis
support-vector-machine		support-vector-machine
.gitattributes		.gitattributes
.gitignore		.gitignore
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Academic Projects for CSE-6363 Machine Learning

kNN - Classifier

kNN - Regressor

Naive Bayes Classifier

About

Releases

Packages

Languages

huige555551/machine-learning

Folders and files

Latest commit

History

Repository files navigation

Academic Projects for CSE-6363 Machine Learning

kNN - Classifier

kNN - Regressor

Naive Bayes Classifier

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages