Skip to content

Implemented bisecting K-means in Python, with the feature selection. Gradually reduce the feature dimension when the cluster size is smaller.

Notifications You must be signed in to change notification settings

msha096/bisecting_K_means

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

The feature selection based bisecting K-means.

Implemented bisecting K-means in Python, with the feature selection. Gradually reduce the feature dimension when the cluster size is smaller.

Feature Selection:

The feature selection is done by applying PCA to the features and reduce the dimensionality of features gradually. The dimension is positive correlated with the clsuter size.

Pipeline:

The baselien K-Means is from SKLearn. The bisecting K-means is a top-down clustering model, it starts with all in one cluster. Each time we apply K-Means to the cluster with the largest square distance, with k = 2.

Evaluation:

The silhouette scores analysis is printed at each time K-Means divide the cluster into two sub clusters.

Usage:

Simply change the file path in main, and it will read your feature.

About

Implemented bisecting K-means in Python, with the feature selection. Gradually reduce the feature dimension when the cluster size is smaller.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages