Mazin18 / text_clustering Public

forked from sergeio/text_clustering

Notifications You must be signed in to change notification settings
Fork 0
Star 0

k-means text clustering using cosine similarity.

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
k_means.py		k_means.py
similarity.py		similarity.py
test_vectorizer.py		test_vectorizer.py
vectorizer.py		vectorizer.py

Repository files navigation

text-clustering

An implementation of textual clustering, using k-means for clustering, and cosine similarity (link) as the distance metric.

Ideal use case:

In [1]: from vectorizer import cluster_paragraphs

# define text variables

In [2]: cluster_paragraphs([
   ...:     text_about_thing_a,
   ...:     text_about_thing_b,
   ...:     text_about_thing_a2,
   ...:     text_about_thing_a3,
   ...:     text_about_thing_b2,
   ...: ], num_clusters=2)
Out[2]: [
   ...:     [text_about_thing_a, text_about_thing_a2, text_about_thing_a3],
   ...:     [text_about_thing_b, text_about_thing_b2],
   ...: ]