Skip to content

k-means text clustering using cosine similarity.

Notifications You must be signed in to change notification settings

Mazin18/text_clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

text-clustering

An implementation of textual clustering, using k-means for clustering, and cosine similarity (link) as the distance metric.

Ideal use case:

In [1]: from vectorizer import cluster_paragraphs

# define text variables

In [2]: cluster_paragraphs([
   ...:     text_about_thing_a,
   ...:     text_about_thing_b,
   ...:     text_about_thing_a2,
   ...:     text_about_thing_a3,
   ...:     text_about_thing_b2,
   ...: ], num_clusters=2)
Out[2]: [
   ...:     [text_about_thing_a, text_about_thing_a2, text_about_thing_a3],
   ...:     [text_about_thing_b, text_about_thing_b2],
   ...: ]

You give the function a list with text, and it groups them into clusters by analyzing the content of each string.

More documentation to come! <- this could be a lie.

About

k-means text clustering using cosine similarity.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%