GitHub - bradaallen/nlp_helper: Practice for submitting a package to PyPI - found at https://pypi.org/project/brad_nlp_helpers/

In text clustering, sometimes the output of a k-means analysis may not be intuitive. This
function is designed to identify the main criteria of a cluster by identifying the top bi-gram
associated with each cluster.

This function creates a DataFrame of each cluster and their top bigram. It does this by
creating a subset of each cluster, and then creates bigrams for each individual row.
Those bigrams are joined together into a list, and then run through a counter to
identify the top bigram.

Parameters
--------------
dataset : DataFrame
    the dataframe being used as an input. At a minimum, it should have two fields: the cluster output and the text fed for clustering
cluster_loop : string
    the title of the field that records the different clusters output
row_loop : string
    the title of the field that includes the text for identifying the bigram (can be raw or preprocessed)
stopwords : bool (optional)
    if True, use stoplist to strip text

Returns
---------
top_gram :
    returns a DataFrame with 2 columns: the cluster title and a tuple including the topbigram and its count

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
brad_nlp_helpers		brad_nlp_helpers
build/lib/brad_nlp_helpers		build/lib/brad_nlp_helpers
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

bradaallen/nlp_helper

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages