Simple Keyword Analysis

Provides a simple extraction and analysis of the most commonly used words in pdf or txt files, using the Python Natural Language Toolkit.

If you have more complex text extraction needs, you may want to take a look at the Doc Processing Toolkit.

Installation

First, download the repo: git clone https://github.com/18F/text-analysis.git

We recommend using pipenv to install dependencies and run things safely in a virtualenv. You'll set that up by running pipenv install from within the repo.

Your virtualenv should be using Python 3.x. If it's not, try brew install python and hopefully you'll get it sorted out. Remember, after you have python 3.x installed, you'll need to re-run pipenv install

If you don't have pipenv, you should be able to install it by running brew install pipenv. Check the Pipenv documentation for details.

Usage

First, drop the files you want to analyze into the files directory.

Then activate your virtual environment: pipenv shell

If this is your first time running this, or if you haven't used it in a long time, be sure the NLTK modules are up-to-date by running python update_nltk.py

Then run python keyword_analysis.py

Dependencies

These should all be installed for you when you run pip install but if you're curious about what's happening under the hood:

PyPDF2 is used to read PDF files. NLTK handles the textual analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
files		files
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
keyword_analysis.py		keyword_analysis.py
update_nltk.py		update_nltk.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple Keyword Analysis

Installation

Usage

Dependencies

About

Releases

Packages

Languages

jasonpaulraj/text-analysis

Folders and files

Latest commit

History

Repository files navigation

Simple Keyword Analysis

Installation

Usage

Dependencies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages