These notebooks are developed to understnad the basics of text mining and text manipulation in python.
The first notebook describes an understanding of how text is handled by python, the structure of text both to the machine and to humans, and an overview of the nltk framework for manipulating text. The first and second notebook describes common manipulation needs, including regular expressions (searching for text), cleaning text, and preparing text for use by machine learning processes.
The third notebook describes basic natural language processing methods to text, and demonstrate how text classification is accomplished.
The final notebook explores more advanced methods for detecting the topics in documents and grouping them by similarity (topic modelling). Real world data is used in notebook 3 and notebook 4 in order to better understand text manipulation and classification in python using regex and nltk libraries.