This repo contains the source code for the tutorial series at http://www.thoughtly.co/blog/prototype. To help self-taught students trying to immerse themselves into ML/NLP, we are introducing a tutorial series focused on ML with an emphasis in NLP. The plan is to take users from basic concepts through more advanced subjects. We intend to provide simple, verbose, well documented code that allows the student to fully grasp concepts and techniques that are often glossed over in classes but which serve to provide a significant portion of the foundation needed for someone to get into ML for NLP.
The first post focuses on text handling with an emphasis on tokenization and term frequency. This post covers the general use of the code found in words.py: http://www.thoughtly.co/blog/working-with-text/.
The second post is an introduction to probability. It's the first of two tutorials on the subject: http://www.thoughtly.co/blog/probability/.
Post three discusses more advanced probability topics, primarily Bayes Theorem. It is availabe here: http://www.thoughtly.co/blog/bayes-theorem/
Post four combines topics from the previous posts to introduce our first machine learning algorith, the Naive Bayes Classifer : http://www.thoughtly.co/blog/naive-bayes-classifier