Part of Speech Tagging from Scratch

Naive Bayes Method

We use the following Bayes formula to find probability: P(Si|Wi) = P(Wi|Si) * P(Si)/P(Wi)
Since the denominator will remain same for given word, we ignore the P(Wi). P(Si|Wi) = P(Wi|Si) * P(Si)
For each word Wi in test sentence, the part of speech tag Si with the maximum probability P(Si|Wi) is considered for classifying the word Wi
Handling new words in simple model: If a word in the test sentence is not present in emission probability calculated from training data then we assign emission probability to that word = (minimum of all emission probabilities)*0.1

Hidden Markov Model(HMM) using Viterbi

Viterbi is based on dynamic programming
The number of states in Viterbi model = number of words in test sentence
For the first word, we only calculate emission probability * prior probability for all 12 parts of speech and save the values in dictionary v["state, part of speech"]
From second word onwards, we calculate max for each of 12 part of speech = max(previous Viterbi state * transition probability). Corresponding part of speech and pervious part of speech is stored in a list of list called tag. Then we multiply this value with corresponding emission probability and get value for 2nd Viterbi state
This process continues till last word of sentence
We find max value of last Viterbi state which is stored in dictionary v and get corresponding part of speech tag which is the tag for last word in sentence. We then backtrack using this part of speech to find tags for each word
Handling emission probabilities of new words by assigning min(emission probability)*0.1
We have first taken log of individual probabilities (emission, transition) and then added them instead of multiplication given in formula to handle underflow problem

Markov Chain Monte Carlo(MCMC)

First, generate a random sample consisting of all nouns. From this sample, we generate a new sample using create_sample() function
To generate a sample, the probability of a part of speech tag for a word is calculated, given all the other words and their corresponding tags observed, that is P(S_i | (S - {S_i}), W1,W2,...,Wn)
For first word we calculated product of emission and prior probability
For the second word we calculated product of emission and transition of tag from 1st to 2nd word
From 3rd word onwards, we calculated product of emission, transition from previous to current tag, and transition from previous of previous to current tag
We then divide these probabilities for each tag of given word by sum of all probabilities of all tags for that word. Using np.random.choice we randomly get a tag having probability close to 1 and assign this tag to that word in sample. We do this process for all samples
To handle emission probability of new words, a small probability (min(emission)*0.1) is assigned. The first 1000 samples were discarded to pass the warming period and improve the accuracy
To calculate marginal distribution from samples, we first generated 3000 samples using MCMC (after discarding the first 1000 samples)
From these samples we calculate max probability of each part of speech tag corresponding to each word in the test sentence
We assign the part of speech tag having maximum probability for a word and combine them to get the part of speech tags for the entire sentence
Handling emission probabilities of new words by assigning min(emission probability)*0.1

Result

Command to run code ./label.py bc.train.txt bc.test.txt

Naive Bayes Accuracy

Words correct = 93.92%
Sentences correct = 47.45%

HMM using Viterbi Accuracy

Words correct = 94.70%
Sentences correct = 53.25%

MCMC Accuracy

Words correct = 95.08%
Sentences correct = 55.65%

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
images		images
README.md		README.md
bc.test.txt		bc.test.txt
bc.train.txt		bc.train.txt
label.py		label.py
pos_scorer.py		pos_scorer.py
pos_solver.py		pos_solver.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Part of Speech Tagging from Scratch

Naive Bayes Method

Hidden Markov Model(HMM) using Viterbi

Markov Chain Monte Carlo(MCMC)

Result

Naive Bayes Accuracy

HMM using Viterbi Accuracy

MCMC Accuracy

About

Releases

Packages

Languages

santoshd97/Part-of-Speech-Tagging

Folders and files

Latest commit

History

Repository files navigation

Part of Speech Tagging from Scratch

Naive Bayes Method

Hidden Markov Model(HMM) using Viterbi

Markov Chain Monte Carlo(MCMC)

Result

Naive Bayes Accuracy

HMM using Viterbi Accuracy

MCMC Accuracy

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages