Skip to content

yee-kevin/natural-language-processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

SUTD 50.040 Natural Language Processing Course Homework and Projects taught by Professor Lu Wei. For more information, refer to https://istd.sutd.edu.sg/undergraduate/courses/50040-natural-language-processing.

Material

Homework

1. HW1 [Code] - Word Embeddings (Co-occurence matrices/Word2Vec)

Word embeddings are dense vectors that represent words, and capable of capturing semantic and syntactic similarity, relations with other words, etc. This homework uses two methods to learn word embeddings: Count-based (Co-occurrence matrices) and Prediction-based (Word2Vec - CBOW and Skip-gram model).The dataset used is "text8" that consists of one single line of text.

2. HW2 [Code] - CKY parsing algorithm for probabilistic context-free grammars (PCFG)

Constituency parsing aims to extract a constituency-based parse tree from a sentence that represents its syntactic structure according to a phrase structure grammar. This homework implements a constituency parser based on probabilistic context-free grammars (PCFGs) and evaluate its performance. The dataset used is a version of the “Penn Treebank” released in the NLTK corpora.

3. HW3 [Written] - Language Model, Dependency Parsing, Context Free Grammar, Transition-based Parsing

4. HW4 [Code] - Part 1 (IBM Model 1), Part 2 (Seq2Seq Attention Model)

Part 1: IBM Model 1 using hard and soft expectation-maximization (EM) algorithm.
Part 2: Seq2Seq Attention Model using a Bidirectional-LSTM Encoder and a Unidirectional-LSTM Decoder.

5. HW5 [Written] - Phrase-based Machine Translation, Synchronous CFG, Word Alignment Model, Attention

Recommended Textbooks

  1. Chris Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA: May 1999
  2. Dan Jurafsky and James H. Martin, Speech and Language Processing (3rd ed. draft), 2018
  3. Yoav Goldberg, Neural Network Methods for Natural Language Processing, 2017

Supplementary Textbooks for Linear Algebra, Probability

  1. D. Poole, Linear Algebra: A Modern Introduction. 3rd edition, 2010.
  2. J. L. Devore, Probability and Statistics for Engineering and the Science. 8th edition, 2011

About

SUTD 50.040 Natural Language Processing Coursework.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published