About

SUTD 50.040 Natural Language Processing Course Homework and Projects taught by Professor Lu Wei. For more information, refer to https://istd.sutd.edu.sg/undergraduate/courses/50040-natural-language-processing.

Material

Homework

1. HW1 [Code] - Word Embeddings (Co-occurence matrices/Word2Vec)

Word embeddings are dense vectors that represent words, and capable of capturing semantic and syntactic similarity, relations with other words, etc. This homework uses two methods to learn word embeddings: Count-based (Co-occurrence matrices) and Prediction-based (Word2Vec - CBOW and Skip-gram model).The dataset used is "text8" that consists of one single line of text.

2. HW2 [Code] - CKY parsing algorithm for probabilistic context-free grammars (PCFG)

Constituency parsing aims to extract a constituency-based parse tree from a sentence that represents its syntactic structure according to a phrase structure grammar. This homework implements a constituency parser based on probabilistic context-free grammars (PCFGs) and evaluate its performance. The dataset used is a version of the “Penn Treebank” released in the NLTK corpora.

3. HW3 [Written] - Language Model, Dependency Parsing, Context Free Grammar, Transition-based Parsing

4. HW4 [Code] - Part 1 (IBM Model 1), Part 2 (Seq2Seq Attention Model)

Part 1: IBM Model 1 using hard and soft expectation-maximization (EM) algorithm.
Part 2: Seq2Seq Attention Model using a Bidirectional-LSTM Encoder and a Unidirectional-LSTM Decoder.

5. HW5 [Written] - Phrase-based Machine Translation, Synchronous CFG, Word Alignment Model, Attention

Supplementary Textbooks for Linear Algebra, Probability

D. Poole, Linear Algebra: A Modern Introduction. 3rd edition, 2010.
J. L. Devore, Probability and Statistics for Engineering and the Science. 8th edition, 2011

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
HW1-word-embeddings		HW1-word-embeddings
HW2-cky-pcfg		HW2-cky-pcfg
HW3		HW3
HW4		HW4
HW5		HW5
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Material

Homework

1. HW1 [Code] - Word Embeddings (Co-occurence matrices/Word2Vec)

2. HW2 [Code] - CKY parsing algorithm for probabilistic context-free grammars (PCFG)

3. HW3 [Written] - Language Model, Dependency Parsing, Context Free Grammar, Transition-based Parsing

4. HW4 [Code] - Part 1 (IBM Model 1), Part 2 (Seq2Seq Attention Model)

5. HW5 [Written] - Phrase-based Machine Translation, Synchronous CFG, Word Alignment Model, Attention

Recommended Textbooks

Supplementary Textbooks for Linear Algebra, Probability

About

Uh oh!

Releases

Packages

Languages

yee-kevin/natural-language-processing

Folders and files

Latest commit

History

Repository files navigation

About

Material

Homework

1. HW1 [Code] - Word Embeddings (Co-occurence matrices/Word2Vec)

2. HW2 [Code] - CKY parsing algorithm for probabilistic context-free grammars (PCFG)

3. HW3 [Written] - Language Model, Dependency Parsing, Context Free Grammar, Transition-based Parsing

4. HW4 [Code] - Part 1 (IBM Model 1), Part 2 (Seq2Seq Attention Model)

5. HW5 [Written] - Phrase-based Machine Translation, Synchronous CFG, Word Alignment Model, Attention

Recommended Textbooks

Supplementary Textbooks for Linear Algebra, Probability

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages