This is a Python repository for dealing with some of the most common task of Natural Language Processing
The repository is divided into three main sections, named by the professor's family name to reference in case of doubt:
- in the first part the basics of natural language processing will be given, with special focus on: morphology, syntax, formal semantics, NL generation and automatic translation;
- the second one is mostly focussed on Lexical Semantics; several approaches for Knowledge Representation will be introduced herein, along with the notion of semantics based on conceptual anchoring. Also a detailed survey on the existing, state-of-the-art, semantic resources will be provided.
- the third one is oriented to statistical approaches to NLP, starting from the concept of distributional semantics and the existing methodologies. Then it will focus on the notion of semantic similarity and the theoretical bases for the construction of meaning through syntactic-semantic compositions, with an emphasis on the automatic construction of ontologies.
Hidden Markov Model based PoS Tagger, using Viterbi's algorithm
Currently, we have the following:
- Exercise 1.1 conceptual similarity
- Exercise 1.2 word sense disambiguation
- Exercise 2 FrameNet Disambiguation
- Exercise 3 Shallow Summarization
- Exercise 4.1 Semantic Similarity
- Exercise 4.2 Sense Identification
Currently, we cover the following topics:
- Exercise 1.1 Aggregate Concepts
- Exercise 1.2 Results of aggregation
- Exercise 1.3 Word sense induction
- Exercise 1.4 Hanks theory
- Exercise 1.5 Content to form
- Exercise 2.1 Text Segmentation
- Exercise 2.2 Topic Modeling
- Exercise 3.2 RNN for Sentiment Analysis & Character Generation
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.