Enhanced Perceptron Framework

This repository includes a Structured Learning Framework for Natural Language Processing problems. It comprises the implementation of the Framework originally designed during the PhD research.

What is structured learning?

Structured learning consists in learning a mapping from inputs to structured outputs by means of a sample of correct input-output pairs. Many important problems fit in this setting. For instance, dependency parsing involves the recognition of a tree underlying a sentence.

What is structured perceptron?

Structured perceptron is a training algorithm for structured problems that is a generalization of the binary perceptron. It learns the parameters of a linear discriminant function that, given an input, discriminates the correct output structure from the alternative ones by means of a task-specific optimization problem (an inference algorithm).

What is included in this framework?

There are implementations of different variations of the structured perceptron algorithm, namely:

vanilla,
large margin, and
dual (kernelized).

The are also some instantiations of the framework for different tasks:

sequence labeling (POS tagging, text chunking, etc.),
dependency parsing,
quotation extraction, and
coreference resolution.

How to execute an instance of the framework?

The common entry point for all instantiations is the command

java br.pucrio.inf.learn.structlearning.discriminative.driver.Driver

When executed, this command shows a list available sub-commands. For instance, you have the sub-command TrainDP to train and evaluate dependency parsing models. You can execute each sub-command without arguments to access their list of options.

Framework Overview

In the following, we highlight the most important aspects of this framework, including the hot spots.

Main classes

In the diagram below, we present the class Perceptron, which takes care of the training loop, along with the most important interfaces, which correspond to the hot spots of this framework.

The main responsabilities of each class and interface are:

Perceptron
This class implements the vanilla Structured Perceptron training algorithm. It also serves as base class for other variants of the Structured Perceptron (large margin and dual, for instance). This implementation should works for any instantiation of the framework.
Model
The class that implements this interface must store the model parameters and implement some methods. The most important method is:
```
double update(ExampleInput in, ExampleOutput out, ExampleOutput pred, double lr);
```
This method gets an input in, its correct output out, the current prediction pred, the learning rate lr and must update the model parameters according to the difference between out and pred.
Inference
This interface represents the inference algorithm, i.e., the algorithm that predicts an output structure for a given input and model. The inference method is:
```
void inference(Model model, ExampleInput input, ExampleOutput output);
```
Dataset
This interface must be implemented to provide training examples, usualy by reading them from a file.
ExampleInput
This interface represents the input features of an example.
ExampleOutput
This interface represents the output structure of an example.

Training loop

In the following sequence diagram, we present an overview of the operations within a typical training loop.

The client code (the training script) must create the Dataset object and obtain the list of inputs and outputs. It then passes those to the training algorithm (a Perceptron object, for example) that will execute the training loop.

In a typical iteration of the training loop, the algorithm will randomly select a training example (the pair in, out), call the inference(...) method of the Inference object to obtain a predicted structure pred, and finally update the model weights considering the difference between the correct structure out and the predicted one pred.

Framework Instantiations

If you want to learn about a specific instantiation of the framework, you can have a look at the sub-packages of the package application. And, in the package driver you can find the training scripts for each instantiation.

Acknowledgement

This implementation has been and is conducted through the help of several collaborators. I would like to thank those who have given many relevant advices and ideas during the development of this project.

Name		Name	Last commit message	Last commit date
Latest commit History 208 Commits
src		src
README-dev.md		README-dev.md
README.md		README.md
comando.sh		comando.sh
structlearning.xml		structlearning.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhanced Perceptron Framework

What is structured learning?

What is structured perceptron?

What is included in this framework?

How to execute an instance of the framework?

Framework Overview

Main classes

Training loop

Framework Instantiations

Acknowledgement

About

Releases

Packages

Languages

deeperlush/Enhanced-Perceptron-Framework

Folders and files

Latest commit

History

Repository files navigation

Enhanced Perceptron Framework

What is structured learning?

What is structured perceptron?

What is included in this framework?

How to execute an instance of the framework?

Framework Overview

Main classes

Training loop

Framework Instantiations

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages