Implementation of Lattice Recurrent Unit as described in this paper. I encourage you to check out the website to get an overview of the results and observations described in the paper.
The code has been written in PyTorch and has two key components:
- Language Model: Given a bunch of sentences, the model learns to predict the next character (word or more generally, token) conditioned on all the characters until the current time step. (Check
class langModel
insrc/model.py
) - Lattice: A Lattice Network (unlike LSTM and GRU) supports distinct outputs along depth and time. Hence we implemented
class Lattice
(insrc/model.py
) which supports recurrent units with 2 different outputs. Batches with multiple length sequences are allowed if they are converted totorch.nn.utils.rnn.PackedSequence
(*). In addition,Lattice
also supports multiple layers within an RNN cell (class LRUxCell
andclass HIGHWAYxCell
insrc/model.py
)
(*)-Currently only works for batch_first=True
-- I will make it batch_first
independent as soon as I get time
This code is written in Python 3.6 and requires PyTorch (>=v0.2). I would suggest using the anaconda environment provided in env/
as this would save the hassle of installing all the requirements.
conda env create -n <env-name> -f env/torch-0.2.0-cuda80-py36-pandas.yaml
To active the environment run
source activate <env-name>
Alternatively, if you would like to save space or be adventurous, you could cherry-pick and install the missing requirements.
All the source files are in a sub-directory src
cd src
I would suggest storing every dataset in a directory of its own as the code creates multiple meta-data files essential to the training process.
A language model using various RNN units can be trained using char.py
.
python char.py -data <path-to-data> \
-model <model-name> \
-rnn_size <rnn-size> \
-num_layers <num-layers> \
-num_unrolling <backprob-time>
-save_dir <path-to-results> \
-cpk <checkpoint-name>
The models supported in this implementation are lru
, rglru
, pslru
, gru
, lstm
, highway
, and glstm
.
For example the training scripts can be called with the following command
python char.py -data ../dataset/ptb/ptb.txt -model lru -rnn_size 500 -num_layers 2 -num_unrolling 50 -save_dir save/ptb/lru -cpk m
It would be useful to check out all the arguments of the training code by running
python char.py -h
Weights for the best model (based on validation loss) are saved as a pickle file at the end of training (or after every epoch if -greedy_save 1
is used). These weights are used to initialize a language model from which characters can be sampled.
python generate.py -load <path-to-weights> -num_sample 1000
Note: Weights are stored in -save_dir
with a suffix of _weights.p
Using -cuda <gpu-id>
for char.py
and generate.py
gives the option of choosing a device on a multi-gpu machine. If you wish to run train the model on a cpu, use <gpu-id> = -1
.
Note: Currently multi-gpu training is not supported.
- A nice gist of the LRU Cell in Tensorflow by @simonnanty.