LSTM with Attention classifier models for Kaggle Quora Insincere Questions Classification competition dataset. See here for a definition of sincere and insincere.
Disclaimer: none of this code will win you the competition, probably ;-)
attention-sandwich-model/attention-sandwich-model.ipynb --- A number of the public kernels use the same code for creating a drop-on attention layer that sits nicely on top of the two LSTMs that make up the recurrent part of the model and does improve the performance of the LSTMs on this NLP classification task.
I wanted to dig around in the guts of how to implement an attention mechanism for NLP classification. In this notebook, I implement an attention algorithm that sits between two LSTMs (or more, if we decide to continue stacking, why not) to create an attention sandwich, and serves to make the hidden states of the first LSTM for each timestep accessible to each timestep of the second LSTM, based on attention weights (which are also learned). This has the advantage that for each point in its sequence, the second LSTM (which we'll call, for no particular reason, LSTM_q) is able to look at a much wider input (context) than just it's own hidden state and the current output of the previous LSTM (which we'll call LSTM_p), and means that the two LSTMs aren't necessarily aligned (i.e., they don't need to have the same number of timesteps).
The resulting model is quite slow to train but without ensembling it does get better results than the alternative Attention class in the public kernels, whether this is simply due to having more units in my LSTMs, I'm not sure, but I think it's worth persisting with.
An attempt to improve performance by running data augmentation on a fit generator. The generator makes noisy copies of each batch of embedded word vectors. Problems - the augmentation algorithm is sloooow, and needs optimising. Some suggestions that this improves performance a bit but currently too costly to run on the kaggle kernel within the time limit. Work in progress.
- (optional) create yourself a new, clean Anaconda environment with something like
conda create --name kaggle --clone myenv
or where myenv is the environment you use for deep learning (i.e. tensorflow, keras, GPU enabled if you have one) - clone this repo into an appropriate place using
git clone [email protected]:nicksexton/quora-insincere-questions
cd quora-insincere-questions
- (if you don't have it already) install kaggle CLI with
pip install kaggle
. you'll then need to generate and copy/paste a kaggle access token into a config file, this is documented on the instructions for the kaggle cli api. - download the dataset by running
kaggle competitions download quora-insincere-questions-classification
. Note that this will trigger a very large (~8GiB?) download including four word embeddings files so you could maybe tweak this (see kaggle cli api. - unzip the embeddings files
- the directory structure should such that input directory sits alongside the kernel directories, i.e.:
quora-insincere-questions/input/embeddings/...
andquora-insincere-questions/attention-sandwich-model/...
etc.