This anonymous repository contains code for the core models and description for the data used in LUNA: Language Understanding with Number Augmentations on Transformers via Number Plugins and Pre-training.
The folders number_encoder and number_tokenizer contain code for NumBed and NumTok (Sec. 3.1), respectively.
The folder phases/numerical_field contains codes for number pre-training (Sec. 3.2).
The folder phases/single_number contains codes for the toy task.
The folder phases/downstream_tasks contains codes for the downstream tasks (Sec. 4), including TAT-QA, TabFact, and CrediTrans.
The folder phases/empirical_study contains codes for the empirical studies (Sec. E in appendix), including visualization of attention maps and embeddings from different transformer layers, please rename attention.ipy123nb
to attention.ipynb
unzip the data.zip in supplementary and put the data
dir in this repo as LUNA/data
.
To build docker image, run:
docker build -t luna:1.0 .
To launch the docker container, run:
docker run --rm -it --shm-size=8g luna:1.0 /bin/bash
To run each experiment, see the README
document in each directory under phases (as mentioned above).