This anonymous repository contains code for the core models and description for the data used in LUNA: Language Understanding with Number Augmentations on Transformers via Number Plugins and Pre-training.
The folders number_encoder and number_tokenizer contain code for NumBed and NumTok (Sec. 3.1), respectively.
The folder phases/numerical_field contains codes for number pre-training (Sec. 3.2).
The folder phases/single_number contains codes for the toy task.
The folder phases/downstream_tasks contains codes for the downstream tasks (Sec. 4), including TAT-QA, TabFact, and CrediTrans.
The folder phases/empirical_study contains codes for the empirical studies (Sec. E in appendix), including visualization of attention maps and embeddings from different transformer layers, please rename attention.ipy123nb
to attention.ipynb
unzip the in supplementary and put the data
dir in this repo as LUNA/data
To build docker image, run:
docker build -t luna:1.0 .
To launch the docker container, run:
docker run --rm -it --shm-size=8g luna:1.0 /bin/bash
To run each experiment, see the README
document in each directory under phases (as mentioned above).