Description

This anonymous repository contains code for the core models and description for the data used in LUNA: Language Understanding with Number Augmentations on Transformers via Number Plugins and Pre-training.

The folders number_encoder and number_tokenizer contain code for NumBed and NumTok (Sec. 3.1), respectively.

The folder phases/numerical_field contains codes for number pre-training (Sec. 3.2).

The folder phases/single_number contains codes for the toy task.

The folder phases/downstream_tasks contains codes for the downstream tasks (Sec. 4), including TAT-QA, TabFact, and CrediTrans.

The folder phases/empirical_study contains codes for the empirical studies (Sec. E in appendix), including visualization of attention maps and embeddings from different transformer layers, please rename attention.ipy123nb to attention.ipynb

Preparation

unzip the data.zip in supplementary and put the data dir in this repo as LUNA/data.

To build docker image, run:

docker build -t luna:1.0 .

To launch the docker container, run:

docker run --rm -it --shm-size=8g luna:1.0 /bin/bash

To run each experiment, see the README document in each directory under phases (as mentioned above).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
number_encoder		number_encoder
number_tokenizer		number_tokenizer
phases		phases
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Preparation

About

Languages

License

zmy/LUNA

Folders and files

Latest commit

History

Repository files navigation

Description

Preparation

About

Resources

License

Stars

Watchers

Forks

Languages