Skip to content

Commit

Permalink
Fix multiNLI files
Browse files Browse the repository at this point in the history
  • Loading branch information
kohpangwei committed Mar 26, 2020
1 parent d5dd90b commit ae0a384
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 6 deletions.
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Our code expects the following files/folders in the `[root_dir]/cub` directory:

- `data/waterbird_complete95_forest2water2/`

You can download a tarball of this dataset [here](https://nlp.stanford.edu/data/waterbird_complete95_forest2water2.tar.gz).
You can download a tarball of this dataset [here](https://nlp.stanford.edu/data/dro/waterbird_complete95_forest2water2.tar.gz).

A sample command to run group DRO on Waterbirds is:
`python run_expt.py -s confounder -d CUB -t waterbird_complete95 -c forest2water2 --lr 0.001 --batch_size 128 --weight_decay 0.0001 --model resnet50 --n_epochs 300 --reweight_groups --robust --alpha 0.01 --gamma 0.1 --generalization_adjustment 0`
Expand All @@ -90,9 +90,13 @@ If you'd like to generate variants of this dataset, we have included the script
Our code expects the following files/folders in the `[root_dir]/multinli` directory:

- `data/metadata_random.csv`
- `glue_data/MNLI/`
- `glue_data/MNLI/cached_dev_bert-base-uncased_128_mnli`
- `glue_data/MNLI/cached_dev_bert-base-uncased_128_mnli-mm`
- `glue_data/MNLI/cached_train_bert-base-uncased_128_mnli`

We have included the metadata file in `dataset_metadata/multinli` in this repository. The GLUE data can be downloaded with [this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e); please note that you only need to download MNLI and not the other datasets. The metadata file records whether each example belongs to the train/val/test dataset as well as whether it contains a negation word.
We have included the metadata file in `dataset_metadata/multinli` in this repository. The metadata file records whether each example belongs to the train/val/test dataset as well as whether it contains a negation word.

The `glue_data/MNLI` files are generated by the [huggingface Transformers library](https://github.com/huggingface/transformers) and can be downloaded [here](https://nlp.stanford.edu/data/dro/multinli_bert_features.tar.gz).

A sample command to run group DRO on MultiNLI is:
`python run_expt.py -s confounder -d MultiNLI -t gold_label_random -c sentence2_has_negation --lr 2e-05 --batch_size 32 --weight_decay 0 --model bert --n_epochs 20`
Expand Down
6 changes: 3 additions & 3 deletions data/multinli_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,9 @@ def __init__(self, root_dir,
# Load features
self.features_array = []
for feature_file in [
'cached_train_bert-base-uncased_128_mnli', # Train
'cached_dev_bert-base-uncased_128_mnli', # Val
'cached_dev_bert-base-uncased_128_mnli-mm' # Test
'cached_train_bert-base-uncased_128_mnli',
'cached_dev_bert-base-uncased_128_mnli',
'cached_dev_bert-base-uncased_128_mnli-mm'
]:

features = torch.load(
Expand Down

0 comments on commit ae0a384

Please sign in to comment.