Skip to content

Latest commit

 

History

History
 
 

unsupervised_quality_estimation

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)

This page includes instructions for reproducing results from the paper Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)

Requirements:

Download Models and Test Data

Download translation models and test data from MLQE dataset repository.

Set up:

Given a testset consisting of source sentences and reference translations:

  • SRC_LANG: source language
  • TGT_LANG: target language
  • INPUT: input prefix, such that the file $INPUT.$SRC_LANG contains source sentences and $INPUT.$TGT_LANG contains the reference sentences
  • OUTPUT_DIR: output path to store results
  • MOSES_DECODER: path to mosesdecoder installation
  • BPE_ROOT: path to subword-nmt installation
  • BPE: path to BPE model
  • MODEL_DIR: directory containing the NMT model .pt file as well as the source and target vocabularies.
  • TMP: directory for intermediate temporary files
  • GPU: if translating with GPU, id of the GPU to use for inference
  • DROPOUT_N: number of stochastic forward passes

$DROPOUT_N is set to 30 in the experiments reported in the paper. However, we observed that increasing it beyond 10 does not bring substantial improvements.

Translate the data using standard decoding

Preprocess the input data:

for LANG in $SRC_LANG $TGT_LANG; do
  perl $MOSES_DECODER/scripts/tokenizer/tokenizer.perl -threads 80 -a -l $LANG < $INPUT.$LANG > $TMP/preprocessed.tok.$LANG
  python $BPE_ROOT/apply_bpe.py -c ${BPE} < $TMP/preprocessed.tok.$LANG > $TMP/preprocessed.tok.bpe.$LANG
done

Binarize the data for faster translation:

fairseq-preprocess --srcdict $MODEL_DIR/dict.$SRC_LANG.txt --tgtdict $MODEL_DIR/dict.$TGT_LANG.txt
--source-lang ${SRC_LANG} --target-lang ${TGT_LANG} --testpref $TMP/preprocessed.tok.bpe --destdir $TMP/bin --workers 4

Translate

CUDA_VISIBLE_DEVICES=$GPU fairseq-generate $TMP/bin --path ${MODEL_DIR}/${SRC_LANG}-${TGT_LANG}.pt --beam 5
--source-lang $SRC_LANG --target-lang $TGT_LANG --no-progress-bar --unkpen 5 > $TMP/fairseq.out
grep ^H $TMP/fairseq.out | cut -d- -f2- | sort -n | cut -f3- > $TMP/mt.out

Post-process

sed -r 's/(@@ )| (@@ ?$)//g' < $TMP/mt.out | perl $MOSES_DECODER/scripts/tokenizer/detokenizer.perl
-l $TGT_LANG > $OUTPUT_DIR/mt.out

Produce uncertainty estimates

Scoring

Make temporary files to store the translations repeated N times.

python ${SCRIPTS}/scripts/uncertainty/repeat_lines.py -i $TMP/preprocessed.tok.bpe.$SRC_LANG -n $DROPOUT_N
-o $TMP/repeated.$SRC_LANG
python ${SCRIPTS}/scripts/uncertainty/repeat_lines.py -i $TMP/mt.out -n $DROPOUT_N -o $TMP/repeated.$TGT_LANG

fairseq-preprocess --srcdict ${MODEL_DIR}/dict.${SRC_LANG}.txt $TGT_DIC --source-lang ${SRC_LANG}
--target-lang ${TGT_LANG} --testpref ${TMP}/repeated --destdir ${TMP}/bin-repeated

Produce model scores for the generated translations using --retain-dropout option to apply dropout at inference time:

CUDA_VISIBLE_DEVICES=${GPU} fairseq-generate ${TMP}/bin-repeated --path ${MODEL_DIR}/${LP}.pt --beam 5
 --source-lang $SRC_LANG --target-lang $TGT_LANG --no-progress-bar --unkpen 5 --score-reference --retain-dropout
 --retain-dropout-modules '["TransformerModel","TransformerEncoder","TransformerDecoder","TransformerEncoderLayer"]'
 TransformerDecoderLayer --seed 46 > $TMP/dropout.scoring.out

grep ^H $TMP/dropout.scoring.out | cut -d- -f2- | sort -n | cut -f2 > $TMP/dropout.scores

Use --retain-dropout-modules to specify the modules. By default, dropout is applied in the same places as for training.

Compute the mean of the resulting output distribution:

python $SCRIPTS/scripts/uncertainty/aggregate_scores.py -i $TMP/dropout.scores -o $OUTPUT_DIR/dropout.scores.mean
-n $DROPOUT_N

Generation

Produce multiple translation hypotheses for the same source using --retain-dropout option:

CUDA_VISIBLE_DEVICES=${GPU} fairseq-generate ${TMP}/bin-repeated --path ${MODEL_DIR}/${LP}.pt
 --beam 5 --source-lang $SRC_LANG --target-lang $TGT_LANG --no-progress-bar --retain-dropout
 --unkpen 5 --retain-dropout-modules TransformerModel TransformerEncoder TransformerDecoder
TransformerEncoderLayer TransformerDecoderLayer --seed 46 > $TMP/dropout.generation.out

grep ^H $TMP/dropout.generation.out | cut -d- -f2- | sort -n | cut -f3- > $TMP/dropout.hypotheses_

sed -r 's/(@@ )| (@@ ?$)//g' < $TMP/dropout.hypotheses_ | perl $MOSES_DECODER/scripts/tokenizer/detokenizer.perl
-l $TGT_LANG > $TMP/dropout.hypotheses

Compute similarity between multiple hypotheses corresponding to the same source sentence using Meteor evaluation metric:

python meteor.py -i $TMP/dropout.hypotheses -m <path_to_meteor_installation> -n $DROPOUT_N -o
$OUTPUT_DIR/dropout.gen.sim.meteor