This repository reproduces the experiments and figures from Faster Minimum Bayes Risk Decoding with Confidence-based Pruning, by Julius Cheng and Andreas Vlachos, which won Best Short Paper at EMNLP 2023 😊.
This codebase is updated from the original code used for the paper to use the Hugging Face ecosystem for improved reproducibility. The main difference is that the paper uses translation models trained from scratch, while this repo uses widely-used pretrained models from Facebook. Also, the figures generated by this repo will differ slightly from the paper format-wise.
git clone [email protected]:juliusc/pruning_mbr.git
cd pruning_mbr
pip install .
Downloading models and datasets from Hugging Face requires a user access token. Set your access token as follows:
export HUGGING_FACE_HUB_TOKEN=<your token>
Learn more about access tokens here.
This set of instructions will run the experiments for one language pair and metric.
OUTPUT_DIR=output # Set an output directory of your choice.
LANGUAGE_PAIR=deen
METRIC=comet
cd pruning_mbr/experiments
# Generate the prerequisites for the experiments: hypotheses, pseudo-references,
# utility matrices, and evaluation scores.
python generate.py $OUTPUT_DIR $LANGUAGE_PAIR --metrics=$METRIC
# Generate statistics and plot for Figure 1.
python get_false_pruning_rates.py $OUTPUT_DIR $METRIC
# Generate Figure 2.
python get_decoding_stats.py $OUTPUT_DIR validation $METRIC
python plot_pruning_comparison.py \
$OUTPUT_DIR/decoding_stats.validation.comet.csv \
$OUTPUT_DIR/pruning_function_comparison.png
# Generate statistics for Table 1.
python get_decoding_stats.py $OUTPUT_DIR test $METRIC