Confidence-based pruning for minimum Bayes-risk decoding

This repository reproduces the experiments and figures from Faster Minimum Bayes Risk Decoding with Confidence-based Pruning, by Julius Cheng and Andreas Vlachos, which won Best Short Paper at EMNLP 2023 😊.

This codebase is updated from the original code used for the paper to use the Hugging Face ecosystem for improved reproducibility. The main difference is that the paper uses translation models trained from scratch, while this repo uses widely-used pretrained models from Facebook. Also, the figures generated by this repo will differ slightly from the paper format-wise.

Installation

git clone [email protected]:juliusc/pruning_mbr.git
cd pruning_mbr
pip install .

Usage

Downloading models and datasets from Hugging Face requires a user access token. Set your access token as follows:

export HUGGING_FACE_HUB_TOKEN=<your token>

Learn more about access tokens here.

This set of instructions will run the experiments for one language pair and metric.

OUTPUT_DIR=output # Set an output directory of your choice.
LANGUAGE_PAIR=deen
METRIC=comet

cd pruning_mbr/experiments
# Generate the prerequisites for the experiments: hypotheses, pseudo-references,
# utility matrices, and evaluation scores.
python generate.py $OUTPUT_DIR $LANGUAGE_PAIR --metrics=$METRIC

# Generate statistics and plot for Figure 1.
python get_false_pruning_rates.py $OUTPUT_DIR $METRIC

# Generate Figure 2.
python get_decoding_stats.py $OUTPUT_DIR validation $METRIC
python plot_pruning_comparison.py \
    $OUTPUT_DIR/decoding_stats.validation.comet.csv \
    $OUTPUT_DIR/pruning_function_comparison.png

# Generate statistics for Table 1.
python get_decoding_stats.py $OUTPUT_DIR test $METRIC

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
pruning_mbr		pruning_mbr
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Confidence-based pruning for minimum Bayes-risk decoding

Installation

Usage

About

Releases

Packages

Languages

License

juliusc/pruning_mbr

Folders and files

Latest commit

History

Repository files navigation

Confidence-based pruning for minimum Bayes-risk decoding

Installation

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages