This repository contains the code for the experiments in Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding.
The code is provided mostly as is with little effort on refactoring.
git clone [email protected]:CyberAgentAILab/diverse-mbr
cd diverse-mbr
pip install -r requirements.txt
The code runs in two steps.
sample.sh
samples candidates.run_mbr.sh
computes the MBR candidate from the candidates sampled.
./experiments/sample.sh -d [DATASET] -s [NUMBER OF SAMPLES]
./experiments/run_mbr.sh -d [DATASET] -s [NUMBER OF SAMPLES] -a [ALGORITHM]
- Use sacrebleu to prepare the benchmark dataset.
sacrebleu -t wmt19 -l en-de --echo src > ./dataset/wmt19-text/wmt19.en-de.en
sacrebleu -t wmt19 -l en-de --echo ref > ./dataset/wmt19-text/wmt19.en-de.de
- Sample candidates on WMT'19 En-De
./experiments/sample.sh -d wmt19.en-de
- Computing the Diverse MBR output on WMT'19 En-De
./experiments/run_mbr.sh -d wmt19.en-de -a diverse
- Computing the k-Medoid MBR output on WMT'19 En-De
./experiments/run_mbr.sh -d wmt19.en-de -a kmmbr
Bibtex:
@article{jinnai2024generating,
title={Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding},
author={Yuu Jinnai and Ukyo Honda and Tetsuro Morimura and Peinan Zhang},
year={2024},
journal={arXiv preprint arXiv:2401.05054}
}
For any questions, feel free to raise an issue or contact me at [email protected].