Skip to content

Latest commit

 

History

History
44 lines (35 loc) · 1.37 KB

README.md

File metadata and controls

44 lines (35 loc) · 1.37 KB

Setup transformers following instructions in README.md, (I would fork first).

git clone [email protected]:huggingface/transformers.git
cd transformers
pip install -e .
pip install pandas

Get required metadata

curl https://cdn-datasets.huggingface.co/language_codes/language-codes-3b2.csv  > language-codes-3b2.csv
curl https://cdn-datasets.huggingface.co/language_codes/iso-639-3.csv > iso-639-3.csv

Install Tatoeba-Challenge repo inside transformers

git clone [email protected]:Helsinki-NLP/Tatoeba-Challenge.git

To convert a few models, call the conversion script from command line:

python src/transformers/convert_marian_tatoeba_to_pytorch.py --models heb-eng eng-heb --save_dir converted

To convert lots of models you can pass your list of Tatoeba model names to resolver.convert_models in a python client or script.

from transformers.convert_marian_tatoeba_to_pytorch import TatoebaConverter
resolver = TatoebaConverter(save_dir='converted')
resolver.convert_models(['heb-eng', 'eng-heb'])

Upload converted models

cd converted
transformers-cli login
for FILE in *; do transformers-cli upload $FILE; done

Modifications

  • To change naming logic, change the code near os.rename. The model card creation code may also need to change.
  • To change model card content, you must modify TatoebaCodeResolver.write_model_card