Skip to content

morrisalp/unikud

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UNIKUD: Hebrew nikud with transformers

If you are accessing this repo via GitHub, please see the project page on DAGSHub for data files, pipelines and more.

Requirements

First install:

  • Conda
  • Rust compiler:
    • curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    • Reopen shell or run source $HOME/.cargo/env

Then create and activate the UNIKUD environment with:

  • conda env create -f environment.yml
  • conda activate unikud

You may then download the required data files using DVC:

  • dvc remote add origin https://dagshub.com/morrisalp/unikud.dvc
  • dvc pull -r origin

Data

Sources of data:

To preprocess data, run:

Training

To reproduce the training pipeline, perform the following steps:

  1. Preprocess data:
  • dvc repro preprocessing
  1. Train ktiv male model:
  • dvc repro train-ktiv-male

Training steps will automatically log to MLflow (via the Huggingface Trainer object) if the following environment variables are set: MLFLOW_TRACKING_URI, MLFLOW_TRACKING_USERNAME, MLFLOW_TRACKING_PASSWORD.

About

Hebrew nikud with transfomers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages