Skip to content

Latest commit

 

History

History

transformers

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Transformers Example with Ignite

In this example, we show how to use Ignite to finetune a transformer model:

  • on 1 or more GPUs or TPUs
  • compute training/validation metrics
  • log learning rate, metrics etc
  • save the best model weights

Configurations:

  • single GPU
  • multi GPUs on a single node
  • TPUs on Colab

Requirements:

Alternatively, install the all requirements using pip install -r requirements.txt.

Usage:

Run the example on a single GPU:

python main.py run

If needed, please, adjust the batch size to your GPU device with --batch_size argument.

The default model is bert-base-uncased incase you need to change that use the --model argument, for details on which models can be used refer here

Example:

#Using DistilBERT which has 40% less parameters than bert-base-uncased
python main.py run --model="distilbert-base-uncased"

For details on accepted arguments:

python main.py run -- --help

Distributed training

Single node, multiple GPUs

Let's start training on a single node with 2 gpus:

# using torch.distributed.launch
python -u -m torch.distributed.launch --nproc_per_node=2 --use_env main.py run --backend="nccl"

or

# using function spawn inside the code
python -u main.py run --backend="nccl" --nproc_per_node=2
Using Horovod as distributed backend

Please, make sure to have Horovod installed before running.

Let's start training on a single node with 2 gpus:

# horovodrun
horovodrun -np=2 python -u main.py run --backend="horovod"

or

# using function spawn inside the code
python -u main.py run --backend="horovod" --nproc_per_node=2

Colab or Kaggle kernels, on 8 TPUs

# setup TPU environment
import os
assert os.environ['COLAB_TPU_ADDR'], 'Make sure to select TPU from Edit > Notebook settings > Hardware accelerator'
VERSION = "nightly"
!curl -q https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
!python pytorch-xla-env-setup.py --version $VERSION > /dev/null
from main import run
run(backend="xla-tpu", nproc_per_node=8)

ClearML fileserver

If ClearML server is used (i.e. --with_clearml argument), the configuration to upload artifact must be done by modifying the ClearML configuration file ~/clearml.conf generated by clearml-init. According to the documentation, the output_uri argument can be configured in sdk.development.default_output_uri to fileserver uri. If server is self-hosted, ClearML fileserver uri is http://localhost:8081.

For more details, see https://allegro.ai/clearml/docs/docs/examples/reporting/artifacts.html