Skip to content

zheyuye/ALBERT

 
 

Repository files navigation

ALBERT

Cloned from Google ALBERT which only suitable for CPU, single GPU and TPU. We consider the favour of bert-multi-gpu and did slight modificartion to support Multi-GPU fine-tuning on AWS P3.16xlarge.

Modification Details

  1. reset the the Estimator and EstimatorSpec cause the oringinal one could ony suitable for single training device.

  2. Adapt MirroredStrategy into training progress as here. Notice: the input data is batched by the global batch size, whereas the batch size setting in the parameters of FLAGS are local batch size.

  3. Transform the optimizer including AdamW and Lamb in custom_optimization.py

  4. NVIDIA Collective Communications Library (NCCL) are required for reduce options as here

Data and Evalution scripts

SQuAD

GLUE

simply usepython3 download_glue_data.py to download ALL GLUE TASKS

Simply Fine-tuning

  1. simply load and save the pre-trained model by running the bash file download_pretrained_models.sh

  2. Use multi_run_albert_glue.sh to fine-tune ALBERT on GLUE and multi_run_albert_squad.sh to fine-tune on SQuAD

  3. DON'T forget to set up file path inside bash file.

    export TASK=CoLA
    export ALBERT_DIR=base
    export VERSION=2
    export CURRENT_PWD=/home/ubuntu
    
    export GLUE_DIR=${CURRENT_PWD}/glue_data
    export OUTPUT_DIR=${CURRENT_PWD}/albert_output/${TASK}_${ALBERT_DIR}_v${VERSION}
    
    export BS=8
    export MSL=128
    export LR=5e-06
    export WPSP=320
    export TSP=5336
    
    pip3 install numpy
    pip3 install -r requirements.txt
    
    sudo CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
        python3 -m albert.run_multigpus_classifier \
        --do_train=True \
        --do_eval=True \
        --strategy_type=mirror \
        --num_gpu_cores=2 \
        --data_dir=${GLUE_DIR} \
        --cached_dir=${CURRENT_PWD}/cached_albert_tfrecord \
        --task_name=${TASK} \
        --output_dir=${OUTPUT_DIR} \
        --max_seq_length=${MSL} \
        --train_step=${TSP} \
        --warmup_step=${WPSP} \
        --train_batch_size=${BS} \
        --learning_rate=${LR} \
        --albert_config_file=${CURRENT_PWD}/pretrained_model/albert_${ALBERT_DIR}_v${VERSION}/albert_config.json \
        --init_checkpoint=${CURRENT_PWD}/pretrained_model/albert_${ALBERT_DIR}_v${VERSION}/model.ckpt-best \
        --vocab_file=./30k-clean.vocab \
        --spm_model_file=./30k-clean.model \

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.0%
  • Shell 2.0%