Name	Name	Last commit message	Last commit date
parent directory ..
config	config
AUTHORS	AUTHORS
README.md	README.md
__init__.py	__init__.py
distill_util.py	distill_util.py
lamb_optimizer.py	lamb_optimizer.py
modeling.py	modeling.py
modeling_test.py	modeling_test.py
optimization.py	optimization.py
optimization_test.py	optimization_test.py
requirements.txt	requirements.txt
run_classifier.py	run_classifier.py
run_pretraining.py	run_pretraining.py
run_squad.py	run_squad.py
tokenization.py	tokenization.py

MobileBERT

This directory contains code for MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices. MobileBERT is a thin version of BERT_LARGE, while equipped with bottleneck structures and a carefully designed balance between self-attentions and feed-forward networks.

Pre-trained checkpoints

Download compressed files for pre-trained weights.

MobileBert Optimized Uncased English: uncased_L-24_H-128_B-512_A-4_F-4_OPT

Finetune with MobileBERT

pip install -r requirements.txt
export PYTHONPATH=$PYTHONPATH:/path/to/mobilebert

SQuAD

Finetuning on SQuAD V1.1 and V2.0, we use the same script. The example script finetunes the MobileBERT on SQuAD V1.1 dataset with TPU-v3-8.

export DATA_DIR=/tmp/mobilebert/data_cache/
export INIT_CHECKPOINT=/path/to/checkpoint/
export OUTPUT_DIR=/tmp/mobilebert/experiment/
python3 run_squad.py \
  --bert_config_file=config/uncased_L-24_H-128_B-512_A-4_F-4_OPT.json \
  --data_dir=${DATA_DIR} \
  --do_lower_case \
  --do_predict \
  --do_train \
  --doc_stride=128 \
  --init_checkpoint=${INIT_CHECKPOINT}/mobilebert.ckpt \
  --learning_rate=4e-05 \
  --max_answer_length=30 \
  --max_query_length=64 \
  --max_seq_length=384 \
  --n_best_size=20 \
  --num_train_epochs=5 \
  --output_dir=${OUTPUT_DIR} \
  --predict_file=/path/to/squad/dev-v1.1.json \
  --train_batch_size=32 \
  --train_file=/path/to/squad/train-v1.1.json \
  --use_tpu \
  --tpu_name=${TPU_NAME} \
  --vocab_file=${INIT_CHECKPOINT}/vocab.txt \
  --warmup_proportion=0.1

Export MobileBERT to TF-Lite format.

export EXPORT_DIR='path/to/tflite'
python3 run_squad.py \
  --use_post_quantization=true \
  --activation_quantization=false \
  --data_dir=${DATA_DIR}  \
  --output_dir=${OUTPUT_DIR} \
  --vocab_file=${INIT_CHECKPOINT}/vocab.txt \
  --bert_config_file=config/uncased_L-24_H-128_B-512_A-4_F-4_OPT.json \
  --train_file=/path/to/squad/train-v1.1.json \
  --export_dir=${EXPORT_DIR}

MobileBERT Pre-training & Distillation

Data processing

We use the exact identical data processing script to prepare pre-training data as BERT. Please use the BERT data processing script create_pretraining_data.py to obtain tfrecords. Regarding the datasets, please see https://github.com/google-research/bert#pre-training-data.

Distillation

We conducted distillation process on the pre-training data with TPU-v3-256.

export TEACHER_CHECKPOINT=/path/to/checkpoint/
export OUTPUT_DIR=/tmp/mobilebert/experiment/
python3 run_pretraining.py \
  --attention_distill_factor=1 \
  --bert_config_file=config/uncased_L-24_H-128_B-512_A-4_F-4_OPT.json \
  --bert_teacher_config_file=config/uncased_L-24_H-1024_B-512_A-4.json \
  --beta_distill_factor=5000 \
  --distill_ground_truth_ratio=0.5 \
  --distill_temperature=1 \
  --do_train \
  --first_input_file=/path/to/pretraining_data \
  --first_max_seq_length=128 \
  --first_num_train_steps=0 \
  --first_train_batch_size=4096 \
  --gamma_distill_factor=5 \
  --hidden_distill_factor=100 \
  --init_checkpoint=${TEACHER_CHECKPOINT} \
  --input_file=path/to/pretraining_data \
  --layer_wise_warmup \
  --learning_rate=0.0015 \
  --max_predictions_per_seq=20 \
  --max_seq_length=512 \
  --num_distill_steps=240000 \
  --num_train_steps=500000 \
  --num_warmup_steps=10000 \
  --optimizer=lamb \
  --output_dir=${OUTPUT_DIR} \
  --save_checkpoints_steps=10000 \
  --train_batch_size=2048 \
  --use_einsum \
  --use_summary \
  --use_tpu \
  --tpu_name=${TPU_NAME} \

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mobilebert

mobilebert

README.md

MobileBERT

Pre-trained checkpoints

Finetune with MobileBERT

SQuAD

Export MobileBERT to TF-Lite format.

MobileBERT Pre-training & Distillation

Data processing

Distillation

Files

mobilebert

Directory actions

More options

Directory actions

More options

Latest commit

History

mobilebert

Folders and files

parent directory

README.md

MobileBERT

Pre-trained checkpoints

Finetune with MobileBERT

SQuAD

Export MobileBERT to TF-Lite format.

MobileBERT Pre-training & Distillation

Data processing

Distillation