Skip to content

Commit

Permalink
\nALBERT move\n
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 284252608
  • Loading branch information
0x0539 committed Dec 7, 2019
1 parent 9589348 commit 6ca4dfd
Show file tree
Hide file tree
Showing 20 changed files with 9,049 additions and 0 deletions.
116 changes: 116 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# Initially taken from Github's Python gitignore file

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/
156 changes: 156 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
ALBERT
======

***************New October 31, 2019 ***************

Version 2 of ALBERT models is relased. TF-Hub modules are available:

- https://tfhub.dev/google/albert_base/2
- https://tfhub.dev/google/albert_large/2
- https://tfhub.dev/google/albert_xlarge/2
- https://tfhub.dev/google/albert_xxlarge/2

In this version, we apply 'no dropout', 'additional training data' and 'long training time' strategies to all models. We train ALBERT-base for 10M steps and other models for 3M steps.

The result comparsion to the v1 models are as followings:

| | Average | SQuAD1.1 | SQuAD2.0 | MNLI | SST-2 | RACE |
|----------------|----------|----------|----------|----------|----------|----------|
|V2 |
|ALBERT-base |82.3 |90.2/83.2 |82.1/79.3 |84.6 |92.9 |66.8 |
|ALBERT-large |85.7 |91.8/85.2 |84.9/81.8 |86.5 |94.9 |75.2 |
|ALBERT-xlarge |87.9 |92.9/86.4 |87.9/84.1 |87.9 |95.4 |80.7 |
|ALBERT-xxlarge |90.9 |94.6/89.1 |89.8/86.9 |90.6 |96.8 |86.8 |
|V1 |
|ALBERT-base |80.1 |89.3/82.3 | 80.0/77.1|81.6 |90.3 | 64.0 |
|ALBERT-large |82.4 |90.6/83.9 | 82.3/79.4|83.5 |91.7 | 68.5 |
|ALBERT-xlarge |85.5 |92.5/86.1 | 86.1/83.1|86.4 |92.4 | 74.8 |
|ALBERT-xxlarge |91.0 |94.8/89.3 | 90.2/87.4|90.8 |96.9 | 86.5 |

The comparison shows that for ALBERT-base, ALBERT-large, and ALBERT-xlarge, v2 is much better than v1, indicating the importance of applying the above three strategies. On average, ALBERT-xxlarge is slightly worse than the v1, because of the following two reasons: 1) Training additional 1.5 M steps (the only difference between these two models are training for 1.5M steps and 3M steps) did not lead to significant performance improvement. 2) For v1, we did a little bit hyperparameter search among the parameters sets given by BERT, Roberta, and XLnet. For v2, we simply adopt the parameters from v1 except for RACE, where we use a learning rate of 1e-5 and 0 [ALBERT DR](https://arxiv.org/pdf/1909.11942.pdf) (droput rate for ALBERT in finetuning). The original (v1) RACE hyperpamter will cause model divergence for v2 models. Given that the downstream tasks are sensitive to the fine-tuning hyperparameters, we should be careful about so called slight improvements.



ALBERT is "A Lite" version of BERT, a popular unsupervised language
representation learning algorithm. ALBERT uses parameter-reduction techniques
that allow for large-scale configurations, overcome previous memory limitations,
and achieve better behavior with respect to model degradation.

For a technical description of the algorithm, see our paper:

[ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942)

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut

Release Notes
=============

- Initial release: 10/9/2019

Results
=======

Performance of ALBERT on GLUE benchmark results using a single-model setup on
dev:

| Models | MNLI | QNLI | QQP | RTE | SST | MRPC | CoLA | STS |
|-------------------|----------|----------|----------|----------|----------|----------|----------|----------|
| BERT-large | 86.6 | 92.3 | 91.3 | 70.4 | 93.2 | 88.0 | 60.6 | 90.0 |
| XLNet-large | 89.8 | 93.9 | 91.8 | 83.8 | 95.6 | 89.2 | 63.6 | 91.8 |
| RoBERTa-large | 90.2 | 94.7 | **92.2** | 86.6 | 96.4 | **90.9** | 68.0 | 92.4 |
| ALBERT (1M) | 90.4 | 95.2 | 92.0 | 88.1 | 96.8 | 90.2 | 68.7 | 92.7 |
| ALBERT (1.5M) | **90.8** | **95.3** | **92.2** | **89.2** | **96.9** | **90.9** | **71.4** | **93.0** |

Performance of ALBERT-xxl on SQuaD and RACE benchmarks using a single-model
setup:

|Models | SQuAD1.1 dev | SQuAD2.0 dev | SQuAD2.0 test | RACE test (Middle/High) |
|--------------------------|---------------|---------------|---------------|-------------------------|
|BERT-large | 90.9/84.1 | 81.8/79.0 | 89.1/86.3 | 72.0 (76.6/70.1) |
|XLNet | 94.5/89.0 | 88.8/86.1 | 89.1/86.3 | 81.8 (85.5/80.2) |
|RoBERTa | 94.6/88.9 | 89.4/86.5 | 89.8/86.8 | 83.2 (86.5/81.3) |
|UPM | - | - | 89.9/87.2 | - |
|XLNet + SG-Net Verifier++ | - | - | 90.1/87.2 | - |
|ALBERT (1M) | 94.8/89.2 | 89.9/87.2 | - | 86.0 (88.2/85.1) |
|ALBERT (1.5M) | **94.8/89.3** | **90.2/87.4** | **90.9/88.1** | **86.5 (89.0/85.5)** |


Pre-trained Models
==================
TF-Hub modules are available:

- https://tfhub.dev/google/albert_base/1
- https://tfhub.dev/google/albert_large/1
- https://tfhub.dev/google/albert_xlarge/1
- https://tfhub.dev/google/albert_xxlarge/1

Example usage of the TF-Hub module:

```
tags = set()
if is_training:
tags.add("train")
albert_module = hub.Module("https://tfhub.dev/google/albert_base/1", tags=tags,
trainable=True)
albert_inputs = dict(
input_ids=input_ids,
input_mask=input_mask,
segment_ids=segment_ids)
albert_outputs = albert_module(
inputs=albert_inputs,
signature="tokens",
as_dict=True)
# If you want to use the token-level output, use
# albert_outputs["sequence_output"] instead.
output_layer = albert_outputs["pooled_output"]
```

For a full example, see `run_classifier_with_tfhub.py`.

Pre-training Instructions
=========================
Use `run_pretraining.py` to pretrain ALBERT:

```
pip install -r albert/requirements.txt
python -m albert.run_pretraining \
--output_dir="${OUTPUT_DIR}" \
--export_dir="${EXPORT_DIR}" \
--do_train \
--do_eval \
<additional flags>
```

Fine-tuning Instructions
========================
For XNLI, COLA, MNLI, and MRPC, use `run_classifier_sp.py`:

```
pip install -r albert/requirements.txt
python -m albert.run_classifier_sp \
--task_name=MNLI \
<additional flags>
```

You should see some output like this:

```
***** Eval results *****
global_step = ...
loss = ...
masked_lm_accuracy = ...
masked_lm_loss = ...
sentence_order_accuracy = ...
sentence_order_loss = ...
```

You can also fine-tune the model starting from TF-Hub modules using
`run_classifier_with_tfhub.py`:

```
pip install -r albert/requirements.txt
python -m albert.run_classifier_with_tfhub \
--albert_hub_module_handle=https://tfhub.dev/google/albert_base/1 \
<additional flags>
```
14 changes: 14 additions & 0 deletions __init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# coding=utf-8
# Copyright 2018 The Google AI Team Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Loading

0 comments on commit 6ca4dfd

Please sign in to comment.