Merge branch 'develop' into develop

LongyanU · Jun 8, 2021 · f7185da · f7185da
2 parents 703a35e + 2079a10
commit f7185da
Show file tree

Hide file tree

Showing 63 changed files with 2,555 additions and 579 deletions.
diff --git a/README.md b/README.md
diff --git a/README_en.md b/README_en.md
@@ -1,59 +1,95 @@
 English | [简体中文](./README.md)
 
 <p align="center">
-  <img src="./docs/imgs/paddlenlp.png" width="520" height ="100" />
+  <img src="./docs/imgs/paddlenlp.png" width="718" height ="100" />
 </p>
 
----------------------------------------------------------------------------------
-
+------------------------------------------------------------------------------------------
 [![PyPI - PaddleNLP Version](https://img.shields.io/pypi/v/paddlenlp.svg?label=pip&logo=PyPI&logoColor=white)](https://pypi.org/project/paddlenlp/)
 [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/paddlenlp)](https://pypi.org/project/paddlenlp/)
 [![PyPI Status](https://pepy.tech/badge/paddlenlp/month)](https://pepy.tech/project/paddlenlp)
+![python version](https://img.shields.io/badge/python-3.6+-orange.svg)
 ![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg)
 ![GitHub](https://img.shields.io/github/license/paddlepaddle/paddlenlp)
 
-## Introduction
+## News  <img src="./docs/imgs/news_icon.png" width="40"/>
 
-PaddleNLP aims to accelerate NLP applications through powerful model zoo, easy-to-use API and high performance distributed training. It's also the NLP best practice for PaddlePaddle 2.0 API system.
+* [2021-06-07] **NLP Live Class** from Baidu has started!🔥🔥🔥 Click [HERE](https://aistudio.baidu.com/aistudio/course/introduce/24177) to join us!
+* [2021-06-04] [ERNIE-Gram](https://arxiv.org/abs/2010.12148) pretrained model has been released! Install v2.0.2 to try it.
+* [2021-05-20] PaddleNLP 2.0 has been officially relealsed! :tada: For more information please refer to [Release Note](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.0.0).
+
+## Introduction
 
-## Features
+**PaddleNLP** is a powerful NLP library with **Awesome** pre-trained Transformer models and easy-to-use interface, supporting wide-range of NLP tasks from research to industrial applications. 
 
-* **Powerful Model Zoo for Rich Senario**
-  - Our Model Zoo covers mainstream NLP applications, including Lexical Analysis, Text Classification, Text Generation, Text Matching, Text Graph, Information Extraction, Machine Translation, General Dialogue and Question Answering etc.
 
-* **Easy-to-Use and End-to-End API**
-  - The API is fully integrated with PaddlePaddle 2.0 high-level API system. It minimizes the number of user actions required for common use cases like data loading, text pre-processing, training and evaluation, which enables you to deal with text problems more productively.
+* **Easy-to-Use API**
+  - The API is fully integrated with PaddlePaddle 2.0 high-level API system. It minimizes the number of user actions required for common use cases like data loading, text pre-processing, awesome transfomer models, and fast inference, which enables developer to deal with text problems more productively.
 
-* **High Performance and Distributed Training**
--  We provide a highly optimized ditributed training implementation for BERT with Fleet API, and mixed precision training strategy based on PaddlePaddle 2.0, it can fully utilize GPU clusters for large-scale model pre-training.
+* **Wide-range NLP Task Support**
+  - PaddleNLP support NLP task from research to industrial applications, including Lexical Analysis, Text Classification, Text Matching, Text Generation, Information Extraction, Machine Translation, General Dialogue and Question Answering etc.
 
+* **High Performance Distributed Training**
+  -  We provide an industrial level training pipeline for super large-scale Transformer model based on **Auto Mixed Precision** and Fleet distributed training API by PaddlePaddle, which can support customized model pre-training efficiently.
 
 ## Installation
 
 ### Prerequisites
 
 * python >= 3.6
-* paddlepaddle >= 2.0.1
+* paddlepaddle >= 2.1
 
-More information about PaddlePaddle installation please refer to [PaddlePaddle Install](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/conda/linux-conda.html)
+More information about PaddlePaddle installation please refer to [PaddlePaddle's Website](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/conda/linux-conda.html)
 
 ### PIP Installation
 
 ```
 pip install --upgrade paddlenlp -i https://pypi.org/simple
 ```
 
-### Install from Source
+## Easy-to-use API
 
+
+### Transformer API: Awesome Pre-trained Model Ecosystem
+
+We provide **15** network architecture and **67** pretrained models. Not only includes all the SOTA model like ERNIE, PLATO and SKEP released by Baidu, but also integrates most of high quality Chinese pretrained model developed by other organizations. We also welcome developer to contribute your transformer model! 🤗 
+
+```python
+from paddlenlp.transformers import *
+
+ernie = ErnieModel.from_pretrained('ernie-1.0')
+ernie_gram = ErnieGramModel.from_pretrained('ernie-gram')
+bert = BertModel.from_pretrained('bert-wwm-chinese')
+albert = AlbertModel.from_pretrained('albert-chinese-tiny')
+roberta = RobertaModel.from_pretrained('roberta-wwm-ext')
+electra = ElectraModel.from_pretrained('chinese-electra-small')
+gpt = GPTForPretraining.from_pretrained('gpt-cpm-large-cn')
 ```
-pip install --upgrade git+https://github.com/PaddlePaddle/PaddleNLP.git
 
-pip install --upgrade git+https://gitee.com/PaddlePaddle/PaddleNLP.git
+PaddleNLP also provides unified API experience for NLP task like semantic representation, text classification, sentence matching, sequence labeling, question answering, etc.
+
+```python
+import paddle
+from paddlenlp.transformers import ErnieTokenizer, ErnieModel
+
+tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
+text = tokenizer('natural language understanding')
+
+# Semantic Representation
+model = ErnieModel.from_pretrained('ernie-1.0')
+pooled_output, sequence_output = model(input_ids=paddle.to_tensor([text['input_ids']]))
+# Text Classificaiton and Matching
+model = ErnieForSequenceClassifiation.from_pretrained('ernie-1.0')
+# Sequence Labeling
+model = ErnieForTokenClassifiation.from_pretrained('ernie-1.0')
+# Question Answering
+model = ErnieForQuestionAnswering.from_pretrained('ernie-1.0')
 ```
 
-## Quick Start
+For more pretrained model usage, please refer to [Transformer API](./docs/model_zoo/transformers.rst)
+
 
-### Quick Dataset Loading
+### Dataset API: Rich Dataset Integration and Quick Loading
 
 ```python
 from paddlenlp.datasets import load_dataset
@@ -63,10 +99,9 @@ train_ds, dev_ds, test_ds = load_dataset("chnsenticorp", splits=["train", "dev",
 
 For more dataset API usage please refer to [Dataset API](./docs/datasets.md).
 
-### Pre-trained Text Embedding Loading
+### Embedding API: Quick Loading for Word Embedding
 
 ```python
-
 from paddlenlp.embeddings import TokenEmbedding
 
 wordemb = TokenEmbedding("fasttext.wiki-news.target.word-word.dim300.en")
@@ -76,64 +111,53 @@ wordemb.cosine_sim("apple", "rail")
 >>> 0.29207364
 ```
 
-For more TokenEmbedding usage, please refer to [Embedding API](./docs/embeddings.md)
-
-### Rich Chinese Pre-trained Models
-
-```python
-from paddlenlp.transformers import ErnieModel, BertModel, RobertaModel, ElectraModel, GPTForPretraining
-
-ernie = ErnieModel.from_pretrained('ernie-1.0')
-bert = BertModel.from_pretrained('bert-wwm-chinese')
-roberta = RobertaModel.from_pretrained('roberta-wwm-ext')
-electra = ElectraModel.from_pretrained('chinese-electra-small')
-gpt = GPTForPretraining.from_pretrained('gpt-cpm-large-cn')
-```
-
-For more pretrained model selection, please refer to [Transformer API](./docs/model_zoo/transformers.rst)
+For more `TokenEmbedding` usage, please refer to [Embedding API](./docs/embeddings.md)
 
-### Extract Feature Through Pre-trained Model
+### More API Usage
 
-```python
-import paddle
-from paddlenlp.transformers import ErnieTokenizer, ErnieModel
+- [Transformer API](./docs/model_zoo/transformers.rst)
+- [Data API](./docs/data.md)
+- [Dataset API](./docs/datasets.md)
+- [Embedding API](./docs/model_zoo/embeddings.md)
+- [Metrics API](./docs/metrics.md)
 
-tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
-model = ErnieModel.from_pretrained('ernie-1.0')
+Please find more API Reference from our [readthedocs](https://paddlenlp.readthedocs.io/).
 
-text = tokenizer('自然语言处理')
-pooled_output, sequence_output = model.forward(input_ids=paddle.to_tensor([text['input_ids']]))
-```
+## Rich Application Examples
 
-## Model Zoo and Applications
+PaddleNLP provide rich application examples covers mainstream NLP task to help developer accelerate problem solving.
 
-For model zoo introduction please refer to[PaddleNLP Model Zoo](./docs/model_zoo.md). As for applicaiton senario please refer to [PaddleNLP Examples](./examples/)。
+### NLP Basic Technique
 
 - [Word Embedding](./examples/word_embedding/)
 - [Lexical Analysis](./examples/lexical_analysis/)
-- [Named Entity Recognition](./examples/information_extraction/msra_ner/)
 - [Language Model](./examples/language_model/)
+- [Semantic Parsing (Text to SQL)](./examples/text_to_sql):star:
+
+
+### NLP Core Technique
+
 - [Text Classification](./examples/text_classification/)
-- [Text Gneeration](./examples/text_generation/)
-- [Semantic Maching](./examples/text_matching/)
-- [Text Graph](./examples/text_graph/erniesage/)
+- [Text Matching](./examples/text_matching/)
+- [Text Generation](./examples/text_generation/)
+- [Semantic Indexing](./examples/semantic_indexing/)
 - [Information Extraction](./examples/information_extraction/)
-- [General Dialogue](./examples/dialogue/)
-- [Machine Translation](./examples/machine_translation/)
-- [Machine Readeng Comprehension](./examples/machine_reading_comprehension/)
 
-## Advanced Application
+### NLP Application in Real System
 
-- [Model Compression](./examples/model_compression/)
+- [Sentiment Analysis](./examples/sentiment_analysis/skep/):star2:
+- [General Dialogue System](./examples/dialogue/)
+- [Machine Translation](./examples/machine_translation/)
+- [Simultaneous Translation](././examples/simultaneous_translation/)
+- [Machine Reading Comprehension](./examples/machine_reading_comprehension/)
 
-## API Usage
-
-- [Transformer API](./docs/model_zoo/transformers.rst)
-- [Data API](./docs/data.md)
-- [Dataset API](./docs/datasets.md)
-- [Embedding API](./docs/embeddings.md)
-- [Metrics API](./docs/metrics.md)
+### Extention Application
 
+- [Text Knowledge Linking](./examples/text_to_knowledge/):star2:
+- [Machine Reading Comprehension](./examples/machine_reading_comprehension)
+- [Model Compression](./examples/model_compression/)
+- [Text Graph Learning](./examples/text_graph/erniesage/)
+- [Time Series Prediction](./examples/time_series/)
 
 ## Tutorials
 
@@ -149,10 +173,9 @@ Please refer to our official AI Studio account for more interactive tutorials: [
 
 * [Use TCN Model to predict COVID-19 confirmed cases](https://aistudio.baidu.com/aistudio/projectdetail/1290873)
 
-
 ## Community
 
-### Special Interest Group(SIG)
+### Special Interest Group (SIG)
 
 Welcome to join [PaddleNLP SIG](https://iwenjuan.baidu.com/?code=bkypg8) for contribution, eg. Dataset, Models and Toolkit.
 
@@ -166,6 +189,10 @@ Join our QQ Technical Group for technical exchange right now! ⬇️
   <img src="./docs/imgs/qq.png" width="200" height="200" />
 </div>
 
+## ChangeLog
+
+For more information about our release, please refer to [ChangeLog](./docs/changelog.md)
+
 ## License
 
 PaddleNLP is provided under the [Apache-2.0 License](./LICENSE).
diff --git a/docs/change_log.md b/docs/change_log.md
diff --git a/docs/get_started/quick_start.rst b/docs/get_started/quick_start.rst
@@ -12,7 +12,7 @@
 
 .. code-block::
 
-    >>> pip install --upgrade paddlenlp>=2.0.0rc -i https://pypi.org/simple
+    >>> pip install --upgrade paddlenlp -i https://pypi.org/simple
 
 2. 一键加载预训练模型
 ========

diff --git a/docs/index.rst b/docs/index.rst
@@ -6,7 +6,7 @@
 
 - **易用的文本领域API**
 
-  - 提供从数据集加载、文本预处理、模型组网、模型评估、到推理加速的领域API：如一键加载中文数据集的 **Dataset API**，可灵活高效地完成数据预处理的Data API，预置60+预训练词向量的**Embedding API**; 提供50+预训练模型的生态基础能力的**Transformer API**，可大幅提升NLP任务建模和迭代的效率。
+  - 提供从数据集加载、文本预处理、模型组网、模型评估、到推理加速的领域API：如一键加载中文数据集的 **Dataset API**，可灵活高效地完成数据预处理的Data API，预置60+预训练词向量的 **Embedding API**; 提供50+预训练模型的生态基础能力的 **Transformer API**，可大幅提升NLP任务建模和迭代的效率。
 
 - **多场景的应用示例**
 
@@ -44,8 +44,8 @@
    :maxdepth: 2
    :caption: 模型库
 
-   预训练模型 <model_zoo/transformers>
-   基本组网单元 <model_zoo/others>
+   Transformer预训练模型 <model_zoo/transformers>
+   预训练词向量 <model_zoo/embeddings>
 
 .. toctree::
    :maxdepth: 2