Skip to content

Commit

Permalink
Merge branch 'develop' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
smallv0221 authored Jun 8, 2021
2 parents 703a35e + 2079a10 commit f7185da
Show file tree
Hide file tree
Showing 63 changed files with 2,555 additions and 579 deletions.
148 changes: 75 additions & 73 deletions README.md

Large diffs are not rendered by default.

157 changes: 92 additions & 65 deletions README_en.md
Original file line number Diff line number Diff line change
@@ -1,59 +1,95 @@
English | [简体中文](./README.md)

<p align="center">
<img src="./docs/imgs/paddlenlp.png" width="520" height ="100" />
<img src="./docs/imgs/paddlenlp.png" width="718" height ="100" />
</p>

---------------------------------------------------------------------------------

------------------------------------------------------------------------------------------
[![PyPI - PaddleNLP Version](https://img.shields.io/pypi/v/paddlenlp.svg?label=pip&logo=PyPI&logoColor=white)](https://pypi.org/project/paddlenlp/)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/paddlenlp)](https://pypi.org/project/paddlenlp/)
[![PyPI Status](https://pepy.tech/badge/paddlenlp/month)](https://pepy.tech/project/paddlenlp)
![python version](https://img.shields.io/badge/python-3.6+-orange.svg)
![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg)
![GitHub](https://img.shields.io/github/license/paddlepaddle/paddlenlp)

## Introduction
## News <img src="./docs/imgs/news_icon.png" width="40"/>

PaddleNLP aims to accelerate NLP applications through powerful model zoo, easy-to-use API and high performance distributed training. It's also the NLP best practice for PaddlePaddle 2.0 API system.
* [2021-06-07] **NLP Live Class** from Baidu has started!🔥🔥🔥 Click [HERE](https://aistudio.baidu.com/aistudio/course/introduce/24177) to join us!
* [2021-06-04] [ERNIE-Gram](https://arxiv.org/abs/2010.12148) pretrained model has been released! Install v2.0.2 to try it.
* [2021-05-20] PaddleNLP 2.0 has been officially relealsed! :tada: For more information please refer to [Release Note](https://github.com/PaddlePaddle/PaddleNLP/releases/tag/v2.0.0).

## Introduction

## Features
**PaddleNLP** is a powerful NLP library with **Awesome** pre-trained Transformer models and easy-to-use interface, supporting wide-range of NLP tasks from research to industrial applications.

* **Powerful Model Zoo for Rich Senario**
- Our Model Zoo covers mainstream NLP applications, including Lexical Analysis, Text Classification, Text Generation, Text Matching, Text Graph, Information Extraction, Machine Translation, General Dialogue and Question Answering etc.

* **Easy-to-Use and End-to-End API**
- The API is fully integrated with PaddlePaddle 2.0 high-level API system. It minimizes the number of user actions required for common use cases like data loading, text pre-processing, training and evaluation, which enables you to deal with text problems more productively.
* **Easy-to-Use API**
- The API is fully integrated with PaddlePaddle 2.0 high-level API system. It minimizes the number of user actions required for common use cases like data loading, text pre-processing, awesome transfomer models, and fast inference, which enables developer to deal with text problems more productively.

* **High Performance and Distributed Training**
- We provide a highly optimized ditributed training implementation for BERT with Fleet API, and mixed precision training strategy based on PaddlePaddle 2.0, it can fully utilize GPU clusters for large-scale model pre-training.
* **Wide-range NLP Task Support**
- PaddleNLP support NLP task from research to industrial applications, including Lexical Analysis, Text Classification, Text Matching, Text Generation, Information Extraction, Machine Translation, General Dialogue and Question Answering etc.

* **High Performance Distributed Training**
- We provide an industrial level training pipeline for super large-scale Transformer model based on **Auto Mixed Precision** and Fleet distributed training API by PaddlePaddle, which can support customized model pre-training efficiently.

## Installation

### Prerequisites

* python >= 3.6
* paddlepaddle >= 2.0.1
* paddlepaddle >= 2.1

More information about PaddlePaddle installation please refer to [PaddlePaddle Install](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/conda/linux-conda.html)
More information about PaddlePaddle installation please refer to [PaddlePaddle's Website](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/conda/linux-conda.html)

### PIP Installation

```
pip install --upgrade paddlenlp -i https://pypi.org/simple
```

### Install from Source
## Easy-to-use API


### Transformer API: Awesome Pre-trained Model Ecosystem

We provide **15** network architecture and **67** pretrained models. Not only includes all the SOTA model like ERNIE, PLATO and SKEP released by Baidu, but also integrates most of high quality Chinese pretrained model developed by other organizations. We also welcome developer to contribute your transformer model! 🤗

```python
from paddlenlp.transformers import *

ernie = ErnieModel.from_pretrained('ernie-1.0')
ernie_gram = ErnieGramModel.from_pretrained('ernie-gram')
bert = BertModel.from_pretrained('bert-wwm-chinese')
albert = AlbertModel.from_pretrained('albert-chinese-tiny')
roberta = RobertaModel.from_pretrained('roberta-wwm-ext')
electra = ElectraModel.from_pretrained('chinese-electra-small')
gpt = GPTForPretraining.from_pretrained('gpt-cpm-large-cn')
```
pip install --upgrade git+https://github.com/PaddlePaddle/PaddleNLP.git

pip install --upgrade git+https://gitee.com/PaddlePaddle/PaddleNLP.git
PaddleNLP also provides unified API experience for NLP task like semantic representation, text classification, sentence matching, sequence labeling, question answering, etc.

```python
import paddle
from paddlenlp.transformers import ErnieTokenizer, ErnieModel

tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
text = tokenizer('natural language understanding')

# Semantic Representation
model = ErnieModel.from_pretrained('ernie-1.0')
pooled_output, sequence_output = model(input_ids=paddle.to_tensor([text['input_ids']]))
# Text Classificaiton and Matching
model = ErnieForSequenceClassifiation.from_pretrained('ernie-1.0')
# Sequence Labeling
model = ErnieForTokenClassifiation.from_pretrained('ernie-1.0')
# Question Answering
model = ErnieForQuestionAnswering.from_pretrained('ernie-1.0')
```

## Quick Start
For more pretrained model usage, please refer to [Transformer API](./docs/model_zoo/transformers.rst)


### Quick Dataset Loading
### Dataset API: Rich Dataset Integration and Quick Loading

```python
from paddlenlp.datasets import load_dataset
Expand All @@ -63,10 +99,9 @@ train_ds, dev_ds, test_ds = load_dataset("chnsenticorp", splits=["train", "dev",

For more dataset API usage please refer to [Dataset API](./docs/datasets.md).

### Pre-trained Text Embedding Loading
### Embedding API: Quick Loading for Word Embedding

```python

from paddlenlp.embeddings import TokenEmbedding

wordemb = TokenEmbedding("fasttext.wiki-news.target.word-word.dim300.en")
Expand All @@ -76,64 +111,53 @@ wordemb.cosine_sim("apple", "rail")
>>> 0.29207364
```

For more TokenEmbedding usage, please refer to [Embedding API](./docs/embeddings.md)

### Rich Chinese Pre-trained Models

```python
from paddlenlp.transformers import ErnieModel, BertModel, RobertaModel, ElectraModel, GPTForPretraining

ernie = ErnieModel.from_pretrained('ernie-1.0')
bert = BertModel.from_pretrained('bert-wwm-chinese')
roberta = RobertaModel.from_pretrained('roberta-wwm-ext')
electra = ElectraModel.from_pretrained('chinese-electra-small')
gpt = GPTForPretraining.from_pretrained('gpt-cpm-large-cn')
```

For more pretrained model selection, please refer to [Transformer API](./docs/model_zoo/transformers.rst)
For more `TokenEmbedding` usage, please refer to [Embedding API](./docs/embeddings.md)

### Extract Feature Through Pre-trained Model
### More API Usage

```python
import paddle
from paddlenlp.transformers import ErnieTokenizer, ErnieModel
- [Transformer API](./docs/model_zoo/transformers.rst)
- [Data API](./docs/data.md)
- [Dataset API](./docs/datasets.md)
- [Embedding API](./docs/model_zoo/embeddings.md)
- [Metrics API](./docs/metrics.md)

tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
model = ErnieModel.from_pretrained('ernie-1.0')
Please find more API Reference from our [readthedocs](https://paddlenlp.readthedocs.io/).

text = tokenizer('自然语言处理')
pooled_output, sequence_output = model.forward(input_ids=paddle.to_tensor([text['input_ids']]))
```
## Rich Application Examples

## Model Zoo and Applications
PaddleNLP provide rich application examples covers mainstream NLP task to help developer accelerate problem solving.

For model zoo introduction please refer to[PaddleNLP Model Zoo](./docs/model_zoo.md). As for applicaiton senario please refer to [PaddleNLP Examples](./examples/)
### NLP Basic Technique

- [Word Embedding](./examples/word_embedding/)
- [Lexical Analysis](./examples/lexical_analysis/)
- [Named Entity Recognition](./examples/information_extraction/msra_ner/)
- [Language Model](./examples/language_model/)
- [Semantic Parsing (Text to SQL)](./examples/text_to_sql):star:


### NLP Core Technique

- [Text Classification](./examples/text_classification/)
- [Text Gneeration](./examples/text_generation/)
- [Semantic Maching](./examples/text_matching/)
- [Text Graph](./examples/text_graph/erniesage/)
- [Text Matching](./examples/text_matching/)
- [Text Generation](./examples/text_generation/)
- [Semantic Indexing](./examples/semantic_indexing/)
- [Information Extraction](./examples/information_extraction/)
- [General Dialogue](./examples/dialogue/)
- [Machine Translation](./examples/machine_translation/)
- [Machine Readeng Comprehension](./examples/machine_reading_comprehension/)

## Advanced Application
### NLP Application in Real System

- [Model Compression](./examples/model_compression/)
- [Sentiment Analysis](./examples/sentiment_analysis/skep/):star2:
- [General Dialogue System](./examples/dialogue/)
- [Machine Translation](./examples/machine_translation/)
- [Simultaneous Translation](././examples/simultaneous_translation/)
- [Machine Reading Comprehension](./examples/machine_reading_comprehension/)

## API Usage

- [Transformer API](./docs/model_zoo/transformers.rst)
- [Data API](./docs/data.md)
- [Dataset API](./docs/datasets.md)
- [Embedding API](./docs/embeddings.md)
- [Metrics API](./docs/metrics.md)
### Extention Application

- [Text Knowledge Linking](./examples/text_to_knowledge/):star2:
- [Machine Reading Comprehension](./examples/machine_reading_comprehension)
- [Model Compression](./examples/model_compression/)
- [Text Graph Learning](./examples/text_graph/erniesage/)
- [Time Series Prediction](./examples/time_series/)

## Tutorials

Expand All @@ -149,10 +173,9 @@ Please refer to our official AI Studio account for more interactive tutorials: [

* [Use TCN Model to predict COVID-19 confirmed cases](https://aistudio.baidu.com/aistudio/projectdetail/1290873)


## Community

### Special Interest Group(SIG)
### Special Interest Group (SIG)

Welcome to join [PaddleNLP SIG](https://iwenjuan.baidu.com/?code=bkypg8) for contribution, eg. Dataset, Models and Toolkit.

Expand All @@ -166,6 +189,10 @@ Join our QQ Technical Group for technical exchange right now! ⬇️
<img src="./docs/imgs/qq.png" width="200" height="200" />
</div>

## ChangeLog

For more information about our release, please refer to [ChangeLog](./docs/changelog.md)

## License

PaddleNLP is provided under the [Apache-2.0 License](./LICENSE).
31 changes: 0 additions & 31 deletions docs/change_log.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/get_started/quick_start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

.. code-block::
>>> pip install --upgrade paddlenlp>=2.0.0rc -i https://pypi.org/simple
>>> pip install --upgrade paddlenlp -i https://pypi.org/simple
2. 一键加载预训练模型
========
Expand Down
6 changes: 3 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

- **易用的文本领域API**

- 提供从数据集加载、文本预处理、模型组网、模型评估、到推理加速的领域API:如一键加载中文数据集的 **Dataset API**,可灵活高效地完成数据预处理的Data API,预置60+预训练词向量的**Embedding API**; 提供50+预训练模型的生态基础能力的**Transformer API**,可大幅提升NLP任务建模和迭代的效率。
- 提供从数据集加载、文本预处理、模型组网、模型评估、到推理加速的领域API:如一键加载中文数据集的 **Dataset API**,可灵活高效地完成数据预处理的Data API,预置60+预训练词向量的 **Embedding API**; 提供50+预训练模型的生态基础能力的 **Transformer API**,可大幅提升NLP任务建模和迭代的效率。

- **多场景的应用示例**

Expand Down Expand Up @@ -44,8 +44,8 @@
:maxdepth: 2
:caption: 模型库

预训练模型 <model_zoo/transformers>
基本组网单元 <model_zoo/others>
Transformer预训练模型 <model_zoo/transformers>
预训练词向量 <model_zoo/embeddings>

.. toctree::
:maxdepth: 2
Expand Down
Loading

0 comments on commit f7185da

Please sign in to comment.