English | 简体中文
- [2021-06-07] NLP Live Class from Baidu has started!🔥🔥🔥 Click HERE to join us!
- [2021-06-04] ERNIE-Gram pretrained model has been released! Install v2.0.2 to try it.
- [2021-05-20] PaddleNLP 2.0 has been officially relealsed! 🎉 For more information please refer to Release Note.
PaddleNLP is a powerful text domain library, which aims to accelerate NLP applications through easy-to-use API, rich application examples, and high performance distributed training. We also provide the NLP best practice based on PaddlePaddle 2.0 API system.
-
Easy-to-Use and End-to-End API
- The API is fully integrated with PaddlePaddle 2.0 high-level API system. It minimizes the number of user actions required for common use cases like data loading, text pre-processing, transformer model loading, training and deployment, which enables you to deal with text problems more productively.
-
Rich Application Examples
- Our model zoo covers mainstream NLP applications, including Lexical Analysis, Text Classification, Text Generation, Text Matching, Text Graph, Information Extraction, Machine Translation, General Dialogue and Question Answering etc.
-
High Performance Distributed Training
- We provide a highly optimized ditributed training implementation for BERT with Fleet API, and mixed precision training strategy based on PaddlePaddle 2.0, it can fully utilize GPU clusters for large-scale model pre-training.
- python >= 3.6
- paddlepaddle >= 2.1
More information about PaddlePaddle installation please refer to PaddlePaddle Installation
pip install --upgrade paddlenlp -i https://pypi.org/simple
We provide 15 network architecture and 67 pretrained models. Not only includes all the SOTA model like ERNIE, PLATO and SKEP released by Baidu, but also integrates most of high quality Chinese pretrained model developed by other organizations.
from paddlenlp.transformers import *
ernie = ErnieModel.from_pretrained('ernie-1.0')
ernie_gram = ErnieGramModel.from_pretrained('ernie-gram')
bert = BertModel.from_pretrained('bert-wwm-chinese')
albert = AlbertModel.from_pretrained('albert-chinese-tiny')
roberta = RobertaModel.from_pretrained('roberta-wwm-ext')
electra = ElectraModel.from_pretrained('chinese-electra-small')
gpt = GPTForPretraining.from_pretrained('gpt-cpm-large-cn')
PaddleNLP also provides unified API experience for NLP task like semantic representation, text classification, sentence matching, sequence labeling, question answering, etc.
import paddle
from paddlenlp.transformers import ErnieTokenizer, ErnieModel
tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
text = tokenizer('natural language understanding')
# Semantic Representation
model = ErnieModel.from_pretrained('ernie-1.0')
pooled_output, sequence_output = model(input_ids=paddle.to_tensor([text['input_ids']]))
# Text Classificaiton and Matching
model = ErnieForSequenceClassifiation.from_pretrained('ernie-1.0')
# Sequence Labeling
model = ErnieForTokenClassifiation.from_pretrained('ernie-1.0')
# Question Answering
model = ErnieForQuestionAnswering.from_pretrained('ernie-1.0')
For more pretrained model usage, please refer to Transformer API
from paddlenlp.datasets import load_dataset
train_ds, dev_ds, test_ds = load_dataset("chnsenticorp", splits=["train", "dev", "test"])
For more dataset API usage please refer to Dataset API.
from paddlenlp.embeddings import TokenEmbedding
wordemb = TokenEmbedding("fasttext.wiki-news.target.word-word.dim300.en")
wordemb.cosine_sim("king", "queen")
>>> 0.77053076
wordemb.cosine_sim("apple", "rail")
>>> 0.29207364
For more TokenEmbedding
usage, please refer to Embedding API
Please find more API Reference from our readthedocs.
PaddleNLP provide rich application examples covers mainstream NLP task to help developer accelerate problem solving.
- Sentiment Analysis🌟
- General Dialogue System
- Machine Translation
- Simultaneous Translation
- Machine Reading Comprehension
- Text Knowledge Linking🌟
- Machine Reading Comprehension
- Model Compression
- Text Graph Learning
- Time Series Prediction
Please refer to our official AI Studio account for more interactive tutorials: PaddleNLP on AI Studio
-
What's Seq2Vec? shows how to use simple API to finish LSTM model and solve sentiment analysis task.
-
Sentiment Analysis with ERNIE shows how to exploit the pretrained ERNIE to solve sentiment analysis problem.
-
Waybill Information Extraction with BiGRU-CRF Model shows how to make use of Bi-GRU plus CRF to finish information extraction task.
-
Waybill Information Extraction with ERNIE shows how to use ERNIE, the Chinese pre-trained model improve information extraction performance.
Welcome to join PaddleNLP SIG for contribution, eg. Dataset, Models and Toolkit.
To connect with other users and contributors, welcome to join our Slack channel.
Join our QQ Technical Group for technical exchange right now! ⬇️
For more information about our release, please refer to ChangeLog
PaddleNLP is provided under the Apache-2.0 License.