English | 简体中文
PaddleNLP aims to accelerate NLP applications through powerful model zoo, easy-to-use API and high performance distributed training. It's also the NLP best practice for PaddlePaddle 2.0 API system.
-
Powerful Model Zoo for Rich Senario
- Our Model Zoo covers mainstream NLP applications, including Lexical Analysis, Text Classification, Text Generation, Text Matching, Text Graph, Information Extraction, Machine Translation, General Dialogue and Question Answering etc.
-
Easy-to-Use and End-to-End API
- The API is fully integrated with PaddlePaddle 2.0 high-level API system. It minimizes the number of user actions required for common use cases like data loading, text pre-processing, training and evaluation, which enables you to deal with text problems more productively.
-
High Performance and Distributed Training
- We provide a highly optimized ditributed training implementation for BERT with Fleet API, and mixed precision training strategy based on PaddlePaddle 2.0, it can fully utilize GPU clusters for large-scale model pre-training.
- python >= 3.6
- paddlepaddle >= 2.0.1
More information about PaddlePaddle installation please refer to PaddlePaddle Install
pip install --upgrade paddlenlp -i https://pypi.org/simple
from paddlenlp.datasets import load_dataset
train_ds, dev_ds, test_ds = load_dataset("chnsenticorp", splits=["train", "dev", "test"])
For more dataset API usage please refer to Dataset API.
from paddlenlp.embeddings import TokenEmbedding
wordemb = TokenEmbedding("fasttext.wiki-news.target.word-word.dim300.en")
wordemb.cosine_sim("king", "queen")
>>> 0.77053076
wordemb.cosine_sim("apple", "rail")
>>> 0.29207364
For more TokenEmbedding usage, please refer to Embedding API
from paddlenlp.transformers import ErnieModel, BertModel, RobertaModel, ElectraModel, GPT2ForPretraining
ernie = ErnieModel.from_pretrained('ernie-1.0')
bert = BertModel.from_pretrained('bert-wwm-chinese')
roberta = RobertaModel.from_pretrained('roberta-wwm-ext')
electra = ElectraModel.from_pretrained('chinese-electra-small')
gpt2 = GPT2ForPretraining.from_pretrained('gpt2-base-cn')
For more pretrained model selection, please refer to Transformer API
import paddle
from paddlenlp.transformers import ErnieTokenizer, ErnieModel
tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
model = ErnieModel.from_pretrained('ernie-1.0')
text = tokenizer('自然语言处理')
pooled_output, sequence_output = model.forward(input_ids=paddle.to_tensor([text['input_ids']]))
For model zoo introduction please refer toPaddleNLP Model Zoo. As for applicaiton senario please refer to PaddleNLP Examples。
- Word Embedding
- Lexical Analysis
- Named Entity Recognition
- Language Model
- Text Classification
- Text Gneeration
- Semantic Maching
- Text Graph
- Information Extraction
- General Dialogue
- Machine Translation
- Machine Readeng Comprehension
Please refer to our official AI Studio account for more interactive tutorials: PaddleNLP on AI Studio
-
What's Seq2Vec? shows how to use simple API to finish LSTM model and solve sentiment analysis task.
-
Sentiment Analysis with ERNIE shows how to exploit the pretrained ERNIE to solve sentiment analysis problem.
-
Waybill Information Extraction with BiGRU-CRF Model shows how to make use of Bi-GRU plus CRF to finish information extraction task.
-
Waybill Information Extraction with ERNIE shows how to use ERNIE, the Chinese pre-trained model improve information extraction performance.
Welcome to join PaddleNLP SIG for contribution, eg. Dataset, Models and Toolkit.
To connect with other users and contributors, welcome to join our Slack channel.
Join our QQ Technical Group for technical exchange right now! ⬇️
PaddleNLP is provided under the Apache-2.0 License.