English | 简体中文
PaddleNLP aims to accelerate NLP applications through powerful model zoo, easy-to-use API and high performance distributed training. It's also the NLP best practice for PaddlePaddle 2.0 API system.
-
Rich and Powerful Model Zoo
- Our Model Zoo covers mainstream NLP applications, including Lexical Analysis, Syntactic Parsing, Machine Translation, Text Classification, Text Generation, Text Matching, General Dialogue and Question Answering etc.
-
Easy-to-use and End-to-End API
- The API is fully integrated with PaddlePaddle high-level API system. It minimizes the number of user actions required for common use cases like data loading, text pre-processing, training and evaluation. which enables you to deal with text problems more productively.
-
High Performance and Distributed Training
- We provide a highly optimized ditributed training implementation for BERT with Fleet API, bnd based the mixed precision training strategy based on PaddlePaddle 2.0, it can fully utilize GPU clusters for large-scale model pre-training.
- python >= 3.6
- paddlepaddle >= 2.0.0
pip install --upgrade paddlenlp -i https://pypi.org/simple
from paddlenlp.datasets import load_dataset
train_ds, dev_ds, test_ds = load_dataset("chnsenticorp", splits=["train", "dev", "test"])
from paddlenlp.embeddings import TokenEmbedding
wordemb = TokenEmbedding("w2v.baidu_encyclopedia.target.word-word.dim300")
print(wordemb.cosine_sim("king", "queen"))
>>> 0.63395125
wordemb.cosine_sim("arts", "train")
>>> 0.14792643
from paddlenlp.transformers import ErnieModel, BertModel, RobertaModel, ElectraModel, GPT2ForPretraining
ernie = ErnieModel.from_pretrained('ernie-1.0')
bert = BertModel.from_pretrained('bert-wwm-chinese')
roberta = RobertaModel.from_pretrained('roberta-wwm-ext')
electra = ElectraModel.from_pretrained('chinese-electra-small')
gpt2 = GPT2ForPretraining.from_pretrained('gpt2-base-cn')
For more pretrained model selection, please refer to Pretrained-Models
import paddle
from paddlenlp.transformers import ErnieTokenizer, ErnieModel
tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
model = ErnieModel.from_pretrained('ernie-1.0')
text = tokenizer('natural language processing')
pooled_output, sequence_output = model.forward(input_ids=paddle.to_tensor([text['input_ids']]))
For model zoo introduction please refer toPaddleNLP Model Zoo. As for applicaiton senario please refer to PaddleNLP Examples。
- Word Embedding
- Lexical Analysis
- Name Entity Recognition
- Language Model
- Text Classification
- Text Gneeration
- Semantic Maching
- Text Graph
- Information Extraction
- General Dialogue
- Machine Translation
- Machine Readeng Comprehension
Please refer to our official AI Studio account for more interactive tutorials: PaddleNLP on AI Studio
-
What's Seq2Vec? shows how to use simple API to finish LSTM model and solve sentiment analysis task.
-
Sentiment Analysis with ERNIE shows how to exploit the pretrained ERNIE to solve sentiment analysis problem.
-
Waybill Information Extraction with BiGRU-CRF Model shows how to make use of Bi-GRU plus CRF to finish information extraction task.
-
Waybill Information Extraction with ERNIE shows how to use ERNIE, the Chinese pre-trained model improve information extraction performance.
Join our QQ Technical Group for technical exchange right now! ⬇️
PaddleNLP is provided under the Apache-2.0 License.