English | 简体中文

Introduction

PaddleNLP aims to accelerate NLP applications through powerful model zoo, easy-to-use API and high performance distributed training. It's also the NLP best practice for PaddlePaddle 2.0 API system.

Features

Rich and Powerful Model Zoo
- Our Model Zoo covers mainstream NLP applications, including Lexical Analysis, Syntactic Parsing, Machine Translation, Text Classification, Text Generation, Text Matching, General Dialogue and Question Answering etc.
Easy-to-use and End-to-End API
- The API is fully integrated with PaddlePaddle high-level API system. It minimizes the number of user actions required for common use cases like data loading, text pre-processing, training and evaluation. which enables you to deal with text problems more productively.
High Performance and Distributed Training

We provide a highly optimized ditributed training implementation for BERT with Fleet API, bnd based the mixed precision training strategy based on PaddlePaddle 2.0, it can fully utilize GPU clusters for large-scale model pre-training.

Installation

Prerequisites

python >= 3.6
paddlepaddle >= 2.0.0

pip Installation

pip install --upgrade paddlenlp -i https://pypi.org/simple

Quick Start

Quick Dataset Loading

from paddlenlp.datasets import load_dataset

train_ds, dev_ds, test_ds = load_dataset("chnsenticorp", splits=["train", "dev", "test"])

Chinese Text Embedding Loading

from paddlenlp.embeddings import TokenEmbedding

wordemb = TokenEmbedding("w2v.baidu_encyclopedia.target.word-word.dim300")
print(wordemb.cosine_sim("king", "queen"))
>>> 0.63395125
wordemb.cosine_sim("arts", "train")
>>> 0.14792643

Rich Chinese Pre-trained Models

from paddlenlp.transformers import ErnieModel, BertModel, RobertaModel, ElectraModel, GPT2ForPretraining

ernie = ErnieModel.from_pretrained('ernie-1.0')
bert = BertModel.from_pretrained('bert-wwm-chinese')
roberta = RobertaModel.from_pretrained('roberta-wwm-ext')
electra = ElectraModel.from_pretrained('chinese-electra-small')
gpt2 = GPT2ForPretraining.from_pretrained('gpt2-base-cn')

For more pretrained model selection, please refer to Pretrained-Models

便捷获取文本特征

import paddle
from paddlenlp.transformers import ErnieTokenizer, ErnieModel

tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
model = ErnieModel.from_pretrained('ernie-1.0')

text = tokenizer('natural language processing')
pooled_output, sequence_output = model.forward(input_ids=paddle.to_tensor([text['input_ids']]))

Model Zoo and Applications

For model zoo introduction please refer toPaddleNLP Model Zoo. As for applicaiton senario please refer to PaddleNLP Examples。

Word Embedding
Lexical Analysis
Name Entity Recognition
Language Model
Text Classification
Text Gneeration
Semantic Maching
Text Graph
Information Extraction
General Dialogue
Machine Translation
Machine Readeng Comprehension

Advanced Application

Model Compression

API Usage

Transformer API
Data API
Dataset API
Embedding API
Metrics API

Tutorials

Please refer to our official AI Studio account for more interactive tutorials: PaddleNLP on AI Studio

What's Seq2Vec? shows how to use simple API to finish LSTM model and solve sentiment analysis task.
Sentiment Analysis with ERNIE shows how to exploit the pretrained ERNIE to solve sentiment analysis problem.
Waybill Information Extraction with BiGRU-CRF Model shows how to make use of Bi-GRU plus CRF to finish information extraction task.
Waybill Information Extraction with ERNIE shows how to use ERNIE, the Chinese pre-trained model improve information extraction performance.
Use TCN Model to predict COVID-19 confirmed cases

Community

Special Interest Group(SIG)

Welcome to join PaddleNLP SIG for contribution, eg. Dataset, Models and Toolkit.

Slack

To connect with other users and contributors, welcome to join our Slack channel.

QQ

Join our QQ Technical Group for technical exchange right now! ⬇️

License

PaddleNLP is provided under the Apache-2.0 License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_en.md

README_en.md

Introduction

Features

Installation

Prerequisites

pip Installation

Quick Start

Quick Dataset Loading

Chinese Text Embedding Loading

Rich Chinese Pre-trained Models

便捷获取文本特征

Model Zoo and Applications

Advanced Application

API Usage

Tutorials

Community

Special Interest Group(SIG)

Slack

QQ

License

Files

README_en.md

Latest commit

History

README_en.md

File metadata and controls

Introduction

Features

Installation

Prerequisites

pip Installation

Quick Start

Quick Dataset Loading

Chinese Text Embedding Loading

Rich Chinese Pre-trained Models

便捷获取文本特征

Model Zoo and Applications

Advanced Application

API Usage

Tutorials

Community

Special Interest Group(SIG)

Slack

QQ

License