News

[2021-06-04] ERNIE-Gram pretrained model has been released! Install v2.0.2 to try it.
[2021-05-20] PaddleNLP 2.0 has been officially relealsed! 🎉 For more information please refer to Release Note.

Introduction

PaddleNLP 2.0 aims to accelerate NLP applications through powerful model zoo, easy-to-use API and high performance distributed training. We also provide NLP best practice based on PaddlePaddle 2.0 API system.

Feature

Easy-to-Use and End-to-End API
- The API is fully integrated with PaddlePaddle 2.0 high-level API system. It minimizes the number of user actions required for common use cases like data loading, text pre-processing, transformer model loading, training and deployment, which enables you to deal with text problems more productively.
Rich Application Examples
- Our Model Zoo covers mainstream NLP applications, including Lexical Analysis, Text Classification, Text Generation, Text Matching, Text Graph, Information Extraction, Machine Translation, General Dialogue and Question Answering etc.
High Performance Distributed Training
- We provide a highly optimized ditributed training implementation for BERT with Fleet API, and mixed precision training strategy based on PaddlePaddle 2.0, it can fully utilize GPU clusters for large-scale model pre-training.

Installation

Prerequisites

python >= 3.6
paddlepaddle >= 2.1

More information about PaddlePaddle installation please refer to PaddlePaddle Install

PIP Installation

pip install --upgrade paddlenlp -i https://pypi.org/simple

Quick Start

Quick Dataset Loading

from paddlenlp.datasets import load_dataset

train_ds, dev_ds, test_ds = load_dataset("chnsenticorp", splits=["train", "dev", "test"])

For more dataset API usage please refer to Dataset API.

Pre-trained Text Embedding Loading

from paddlenlp.embeddings import TokenEmbedding

wordemb = TokenEmbedding("fasttext.wiki-news.target.word-word.dim300.en")
wordemb.cosine_sim("king", "queen")
>>> 0.77053076
wordemb.cosine_sim("apple", "rail")
>>> 0.29207364

For more TokenEmbedding usage, please refer to Embedding API

Rich Chinese Pre-trained Models

from paddlenlp.transformers import *

ernie = ErnieModel.from_pretrained('ernie-1.0')
bert = BertModel.from_pretrained('bert-wwm-chinese')
albert = AlbertModel.from_pretrained('albert-chinese-tiny')
roberta = RobertaModel.from_pretrained('roberta-wwm-ext')
electra = ElectraModel.from_pretrained('chinese-electra-small')
gpt = GPTForPretraining.from_pretrained('gpt-cpm-large-cn')

For more pretrained model selection, please refer to Transformer API

Extract Feature Through Pre-trained Model

import paddle
from paddlenlp.transformers import ErnieTokenizer, ErnieModel

tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
model = ErnieModel.from_pretrained('ernie-1.0')

text = tokenizer('自然语言处理')
pooled_output, sequence_output = model.forward(input_ids=paddle.to_tensor([text['input_ids']]))

More API Usage

Transformer API
Data API
Dataset API
Embedding API
Metrics API

Please find more API Reference from our readthedocs.

Rich Text Application Examples

PaddleNLP provide rich application examples covers mainstream NLP task to help developer accelerate problem solving.

NLP Basic Technique

Word Embedding
Lexical Analysis
Language Model
Semantic Parsing (Text to SQL)⭐

NLP Core Technique

Text Classification
Text Matching
Text Generation
Semantic Indexing
Information Extraction

NLP Application in Real System

Sentiment Analysis🌟
General Dialogue System
Machine Translation
Simultaneous Translation
Machine Reading Comprehension

Extention Application

Text Knowledge Linking🌟
Machine Reading Comprehension
Model Compression
Text Graph Learning
Time Series Prediction

Tutorials

Please refer to our official AI Studio account for more interactive tutorials: PaddleNLP on AI Studio

What's Seq2Vec? shows how to use simple API to finish LSTM model and solve sentiment analysis task.
Sentiment Analysis with ERNIE shows how to exploit the pretrained ERNIE to solve sentiment analysis problem.
Waybill Information Extraction with BiGRU-CRF Model shows how to make use of Bi-GRU plus CRF to finish information extraction task.
Waybill Information Extraction with ERNIE shows how to use ERNIE, the Chinese pre-trained model improve information extraction performance.
Use TCN Model to predict COVID-19 confirmed cases

Community

Special Interest Group (SIG)

Welcome to join PaddleNLP SIG for contribution, eg. Dataset, Models and Toolkit.

Slack

To connect with other users and contributors, welcome to join our Slack channel.

QQ

Join our QQ Technical Group for technical exchange right now! ⬇️

ChangeLog

For more information about our release, please refer to ChangeLog

License

PaddleNLP is provided under the Apache-2.0 License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_en.md

README_en.md

News

Introduction

Feature

Installation

Prerequisites

PIP Installation

Quick Start

Quick Dataset Loading

Pre-trained Text Embedding Loading

Rich Chinese Pre-trained Models

Extract Feature Through Pre-trained Model

More API Usage

Rich Text Application Examples

NLP Basic Technique

NLP Core Technique

NLP Application in Real System

Extention Application

Tutorials

Community

Special Interest Group (SIG)

Slack

QQ

ChangeLog

License

Files

README_en.md

Latest commit

History

README_en.md

File metadata and controls

News

Introduction

Feature

Installation

Prerequisites

PIP Installation

Quick Start

Quick Dataset Loading

Pre-trained Text Embedding Loading

Rich Chinese Pre-trained Models

Extract Feature Through Pre-trained Model

More API Usage

Rich Text Application Examples

NLP Basic Technique

NLP Core Technique

NLP Application in Real System

Extention Application

Tutorials

Community

Special Interest Group (SIG)

Slack

QQ

ChangeLog

License