Skip to content

Latest commit

 

History

History
128 lines (82 loc) · 4.64 KB

README_en.md

File metadata and controls

128 lines (82 loc) · 4.64 KB

English | 简体中文


License python version support os

Introduction

PaddleNLP aims to accelerate NLP applications through powerful model zoo, easy-to-use API and high performance distributed training. It's also the NLP best practice for PaddlePaddle 2.0 API system.

Features

  • Rich and Powerful Model Zoo

    • Our Model Zoo covers mainstream NLP applications, including Lexical Analysis, Syntactic Parsing, Machine Translation, Text Classification, Text Generation, Text Matching, General Dialogue and Question Answering etc.
  • Easy-to-use and End-to-End API

    • The API is fully integrated with PaddlePaddle high-level API system. It minimizes the number of user actions required for common use cases like data loading, text pre-processing, training and evaluation. which enables you to deal with text problems more productively.
  • High Performance and Distributed Training

  • We provide a highly optimized ditributed training implementation for BERT with Fleet API, bnd based the mixed precision training strategy based on PaddlePaddle 2.0, it can fully utilize GPU clusters for large-scale model pre-training.

Installation

Prerequisites

  • python >= 3.6
  • paddlepaddle >= 2.0.0
pip install paddlenlp>=2.0.0rc

Quick Start

Quick Dataset Loading

from paddlenlp.datasets import ChnSentiCorp

train_ds, test_ds = ChnSentiCorp.get_datasets(['train','test'])

Chinese Text Emebdding Loading

from paddlenlp.embeddings import TokenEmbedding

wordemb = TokenEmbedding("w2v.baidu_encyclopedia.target.word-word.dim300")
print(wordemb.cosine_sim("国王", "王后"))
>>> 0.63395125
wordemb.cosine_sim("艺术", "火车")
>>> 0.14792643

Rich Chinsese Pre-trained Models

from paddlenlp.transformers import ErnieModel, BertModel, RobertaModel, ElectraModel, GPT2ForPretraining

ernie = ErnieModel.from_pretrained('ernie-1.0')
bert = BertModel.from_pretrained('bert-wwm-chinese')
roberta = RobertaModel.from_pretrained('roberta-wwm-ext')
electra = ElectraModel.from_pretrained('chinese-electra-small')
gpt2 = GPT2ForPretraining.from_pretrained('gpt2-base-cn')

For more pretrained model selection, please refer to Pretrained-Models

Model Zoo and Applications

Advanced Application

API Usage

Tutorials

Please refer to our official AI Studio account for more interactive tutorials: PaddleNLP on AI Studio

Community

Join our QQ Technical Group for technical exchange right now! ⬇️

License

PaddleNLP is provided under the Apache-2.0 License.