Skip to content

Commit

Permalink
first commit without legacy files
Browse files Browse the repository at this point in the history
  • Loading branch information
ZeyuChen committed Feb 5, 2021
1 parent cc1d27e commit 9192cd8
Show file tree
Hide file tree
Showing 302 changed files with 78,404 additions and 2 deletions.
123 changes: 123 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pycharm
.DS_Store
.idea/
FETCH_HEAD

# vscode
.vscode
35 changes: 35 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
- repo: https://github.com/PaddlePaddle/mirrors-yapf.git
sha: 0d79c0c469bab64f7229c9aca2b1186ef47f0e37
hooks:
- id: yapf
files: \.py$
- repo: https://github.com/pre-commit/pre-commit-hooks
sha: a11d9314b22d8f8c7556443875b731ef05965464
hooks:
- id: check-merge-conflict
- id: check-symlinks
- id: detect-private-key
files: (?!.*paddle)^.*$
- id: end-of-file-fixer
files: \.md$
- id: trailing-whitespace
files: \.md$
- repo: https://github.com/Lucas-C/pre-commit-hooks
sha: v1.0.1
hooks:
- id: forbid-crlf
files: \.md$
- id: remove-crlf
files: \.md$
- id: forbid-tabs
files: \.md$
- id: remove-tabs
files: \.md$
- repo: local
hooks:
- id: clang-format
name: clang-format
description: Format files with ClangFormat
entry: bash .clang_format.hook -i
language: system
files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx|cuh|proto)$
3 changes: 3 additions & 0 deletions .style.yapf
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[style]
based_on_style = pep8
column_limit = 80
4 changes: 3 additions & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved

Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
Expand Down Expand Up @@ -186,7 +188,7 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright [yyyy] [name of copyright owner]
Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
142 changes: 141 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,141 @@
# PaddleNLP
简体中文 | [English](./README_en.md)

<p align="center">
<img src="./docs/imgs/paddlenlp.png" width="520" height ="100" />
</p>

------------------------------------------------------------------------------------------

![License](https://img.shields.io/badge/license-Apache%202-red.svg)
![python version](https://img.shields.io/badge/python-3.6+-orange.svg)
![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg)

## 简介

PaddleNLP 2.0拥有丰富的模型库、简洁易用的API与高性能的分布式训练的能力,旨在为飞桨开发者提升文本建模效率,并提供基于PaddlePaddle 2.0的NLP领域最佳实践。

## 特性

- **丰富的模型库**
- 涵盖了NLP主流应用相关的前沿模型,包括中文词向量、预训练模型、词法分析、文本分类、文本匹配、文本生成、机器翻译、通用对话、问答系统等,更多详细介绍请查看[PaddleNLP模型库](./docs/model_zoo.md)

- **简洁易用的API**
- 深度兼容飞桨2.0的高层API体系,提供可复用的文本建模模块,可大幅度减少数据处理、组网、训练环节的代码开发量,提升文本建模开发效率。

- **高性能分布式训练**
- 通过深度优化的混合精度训练策略与Fleet分布式训练API,可充分利用GPU集群资源,高效完成大规模预训练模型的分布式训练。


## 安装

### 环境依赖

- python >= 3.6
- paddlepaddle >= 2.0.0

```
pip install paddlenlp==2.0.0rc
```

## 快速开始

### 数据集快速加载

```python
from paddlenlp.datasets import ChnSentiCorp

train_ds, dev_ds, test_ds = ChnSentiCorp.get_datasets(['train', 'dev', 'test'])
```

可参考[Dataset文档](./docs/datasets.md)查看更多数据集。

### 一键加载中文词向量

```python
from paddlenlp.embeddings import TokenEmbedding

wordemb = TokenEmbedding("w2v.baidu_encyclopedia.target.word-word.dim300")
print(wordemb.cosine_sim("国王", "王后"))
>>> 0.63395125
wordemb.cosine_sim("艺术", "火车")
>>> 0.14792643
```

内置50+中文词向量,更多使用方法请参考 [Embedding文档](./examples/word_embedding/README.md)


### 一键加载高质量中文预训练模型

```python
from paddlenlp.transformers import ErnieModel, BertModel, RobertaModel, ElectraModel, GPT2ForPretraining

ernie = ErnieModel.from_pretrained('ernie-1.0')
bert = BertModel.from_pretrained('bert-wwm-chinese')
roberta = RobertaModel.from_pretrained('roberta-wwm-ext')
electra = ElectraModel.from_pretrained('chinese-electra-small')
gpt2 = GPT2ForPretraining.from_pretrained('gpt2-base-cn')
```

请参考 [Pretrained-Models](./docs/transformers.md)查看目前支持的预训练模型。

## 模型库及其应用

PaddleNLP模型库整体介绍请参考文档[PaddleNLP Model Zoo](./docs/model_zoo.md).
模型应用场景介绍请参考[PaddleNLP Examples](./examples/README.md)

- [词向量](./examples/word_embedding/README.md)
- [词法分析](./examples/lexical_analysis/README.md)
- [语言模型](./examples/language_model)
- [文本分类](./examples/text_classification/README.md)
- [文本生成](./examples/text_generation/README.md)
- [语义匹配](./examples/text_matching/README.md)
- [命名实体识别](./examples/named_entity_recognition/README.md)
- [文本图学习](./examples/text_graph/erniesage/README.md)
- [通用对话](./examples/dialogue)
- [机器翻译](./examples/machine_translation)
- [阅读理解](./examples/machine_reading_comprehension)

## 进阶应用

- [模型压缩](./examples/model_compression/)

## API 使用文档

- [Transformer API](./docs/transformers.md)
* 基于Transformer结构相关的预训练模型API,包含ERNIE, BERT, RoBERTa, Electra等主流经典结构和下游任务。
- [Data API](./docs/data.md)
* 文本数据处理Pipeline的相关API说明。
- [Dataset API](./docs/datasets.md)
* 数据集相关API,包含自定义数据集,数据集贡献与数据集快速加载等功能说明。
- [Embedding API](./docs/embeddings.md)
* 词向量相关API,支持一键快速加载包预训练的中文词向量,VisulDL高维可视化等功能说明。
- [Metrics API](./docs/metrics.md)
* 针对NLP场景的评估指标说明,与飞桨2.0框架高层API兼容。


## 交互式Notebook教程

- [使用Seq2Vec模块进行句子情感分类](https://aistudio.baidu.com/aistudio/projectdetail/1283423)
- [如何通过预训练模型Fine-tune下游任务](https://aistudio.baidu.com/aistudio/projectdetail/1294333)
- [使用BiGRU-CRF模型完成快递单信息抽取](https://aistudio.baidu.com/aistudio/projectdetail/1317771)
- [使用预训练模型ERNIE优化快递单信息抽取](https://aistudio.baidu.com/aistudio/projectdetail/1329361)
- [使用Seq2Seq模型完成自动对联](https://aistudio.baidu.com/aistudio/projectdetail/1321118)
- [使用预训练模型ERNIE-GEN实现智能写诗](https://aistudio.baidu.com/aistudio/projectdetail/1339888)
- [使用TCN网络完成新冠疫情病例数预测](https://aistudio.baidu.com/aistudio/projectdetail/1290873)

更多教程参见[PaddleNLP on AI Studio](https://aistudio.baidu.com/aistudio/personalcenter/thirdview/574995)


## 社区贡献与技术交流

- 欢迎您加入PaddleNLP的SIG社区,贡献优秀的模型实现、公开数据集、教程与案例、外围小工具。
- 现在就加入PaddleNLP的QQ技术交流群,一起交流NLP技术吧!⬇️

<div align="center">
<img src="./docs/imgs/qq.png" width="200" height="200" />
</div>


## License

PaddleNLP遵循[Apache-2.0开源协议](./LICENSE)
Loading

0 comments on commit 9192cd8

Please sign in to comment.