Stars
Curated tutorials and resources for Large Language Models, AI Painting, and more.
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
This repo includes ChatGPT prompt curation to use ChatGPT better.
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
此项目是机器学习(Machine Learning)、深度学习(Deep Learning)、NLP面试中常考到的知识点和代码实现,也是作为一个算法工程师必会的理论基础知识。
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、…
A guideline for building practical production-level deep learning systems to be deployed in real world applications.
GPT2 for Multiple Languages, including pretrained models. GPT2 多语言支持, 15亿参数中文预训练模型
100+ Chinese Word Vectors 上百种预训练中文词向量
AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST, and the significant improvement was also made, which make long text predictions more accurate.https…
A tensorflow implementation of EAST text detector
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network
classical model code implementation of few-shot/one-shot lenaring, including siamese network, prototypical network, relation network, induction network
Chinese version of GPT2 training code, using BERT tokenizer.
XLNet: Generalized Autoregressive Pretraining for Language Understanding
A curated list of resources for Chinese NLP 中文自然语言处理相关资料
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
Natural Language Processing Tasks and References
Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0
A TensorFlow Implementation of the Transformer: Attention Is All You Need
Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
TensorFlow code and pre-trained models for BERT
A system for quickly generating training data with weak supervision
all kinds of text classification models and more with deep learning
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Collection of NSFW images URLs for the purposes of training an NSFW Image Classifier