Stars
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、…
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
A high-throughput and memory-efficient inference and serving engine for LLMs
PyTorch Tutorial for Deep Learning Researchers
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search…
100+ Chinese Word Vectors 上百种预训练中文词向量
Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
总结梳理自然语言处理工程师(NLP)需要积累的各方面知识,包括面试题,各种基础知识,工程能力等等,提升核心竞争力
A Comprehensive Toolkit for High-Quality PDF Content Extraction
Example models using DeepSpeed
Facilitating the design, comparison and sharing of deep text matching models.
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。
Python 中文数据结构和算法教程
Image to prompt with BLIP and CLIP
一键中文数据增强包 ; NLP数据增强、bert数据增强、EDA:pip install nlpcda
Emu Series: Generative Multimodal Models from BAAI
Efficient Retrieval Augmentation and Generation Framework
pytorch实现 Bert 做seq2seq任务,使用unilm方案,现在也可以做自动摘要,文本分类,情感分析,NER,词性标注等任务,支持t5模型,支持GPT2进行文章续写。
An elegent pytorch implement of transformers
Recipes to train reward model for RLHF.
Implement Statistical Learning Methods, Li Hang the hard way. 李航《统计学习方法》一书的硬核 Python 实现
A Collection of BM25 Algorithms in Python