Stars
3000000+语义理解与匹配数据集。可用于无监督对比学习、半监督学习等构建中文领域效果最好的预训练模型
Phi2-Chinese-0.2B 从0开始训练自己的Phi2中文小模型,支持接入langchain加载本地知识库做检索增强生成RAG。Training your own Phi2 small chat model from scratch.
中文对话0.2B小模型(ChatLM-Chinese-0.2B),开源所有数据集来源、数据清洗、tokenizer训练、模型预训练、SFT指令微调、RLHF优化等流程的全部代码。支持下游任务sft微调,给出三元组信息抽取微调示例。
搜索、推荐、广告、用增等工业界实践文章收集(来源:知乎、Datafuntalk、技术公众号)
Fantastic Data Engineering for Large Language Models
DALL·E Mini - Generate images from a text prompt
基于pytorch的GlobalPointer进行中文命名实体识别。
Open Academic Research on Improving LLaMA to SOTA LLM
Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models
[EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627
🤖 AgentVerse 🪐 is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides two frameworks: task-solving and simulation
An Open-source Framework for Data-centric, Self-evolving Autonomous Language Agents
A programming framework for agentic AI 🤖
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
Resource, Evaluation and Detection Papers for ChatGPT
Question and Answer based on Anything.
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
Chat凉宫春日, An open sourced Role-Playing chatbot Cheng Li, Ziang Leng, and others.
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
[ACL 2024] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
The official code of EMNLP 2022, "SCROLLS: Standardized CompaRison Over Long Language Sequences".
A quick guide (especially) for trending instruction finetuning datasets