Starred repositories
A comprehensive library for implementing LLMs, including a unified training pipeline and comprehensive model evaluation.
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…
Build ChatGPT over your data, all with natural language
use python parse OFD file: ofd2img ofd2pdf pdf2ofd img2ofd ;(纯 python的ofd解析)
a Python toolbox loads 172 public time series datasets for machine/deep learning with a single line of code. Datasets from multiple domains including healthcare, financial, power, traffic, weather,…
Awesome Deep Learning for Time-Series Imputation, including a must-read paper list about applying neural networks to impute incomplete time series containing NaN missing values/data
PyGrinder: a Python toolkit for grinding data beans into the incomplete for real-world data simulation by introducing missing values with different missingness patterns, including MCAR (complete at…
A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputatio…
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
该仓库用于记录作者本人参加的各大数据科学竞赛的获奖方案源码以及一些新比赛的原创baseline. 主要涵盖:kaggle, 阿里天池,华为云大赛校园赛,百度aistudio,和鲸社区,datafountain等
用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库;24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.
全自动大模型llm训练,无需微调知识,门槛极低。极其适合零基础的人使用(目前暂时只支持glm3,未来会增加更多模型)
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
⛽️「算法通关手册」:超详细的「算法与数据结构」基础讲解教程,从零基础开始学习算法知识,850+ 道「LeetCode 题目」详细解析,200 道「大厂面试热门题目」。