Stars
整理一些书籍 ,包含 C&C++ 、git 、Java、Keras 、Linux 、NLP 、Python 、Scala 、TensorFlow 、大数据 、推荐系统、数据库、数据挖掘 、机器学习 、深度学习 、算法等。
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
A Comprehensive Toolkit for High-Quality PDF Content Extraction
🔍 Better text detection by combining multiple OCR engines (EasyOCR, Tesseract, and Pororo) with 🧠 LLM.
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
LLMs interview notes and answers:该仓库主要记录大模型(LLMs)算法工程师相关的面试题和参考答案
LLM Chatbot w/ Retrieval Augmented Generation using Llamaindex. It demonstrates how to impl. chunking, indexing, and source citation.
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。
Example models using DeepSpeed
Recipes to train reward model for RLHF.
Apriori and fp-growth implement of python
[ICLR 2023] ReAct: Synergizing Reasoning and Acting in Language Models
[ACL 2024]Official GitHub repo for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems.
Multimodal Question Answering in the Medical Domain: A summary of Existing Datasets and Systems
Replication of the paper "Text Is All You Need: Learning Language Representations for Sequential Recommendation" on KDD'23.
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
📄 适合中文的简历模板收集(LaTeX,HTML/JS and so on)由 @hoochanlon 维护
A high-throughput and memory-efficient inference and serving engine for LLMs
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
Emu Series: Generative Multimodal Models from BAAI
✨✨Latest Advances on Multimodal Large Language Models
A Collection of BM25 Algorithms in Python
Python PDF parser for scientific publications: content and figures