Stars
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
Fully open reproduction of DeepSeek-R1
π Guides, papers, lecture, notebooks and resources for prompt engineering
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
Entropy Based Sampling and Parallel CoT Decoding
Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
A comprehensive repository of reasoning tasks for LLMs (and beyond)
OCR, layout analysis, reading order, table recognition in 90+ languages
Convert PDF to markdown + JSON quickly with high accuracy
π Monitor deep learning model training and hardware usage from your mobile phone π±
A multi-programming language benchmark for LLMs
DeepSeek LLM: Let there be answers
A quick guide (especially) for trending instruction finetuning datasets
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
π CodeEdit App for macOS β Elevate your code editing experience. Open source, free forever.
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)
The GeoV model is a large langauge model designed by Georges Harik and uses Rotary Positional Embeddings with Relative distances (RoPER). We have shared a pre-trained 9B parameter model.
High-Resolution Image Synthesis with Latent Diffusion Models
π§βπ« 60+ Implementations/tutorials of deep learning papers with side-by-side notes π; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gaβ¦
Fast and memory-efficient exact attention