- Planet Earth
Stars
Efficient Triton Kernels for LLM Training
Introduction to Machine Learning Systems
PyTorch native quantization and sparsity for training and inference
Flash Attention in ~100 lines of CUDA (forward pass only)
Simple Implement Transformer with C and Python for educational purpose
A curated list for Efficient Large Language Models
DSPy: The framework for programming—not prompting—language models
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Learn CUDA Programming, published by Packt
An awesome repository of local AI tools
Source for https://fullstackdeeplearning.com
Fine-tune SantaCoder for Code/Text Generation.
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
📋 A list of open LLMs available for commercial use.
deadlock detection for pthread_mutex [search keywords: lockdep pthread]
Integrate cutting-edge LLM technology quickly and easily into your apps