Stars
DeepEP: an efficient expert-parallel communication library
Hackable and optimized Transformers building blocks, supporting a composable construction.
Custom kernels in Triton language for accelerating LLMs
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Optimized primitives for collective multi-GPU communication
bertmaher / llama2.so
Forked from karpathy/llama2.cInference Llama 2 with a model compiled to native code by TorchInductor
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Development repository for the Triton language and compiler
Distribute and run AI workloads magically in Python, like PyTorch for ML infra.
A library that contains a rich collection of performant PyTorch model metrics, a simple interface to create new metrics, a toolkit to facilitate metric computation in distributed training and tools…
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind.
Use Neovim as a language server to inject LSP diagnostics, code actions, and more via Lua.
Pytorch domain library for recommendation systems
Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
❤️ Slim, Fast and Hackable Completion Framework for Neovim
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
A toolkit for developing and comparing reinforcement learning algorithms.
🌠 Dark powered asynchronous completion framework for neovim/Vim8
This repository is outdated and new Boost Note app is available! We've launched a new Boost Note app which supports real-time collaborative writing. https://github.com/BoostIO/BoostNote-App
C/C++ language server supporting multi-million line code base, powered by libclang. Emacs, Vim, VSCode, and others with language server protocol support. Cross references, completion, diagnostics, …