Stars
A flexible and efficient training framework for large-scale alignment tasks
Run your deep learning workloads on Kubernetes more easily and efficiently.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Retrieval and Retrieval-augmented LLMs
Ongoing research training transformer models at scale
Tensors and Dynamic neural networks in Python with strong GPU acceleration
An Open Source Machine Learning Framework for Everyone
alibaba / Megatron-LLaMA
Forked from NVIDIA/Megatron-LMBest practice for training LLaMA models in Megatron-LM
π° Must-read papers and blogs on Speculative Decoding β‘οΈ
πA curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. ππ
Survey: A collection of AWESOME papers and resources on the large language model (LLM) related recommender system topics.
An easy-to-use framework for large scale recommendation algorithms.
Fast and memory-efficient exact attention
FlashInfer: Kernel Library for LLM Serving