Stars
(MSc Thesis) Investigating the application of gradient compression techniques used to speed up distributed deep learning
SIDCo is An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems
Hi-Speed DNN Training with Espresso: Unleashing the Full Potential of Gradient Compression with Near-Optimal Usage Strategies (EuroSys '23)
GRACE - GRAdient ComprEssion for distributed deep learning
Practical low-rank gradient compression for distributed optimization: https://arxiv.org/abs/1905.13727
Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
neuralmagic / nm-vllm
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Outline Server, developed by Jigsaw. The Outline Server is a proxy server that runs a Shadowsocks instance and provides a REST API for access key management.
[CVPR 2023] DepGraph: Towards Any Structural Pruning
AIFM: High-Performance, Application-Integrated Far Memory
Next-generation datacenter OS built on kernel bypass to speed up unmodified code while improving platform density and security
A web framework for building multi-user virtual reality experiences.
[ICLR 2018] Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
Mellanox / dpdk-vhost-vfe
Forked from DPDK/dpdkData Plane Development Kit
Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
Ongoing research training transformer models at scale
PyTorch extensions for high performance and large scale training.
This is a code repository for pytorch c++ (or libtorch) tutorial.