-
Cornell University
- Ithaca, NY
- https://byungsoo-oh.github.io/
- https://orcid.org/0000-0003-4949-1472
- in/byungsoo-oh-800351140
Stars
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsely activated memory layers complement compute-heavy dense f…
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Code for MLSys 2024 Paper "SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models"
Run Mixtral-8x7B models in Colab or consumer desktops
NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to fa…
A library to analyze PyTorch traces.
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
scalable and robust tree-based speculative decoding algorithm
Codebase for Aria - an Open Multimodal Native MoE
The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"
[TMLR 2024] Efficient Large Language Models: A Survey
A curated list for Efficient Large Language Models
[ATC '24] Metis: Fast automatic distributed training on heterogeneous GPUs (https://www.usenix.org/conference/atc24/presentation/um)
nnScaler: Compiling DNN models for Parallel Training
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
Making Long-Context LLM Inference 10x Faster and 10x Cheaper
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
A hierarchical collective communications library with portable optimizations