Skip to content
View byungsoo-oh's full-sized avatar

Block or report byungsoo-oh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Python 89 3 Updated Dec 23, 2024
Python 5 Updated Oct 22, 2024

Compiler for Dynamic Neural Networks

Python 44 2 Updated Nov 13, 2023

Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsely activated memory layers complement compute-heavy dense f…

Python 196 9 Updated Dec 12, 2024

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 788 43 Updated Dec 28, 2024

Code for MLSys 2024 Paper "SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models"

Python 12 5 Updated Apr 13, 2024
Python 78 11 Updated Oct 9, 2024
Jupyter Notebook 74 6 Updated Nov 11, 2024

Run Mixtral-8x7B models in Colab or consumer desktops

Python 2,295 226 Updated Apr 8, 2024

NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to fa…

Python 66 4 Updated Jan 3, 2025

A library to analyze PyTorch traces.

Python 317 45 Updated Dec 3, 2024

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

1,113 41 Updated Jan 3, 2025

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

Python 393 29 Updated Dec 30, 2024

scalable and robust tree-based speculative decoding algorithm

Python 325 37 Updated Aug 13, 2024

Codebase for Aria - an Open Multimodal Native MoE

Jupyter Notebook 954 79 Updated Dec 18, 2024

The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"

Python 788 38 Updated Sep 6, 2024

[TMLR 2024] Efficient Large Language Models: A Survey

1,062 87 Updated Jan 3, 2025

A curated list for Efficient Large Language Models

Python 1,369 102 Updated Dec 30, 2024

[ATC '24] Metis: Fast automatic distributed training on heterogeneous GPUs (https://www.usenix.org/conference/atc24/presentation/um)

Python 20 11 Updated Nov 18, 2024
Jupyter Notebook 134 7 Updated Mar 12, 2024

nnScaler: Compiling DNN models for Parallel Training

Python 86 13 Updated Dec 10, 2024

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

Python 323 54 Updated Jan 3, 2025

Microsoft Collective Communication Library

58 6 Updated Nov 23, 2024

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

Python 22,579 9,563 Updated Nov 8, 2024

Making Long-Context LLM Inference 10x Faster and 10x Cheaper

Python 338 36 Updated Jan 5, 2025

Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

Python 284,424 47,493 Updated Dec 2, 2024

A hierarchical collective communications library with portable optimizations

C++ 25 5 Updated Dec 8, 2024
Next