byungsoo-oh

Follow

Byungsoo Oh byungsoo-oh

Follow

CS Ph.D. Student @ Cornell University

9 followers · 8 following

Achievements

Achievements

Stars

SNU-ARC / any-precision-llm

[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Python 89 3 Updated Dec 23, 2024

stanford-mast / cedar

Python 5 Updated Oct 22, 2024

deepseek-ai / DeepSeek-V3

Python 15,246 1,103 Updated Jan 3, 2025

Raphael-Hao / brainstorm

Compiler for Dynamic Neural Networks

Python 44 2 Updated Nov 13, 2023

facebookresearch / memory

Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsely activated memory layers complement compute-heavy dense f…

Python 196 9 Updated Dec 12, 2024

thu-ml / SageAttention

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 788 43 Updated Dec 28, 2024

ranggihwang / Pregated_MoE

C++ 36 5 Updated May 4, 2024

timlee0212 / SiDA-MoE

Code for MLSys 2024 Paper "SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models"

Python 12 5 Updated Apr 13, 2024

UChi-JCL / CacheGen

Python 78 11 Updated Oct 9, 2024

LoongServe / LoongServe

Jupyter Notebook 74 6 Updated Nov 11, 2024

dvmazur / mixtral-offloading

Run Mixtral-8x7B models in Colab or consumer desktops

Python 2,295 226 Updated Apr 8, 2024

NVIDIA / nvidia-resiliency-ext

NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to fa…

Python 66 4 Updated Jan 3, 2025

facebookresearch / HolisticTraceAnalysis

A library to analyze PyTorch traces.

Python 317 45 Updated Dec 3, 2024

Xnhyacinth / Awesome-LLM-Long-Context-Modeling

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

1,113 41 Updated Jan 3, 2025

feifeibear / long-context-attention

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

Python 393 29 Updated Dec 30, 2024

Infini-AI-Lab / Sequoia

scalable and robust tree-based speculative decoding algorithm

Python 325 37 Updated Aug 13, 2024

Tencent / Tencent-Hunyuan-Large

Python 1,331 67 Updated Dec 6, 2024

rhymes-ai / Aria

Codebase for Aria - an Open Multimodal Native MoE

Jupyter Notebook 954 79 Updated Dec 18, 2024

HITsz-TMG / UMOE-Scaling-Unified-Multimodal-LLMs

The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"

Python 788 38 Updated Sep 6, 2024

AIoT-MLSys-Lab / Efficient-LLMs-Survey

[TMLR 2024] Efficient Large Language Models: A Survey

1,062 87 Updated Jan 3, 2025

horseee / Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Python 1,369 102 Updated Dec 30, 2024

SamsungLabs / Metis

[ATC '24] Metis: Fast automatic distributed training on heterogeneous GPUs (https://www.usenix.org/conference/atc24/presentation/um)

Python 20 11 Updated Nov 18, 2024

InternLM / AcmeTrace

Jupyter Notebook 134 7 Updated Mar 12, 2024

microsoft / nnscaler

nnScaler: Compiling DNN models for Parallel Training

Python 86 13 Updated Dec 10, 2024

InternLM / InternEvo

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

Python 323 54 Updated Jan 3, 2025

Azure / msccl

Microsoft Collective Communication Library

58 6 Updated Nov 23, 2024

pytorch / examples

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

Python 22,579 9,563 Updated Nov 8, 2024

LMCache / LMCache

Making Long-Context LLM Inference 10x Faster and 10x Cheaper

Python 338 36 Updated Jan 5, 2025

donnemartin / system-design-primer

Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

Python 284,424 47,493 Updated Dec 2, 2024

merthidayetoglu / HiCCL

A hierarchical collective communications library with portable optimizations

C++ 25 5 Updated Dec 8, 2024