-
Meta
- Menlo Park, CA
- https://dongheuw.github.io
- @dongheuw
Stars
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
Analyze computation-communication overlap in V3/R1.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
PyTorch extensions for high performance and large scale training.
Ongoing research training transformer models at scale
A PyTorch native library for large-scale model training
VOCAL-UDF: Self-Enhancing Video Data Management System for Compositional Events with Large Language Models
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…
Fast and memory-efficient exact attention
Code for paper "AI for radiographic COVID-19 detection selects shortcuts over signal"
[NeurIPS 2024] A task generation and model evaluation system for multimodal language models.
Class file for University of Washington thesis formatting with LaTeX.
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
Awesome-LLM: a curated list of Large Language Model
Snowflake dataset containing statistics for 70 million queries over 14 day period
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
LLM papers I'm reading, mostly on inference and model compression
Graph Compression using Quasi-stable Coloring
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
EQUI-VOCAL: Synthesizing Queries for Compositional Video Events from Limited User Interactions
Explain, analyze, and visualize NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, B…