- Palo Alto
-
13:03
(UTC -07:00) - https://ryanlyn.ch
- in/ryantlynch
Lists (1)
Sort Name ascending (A-Z)
Stars
Fully open data curation for reasoning models
A low-latency & high-throughput serving engine for LLMs
🟣 Concurrency interview questions and answers to help you prepare for your next software architecturea and design patterns interview in 2024.
What would you do with 1000 H100s...
SGLang is a fast serving framework for large language models and vision language models.
gatech-sysml / sarathi-serve
Forked from microsoft/sarathi-serveA low-latency & high-throughput serving engine for LLMs
Teaching and Learning Software Verification via SVF
Implementation of a Transformer, but completely in Triton
Development repository for the Triton language and compiler
Tensor Compute Primitives: Mid-level Intermediate Representation for Machine Learning Programs
An attempt at proxying vscode remote shell backend through cluster login nodes.
LLM Serving Performance Evaluation Harness
HeteroCL-MLIR dialect for accelerator design
Allo: A Programming Model for Composable Accelerator Design
Use of a Linux initramfs to fully automate the bootstrapping process
Scheduling application designed to mitigate some of the pain-points present throughout Georgia Tech's registration process.
Fast and memory-efficient exact attention
Utilities intended for use with Llama models.