monellz

Follow

Runxin Zhong monellz

Follow

28 followers · 41 following

Tsinghua University
19:27 (UTC +08:00)

Achievements

Achievements

Highlights

Pro

Starred repositories

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU kernels

C 133 12 Updated Jan 22, 2025

proger / accelerated-scan

Accelerated First Order Parallel Associative Scan

Python 169 8 Updated Aug 20, 2024

fla-org / flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Python 1,773 97 Updated Jan 22, 2025

microsoft / FractalTensor

Python 22 4 Updated Dec 21, 2024

Zefan-Cai / KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

Python 852 151 Updated Jan 4, 2025

FoundationVision / Infinity

Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Python 891 32 Updated Jan 21, 2025

pytorch-labs / attention-gym

Helpful tools and examples for working with flex-attention

Python 593 33 Updated Jan 15, 2025

pallets / jinja

A very fast and expressive template engine.

Python 10,529 1,627 Updated Jan 14, 2025

reed-lau / cute-gemm

C++ 94 26 Updated Dec 6, 2024

thu-ml / SageAttention

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 874 51 Updated Dec 28, 2024

FoundationVision / VAR

[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…

Jupyter Notebook 6,440 420 Updated Jan 12, 2025

lcpu-club / overleaf

Forked from overleaf/overleaf

PKU LaTeX

JavaScript 48 4 Updated Jan 16, 2025

overleaf / overleaf

A web-based collaborative LaTeX editor

JavaScript 14,620 1,492 Updated Jan 22, 2025

arceos-org / arceos

An experimental modular OS written in Rust.

Rust 576 302 Updated Jan 7, 2025

spcl / pymlir

Python interface for MLIR - the Multi-Level Intermediate Representation

Python 238 38 Updated Nov 28, 2024

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Cuda 706 29 Updated Sep 21, 2024

PandaFoss / Awesome-Arch

😎 A complete list of Arch-based projects

HTML 459 27 Updated Sep 29, 2024

1Panel-dev / 1Panel

🔥 Top-Rated Web-Based Linux Server Management Tool. 1Panel features an intuitive web interface that seamlessly integrates server management and monitoring, container management, database administra…

Go 25,074 2,251 Updated Jan 22, 2025

fatedier / frp

A fast reverse proxy to help you expose a local server behind a NAT or firewall to the internet.

Go 89,634 13,641 Updated Jan 16, 2025

aigc-apps / EasyAnimate

📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion

Python 1,693 122 Updated Jan 22, 2025

siliconflow / onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.

Jupyter Notebook 1,761 115 Updated Jan 13, 2025

SagerNet / sing-box

The universal proxy platform

Go 21,542 2,572 Updated Jan 19, 2025

bytedance / flux

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 276 24 Updated Oct 30, 2024

makslevental / mlir-wheels

Python 22 2 Updated Jan 14, 2025

makslevental / mlir-python-extras

The missing pieces (as far as boilerplate reduction goes) of the upstream MLIR python bindings.

Python 75 8 Updated Jan 19, 2025

NVIDIA / TensorRT-Incubator

Experimental projects related to TensorRT

MLIR 86 14 Updated Jan 21, 2025

xdslproject / xdsl

A Python Compiler Design Toolkit

Python 300 79 Updated Jan 22, 2025

pytorch / ao

PyTorch native quantization and sparsity for training and inference

Python 1,765 204 Updated Jan 22, 2025

greg7mdp / parallel-hashmap

A family of header-only, very fast and memory-friendly hashmap and btree containers.

C++ 2,717 252 Updated Jan 21, 2025

Vchitect / LiteGen

A light-weight and high-efficient training framework for accelerating diffusion tasks.

Python 45 2 Updated Sep 14, 2024

Starred topics

Raspberry Pi

Game engine

Data structures

Continuous integration

C++

Awesome Lists

Algorithm