Skip to content
View GeneZC's full-sized avatar
🌊
Timing
🌊
Timing

Block or report GeneZC

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Minimalistic 4D-parallelism distributed training framework for education purpose

Python 947 71 Updated Mar 7, 2025

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 108 32 Updated Jan 2, 2025
Python 393 32 Updated Mar 6, 2025

When it comes to optimizers, it's always better to be safe than sorry

Python 214 8 Updated Feb 23, 2025

Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, and other large language models.

Go 134,129 11,088 Updated Mar 21, 2025
Python 65 8 Updated Mar 19, 2025

Survey of Small Language Models from Penn State, ...

169 14 Updated Jan 16, 2025

Chat with multiple PDFs locally

Python 487 74 Updated Oct 11, 2024

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

Python 523 28 Updated Feb 19, 2025

Quantized Attention on GPU

Python 45 Updated Nov 22, 2024

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 1,173 73 Updated Mar 21, 2025

The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference

Python 67 Updated Jan 23, 2025

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 438 26 Updated Feb 10, 2025

On-device AI across mobile, embedded and edge for PyTorch

C++ 2,616 489 Updated Mar 21, 2025

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

Python 200 14 Updated Jan 6, 2025

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 8,936 629 Updated Mar 7, 2025

大模型进阶面经

38 1 Updated Feb 11, 2025

Python Intelligence Config Manager. A superset of hydra+pydantic+lsp

C 26 Updated Feb 4, 2025

FlagGems is an operator library for large language models implemented in Triton Language.

Python 456 73 Updated Mar 21, 2025

Odysseus: Playground of LLM Sequence Parallelism

Python 66 3 Updated Jun 17, 2024

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality

Python 3,739 282 Updated Aug 10, 2024
Python 3,578 333 Updated Feb 24, 2025

An Open Source Toolkit For LLM Distillation

Python 542 65 Updated Jan 7, 2025

A family of compressed models obtained via pruning and knowledge distillation

329 18 Updated Nov 13, 2024

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 555 40 Updated Feb 14, 2025

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,953 113 Updated Jul 29, 2024
Python 7 Updated Jun 26, 2024

The official evaluation suite and dynamic data release for MixEval.

Python 233 41 Updated Nov 10, 2024

source code of paper "On the Hallucination in Simultaneous Machine Translation"

Python 2 Updated Jun 1, 2024

[Neurips2024] Source code for xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token

Jupyter Notebook 128 12 Updated Jul 4, 2024
Next