Skip to content
View ultranity's full-sized avatar

Block or report ultranity

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Analyze computation-communication overlap in V3/R1.

508 43 Updated Feb 27, 2025

Implementation for SimDINO/SimDINOv2

Python 38 1 Updated Feb 26, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 6,352 479 Updated Feb 27, 2025

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 1,005 63 Updated Feb 15, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 3,742 303 Updated Feb 27, 2025

PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437

Python 599 22 Updated Feb 25, 2025

An obsidian template vault for tracking your academic life.

TypeScript 187 8 Updated Feb 21, 2025

Redis for humans. 🌎🌍🌏

Python 1,123 62 Updated Feb 10, 2025

Cost-efficient and pluggable Infrastructure components for GenAI inference

Jupyter Notebook 2,604 225 Updated Feb 27, 2025

Official repo of paper LM2

Python 24 6 Updated Feb 13, 2025

A token pruning method that accelerates ViTs for various tasks while maintaining high performance.

Python 9 Updated Jan 7, 2025

[Oral; Neurips OPT2024 ] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers

Python 12 Updated Dec 9, 2024

Official implementation of the paper "You Do Not Fully Utilize Transformer's Representation Capacity"

Python 26 1 Updated Feb 16, 2025

Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs

Python 87 6 Updated Feb 27, 2025

I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models

113 2 Updated Feb 19, 2025

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 4,080 437 Updated Feb 20, 2025

Official code for paper: F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Consistent Gaussian Splatting

Python 35 1 Updated Jan 14, 2025

LookHere position encoding for ViTs (NeurIPS 2024)

Python 5 Updated Oct 21, 2024

[3DV'25] 3D Reconstruction with Spatial Memory

Python 929 45 Updated Feb 25, 2025

Efficient End2End Compiler for Mixed-Precision Deep Learning

Python 1 Updated Feb 8, 2025

Reproducible evaluation of NeRF and 3DGS methods

Python 253 11 Updated Feb 22, 2025
Jupyter Notebook 282 8 Updated Feb 25, 2025

This repository provides tutorials and implementations for various Generative AI Agent techniques, from basic to advanced. It serves as a comprehensive guide for building intelligent, interactive A…

Jupyter Notebook 7,344 969 Updated Feb 25, 2025

📦 Repomix (formerly Repopack) is a powerful tool that packs your entire repository into a single, AI-friendly file. Perfect for when you need to feed your codebase to Large Language Models (LLMs) o…

TypeScript 11,538 500 Updated Feb 26, 2025

FlashMLA: Efficient MLA Decoding Kernel for Hopper GPUs

C++ 10,506 668 Updated Feb 27, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 354 23 Updated Feb 26, 2025

A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.

Python 126 12 Updated Feb 25, 2025

MoH: Multi-Head Attention as Mixture-of-Head Attention

Python 207 7 Updated Oct 29, 2024
Next