-
University of Connecticut
- CT. USA
- https://harveyp123.github.io
Stars
Official Implementation of "RTop-K: Ultra-Fast Row-Wise Top-K Selection for Neural Network Acceleration on GPUs"
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation (ICLR 2025)
Codes for "Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective" (AAAI 2025)
RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.
YaRN: Efficient Context Window Extension of Large Language Models
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)
EleutherAI / nanoGPT-mup
Forked from karpathy/nanoGPTThe simplest, fastest repository for training/finetuning medium-sized GPTs.
[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
Sparse Backpropagation for Mixture-of-Expert Training
Official inference repo for FLUX.1 models
Lumina-T2X is a unified framework for Text to Any Modality Generation
A simple pip-installable Python tool to generate your own HTML citation world map from your Google Scholar ID.
A PyTorch native library for large model training
arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv
Large Language Model Text Generation Inference
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality
Ring attention implementation with flash attention