Skip to content
View snowpeakz's full-sized avatar

Block or report snowpeakz

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

SGLang is a fast serving framework for large language models and vision language models.

Python 7,303 701 Updated Jan 14, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,365 134 Updated Jan 14, 2025

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs

C++ 440 60 Updated Jan 7, 2025

C++ builds C++

C++ 24 Updated Nov 14, 2024

A Lightweight Recommendation System

Python 6,908 511 Updated Nov 8, 2023

Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.

Cuda 601 139 Updated Jan 7, 2025

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 2,838 471 Updated Jan 15, 2025

CUDA Core Compute Libraries

C++ 1,385 179 Updated Jan 15, 2025
C++ 503 90 Updated Jan 7, 2025

Material for gpu-mode lectures

Jupyter Notebook 3,480 348 Updated Jan 6, 2025

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on h…

Cuda 135 26 Updated Jan 6, 2025

cuDF - GPU DataFrame Library

C++ 8,595 918 Updated Jan 15, 2025

compiler learning resources collect.

Python 2,235 340 Updated May 27, 2024

how to learn PyTorch and OneFlow

376 24 Updated Mar 22, 2024

how to optimize some algorithm in cuda.

Cuda 1,823 151 Updated Jan 15, 2025

Fast and memory-efficient exact attention

Python 15,055 1,421 Updated Jan 14, 2025

Awesome-LLM: a curated list of Large Language Model

20,610 1,689 Updated Jan 13, 2025

CUDA Templates for Linear Algebra Subroutines

C++ 5,998 1,037 Updated Jan 10, 2025

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021

Python 55 12 Updated Jul 21, 2021

Alluxio, data orchestration for analytics and machine learning in the cloud

Java 6,903 2,939 Updated Nov 27, 2024

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 137,592 27,568 Updated Jan 14, 2025

LightSeq: A High Performance Library for Sequence Processing and Generation

C++ 3,243 331 Updated May 16, 2023

[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

Python 317 47 Updated Apr 11, 2023

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

C++ 467 39 Updated Mar 15, 2024

Transformer related optimization, including BERT, GPT

C++ 5,980 895 Updated Mar 27, 2024

Ongoing research training transformer models at scale

Python 11,092 2,477 Updated Jan 14, 2025

Inference code for Llama models

Python 57,208 9,658 Updated Aug 18, 2024

Training and serving large-scale neural networks with auto parallelization.

Python 3,093 360 Updated Dec 9, 2023

ImageBind One Embedding Space to Bind Them All

Python 8,474 783 Updated Jul 31, 2024

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 6,747 1,886 Updated Jul 26, 2024
Next