Skip to content
View fxmarty-amd's full-sized avatar

Block or report fxmarty-amd

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

SGLang is a fast serving framework for large language models and vision language models.

Python 9,280 885 Updated Feb 11, 2025

AMD SMI

C++ 53 31 Updated Feb 11, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 1,977 200 Updated Feb 11, 2025

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 37,736 4,615 Updated Feb 11, 2025

A minimal implementation of vllm.

Cuda 33 Updated Jul 27, 2024

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 210 17 Updated Feb 11, 2025

A framework for few-shot evaluation of language models.

Python 7,733 2,082 Updated Feb 11, 2025

The Paper List on Data Contamination for Large Language Models Evaluation.

89 3 Updated Jan 10, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 37,350 5,615 Updated Feb 11, 2025

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 30,808 12,646 Updated Feb 11, 2025

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Python 35,316 5,995 Updated Feb 11, 2025

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 292 25 Updated Oct 30, 2024
Python 19 13 Updated Jan 31, 2025

CUDA on non-NVIDIA GPUs

Rust 10,618 685 Updated Feb 7, 2025

Clspv is a compiler for OpenCL C to Vulkan compute shaders

LLVM 652 92 Updated Jan 31, 2025

chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.

C++ 248 35 Updated Feb 6, 2025

Online compiler for HIP and NVIDIA® CUDA® code to WebGPU

C++ 137 1 Updated Jan 8, 2025
C++ 117 54 Updated Feb 11, 2025

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

Cuda 318 48 Updated Jan 2, 2025

ROCm BLAS marshalling library

C++ 131 81 Updated Feb 11, 2025

hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library

Assembly 77 103 Updated Feb 11, 2025

Bag of Tricks for NN Quantization

Python 3 Updated Dec 9, 2024

Fast CUDA matrix multiplication from scratch

Cuda 624 81 Updated Dec 28, 2023

A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators

Python 73 8 Updated Jan 2, 2024
MLIR 137 40 Updated Feb 11, 2025

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream d…

Python 701 52 Updated Feb 11, 2025

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")

C++ 299 55 Updated Feb 11, 2025

Stretching GPU performance for GEMMs and tensor contractions.

Python 231 154 Updated Feb 7, 2025

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

C++ 340 146 Updated Feb 11, 2025

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

C++ 622 55 Updated Jan 21, 2025
Next