luliyucoordinate

🏅

Focusing

LiYu Lu luliyucoordinate

🏅

Focusing

Pytorch/TensorFlow/CUDA/HPC/more

282 followers · 57 following

hangzhou

Achievements

Organizations

Lists (1)

Sort

✨ Inspiration

2 repositories

Stars

73 stars written in C++

Clear filter

ggerganov / llama.cpp

LLM inference in C/C++

C++ 72,780 10,485 Updated Feb 2, 2025

ml-explore / mlx

MLX: An array framework for Apple silicon

C++ 18,776 1,073 Updated Feb 3, 2025

triton-lang / triton

Development repository for the Triton language and compiler

C++ 14,241 1,763 Updated Feb 3, 2025

microsoft / BitNet

Official inference framework for 1-bit LLMs

C++ 12,685 886 Updated Dec 20, 2024

ggerganov / ggml

Tensor library for machine learning

C++ 11,707 1,104 Updated Jan 29, 2025

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,280 1,089 Updated Feb 2, 2025

scylladb / seastar

High performance server-side application framework

C++ 8,514 1,580 Updated Feb 2, 2025

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 6,109 1,053 Updated Feb 2, 2025

baidu / braft

An industrial-grade C++ implementation of RAFT consensus algorithm based on brpc, widely used inside Baidu to build highly-available distributed systems.

C++ 4,037 893 Updated Oct 25, 2024

NVlabs / tiny-cuda-nn

Lightning fast C++/CUDA neural network framework

C++ 3,856 475 Updated Jan 27, 2025

dblalock / bolt

10x faster matrix and vector operations

C++ 2,480 171 Updated Oct 12, 2022

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,456 146 Updated Jan 24, 2025

opencurve / curve

Curve is a sandbox project hosted by the CNCF Foundation. It's cloud-native, high-performance, and easy to operate. Curve is an open-source distributed storage system for block and shared file stor…

C++ 2,344 526 Updated Aug 13, 2024

microsoft / BlingFire

A lightning fast Finite State machine and REgular expression manipulation library.

C++ 1,834 131 Updated Dec 8, 2024

alibaba / async_simple

Simple, light-weight and easy-to-use asynchronous components

C++ 1,812 267 Updated Jan 23, 2025

flexflow / flexflow-train

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,750 234 Updated Feb 2, 2025

alibaba / yalantinglibs

A collection of modern C++ libraries, include coro_rpc, struct_pack, struct_json, struct_xml, struct_pb, easylog, async_simple

C++ 1,660 254 Updated Jan 27, 2025

kevmo314 / scuda

SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.

C++ 1,602 53 Updated Jan 28, 2025

Tencent / TurboTransformers

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

C++ 1,504 199 Updated Jun 12, 2023

NVIDIA / cccl

CUDA Core Compute Libraries

C++ 1,409 185 Updated Feb 3, 2025

pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,249 529 Updated Feb 2, 2025

gpgpu-sim / gpgpu-sim_distribution

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…

C++ 1,210 526 Updated Aug 21, 2024