aMarry

AcMary aMarry

3 followers · 34 following

Organizations

DeepSeek-V3 Public
Forked from deepseek-ai/DeepSeek-V3

Python MIT License Updated Dec 27, 2024
CUDA-Learn-Notes Public
Forked from DefTruth/CUDA-Learn-Notes

📚Tensor/CUDA Cores, 📖150+ CUDA Kernels, ⚡️⚡️toy-hgemm library with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS 🎉🎉).

Cuda GNU General Public License v3.0 Updated Dec 16, 2024
Cute-Gemm-Optimization Public
Forked from DD-DuDa/Cute-Learning

Makefile MIT License Updated Dec 16, 2024
cutlass Public
Forked from NVIDIA/cutlass

CUDA Templates for Linear Algebra Subroutines

C++ Other Updated Dec 16, 2024
cute-gemm Public
Forked from reed-lau/cute-gemm

C++ Updated Dec 16, 2024
Awesome-KV-Cache-Compression Public
Forked from October2001/Awesome-KV-Cache-Compression

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

MIT License Updated Dec 5, 2024
blog Public
Forked from huggingface/blog

Public repo for HF blog posts

Jupyter Notebook Updated Nov 20, 2024
flux Public
Forked from bytedance/flux

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ Apache License 2.0 Updated Oct 30, 2024
resource-stream Public
Forked from gpu-mode/resource-stream

GPU programming related news and material links

MIT License Updated Sep 23, 2024
triton Public
Forked from triton-lang/triton

Development repository for the Triton language and compiler

C++ MIT License Updated Jul 1, 2024
CUDALibrarySamples Public
Forked from NVIDIA/CUDALibrarySamples

CUDA Library Samples

Cuda Other Updated Apr 28, 2024
tiny-flash-attention Public
Forked from 66RING/tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass

Cuda Updated Apr 16, 2024
APPy Public
Forked from habanero-lab/APPy

APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to OpenMP, and automatically compiles the annotated code to GP…

Python MIT License Updated Feb 28, 2024
cutlass_fpA_intB_gemm Public
Forked from tlc-pack/cutlass_fpA_intB_gemm

A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer

C++ Apache License 2.0 Updated Feb 28, 2024
kernl Public
Forked from ELS-RD/kernl

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

Jupyter Notebook Apache License 2.0 Updated Feb 16, 2024
OneNeuralNetwork Public
Forked from matrix97317/OneNeuralNetwork

This is a cross-chip platform collection of operators and a unified neural network library.

Python Apache License 2.0 Updated Nov 3, 2023
Cpp-Templates-2ed Public
Forked from downdemo/Cpp-Templates-2ed

C++11/14/17/20 templates and generic programming, the most complex and difficult technical details of C++, indispensable in building infrastructure libraries.

C++ Apache License 2.0 Updated Sep 4, 2023
FlashAttention20 Public
Forked from kyegomez/FlashAttention20

Get down and dirty with FlashAttention2.0 in pytorch, plug in and play no complex CUDA kernels

Python MIT License Updated Jul 31, 2023
miniob Public
Forked from oceanbase/miniob

MiniOB is one mini database, helping developers to learn how database works.

C++ Mulan Permissive Software License, Version 2 Updated Jul 27, 2023
concurrentqueue Public
Forked from cameron314/concurrentqueue

A fast multi-producer, multi-consumer lock-free concurrent queue for C++11

C++ Other Updated Jun 19, 2023
TensorRT-in-Action Public
Forked from DD-DuDa/TensorRT-in-Action

TensorRT-in-Action 是一个 GitHub 代码库，提供了使用 TensorRT 的代码示例，并有对应 Jupyter Notebook。

Jupyter Notebook Apache License 2.0 Updated Jun 1, 2023
DeepLearningExamples Public
Forked from NVIDIA/DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

Jupyter Notebook Updated May 8, 2023
DeepLearningSystem Public
Forked from chenzomi12/AISystem

Deep Learning System core principles introduction.

Jupyter Notebook Apache License 2.0 Updated May 7, 2023
bitsandbytes Public
Forked from bitsandbytes-foundation/bitsandbytes

8-bit CUDA functions for PyTorch

Python MIT License Updated Apr 29, 2023
llama.onnx Public
Forked from wolf1981/llama.onnx

llama/alpaca onnx models, quantization and testcase

Python GNU General Public License v3.0 Updated Apr 19, 2023
how-to-optimize-gemm-1 Public
Forked from tpoisonooo/how-to-optimize-gemm

row-major matmul optimization

C++ GNU General Public License v3.0 Updated Apr 5, 2023
tvm_mlir_learn Public
Forked from BBuf/tvm_mlir_learn

tvm learn

Python Updated Mar 30, 2023
HPC-Learning-Notes Public
Forked from Eddie-Wang1120/HPC-Learning-Notes

高性能计算相关知识学习笔记，包含学习笔记和相关知识的代码demo，在持续完善中。如果有帮助的话请Star一下，对作者帮助很大，谢谢！

Jupyter Notebook Updated Mar 28, 2023
onnxruntime Public
Forked from microsoft/onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

C++ MIT License Updated Mar 17, 2023
ray Public
Forked from ray-project/ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for accelerating ML workloads.

Python Apache License 2.0 Updated Feb 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AcMary aMarry

Organizations

Block or report aMarry

DeepSeek-V3 Public

CUDA-Learn-Notes Public

Cute-Gemm-Optimization Public

cutlass Public

cute-gemm Public

Awesome-KV-Cache-Compression Public

blog Public

flux Public

resource-stream Public

triton Public

CUDALibrarySamples Public

tiny-flash-attention Public

APPy Public

cutlass_fpA_intB_gemm Public

kernl Public

OneNeuralNetwork Public

Cpp-Templates-2ed Public

FlashAttention20 Public

miniob Public

concurrentqueue Public

TensorRT-in-Action Public

DeepLearningExamples Public

DeepLearningSystem Public

bitsandbytes Public

llama.onnx Public

how-to-optimize-gemm-1 Public

tvm_mlir_learn Public

HPC-Learning-Notes Public

onnxruntime Public

ray Public