- shanghai
-
19:05
(UTC +08:00)
Starred repositories
Next generation BLAS implementation for ROCm platform
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
Fast and memory-efficient exact attention
How do we integrate AI generation tools into actual work? | 关于 Ai 绘画的Wiki | Wiki about Ai painting | Prompts Engineering| 指南 Guide | Seeking Maintainer&Translator🙌
GNU toolchain for RISC-V, including GCC
FlagGems is an operator library for large language models implemented in Triton Language.
Official QEMU mirror. Please see https://www.qemu.org/contribute/ for how to submit changes to QEMU. Pull Requests are ignored. Please only use release tarballs from the QEMU website.
ROCm Platform Runtime: ROCr a HPC market enhanced HSA based runtime
hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library
Hook function calls by replacing PLT(Procedure Linkage Table) entries.
Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
Optimized primitives for collective multi-GPU communication
Demo project for building Python wheels for Linux with Travis-CI
ROCm / triton
Forked from triton-lang/tritonDevelopment repository for the Triton language and compiler
A high-throughput and memory-efficient inference and serving engine for LLMs
High-speed Large Language Model Serving for Local Deployment
Shared Middle-Layer for Triton Compilation