bobliao foreverlms

📚

Studying

/**************/

20 followers · 82 following

Bytedance
Shanghai, China
08:54 (UTC +08:00)
https://blog.bobliao.xyz

Achievements

flashinfer Public
Forked from flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda Apache License 2.0 Updated Feb 12, 2025
TensorRT-LLM Public
Forked from NVIDIA/TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ Apache License 2.0 Updated Feb 10, 2025
cutlass Public
Forked from NVIDIA/cutlass

CUDA Templates for Linear Algebra Subroutines

C++ Other Updated Jan 18, 2025
CUDA-Learn-Notes Public
Forked from DefTruth/CUDA-Learn-Notes

📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda GNU General Public License v3.0 Updated Dec 29, 2024
cfx-article-src Public
Forked from ColfaxResearch/cfx-article-src

C++ Updated Dec 20, 2024
ZhiLight Public
Forked from zhihu/ZhiLight

A highly optimized inference acceleration engine for Llama and its variants.

C++ Apache License 2.0 Updated Dec 10, 2024
lightllm Public
Forked from ModelTC/lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python Apache License 2.0 Updated Dec 9, 2024
Cute-Learning Public
Forked from DD-DuDa/Cute-Learning

Makefile MIT License Updated Oct 31, 2024
armnn Public
Forked from ARM-software/armnn

Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn

C++ MIT License Updated Oct 24, 2024
cute-gemm-101 Public

Cuda Updated Oct 4, 2024
composable_kernel Public
Forked from ROCm/composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

C++ Other Updated Sep 18, 2024
cuda-training-series Public
Forked from olcf/cuda-training-series

Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)

Cuda Updated Aug 19, 2024
flash-attention Public
Forked from Dao-AILab/flash-attention

Fast and memory-efficient exact attention

Python BSD 3-Clause "New" or "Revised" License Updated Aug 6, 2024
pytorch Public
Forked from pytorch/pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python Other Updated Jul 15, 2024
cutlass-kernels Public
Forked from ColfaxResearch/cutlass-kernels

Cuda MIT License Updated Jul 11, 2024
cuda_sgemm Public
Forked from njuhope/cuda_sgemm

Cuda Updated Apr 11, 2024
llm-numbers Public
Forked from ray-project/llm-numbers

Numbers every LLM developer should know

Updated Jan 16, 2024
INT8-Flash-Attention-FMHA-Quantization Public
Forked from jundaf2/INT8-Flash-Attention-FMHA-Quantization

Cuda Updated Sep 15, 2023
awesome-tensor-compilers Public
Forked from merrymercy/awesome-tensor-compilers

A list of awesome compiler projects and papers for tensor computation and deep learning.

Updated Apr 2, 2023
MegEngine Public
Forked from MegEngine/MegEngine

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架

C++ Apache License 2.0 Updated Feb 7, 2023
llvm-project Public
Forked from llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at…

Other Updated Feb 7, 2023
abc Public

code snippets

Jupyter Notebook Updated Jan 31, 2023
MNN Public
Forked from alibaba/MNN

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba

C++ 2 Updated Jan 13, 2023
maxas Public
Forked from NervanaSystems/maxas

Assembler for NVIDIA Maxwell architecture

Sass MIT License Updated Jan 3, 2023
perf-ninja Public
Forked from dendibakh/perf-ninja

This is an online course where you can learn and master the skill of low-level performance analysis and tuning.

C++ Updated Jan 1, 2023
MegPeak Public
Forked from MegEngine/MegPeak

C++ Apache License 2.0 Updated Dec 14, 2022
foreverlms.github.io Public

个人博客，参考的模板是izhengfan.github.io

HTML 1 MIT License Updated Oct 20, 2022
gdb-dashboard Public
Forked from cyrus-and/gdb-dashboard

Modular visual interface for GDB in Python

Python MIT License Updated Oct 19, 2022
folly Public
Forked from facebook/folly

An open-source C++ library developed and used at Facebook.

C++ Apache License 2.0 Updated Oct 15, 2022
dev-sidecar Public
Forked from docmirror/dev-sidecar

开发者边车，github打不开，github加速，git clone加速，git release下载加速，stackoverflow加速

JavaScript Mozilla Public License 2.0 Updated Aug 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bobliao foreverlms

Achievements

Achievements

Block or report foreverlms

flashinfer Public

TensorRT-LLM Public

cutlass Public

CUDA-Learn-Notes Public

cfx-article-src Public

ZhiLight Public

lightllm Public

Cute-Learning Public

armnn Public

cute-gemm-101 Public

composable_kernel Public

cuda-training-series Public

flash-attention Public

pytorch Public

cutlass-kernels Public

cuda_sgemm Public

llm-numbers Public

INT8-Flash-Attention-FMHA-Quantization Public

awesome-tensor-compilers Public

MegEngine Public

llvm-project Public

abc Public

MNN Public

maxas Public

perf-ninja Public

MegPeak Public

foreverlms.github.io Public

gdb-dashboard Public

folly Public

dev-sidecar Public