Skip to content
View luchangli03's full-sized avatar

Block or report luchangli03

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题

HTML 4,534 525 Updated Oct 22, 2024

LeetCode 101:力扣刷题指南

8,919 1,180 Updated Dec 8, 2024

Tile primitives for speedy kernels

Cuda 1,902 92 Updated Jan 4, 2025

Fastest kernels written from scratch

Cuda 92 12 Updated Nov 30, 2024

SGLang is a fast serving framework for large language models and vision language models.

Python 7,148 664 Updated Jan 7, 2025

This project is about convolution operator optimization on GPU, include GEMM based (Implicit GEMM) convolution.

C++ 23 1 Updated Dec 27, 2024

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 328 68 Updated Sep 8, 2024
Python 181 9 Updated Dec 17, 2024

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 8,869 635 Updated Jan 5, 2025

A Easy-to-understand TensorOp Matmul Tutorial

C++ 305 34 Updated Sep 21, 2024

Yinghan's Code Sample

Cuda 297 54 Updated Jul 25, 2022

Fast CUDA matrix multiplication from scratch

Cuda 566 69 Updated Dec 28, 2023

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,329 132 Updated Jan 7, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 33,312 5,078 Updated Jan 7, 2025

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 6,698 1,885 Updated Jul 26, 2024

CUDA Library Samples

Cuda 1,698 355 Updated Dec 22, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,092 1,048 Updated Jan 6, 2025

CUDA Core Compute Libraries

C++ 1,368 174 Updated Jan 7, 2025

A modern, C++-native, test framework for unit-tests, TDD and BDD - using C++14, C++17 and later (C++11 support is in v2.x branch, and C++03 on the Catch1.x branch)

C++ 18,901 3,067 Updated Jan 5, 2025

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Python 30,954 7,546 Updated Dec 19, 2024

经济学人(含音频)、纽约客、卫报、连线、大西洋月刊等英语杂志免费下载,支持epub、mobi、pdf格式, 每周更新

CSS 22,699 1,795 Updated Jan 7, 2025

A Python Compiler Design Toolkit

Python 294 77 Updated Jan 7, 2025

Official implementation of S3Diff

Python 119 11 Updated Nov 9, 2024

A self-learning tutorail for CUDA High Performance Programing.

JavaScript 307 35 Updated Dec 17, 2024

Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.

Python 29,123 3,645 Updated Aug 6, 2024

A profiler to disclose and quantify hardware features on GPUs.

C++ 164 23 Updated May 15, 2022

An innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification.

Python 24 2 Updated Feb 20, 2024

[ICLR 2024] Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation

Python 149 17 Updated Mar 1, 2024
Next