Skip to content
View Nateiru's full-sized avatar
🎋
Focusing
🎋
Focusing

Highlights

  • Pro

Block or report Nateiru

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Python 8,598 1,436 Updated Apr 4, 2025

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,021 156 Updated Mar 26, 2025

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,286 556 Updated Apr 5, 2025

《Build a Large Language Model (From Scratch)》是一本深入探讨大语言模型原理与实现的电子书,适合希望深入了解 GPT 等大模型架构、训练过程及应用开发的学习者。为了让更多中文读者能够接触到这本极具价值的教材,我决定将其翻译成中文,并通过 GitHub 进行开源共享。

658 142 Updated Apr 2, 2025

Material for gpu-mode lectures

Jupyter Notebook 4,178 420 Updated Feb 9, 2025

LLM inference in C/C++

C++ 77,665 11,317 Updated Apr 5, 2025

The Art of Debugging

C 865 39 Updated Aug 3, 2024

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

367 8 Updated Apr 2, 2025

Magnificent app which corrects your previous console command.

Python 91,316 3,669 Updated Jul 19, 2024

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 18,343 2,062 Updated Apr 5, 2025

A faster int-to-int hashmap implemented in C++.

C++ 41 7 Updated Jan 6, 2025

Concurrency primitives, safe memory reclamation mechanisms and non-blocking (including lock-free) data structures designed to aid in the research, design and implementation of high performance conc…

C 2,474 323 Updated Mar 7, 2025

LLM training in simple, raw C/CUDA

Cuda 26,243 3,018 Updated Oct 2, 2024

Systems for GenAI

129 8 Updated Mar 8, 2025

Development repository for the Triton language and compiler

MLIR 15,086 1,903 Updated Apr 5, 2025

Boosting 4-bit inference kernels with 2:4 Sparsity

Cuda 72 5 Updated Sep 4, 2024

A basic deep learning library, comparable to a very minimal version of PyTorch.

Python 13 2 Updated Mar 1, 2023

µcoro

C++ 135 18 Updated Jan 16, 2025

QRec: A Python Framework for quick implementation of recommender systems (TensorFlow Based)

Python 1,609 406 Updated Dec 26, 2023

练习下用pytorch来复现下经典的推荐系统模型, 如MF, FM, DeepConn, MMOE, PLE, DeepFM, NFM, DCN, AFM, AutoInt, ONN, FiBiNET, DCN-v2, AFN, DCAP等

Python 597 124 Updated Mar 14, 2022

Notes about courses Dive into Deep Learning by Mu Li

Jupyter Notebook 3,528 562 Updated Apr 11, 2023

A model compilation solution for various hardware

MLIR 418 47 Updated Apr 4, 2025

row-major matmul optimization

C++ 616 86 Updated Sep 9, 2023

how to optimize some algorithm in cuda.

Cuda 2,070 184 Updated Apr 5, 2025

C++ library for executors

C++ 501 75 Updated Sep 21, 2016

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 13,099 1,887 Updated Mar 29, 2025

关于Transformer模型的最简洁pytorch实现,包含详细注释

Jupyter Notebook 186 23 Updated Nov 13, 2023

🧡 Follow everything in one place

TypeScript 24,844 1,050 Updated Apr 5, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 5,997 518 Updated Apr 4, 2025

Pytorch domain library for recommendation systems

Python 2,081 489 Updated Apr 5, 2025
Next
Showing results