Stars
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Learning material for CMU10-714: Deep Learning System
A cheatsheet of modern C++ language and library features.
Windows Calculator: A simple yet powerful calculator that ships with Windows
Software Architecture with C++, published by Packt
A Primer on Memory Consistency and Cache Coherence (Second Edition) 翻译计划
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
A simple bash script for switching between installed versions of CUDA.
An easy-to-use, header-only C++ wrapper for Linux' perf event API
Code base and slides for ECE408:Applied Parallel Programming On GPU.
Code repository with DAG node packages and visalization scripts
Example code showing how to write periodic threads in Linux
Ligra: A Lightweight Graph Processing Framework for Shared Memory
DAMOV is a benchmark suite and a methodical framework targeting the study of data movement bottlenecks in modern applications. It is intended to study new architectures, such as near-data processin…
rt-app emulates typical mobile and real-time systems use cases and gives runtime information
A hybrid cache sharing-partitioning tool for systems with Intel CAT support.
Evaluating different memory managers for dynamic GPU memory
Graphic notes on Gilbert Strang's "Linear Algebra for Everyone", 线性代数的艺术中文版, 欢迎PR.
Best Practices on Recommendation Systems
TFRecord parser using C++ and Protocal Buffer
Standalone TFRecord reader/writer with PyTorch data loaders
Unofficial implemention of lanenet model for real time lane detection