-
Microsoft
-
Cute-Gemm-Optimization Public
Forked from DD-DuDa/Cute-LearningMakefile MIT License UpdatedNov 28, 2024 -
qserve Public
Forked from mit-han-lab/omniserveQServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Python Apache License 2.0 UpdatedNov 9, 2024 -
tiny-flash-attention Public
Forked from 66RING/tiny-flash-attentionflash attention tutorial written in python, triton, cuda, cutlass
Cuda UpdatedJun 18, 2024 -
-
MatmulTutorial Public
Forked from KnowingNothing/MatmulTutorialA Easy-to-understand TensorOp Matmul Tutorial
C++ Apache License 2.0 UpdatedFeb 27, 2024 -
-
how-to-optim-algorithm-in-cuda Public
Forked from BBuf/how-to-optim-algorithm-in-cudahow to optimize some algorithm in cuda.
Cuda UpdatedJan 22, 2024 -
flash-attention Public
Forked from Dao-AILab/flash-attentionFast and memory-efficient exact attention
Python BSD 3-Clause "New" or "Revised" License UpdatedJan 21, 2024 -
-
Awesome-System-for-Machine-Learning Public
Forked from HuaizhengZhang/AI-System-SchoolA curated list of research in machine learning systems (MLSys). Paper notes are also provided.
MIT License UpdatedDec 15, 2023 -
cuda_hgemm Public
Forked from Bruce-Lee-LY/cuda_hgemmSeveral optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Cuda MIT License UpdatedNov 7, 2023 -
TheArtofHPC_pdfs Public
Forked from VictorEijkhout/TheArtofHPC_pdfsAll pdfs of Victor Eijkhout's Art of HPC books and courses
UpdatedNov 1, 2023 -
numpy-ml Public
Forked from ddbourgin/numpy-mlMachine learning, in numpy
Python GNU General Public License v3.0 UpdatedOct 29, 2023 -
flash_attention_inference Public
Forked from ShaYeBuHui01/flash_attention_inferencePerformance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
C++ MIT License UpdatedAug 31, 2023 -
-
Optimize_SGEMM_on_Nvidia_GPU Public
Implementations of SGEMM algorithm on Nvidia GPU using different tricks to optimize the performance.
Cuda UpdatedMay 28, 2023 -
-
-
Supporting code for "Systematic improvement of neural network quantum states using Lanczos (NeurIPS 2022)""
-
DeepLearningExamples Public
Forked from NVIDIA/DeepLearningExamplesDeep Learning Examples
Python UpdatedNov 4, 2022 -
physics_codes_publications Public
Forked from ryuikaneko/codes_for_my_publicationsC MIT License UpdatedOct 28, 2022 -
multi-gpu-programming-models Public
Forked from NVIDIA/multi-gpu-programming-modelsExamples demonstrating available options to program multiple GPUs in a single node or a cluster
Cuda BSD 3-Clause "New" or "Revised" License UpdatedOct 17, 2022 -
Linear-Algebra-and-Learning-from-Data Public
Forked from niuers/Linear-Algebra-and-Learning-from-DataSolutions to the problems in the book: Linear Algebra and Learning from Data by Gilbert Strang, MIT
Jupyter Notebook UpdatedSep 28, 2022 -
Optimize_DGEMM_on_Intel_CPU Public
Implementations of DGEMM algorithm using different tricks to optimize the performance.
-
oneDNN Public
Forked from oneapi-src/oneDNNoneAPI Deep Neural Network Library (oneDNN)
C++ Apache License 2.0 UpdatedAug 12, 2022 -
neural_network_quantum_state Public
Neural Network Quantum State
-
oneMKL Public
Forked from uxlfoundation/oneMathoneAPI Math Kernel Library (oneMKL) Interfaces
C++ Apache License 2.0 UpdatedAug 9, 2022 -
cutlass Public
Forked from NVIDIA/cutlassCUDA Templates for Linear Algebra Subroutines
C++ Other UpdatedAug 9, 2022 -
ResNet50 Implementation for Food101 and ResNet9 model for CIFAR10 in Pytorch
-
ising-model-gpu Public
Accelerating Monte Carlo simulations of 2D Ising Model using Nvidia GPU