Stars
Fast and memory-efficient exact attention
TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
A high-performance, zero-overhead, extensible Python compiler using LLVM
The release codes of LA-MCTS with its application to Neural Architecture Search.
C++ Parallel Computing and Asynchronous Networking Framework
Repo for counting stars and contributing. Press F to pay respect to glorious developers.
GUINNESS: A GUI-based binarized deep Neural NEtwork SyntheSizer toward an FPGA
A computer algebra system written in pure Python
The quickest way to start and publish your Jekyll powered blog. 100% compatible with GitHub pages.