Stars
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
Benchmarking Deep Learning operations on different hardware
A Fast and Extensible DRAM Simulator, with built-in support for modeling many different DRAM technologies including DDRx, LPDDRx, GDDRx, WIOx, HBMx, and various academic proposals. Described in the…
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
Matrix-Vector Library Designed for Neural Network Construction. cuda (gpu) support, openmp (multithreaded cpu) support, partial support of BLAS, expression template based implementation PTX code ge…
A C++ library for Deep Convolutional Neural Nets with Parallel Computing (openMP, CUDA and MPI)
gem5 version for garnet network backend for ASTRA-sim (http://github.com/astra-sim/astra-sim)
Speeding up our C++ code for a modular, simple, feed forward neural network library