bbshocking

bbshocking

10 followers · 30 following

Stars

nebuly-ai / optimate

A collection of libraries to optimise AI model performances

Python 8,371 636 Updated Jul 22, 2024

NVIDIA / libnvidia-container

NVIDIA container runtime library

C 889 216 Updated Jan 23, 2025

Uniswap / v3-core

🦄 🦄 🦄 Core smart contracts of Uniswap v3

TypeScript 4,517 2,792 Updated Nov 3, 2024

PersiaML / PERSIA

High performance distributed framework for training deep learning recommendation models based on PyTorch.

Rust 399 51 Updated Jan 24, 2025

EleutherAI / gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Python 7,054 1,035 Updated Jan 22, 2025

F-Stack / f-stack

F-Stack is an user space network development kit with high performance based on DPDK, FreeBSD TCP/IP stack and coroutine API.

C 3,908 908 Updated Jan 10, 2025

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 39,026 4,353 Updated Jan 24, 2025

ParCoreLab / ComScribe

ComScribe is a tool to identify communication among all GPU-GPU and CPU-GPU pairs in a single-node multi-GPU system.

C++ 25 4 Updated Jul 6, 2023

milvus-io / milvus

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

Go 32,008 3,007 Updated Jan 24, 2025

NVIDIA / nvcomp

Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.

C++ 565 80 Updated Sep 11, 2024

l-nic / chipyard

Forked from ucb-bar/chipyard

An Agile Chisel-Based SoC Design Framework

Scala 26 2 Updated Dec 29, 2021

sohutv / hotcaffeine

热咖啡

JavaScript 188 9 Updated Feb 2, 2023

kaiyuyue / torchshard

Slicing a PyTorch Tensor Into Parallel Shards

Python 298 15 Updated Jul 27, 2021

gabaker / TARUC_Bench

A benchmark for testing PCIe and host/device memory bandwith and communication contention on multi-GPU and multi-CPU systems.

C++ 9 1 Updated Jun 9, 2016

intelxed / xed

The X86 Encoder Decoder (XED), is a software library for encoding and decoding X86 (IA32 and Intel64) instructions

Python 1,435 149 Updated Nov 5, 2024

flame / blis

BLAS-like Library Instantiation Software Framework

C 2,356 372 Updated Jan 22, 2025

curtisseizert / CUDA-uint128

A 128 bit unsigned integer class for CUDA

C++ 43 16 Updated Jan 3, 2025

ceph / cbt

The Ceph Benchmarking Tool

Python 274 140 Updated Jan 17, 2025

ververica / flink-sql-benchmark

Java 106 51 Updated Jul 20, 2023

onnx / onnx-tensorrt

ONNX-TensorRT: TensorRT backend for ONNX

C++ 3,000 546 Updated Dec 3, 2024

pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,244 528 Updated Jan 24, 2025

onnx / onnx-tensorflow

Tensorflow Backend for ONNX

Python 1,291 296 Updated Mar 28, 2024

NVIDIA / cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 6,794 1,903 Updated Jul 26, 2024

triton-inference-server / model_analyzer

Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.

Python 447 76 Updated Jan 13, 2025

BDHU / CUDA_Device_Attribute_Generation

Automatically generate a C++ header file including Cuda device-specific parameters

C++ 3 Updated Jul 1, 2020

uber / aresdb

A GPU-powered real-time analytics storage and query engine.

Go 3,042 234 Updated Jul 13, 2024

yuhc / gpu-rodinia

Rodinia benchmark

C 168 90 Updated Apr 14, 2023

bytedance / effective_transformer

Running BERT without Padding

C++ 468 54 Updated Mar 18, 2022

virtual-kubelet / virtual-kubelet

Virtual Kubelet is an open source Kubernetes kubelet implementation.

Go 4,259 624 Updated Jan 20, 2025

apache / brpc

brpc is an Industrial-grade RPC framework using C++ Language, which is often used in high performance system such as Search, Storage, Machine learning, Advertisement, Recommendation etc. "brpc" mea…

C++ 16,727 4,006 Updated Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly