Skip to content
View bbshocking's full-sized avatar

Block or report bbshocking

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A collection of libraries to optimise AI model performances

Python 8,371 636 Updated Jul 22, 2024

NVIDIA container runtime library

C 889 216 Updated Jan 23, 2025

🦄 🦄 🦄 Core smart contracts of Uniswap v3

TypeScript 4,517 2,792 Updated Nov 3, 2024

High performance distributed framework for training deep learning recommendation models based on PyTorch.

Rust 399 51 Updated Jan 24, 2025

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Python 7,054 1,035 Updated Jan 22, 2025

F-Stack is an user space network development kit with high performance based on DPDK, FreeBSD TCP/IP stack and coroutine API.

C 3,908 908 Updated Jan 10, 2025

Making large AI models cheaper, faster and more accessible

Python 39,026 4,353 Updated Jan 24, 2025

ComScribe is a tool to identify communication among all GPU-GPU and CPU-GPU pairs in a single-node multi-GPU system.

C++ 25 4 Updated Jul 6, 2023

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

Go 32,008 3,007 Updated Jan 24, 2025

Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.

C++ 565 80 Updated Sep 11, 2024

An Agile Chisel-Based SoC Design Framework

Scala 26 2 Updated Dec 29, 2021

热咖啡

JavaScript 188 9 Updated Feb 2, 2023

Slicing a PyTorch Tensor Into Parallel Shards

Python 298 15 Updated Jul 27, 2021

A benchmark for testing PCIe and host/device memory bandwith and communication contention on multi-GPU and multi-CPU systems.

C++ 9 1 Updated Jun 9, 2016

The X86 Encoder Decoder (XED), is a software library for encoding and decoding X86 (IA32 and Intel64) instructions

Python 1,435 149 Updated Nov 5, 2024

BLAS-like Library Instantiation Software Framework

C 2,356 372 Updated Jan 22, 2025

A 128 bit unsigned integer class for CUDA

C++ 43 16 Updated Jan 3, 2025

The Ceph Benchmarking Tool

Python 274 140 Updated Jan 17, 2025

ONNX-TensorRT: TensorRT backend for ONNX

C++ 3,000 546 Updated Dec 3, 2024

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,244 528 Updated Jan 24, 2025

Tensorflow Backend for ONNX

Python 1,291 296 Updated Mar 28, 2024

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 6,794 1,903 Updated Jul 26, 2024

Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.

Python 447 76 Updated Jan 13, 2025

Automatically generate a C++ header file including Cuda device-specific parameters

C++ 3 Updated Jul 1, 2020

A GPU-powered real-time analytics storage and query engine.

Go 3,042 234 Updated Jul 13, 2024

Rodinia benchmark

C 168 90 Updated Apr 14, 2023

Running BERT without Padding

C++ 468 54 Updated Mar 18, 2022

Virtual Kubelet is an open source Kubernetes kubelet implementation.

Go 4,259 624 Updated Jan 20, 2025

brpc is an Industrial-grade RPC framework using C++ Language, which is often used in high performance system such as Search, Storage, Machine learning, Advertisement, Recommendation etc. "brpc" mea…

C++ 16,727 4,006 Updated Jan 23, 2025
Next