Stars
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
An easy to use PyTorch to TensorRT converter
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
A Datacenter Scale Distributed Inference Serving Framework
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Tensors and Dynamic neural networks in Python with strong GPU acceleration
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
The Elements of Statistical Learning (ESL)的中文翻译、代码实现及其习题解答。
A repository for storing models that have been inter-converted between various frameworks. Supported frameworks are TensorFlow, PyTorch, ONNX, OpenVINO, TFJS, TFTRT, TensorFlowLite (Float32/16/INT8…
A repo for all spark examples using Rapids Accelerator including ETL, ML/DL, etc.
A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
Deploy RT-DETR model based on OpenVINO.
Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
All Algorithms implemented in Python
Collection of various algorithms in mathematics, machine learning, computer science and physics implemented in C++ for educational purposes.
C++ library based on tensorrt integration
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Deploy Detectron2 with Triton inference server
Sample codes for my CUDA programming book
A simple implementation of tensorrt yolov5 python/c++🔥
深度学习500问,以问答形式对常用的概率知识、线性代数、机器学习、深度学习、计算机视觉等热点问题进行阐述,以帮助自己及有需要的读者。 全书分为18个章节,50余万字。由于水平有限,书中不妥之处恳请广大读者批评指正。 未完待续............ 如有意合作,联系[email protected] 版权所有,违权必究 Tan 2018.06
DeepRacer workshop content. This Guidance demonstrates how software developers can use an Amazon SageMaker Notebook instance to directly train and evaluate AWS DeepRacer models with full control
Simple samples for TensorRT programming