Stars
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on h…
ONNXMLTools enables conversion of models to ONNX
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
后台ui大全(有这些你就够了)https://blog.csdn.net/m0_37499059/article/details/80519211
Visualizer for neural network, deep learning and machine learning models
The official GitHub mirror of the Chromium source
A tiny compiler for a language featuring LL(2) with Lexer, Parser, ASM-like codegen and VM. Complex enough to give you a flavour of how the "real" thing works whilst not being a mere toy example
This repository includes tutorials on how to use the TensorFlow estimator APIs to perform various ML tasks, in a systematic and standardised way
Distributed vector search for AI-native applications
Qihoo360 / floyd
Forked from PikaLabs/floydA raft consensus implementation that is simply and understandable
pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation
Mission: To provide a high-quality open content data structures textbook that is both mathematically rigorous and provides complete implementations.
FlatBuffers: Memory Efficient Serialization Library
A powerful flow control component enabling reliability, resilience and monitoring for microservices. (面向云原生微服务的高可用流控防护组件)
oneAPI Threading Building Blocks (oneTBB)