-
Vultureprime
- Bangkok, Thailand
- @KMatiDev1
Stars
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
An Aspiring Drop-In Replacement for NumPy at Scale
Code I wrote for my AI & LLM workshops
Qwen2.5-Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud.
🌈 React for interactive command-line apps
🦄 Record your terminal and generate animated gif images or share a web player
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream d…
Experimental projects related to TensorRT
📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉
Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models
An AI search engine inspired by Perplexity
A framework for evaluating function calls made by LLMs
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
[ACL 2024 Demo] SeaLLMs - Large Language Models for Southeast Asia
Contrastive Chain-of-Thought Prompting
Evaluate the accuracy of LLM generated outputs
The Triton TensorRT-LLM Backend
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
Fast and memory-efficient exact attention
GitHub Action for advanced repository traffic analysis and reporting