-
Vultureprime
- Bangkok, Thailand
- @KMatiDev1
Stars
Fast and memory-efficient exact attention
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
Qwen2.5-Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud.
Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
An Aspiring Drop-In Replacement for NumPy at Scale
The Triton TensorRT-LLM Backend
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream d…
GitHub Action for advanced repository traffic analysis and reporting
Contrastive Chain-of-Thought Prompting
A framework for evaluating function calls made by LLMs