Stars
A Gradio web UI for Large Language Models with support for multiple inference backends.
This Repo will provide TensorFlow libraries and extended build tutorials that require compilation to build, as well as pre-compiled wheel files.
Arena-Hard-Auto: An automatic LLM benchmark.
[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct
SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2…
A benchmark for emotional intelligence in large language models
RuLES: a benchmark for evaluating rule-following in language models
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
An Analytical Evaluation Board of Multi-turn LLM Agents
Minimalistic large language model 3D-parallelism training
[EMNLP 2024] RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning
CRUXEval: Code Reasoning, Understanding, and Execution Evaluation
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
The FunctionChain is a tool that simplifies and organizes the process of invoking OpenAI functions in your Node.js applications. With this toolkit, you can easily scaffold out and isolate all the O…
🛠 openai function calling tools for JS/TS
A high-throughput and memory-efficient inference and serving engine for LLMs
Fast inference engine for Transformer models