Stars
DAMO-ConvAI: The official repository which contains the codebase for Alibaba DAMO Conversational AI.
A validation and profiling tool for AI infrastructure
LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.
A library for advanced large language model reasoning
[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Build high-quality LLM apps - from prototyping, testing to production deployment and monitoring.
Awesome-LLM-Prompt-Optimization: a curated list of advanced prompt optimization and tuning methods in Large Language Models
A library for prompt engineering and optimization (SAMMO = Structure-aware Multi-Objective Metaprompt Optimization)
DSPy: The framework for programming—not prompting—foundation models
Must-read Papers on Large Language Model (LLM) as Optimizers and Automatic Optimization for Prompting LLMs.
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.
Extract full next-token probabilities via language model APIs
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
A high-throughput and memory-efficient inference and serving engine for LLMs
LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath
Interactive coding assistant for data scientists and machine learning developers, empowered by large language models.
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
This is a collection of our research on efficient AI, covering hardware-aware NAS and model compression.
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
WebGLM: An Efficient Web-enhanced Question Answering System (KDD 2023)
A framework for few-shot evaluation of language models.
Fast and memory-efficient exact attention
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Awesome LLM compression research papers and tools.
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters