Skip to content

tawandakembo/awesome-local-ai

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 

Repository files navigation

If you tried Jan Desktop and liked it, please also check out the following awesome collection of open source and/or local AI tools and solutions.

Your contributions are always welcome!

Lists

Inference Engine

Repository Description Supported model formats CPU/GPU Support UI language Platform Type
llama.cpp - Inference of LLaMA model in pure C/C++ GGML/GGUF Both C/C++ Text-Gen
Nitro - 3MB inference engine embeddable in your apps. Uses Llamacpp and more Both Both Text-Gen
ollama - CLI and local server. Uses Llamacpp Both Both Text-Gen
koboldcpp - A simple one-file way to run various GGML models with KoboldAI's UI GGML Both C/C++ Text-Gen
LoLLMS - Lord of Large Language Models Web User Interface. Nearly ALL Both Python Text-Gen
ExLlama - A more memory-efficient rewrite of the HF transformers implementation of Llama AutoGPTQ/GPTQ GPU Python/C++ Text-Gen
vLLM - vLLM is a fast and easy-to-use library for LLM inference and serving. GGML/GGUF Both Python Text-Gen
SGLang - 3-5x higher throughput than vLLM (Control flow, RadixAttention, KV cache reuse) Safetensor / AWQ / GPTQ GPU Python Text-Gen
LmDeploy - LMDeploy is a toolkit for compressing, deploying, and serving LLMs. Pytorch / Turbomind Both Python/C++ Text-Gen
Tensorrt-llm - Inference efficiently on NVIDIA GPUs Python / C++ runtimes Both Python/C++ Text-Gen
CTransformers - Python bindings for the Transformer models implemented in C/C++ using GGML library GGML/GPTQ Both C/C++ Text-Gen
llama-cpp-python - Python bindings for llama.cpp GGUF Both Python Text-Gen
llama2.rs - A fast llama2 decoder in pure Rust GPTQ CPU Rust Text-Gen
ExLlamaV2 - A fast inference library for running LLMs locally on modern consumer-class GPUs GPTQ/EXL2 GPU Python/C++ Text-Gen
LoRAX - Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs Safetensor / AWQ / GPTQ GPU Python/Rust Text-Gen
text-generation-inference - Inference serving toolbox with optimized kernels for each LLM architecture Safetensors / AWQ / GPTQ Both Python/Rust Text-Gen

Inference UI

  • oobabooga - A Gradio web UI for Large Language Models.
  • LM Studio - Discover, download, and run local LLMs.
  • LocalAI - LocalAI is a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing.
  • FireworksAI - Experience the world's fastest LLM inference platform deploy your own at no additional cost.
  • faradav - Chat with AI Characters Offline, Runs locally, Zero-configuration.
  • GPT4All - A free-to-use, locally running, privacy-aware chatbot.
  • LLMFarm - llama and other large language models on iOS and MacOS offline using GGML library.
  • LlamaChat - LlamaChat allows you to chat with LLaMa, Alpaca and GPT4All models1 all running locally on your Mac.
  • LLM as a Chatbot Service - LLM as a Chatbot Service.
  • FuLLMetalAi - Fullmetal.Ai is a distributed network of self-hosted Large Language Models (LLMs).
  • Automatic1111 - Stable Diffusion web UI.
  • ComfyUI - A powerful and modular stable diffusion GUI with a graph/nodes interface.
  • Wordflow - Run, share, and discover AI prompts in your browsers
  • petals - Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading.
  • ChatUI - Open source codebase powering the HuggingChat app.
  • AI-Mask - Browser extension to provide model inference to web apps. Backed by web-llm and transformers.js
  • everything-rag - Interact with (virtually) any LLM on Hugging Face Hub with an asy-to-use, 100% local Gradio chatbot.
  • LmScript - UI for SGLang and Outlines

Platforms / full solutions

  • H2OAI - H2OGPT The fastest, most accurate AI Cloud Platform.
  • BentoML - BentoML is a framework for building reliable, scalable, and cost-efficient AI applications.
  • Predibase - Serverless LoRA Fine-Tuning and Serving for LLMs.

Developer tools

  • Jan Framework - At its core, Jan is a cross-platform, local-first and AI native application framework that can be used to build anything.
  • Pinecone - Long-Term Memory for AI.
  • PoplarML - PoplarML enables the deployment of production-ready, scalable ML systems with minimal engineering effort.
  • Datature - The All-in-One Platform to Build and Deploy Vision AI.
  • One AI - MAKING GENERATIVE AI BUSINESS-READY.
  • Gooey.AI - Create Your Own No Code AI Workflows.
  • Mixo.io - AI website builder.
  • Safurai - AI Code Assistant that saves you time in changing, optimizing, and searching code.
  • GitFluence - The AI-driven solution that helps you quickly find the right command. Get started with Git Command Generator today and save time.
  • Haystack - A framework for building NLP applications (e.g. agents, semantic search, question-answering) with language models.
  • LangChain - A framework for developing applications powered by language models.
  • gpt4all - A chatbot trained on a massive collection of clean assistant data including code, stories and dialogue.
  • LMQL - LMQL is a query language for large language models.
  • LlamaIndex - A data framework for building LLM applications over external data.
  • Phoenix - Open-source tool for ML observability that runs in your notebook environment, by Arize. Monitor and fine tune LLM, CV and tabular models.
  • trypromptly - Create AI Apps & Chatbots in Minutes.
  • BentoML - BentoML is the platform for software engineers to build AI products.
  • LiteLLM - Call all LLM APIs using the OpenAI format.

User Tools

  • llmcord.py - Discord LLM Chatbot - Talk to LLMs with your friends!

Agents

  • SuperAGI - Opensource AGI Infrastructure.
  • Auto-GPT - An experimental open-source attempt to make GPT-4 fully autonomous.
  • BabyAGI - Baby AGI is an autonomous AI agent developed using Python that operates through OpenAI and Pinecone APIs.
  • AgentGPT -Assemble, configure, and deploy autonomous AI Agents in your browser.
  • HyperWrite - HyperWrite helps you work smarter, faster, and with ease.
  • AI Agents - AI Agent that Power Up Your Productivity.
  • AgentRunner.ai - Leverage the power of GPT-4 to create and train fully autonomous AI agents.
  • GPT Engineer - Specify what you want it to build, the AI asks for clarification, and then builds it.
  • GPT Prompt Engineer - Automated prompt engineering. It generates, tests, and ranks prompts to find the best ones.
  • MetaGPT - The Multi-Agent Framework: Given one line requirement, return PRD, design, tasks, repo.
  • Open Interpreter - Let language models run code. Have your agent write and execute code.
  • CrewAI - Cutting-edge framework for orchestrating role-playing, autonomous AI agents.

Training

  • FastChat - An open platform for training, serving, and evaluating large language models.
  • DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
  • BMTrain - Efficient Training for Big Models.
  • Alpa - Alpa is a system for training and serving large-scale neural networks.
  • Megatron-LM - Ongoing research training transformer models at scale.
  • Ludwig - Low-code framework for building custom LLMs, neural networks, and other AI models.
  • Nanotron - Minimalistic large language model 3D-parallelism training.
  • TRL - Language model alignment with reinforcement learning.
  • PEFT - Parameter efficient fine-tuning (LoRA, DoRA, model merger and more)

LLM Leaderboard

Research

  • Attention Is All You Need (2017): Presents the original transformer model. it helps with sequence-to-sequence tasks, such as machine translation. [Paper]
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018): Helps with language modeling and prediction tasks. [Paper]
  • FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (2022): Mechanism to improve transformers. [paper]
  • Improving Language Understanding by Generative Pre-Training (2019): Paper is authored by OpenAI on GPT. [paper]
  • Cramming: Training a Language Model on a Single GPU in One Day (2022): Paper focus on a way too increase the performance by using minimum computing power. [paper]
  • LaMDA: Language Models for Dialog Applications (2022): LaMDA is a family of Transformer-based neural language models by Google. [paper]
  • Training language models to follow instructions with human feedback (2022): Use human feedback to align LLMs. [paper]
  • TurboTransformers: An Efficient GPU Serving System For Transformer Models (PPoPP'21) [paper]
  • Fast Distributed Inference Serving for Large Language Models (arXiv'23) [paper]
  • An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs (arXiv'23) [paper]
  • Accelerating LLM Inference with Staged Speculative Decoding (arXiv'23) [paper]
  • ZeRO: Memory optimizations Toward Training Trillion Parameter Models (SC'20) [paper]
  • TensorGPT: Efficient Compression of the Embedding Layer in LLMs based on the Tensor-Train Decomposition 2023 [Paper]

Community

About

An awesome repository of local AI tools

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published