Skip to content

A list of free LLM inference resources accessible via API.

Notifications You must be signed in to change notification settings

cheahjs/free-llm-api-resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Free LLM API resources

This lists various services that provide free access or credits towards API-based LLM usage.

Note

Please don't abuse these services, else we might lose them.

Warning

This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)

Free Providers

Provider Provider Limits/Notes Model Name Model Limits
GroqDistil Whisper Large v37200 audio-seconds/minute
2000 requests/day
Gemma 2 9B Instruct14400 requests/day
15000 tokens/minute
Gemma 7B Instruct14400 requests/day
15000 tokens/minute
LLaVA 1.5 7B14400 requests/day
30000 tokens/minute
Llama 3 70B14400 requests/day
6000 tokens/minute
Llama 3 70B - Groq Tool Use Preview14400 requests/day
15000 tokens/minute
Llama 3 8B14400 requests/day
30000 tokens/minute
Llama 3 8B - Groq Tool Use Preview14400 requests/day
15000 tokens/minute
Llama 3.1 70B14400 requests/day
20000 tokens/minute
Llama 3.1 8B14400 requests/day
20000 tokens/minute
Llama 3.2 11B (Text Only)7000 requests/day
7000 tokens/minute
Llama 3.2 11B Vision7000 requests/day
7000 tokens/minute
Llama 3.2 1B7000 requests/day
7000 tokens/minute
Llama 3.2 3B7000 requests/day
7000 tokens/minute
Llama 3.2 90B (Text Only)7000 requests/day
7000 tokens/minute
Llama Guard 3 8B14400 requests/day
15000 tokens/minute
Mixtral 8x7B14400 requests/day
5000 tokens/minute
Whisper Large v37200 audio-seconds/minute
2000 requests/day
OpenRouterGemma 2 9B Instruct20 requests/minute
200 requests/day
Hermes 3 Llama 3.1 405B20 requests/minute
200 requests/day
Liquid LFM 40B20 requests/minute
200 requests/day
Llama 3 8B Instruct20 requests/minute
200 requests/day
Llama 3.1 405B Instruct20 requests/minute
200 requests/day
Llama 3.1 70B Instruct20 requests/minute
200 requests/day
Llama 3.1 8B Instruct20 requests/minute
200 requests/day
Llama 3.2 11B Vision Instruct20 requests/minute
200 requests/day
Llama 3.2 1B Instruct20 requests/minute
200 requests/day
Llama 3.2 3B Instruct20 requests/minute
200 requests/day
Mistral 7B Instruct20 requests/minute
200 requests/day
Mythomist 7B20 requests/minute
200 requests/day
OpenChat 7B20 requests/minute
200 requests/day
Phi-3 Medium 128k Instruct20 requests/minute
200 requests/day
Phi-3 Mini 128k Instruct20 requests/minute
200 requests/day
Qwen 2 7B Instruct20 requests/minute
200 requests/day
Toppy M 7B20 requests/minute
200 requests/day
Zephyr 7B Beta20 requests/minute
200 requests/day
Google AI Studio Data is used for training (when used outside of the UK/CH/EEA/EU). Gemini 1.5 Flash 1000000 tokens/minute
1500 requests/day
15 requests/minute
Gemini 1.5 Flash (Experimental) 1000000 tokens/minute
1500 requests/day
5 requests/minute
Gemini 1.5 Flash-8B 1000000 tokens/minute
1500 requests/day
15 requests/minute
Gemini 1.5 Flash-8B (Experimental) 1000000 tokens/minute
1500 requests/day
15 requests/minute
Gemini 1.5 Pro 32000 tokens/minute
50 requests/day
2 requests/minute
Gemini 1.5 Pro (Experimental) 1000000 tokens/minute
50 requests/day
2 requests/minute
Gemini 1.0 Pro 32000 tokens/minute
1500 requests/day
15 requests/minute
text-embedding-004 150 batch requests/minute
1500 requests/minute
100 content/batch
embedding-001
Lambda Labs (Free Preview) Free for a limited time Nous Hermes 3 Llama 3.1 405B (FP8)
Liquid LFM 40B
Mistral (Le Platforme) Free tier (Experiment plan) requires opting into data training. Open and Proprietary Mistral models 1 request/second
500,000 tokens/minute
1,000,000,000 tokens/month
Mistral (Codestral) Currently free to use, monthly subscription based, requires phone number verification. Codestral 30 requests/minute
2000 requests/day
HuggingFace Serverless Inference Limited to models smaller than 10GB.
Some popular models are supported even if they exceed 10GB.
Various open models 50 requests/hour (with an account)
SambaNova Cloud Llama 3.1 405B 10 requests/minute
Llama 3.1 70B 20 requests/minute
Llama 3.1 8B 30 requests/minute
Llama 3.2 3B 30 requests/minute
Llama 3.2 1B 30 requests/minute
Cerebras Waitlist
Free tier restricted to 8K context
Llama 3.1 8B 30 requests/minute, 60000 tokens/minute
900 requests/hour, 1000000 tokens/hour
14400 requests/day, 1000000 tokens/day
Llama 3.1 70B 30 requests/minute, 60000 tokens/minute
900 requests/hour, 1000000 tokens/hour
14400 requests/day, 1000000 tokens/day
GitHub ModelsWaitlist
Rate limits dependent on Copilot subscription tier
AI21-Jamba-Instruct
Cohere Command R
Cohere Command R+
Cohere Embed v3 English
Cohere Embed v3 Multilingual
Meta-Llama-3-70B-Instruct
Meta-Llama-3-8B-Instruct
Meta-Llama-3.1-405B-Instruct
Meta-Llama-3.1-70B-Instruct
Meta-Llama-3.1-8B-Instruct
Mistral Large
Mistral Large (2407)
Mistral Nemo
Mistral Small
OpenAI GPT-4o
OpenAI GPT-4o mini
OpenAI Text Embedding 3 (large)
OpenAI Text Embedding 3 (small)
Phi-3-medium instruct (128k)
Phi-3-medium instruct (4k)
Phi-3-mini instruct (128k)
Phi-3-mini instruct (4k)
Phi-3-small instruct (128k)
Phi-3-small instruct (8k)
Phi-3.5-mini instruct (128k)
OVH AI Endpoints (Free Alpha)Token expires every 2 weeks.CodeLlama 13B Instruct12 requests/minute
Llama 2 13B Chat12 requests/minute
Llama 3 70B Instruct12 requests/minute
Llama 3 8B Instruct12 requests/minute
Llama 3.1 70B Instruct12 requests/minute
Mathstral 7B v0.112 requests/minute
Mistral 7B Instruct12 requests/minute
Mixtral 8x22B Instruct12 requests/minute
Mixtral 8x7B Instruct12 requests/minute
Cloudflare Workers AI10000 tokens/dayDeepseek Coder 6.7B Base (AWQ)
Deepseek Coder 6.7B Instruct (AWQ)
Deepseek Math 7B Instruct
Discolm German 7B v1 (AWQ)
Falcom 7B Instruct
Gemma 2B Instruct (LoRA)
Gemma 7B Instruct
Gemma 7B Instruct (LoRA)
Hermes 2 Pro Mistral 7B
Llama 2 13B Chat (AWQ)
Llama 2 7B Chat (FP16)
Llama 2 7B Chat (INT8)
Llama 2 7B Chat (LoRA)
Llama 3 8B Instruct
Llama 3 8B Instruct
Llama 3 8B Instruct (AWQ)
Llama 3.1 8B Instruct
Llama 3.1 8B Instruct (AWQ)
Llama 3.1 8B Instruct (FP8)
Llama 3.2 11B Vision Instruct
Llama 3.2 1B Instruct
Llama 3.2 3B Instruct
LlamaGuard 7B (AWQ)
Mistral 7B Instruct v0.1
Mistral 7B Instruct v0.1 (AWQ)
Mistral 7B Instruct v0.2
Mistral 7B Instruct v0.2 (LoRA)
Neural Chat 7B v3.1 (AWQ)
OpenChat 3.5 0106
OpenHermes 2.5 Mistral 7B (AWQ)
Phi-2
Qwen 1.5 0.5B Chat
Qwen 1.5 1.8B Chat
Qwen 1.5 14B Chat (AWQ)
Qwen 1.5 7B Chat (AWQ)
SQLCoder 7B 2
Starling LM 7B Beta
TinyLlama 1.1B Chat v1.0
Una Cybertron 7B v2 (BF16)
Zephyr 7B Beta (AWQ)
Together Llama 3.2 11B Vision Instruct Free for 2024
Cohere 20 requests/min
1000 requests/month
Command-R Shared Limit
Command-R+
Google Cloud Vertex AI Very stringent payment verification for Google Cloud. Llama 3.1 405B Instruct Llama 3.1 API Service free during preview.
60 requests/minute
Llama 3.1 70B Instruct Llama 3.1 API Service free during preview.
60 requests/minute
Llama 3.1 8B Instruct Llama 3.1 API Service free during preview.
60 requests/minute
Llama 3.2 90B Vision Instruct Llama 3.2 API Service free during preview.
30 requests/minute
Gemini Flash Experimental Experimental Gemini model.
10 requests/minute
Gemini Pro Experimental
glhf.chat (Free Beta) Email for API access Any model on Hugging Face runnable on vLLM and fits on a A100 node (~640GB VRAM), including Llama 3.1 405B at FP8

Providers with trial credits

Provider Credits Requirements Models
Together $5 Various open models
Fireworks $1 Various open models
Unify $10 (+$40 for getting into contact) Routes to other providers, various open models and proprietary models (OpenAI, Gemini, Anthropic, Mistral, Perplexity, etc)
DeepInfra $1.80 Various open models
NVIDIA NIM 1000 API calls Various open models
AI21 $10 for 3 months Jamba/Jurrasic-2
NLP Cloud $15 Phone number verification Various open models
Upstage $10 for 3 months Solar Pro/Mini
Hyperbolic$10DeepSeek V2.5
Hermes 3 Llama 3.1 70B
Llama 3 70B Instruct
Llama 3.1 405B Base
Llama 3.1 405B Base (FP8)
Llama 3.1 405B Instruct
Llama 3.1 70B Instruct
Llama 3.1 8B Instruct
Llama 3.2 3B Instruct
Llama 3.2 90B Vision
Llama 3.2 90B Vision Instruct
Pixtral 12B (2409)
Qwen2-VL 72B Instruct
Qwen2-VL 7B Instruct
Qwen2.5 72B Instruct