Skip to content
forked from BerriAI/litellm

Call all LLM APIs using the OpenAI format. Use Azure, OpenAI, Cohere, Anthropic, Ollama, VLLM, Sagemaker, HuggingFace, Replicate (100+ LLMs)

License

Notifications You must be signed in to change notification settings

JaeDukSeo/litellm

This branch is 17376 commits behind BerriAI/litellm:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

2bd9b4a · Oct 21, 2023
Oct 20, 2023
Oct 20, 2023
Oct 13, 2023
Oct 18, 2023
Oct 21, 2023
Oct 21, 2023
Oct 21, 2023
Aug 28, 2023
Aug 28, 2023
Aug 31, 2023
Oct 17, 2023
Oct 20, 2023
Jul 27, 2023
Oct 17, 2023
Oct 8, 2023
Oct 8, 2023
Oct 21, 2023
Oct 14, 2023
Oct 20, 2023
Oct 17, 2023
Oct 18, 2023

Repository files navigation

🚅 LiteLLM

Call all LLM APIs using the OpenAI format [Anthropic, Huggingface, Cohere, TogetherAI, Azure, OpenAI, etc.]

Schedule Demo · Feature Request

Docs 100+ Supported Models Demo Video

LiteLLM manages

  • Translating inputs to the provider's completion and embedding endpoints
  • Guarantees consistent output, text responses will always be available at ['choices'][0]['message']['content']
  • Exception mapping - common exceptions across providers are mapped to the OpenAI exception types.

10/05/2023: LiteLLM is adopting Semantic Versioning for all commits. Learn more
10/16/2023: Self-hosted OpenAI-proxy server Learn more

Usage

Open In Colab
pip install litellm
from litellm import completion
import os

## set ENV variables 
os.environ["OPENAI_API_KEY"] = "your-openai-key" 
os.environ["COHERE_API_KEY"] = "your-cohere-key" 

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# cohere call
response = completion(model="command-nightly", messages=messages)
print(response)

Streaming (Docs)

liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response. Streaming is supported for OpenAI, Azure, Anthropic, Huggingface models

response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
for chunk in response:
    print(chunk['choices'][0]['delta'])

# claude 2
result = completion('claude-2', messages, stream=True)
for chunk in result:
  print(chunk['choices'][0]['delta'])

OpenAI Proxy Server (Docs)

Create an OpenAI API compatible server to call any non-openai model (e.g. Huggingface, TogetherAI, Ollama, etc.)

This works for async + streaming as well.

litellm --model <model_name>

#INFO: litellm proxy running on http://0.0.0.0:8000

Running your model locally or on a custom endpoint ? Set the --api-base parameter see how

Self-host server (Docs)

  1. Clone the repo
git clone https://github.com/BerriAI/litellm.git
  1. Modify template_secrets.toml
[keys]
OPENAI_API_KEY="sk-..."

[general]
default_model = "gpt-3.5-turbo"
  1. Deploy
docker build -t litellm . && docker run -p 8000:8000 litellm

Supported Provider (Docs)

Provider Completion Streaming Async Completion Async Streaming
openai
cohere
anthropic
replicate
huggingface
together_ai
openrouter
vertex_ai
palm
ai21
baseten
azure
sagemaker
bedrock
vllm
nlp_cloud
aleph alpha
petals
ollama
deepinfra

Read the Docs

Logging Observability - Log LLM Input/Output (Docs)

LiteLLM exposes pre defined callbacks to send data to LLMonitor, Langfuse, Helicone, Promptlayer, Traceloop, Slack

from litellm import completion

## set env variables for logging tools
os.environ["PROMPTLAYER_API_KEY"] = "your-promptlayer-key"
os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id"

os.environ["OPENAI_API_KEY"]

# set callbacks
litellm.success_callback = ["promptlayer", "llmonitor"] # log input/output to promptlayer, llmonitor, supabase

#openai call
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])

Contributing

To contribute: Clone the repo locally -> Make a change -> Submit a PR with the change.

Here's how to modify the repo locally: Step 1: Clone the repo

git clone https://github.com/BerriAI/litellm.git

Step 2: Navigate into the project, and install dependencies:

cd litellm
poetry install

Step 3: Test your change:

cd litellm/tests # pwd: Documents/litellm/litellm/tests
pytest .

Step 4: Submit a PR with your changes! 🚀

  • push your fork to your GitHub repo
  • submit a PR from there

Support / talk with founders

Why did we build this

  • Need for simplicity: Our code started to get extremely complicated managing & translating calls between Azure, OpenAI and Cohere.

Contributors

About

Call all LLM APIs using the OpenAI format. Use Azure, OpenAI, Cohere, Anthropic, Ollama, VLLM, Sagemaker, HuggingFace, Replicate (100+ LLMs)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.9%
  • Other 0.1%