Skip to content
forked from BerriAI/litellm

Use any LLM as a drop in replacement for GPT. Use Azure, OpenAI, Cohere, Anthropic, Ollama, VLLM, Sagemaker, HuggingFace, Replicate (100+ LLMs)

License

Notifications You must be signed in to change notification settings

mbrukman/litellm

This branch is 18006 commits behind BerriAI/litellm:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

fd4ae19 Β· Sep 29, 2023
Sep 26, 2023
Sep 22, 2023
Sep 28, 2023
Sep 27, 2023
Sep 29, 2023
Sep 29, 2023
Sep 22, 2023
Aug 28, 2023
Aug 28, 2023
Aug 31, 2023
Sep 10, 2023
Sep 19, 2023
Jul 29, 2023
Jul 27, 2023
Sep 29, 2023
Aug 28, 2023
Sep 24, 2023
Sep 26, 2023
Sep 29, 2023

Repository files navigation

πŸš… LiteLLM

Call all LLM APIs using the OpenAI format [Anthropic, Huggingface, Cohere, TogetherAI, Azure, OpenAI, etc.]

Docs Discord 100+ Supported Models

LiteLLM manages

  • Translating inputs to the provider's completion and embedding endpoints
  • Guarantees consistent output, text responses will always be available at ['choices'][0]['message']['content']
  • Exception mapping - common exceptions across providers are mapped to the OpenAI exception types

Usage

Open In Colab
pip install litellm
from litellm import completion
import os

## set ENV variables 
os.environ["OPENAI_API_KEY"] = "your-openai-key" 
os.environ["COHERE_API_KEY"] = "your-cohere-key" 

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# cohere call
response = completion(model="command-nightly", messages=messages)
print(response)

Streaming

liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response. Streaming is supported for OpenAI, Azure, Anthropic, Huggingface models

response = completion(model="gpt-3.5-turbo", messages=messages, stream=True)
for chunk in response:
    print(chunk['choices'][0]['delta'])

# claude 2
result = completion('claude-2', messages, stream=True)
for chunk in result:
  print(chunk['choices'][0]['delta'])

OpenAI Proxy Server

Spin up a local server to translate openai api calls to any non-openai model (e.g. Huggingface, TogetherAI, Ollama, etc.)

This works for async + streaming as well.

litellm --model <model_name>

Running your model locally or on a custom endpoint ? Set the --api-base parameter see how

Contributing

To contribute: Clone the repo locally -> Make a change -> Submit a PR with the change.

Here's how to modify the repo locally: Step 1: Clone the repo

git clone https://github.com/BerriAI/litellm.git

Step 2: Navigate into the project, and install dependencies:

cd litellm
poetry install

Step 3: Test your change:

cd litellm/tests # pwd: Documents/litellm/litellm/tests
pytest .

Step 4: Submit a PR with your changes! πŸš€

  • push your fork to your github repo
  • submit a PR from there

Learn more on how to make a PR

Support / talk with founders

Why did we build this

  • Need for simplicity: Our code started to get extremely complicated managing & translating calls between Azure, OpenAI, Cohere

Contributors

About

Use any LLM as a drop in replacement for GPT. Use Azure, OpenAI, Cohere, Anthropic, Ollama, VLLM, Sagemaker, HuggingFace, Replicate (100+ LLMs)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%