My Little GPT is a simple AI assistant that can use open-source models running locally on your computer, the Anthropic API, or the OpenAI API.
Check out the hosted version of My Little GPT here, deployed directly from the hosted
branch of this repository.
my-little-gpt-preview.mp4
I used to use ChatGPT through its premium subscription. Eventually I decided that I don't want to keep spending $20 a month on it. However, I still value consistent access to GPT-4 for work and whatnot, so I decided to look into self-hosting.
There are some great open-source repos for this, but I was looking for a codebase that is only as complex as I need it to be, with a chat app that has the mobile UX I want. I ultimately decided to make my own end-to-end stack β my own little GPT.
I've open-sourced the codebase and written an installation guide to make it as easy as possible for others who are interested in self-hosting it themselves. I use My Little GPT to talk to open-source models locally on my computer for free, or pay just for my usage of hosted model APIs.
Right now, My Little GPT supports a minimal, straightforward chat experience. Over time, I may continue to add new features to the app.
The links below will go to the hosted version of My Little GPT, but the same paths will work for your local version.
- Create an account (
/create-account
) or login (/login
) - Go to settings (
/settings
) and enter at least one of the following:- Local Base URL: The base URL of any OpenAI compatible URL you want to use
- If running locally, set this value to:
- MacOS (Apple Silicon), or any other non-Docker setup:
http://localhost:8000/v1
- CPU (Docker):
http://llama-cpp-cpu:8000/v1
- NVIDIA GPU (Docker):
http://llama-cpp-cuda:8000/v1
- MacOS (Apple Silicon), or any other non-Docker setup:
- If running locally, set this value to:
- Anthropic API Key: An API key to use for the Anthropic API
- Required to use any Anthropic models
- OpenAI API Key: An API key to use for the OpenAI API
- Required to use any OpenAI models
- Local Base URL: The base URL of any OpenAI compatible URL you want to use
- Go to the chat page (
/chat
), select a model from the model picker in the top navbar, and send your first message!- If chatting with a local model, every first message sent after starting the inference server or switching local models may take as long as a few minutes to be processed while the model loads into memory
- A chat title is automatically generated by the model you are sending a message to (title quality may vary)
- Feel free to change themes using the theme picker in the sidebar (themes provided by DaisyUI)
It is straightforward to support any provider supported by the Vercel AI SDK. Right now My Little GPT supports the following:
- Anthropic:
claude-3-5-sonnet-20240620
,claude-3-opus-20240229
,claude-3-sonnet-20240229
,claude-3-haiku-20240307
- OpenAI:
gpt-4o
,gpt-4-turbo
,gpt-4
,gpt-3.5-turbo
The inference server can use any models compatible with llama.cpp, and comes configured with the following models:
- Meta Llama 3.1 8B Instruct
llama-3.1-small
: Quantized (q4_k_m
) to be less than 5GB in sizellama-3.1
: Quantized (q8_0
) to be less than 9GB in size
- Mistral 7B Instruct v0.3
mistral-7b-small
: Quantized (q4_k_m
) to be less than 5GB in sizemistral-7b
: Quantized (q8_0
) to be less than 8GB in size
- Qwen2 7B Instruct
qwen2-7b-small
: Quantized (q4_k_m
) to be less than 5GB in sizeqwen2-7b
: Quantized (q8_0
) to be less than 9GB in size
Edit the configuration file at apps/llama-cpp/config.json
to add other models. Reference the llama-cpp-python docs for more info.
If you are just using hosted model APIs, any computer that supports Docker will probably work.
If you have at least 8GB of memory (RAM for CPU or VRAM for GPU), you should at least be able to run models that have quantized versions < 8GB in size. See here for more information on models that come pre-configured.
For higher inference speeds, NVIDIA GPU or M1/M2/M3 Mac (Apple Silicon) is recommended.
- Docker Desktop
- Installs Docker and Docker Compose along with a nice GUI
- Node.js
You can install the requirements yourself from the links above, or follow these instructions:
-
Install Homebrew, then run the following in your terminal:
brew install node@20
-
- Make sure your terminal shows that you are using the
base
environment, and thatpython --version
prints a version of python >= 3.10
- Make sure your terminal shows that you are using the
git clone https://github.com/arrowban/my-little-gpt.git
cd my-little-gpt
First, set up environment variables for the chat app using apps/web/.env.example
:
cp apps/web/.env.example apps/web/.env.local
Then install and build dependencies:
npm install
npm run build
-
Open the Docker Desktop app to start the Docker daemon
-
Run the following in your terminal from the root of the repository:
npm run start
After everything starts up, visit http://localhost:3000/create-account to create an account on your local instance of My Little GPT. See the getting started section above for instructions on how to use My Little GPT.
All other platforms are supported via Docker.
- Docker Desktop
- Installs Docker and Docker Compose along with a nice GUI
- (NVIDIA GPUs only) CUDA (12.5 supported out-of-the-box)
- (NVIDIA GPUs only) NVIDIA Container Toolkit
git clone https://github.com/arrowban/my-little-gpt.git
cd my-little-gpt
-
Open the Docker Desktop app to start the Docker daemon
-
Run the following in your terminal from the root of the repository:
# Without a local inference server docker compose up # CPU docker compose --profile cpu up # NVIDIA GPU (CUDA) docker compose --profile cuda up
After everything starts up, visit http://localhost:3000/create-account to create an account on your local instance of My Little GPT. See the getting started section above for instructions on how to use My Little GPT.
By default, the API endpoint for creating a user on the backend is public. Please make sure to secure your self-hosted endpoints, or use the pocketbase admin dashboard to update the create rule for the users
table to Admin only
(null
).
ngrok provides one free stable domain (at the time of writing this) so you can use your own local instance of My Little GPT anywhere you want, as long as you leave it running on your computer at home (I use it on my phone most of the time).
-
Take note of your Authtoken
-
Create your one free domain in the "Domains" tab under "Cloud Edge"
-
Set up the ngrok configuration
ngrok.yml
:# From the root of the repository cp apps/ngrok/template.yml ngrok.yml
-
In
ngrok.yml
, replaceMY_AUTHTOKEN
with your Authtoken, andMY_DOMAIN
(in two places) with the domain that ngrok generated for you -
(Optional, but highly recommended) Edit the
web
tunnel inngrok.yml
following the ngrok docs for securing endpoints using basic auth- This could look like adding the following to the
ngrok.yml
under theweb
tunnel:basic_auth: - MY_USERNAME:MY_PASSWORD # Use something more secure than this
- You would then be able to use your private instance like normal, passing basic auth headers to the website via the URL, like this:
https://MY_USERNAME:[email protected]
- Prevents bad actors from abusing your public endpoints
- This could look like adding the following to the
npm run ngrok:web
# Without a local inference server
docker compose --profile ngrok-web up
# CPU
docker compose --profile cpu --profile ngrok-web up
# NVIDIA GPU (CUDA)
docker compose --profile cuda --profile ngrok-web up
After starting the web tunnel, you will be able to access your private instance of My Little GPT by visiting the domain generated for you by ngrok.
Contributions are welcome!
WARNING: The development workflow has only been tested on MacOS (Apple Silicon), sorry!
@my-little-gpt/llama-cpp
: llama-cpp-python inference server@my-little-gpt/ngrok
: Helper scripts for starting ngrok tunnels via Docker@my-little-gpt/pocketbase
: Backend for the chat app, made with Pocketbase@my-little-gpt/web
: Chat app, made with SvelteKit@my-little-gpt/eslint-config
: ESLint config@my-little-gpt/typescript-config
: TypeScript config
WARNING: dev
script only tested on MacOS (Apple Silicon).
Install dependencies with npm install
, then use the npm run dev
command to start the:
llama-cpp
server athttp://localhost:8000
pocketbase
server athttp://localhost:8080
web
server athttp://localhost:5173
Edits to the codebase will trigger a "hot reload" of the web app.
WARNING: build
, start
, ngrok:web
, and ngrok:llama-cpp
scripts only tested on MacOS (Apple Silicon).
format
: Format the codebase using Prettierlint
: Run a lint check with ESLintcheck
: Run a type check with TypeScriptbuild
: Build and setup all apps and packagesbuild:force
: Thebuild
script, without using the build cachestart
: Start My Little GPT in production modengrok:web
: Start My Little GPT with an ngrok tunnel to the chat app (localhost:3000)ngrok:llama-cpp
: Start ngrok tunnel to the llama-cpp server (localhost:8000)