🤖 My Little GPT

My Little GPT is a simple AI assistant that can use open-source models running locally on your computer, the Anthropic API, or the OpenAI API.

🧐 Background

I used to use ChatGPT through its premium subscription. Eventually I decided that I don't want to keep spending $20 a month on it. However, I still value consistent access to GPT-4 for work and whatnot, so I decided to look into self-hosting.

There are some great open-source repos for this, but I was looking for a codebase that is only as complex as I need it to be, with a chat app that has the mobile UX I want. I ultimately decided to make my own end-to-end stack – my own little GPT.

I've open-sourced the codebase and written an installation guide to make it as easy as possible for others who are interested in self-hosting it themselves. I use My Little GPT to talk to open-source models locally on my computer for free, or pay just for my usage of hosted model APIs.

Right now, My Little GPT supports a minimal, straightforward chat experience. Over time, I may continue to add new features to the app.

📖 Getting started

The links below will go to the hosted version of My Little GPT, but the same paths will work for your local version.

Create an account (/create-account) or login (/login)
Go to settings (/settings) and enter at least one of the following:
- Local Base URL: The base URL of any OpenAI compatible URL you want to use
  - If running locally, set this value to:
    - MacOS (Apple Silicon), or any other non-Docker setup: http://localhost:8000/v1
    - CPU (Docker): http://llama-cpp-cpu:8000/v1
    - NVIDIA GPU (Docker): http://llama-cpp-cuda:8000/v1
- Anthropic API Key: An API key to use for the Anthropic API
  - Required to use any Anthropic models
- OpenAI API Key: An API key to use for the OpenAI API
  - Required to use any OpenAI models
Go to the chat page (/chat), select a model from the model picker in the top navbar, and send your first message!
- If chatting with a local model, every first message sent after starting the inference server or switching local models may take as long as a few minutes to be processed while the model loads into memory
- A chat title is automatically generated by the model you are sending a message to (title quality may vary)
- Feel free to change themes using the theme picker in the sidebar (themes provided by DaisyUI)

Available models

API providers

It is straightforward to support any provider supported by the Vercel AI SDK. Right now My Little GPT supports the following:

Anthropic: claude-3-5-sonnet-20240620, claude-3-opus-20240229, claude-3-sonnet-20240229, claude-3-haiku-20240307
OpenAI: gpt-4o, gpt-4-turbo, gpt-4, gpt-3.5-turbo

Local

The inference server can use any models compatible with llama.cpp, and comes configured with the following models:

Meta Llama 3.1 8B Instruct
- llama-3.1-small: Quantized (q4_k_m) to be less than 5GB in size
- llama-3.1: Quantized (q8_0) to be less than 9GB in size
Mistral 7B Instruct v0.3
- mistral-7b-small: Quantized (q4_k_m) to be less than 5GB in size
- mistral-7b: Quantized (q8_0) to be less than 8GB in size
Qwen2 7B Instruct
- qwen2-7b-small: Quantized (q4_k_m) to be less than 5GB in size
- qwen2-7b: Quantized (q8_0) to be less than 9GB in size

Edit the configuration file at apps/llama-cpp/config.json to add other models. Reference the llama-cpp-python docs for more info.

💻 Local installation

Hardware requiremments

If you are just using hosted model APIs, any computer that supports Docker will probably work.

Local inference server

If you have at least 8GB of memory (RAM for CPU or VRAM for GPU), you should at least be able to run models that have quantized versions < 8GB in size. See here for more information on models that come pre-configured.

For higher inference speeds, NVIDIA GPU or M1/M2/M3 Mac (Apple Silicon) is recommended.

MacOS (Apple Silicon)

Install requirements

Docker Desktop
- Installs Docker and Docker Compose along with a nice GUI
Node.js

You can install the requirements yourself from the links above, or follow these instructions:

Install Homebrew, then run the following in your terminal:
```
brew install node@20
```
Download and install Docker Desktop
Install Miniforge
- Make sure your terminal shows that you are using the base environment, and that python --version prints a version of python >= 3.10

Clone repository

git clone https://github.com/arrowban/my-little-gpt.git
cd my-little-gpt

Install and build dependencies

First, set up environment variables for the chat app using apps/web/.env.example:

cp apps/web/.env.example apps/web/.env.local

Then install and build dependencies:

npm install
npm run build

Start the chat app, backend, and inference server

Open the Docker Desktop app to start the Docker daemon
Run the following in your terminal from the root of the repository:
```
npm run start
```

After everything starts up, visit http://localhost:3000/create-account to create an account on your local instance of My Little GPT. See the getting started section above for instructions on how to use My Little GPT.

Linux, MacOS, and Windows

All other platforms are supported via Docker.

Install requirements

Docker Desktop
- Installs Docker and Docker Compose along with a nice GUI
(NVIDIA GPUs only) CUDA (12.5 supported out-of-the-box)
(NVIDIA GPUs only) NVIDIA Container Toolkit

Clone repository

git clone https://github.com/arrowban/my-little-gpt.git
cd my-little-gpt

Start the chat app, backend, and inference server

Open the Docker Desktop app to start the Docker daemon

Run the following in your terminal from the root of the repository:

# Without a local inference server
docker compose up

# CPU
docker compose --profile cpu up

# NVIDIA GPU (CUDA)
docker compose --profile cuda up

After everything starts up, visit http://localhost:3000/create-account to create an account on your local instance of My Little GPT. See the getting started section above for instructions on how to use My Little GPT.

☁️ Self-host

Requirements

By default, the API endpoint for creating a user on the backend is public. Please make sure to secure your self-hosted endpoints, or use the pocketbase admin dashboard to update the create rule for the users table to Admin only (null).

ngrok

ngrok provides one free stable domain (at the time of writing this) so you can use your own local instance of My Little GPT anywhere you want, as long as you leave it running on your computer at home (I use it on my phone most of the time).

Set up ngrok

Create an ngrok account
Take note of your Authtoken
Create your one free domain in the "Domains" tab under "Cloud Edge"

Set up the ngrok configuration ngrok.yml:

# From the root of the repository
cp apps/ngrok/template.yml ngrok.yml

In ngrok.yml, replace MY_AUTHTOKEN with your Authtoken, and MY_DOMAIN (in two places) with the domain that ngrok generated for you
(Optional, but highly recommended) Edit the web tunnel in ngrok.yml following the ngrok docs for securing endpoints using basic auth
- This could look like adding the following to the ngrok.yml under the web tunnel:
```
basic_auth:
  - MY_USERNAME:MY_PASSWORD # Use something more secure than this
```
- You would then be able to use your private instance like normal, passing basic auth headers to the website via the URL, like this: https://MY_USERNAME:[email protected]
- Prevents bad actors from abusing your public endpoints

Start ngrok tunnels

MacOS (Apple Silicon)

npm run ngrok:web

Linux, MacOS, and Windows

  # Without a local inference server
  docker compose --profile ngrok-web up

  # CPU
  docker compose --profile cpu --profile ngrok-web up

  # NVIDIA GPU (CUDA)
  docker compose --profile cuda --profile ngrok-web up

After starting the web tunnel, you will be able to access your private instance of My Little GPT by visiting the domain generated for you by ngrok.

🧑‍💻 The code

Contributions are welcome!

WARNING: The development workflow has only been tested on MacOS (Apple Silicon), sorry!

Apps and Packages

@my-little-gpt/llama-cpp: llama-cpp-python inference server
@my-little-gpt/ngrok: Helper scripts for starting ngrok tunnels via Docker
@my-little-gpt/pocketbase: Backend for the chat app, made with Pocketbase
@my-little-gpt/web: Chat app, made with SvelteKit
@my-little-gpt/eslint-config: ESLint config
@my-little-gpt/typescript-config: TypeScript config

Development

WARNING: dev script only tested on MacOS (Apple Silicon).

Install dependencies with npm install, then use the npm run dev command to start the:

llama-cpp server at http://localhost:8000
pocketbase server at http://localhost:8080
web server at http://localhost:5173

Edits to the codebase will trigger a "hot reload" of the web app.

Other scripts

WARNING: build, start, ngrok:web, and ngrok:llama-cpp scripts only tested on MacOS (Apple Silicon).

format: Format the codebase using Prettier
lint: Run a lint check with ESLint
check: Run a type check with TypeScript
build: Build and setup all apps and packages
build:force: The build script, without using the build cache
start: Start My Little GPT in production mode
ngrok:web: Start My Little GPT with an ngrok tunnel to the chat app (localhost:3000)
ngrok:llama-cpp: Start ngrok tunnel to the llama-cpp server (localhost:8000)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
apps		apps
packages		packages
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
.npmrc		.npmrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
turbo.json		turbo.json

License

arrowban/my-little-gpt

Folders and files

Latest commit

History

Repository files navigation

🤖 My Little GPT

Table of contents

Previews

Screenshot (mobile)

Screen recording

🧐 Background

📖 Getting started

Available models

API providers

Local

💻 Local installation

Hardware requiremments

Local inference server

MacOS (Apple Silicon)

Install requirements

Clone repository

Install and build dependencies

Start the chat app, backend, and inference server

Linux, MacOS, and Windows

Install requirements

Clone repository

Start the chat app, backend, and inference server

☁️ Self-host

Requirements

ngrok

Set up ngrok

Start ngrok tunnels

MacOS (Apple Silicon)

Linux, MacOS, and Windows

🧑‍💻 The code

Apps and Packages

Development

Other scripts

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages