Skip to content

arrowban/my-little-gpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– My Little GPT

My Little GPT is a simple AI assistant that can use open-source models running locally on your computer, the Anthropic API, or the OpenAI API.

Table of contents

Check out the hosted version of My Little GPT here, deployed directly from the hosted branch of this repository.

Previews

Screenshot (mobile)

my-little-gpt-screenshot

Screen recording

my-little-gpt-preview.mp4

🧐 Background

I used to use ChatGPT through its premium subscription. Eventually I decided that I don't want to keep spending $20 a month on it. However, I still value consistent access to GPT-4 for work and whatnot, so I decided to look into self-hosting.

There are some great open-source repos for this, but I was looking for a codebase that is only as complex as I need it to be, with a chat app that has the mobile UX I want. I ultimately decided to make my own end-to-end stack – my own little GPT.

I've open-sourced the codebase and written an installation guide to make it as easy as possible for others who are interested in self-hosting it themselves. I use My Little GPT to talk to open-source models locally on my computer for free, or pay just for my usage of hosted model APIs.

Right now, My Little GPT supports a minimal, straightforward chat experience. Over time, I may continue to add new features to the app.

πŸ“– Getting started

The links below will go to the hosted version of My Little GPT, but the same paths will work for your local version.

  1. Create an account (/create-account) or login (/login)
  2. Go to settings (/settings) and enter at least one of the following:
    • Local Base URL: The base URL of any OpenAI compatible URL you want to use
      • If running locally, set this value to:
        • MacOS (Apple Silicon), or any other non-Docker setup: http://localhost:8000/v1
        • CPU (Docker): http://llama-cpp-cpu:8000/v1
        • NVIDIA GPU (Docker): http://llama-cpp-cuda:8000/v1
    • Anthropic API Key: An API key to use for the Anthropic API
      • Required to use any Anthropic models
    • OpenAI API Key: An API key to use for the OpenAI API
      • Required to use any OpenAI models
  3. Go to the chat page (/chat), select a model from the model picker in the top navbar, and send your first message!
    • If chatting with a local model, every first message sent after starting the inference server or switching local models may take as long as a few minutes to be processed while the model loads into memory
    • A chat title is automatically generated by the model you are sending a message to (title quality may vary)
    • Feel free to change themes using the theme picker in the sidebar (themes provided by DaisyUI)

Available models

API providers

It is straightforward to support any provider supported by the Vercel AI SDK. Right now My Little GPT supports the following:

  • Anthropic: claude-3-5-sonnet-20240620, claude-3-opus-20240229, claude-3-sonnet-20240229, claude-3-haiku-20240307
  • OpenAI: gpt-4o, gpt-4-turbo, gpt-4, gpt-3.5-turbo

Local

The inference server can use any models compatible with llama.cpp, and comes configured with the following models:

  • Meta Llama 3.1 8B Instruct
    • llama-3.1-small: Quantized (q4_k_m) to be less than 5GB in size
    • llama-3.1: Quantized (q8_0) to be less than 9GB in size
  • Mistral 7B Instruct v0.3
    • mistral-7b-small: Quantized (q4_k_m) to be less than 5GB in size
    • mistral-7b: Quantized (q8_0) to be less than 8GB in size
  • Qwen2 7B Instruct
    • qwen2-7b-small: Quantized (q4_k_m) to be less than 5GB in size
    • qwen2-7b: Quantized (q8_0) to be less than 9GB in size

Edit the configuration file at apps/llama-cpp/config.json to add other models. Reference the llama-cpp-python docs for more info.

πŸ’» Local installation

Hardware requiremments

If you are just using hosted model APIs, any computer that supports Docker will probably work.

Local inference server

If you have at least 8GB of memory (RAM for CPU or VRAM for GPU), you should at least be able to run models that have quantized versions < 8GB in size. See here for more information on models that come pre-configured.

For higher inference speeds, NVIDIA GPU or M1/M2/M3 Mac (Apple Silicon) is recommended.

MacOS (Apple Silicon)

Install requirements

You can install the requirements yourself from the links above, or follow these instructions:

Clone repository

git clone https://github.com/arrowban/my-little-gpt.git
cd my-little-gpt

Install and build dependencies

First, set up environment variables for the chat app using apps/web/.env.example:

cp apps/web/.env.example apps/web/.env.local

Then install and build dependencies:

npm install
npm run build

Start the chat app, backend, and inference server

  • Open the Docker Desktop app to start the Docker daemon

  • Run the following in your terminal from the root of the repository:

    npm run start

After everything starts up, visit http://localhost:3000/create-account to create an account on your local instance of My Little GPT. See the getting started section above for instructions on how to use My Little GPT.

Linux, MacOS, and Windows

All other platforms are supported via Docker.

Install requirements

Clone repository

git clone https://github.com/arrowban/my-little-gpt.git
cd my-little-gpt

Start the chat app, backend, and inference server

  • Open the Docker Desktop app to start the Docker daemon

  • Run the following in your terminal from the root of the repository:

    # Without a local inference server
    docker compose up
    
    # CPU
    docker compose --profile cpu up
    
    # NVIDIA GPU (CUDA)
    docker compose --profile cuda up

After everything starts up, visit http://localhost:3000/create-account to create an account on your local instance of My Little GPT. See the getting started section above for instructions on how to use My Little GPT.

☁️ Self-host

Requirements

By default, the API endpoint for creating a user on the backend is public. Please make sure to secure your self-hosted endpoints, or use the pocketbase admin dashboard to update the create rule for the users table to Admin only (null).

ngrok

ngrok provides one free stable domain (at the time of writing this) so you can use your own local instance of My Little GPT anywhere you want, as long as you leave it running on your computer at home (I use it on my phone most of the time).

Set up ngrok

  1. Create an ngrok account

  2. Take note of your Authtoken

  3. Create your one free domain in the "Domains" tab under "Cloud Edge"

  4. Set up the ngrok configuration ngrok.yml:

    # From the root of the repository
    cp apps/ngrok/template.yml ngrok.yml
  5. In ngrok.yml, replace MY_AUTHTOKEN with your Authtoken, and MY_DOMAIN (in two places) with the domain that ngrok generated for you

  6. (Optional, but highly recommended) Edit the web tunnel in ngrok.yml following the ngrok docs for securing endpoints using basic auth

    • This could look like adding the following to the ngrok.yml under the web tunnel:
      basic_auth:
        - MY_USERNAME:MY_PASSWORD # Use something more secure than this
    • You would then be able to use your private instance like normal, passing basic auth headers to the website via the URL, like this: https://MY_USERNAME:[email protected]
    • Prevents bad actors from abusing your public endpoints

Start ngrok tunnels

MacOS (Apple Silicon)
npm run ngrok:web
Linux, MacOS, and Windows
  # Without a local inference server
  docker compose --profile ngrok-web up

  # CPU
  docker compose --profile cpu --profile ngrok-web up

  # NVIDIA GPU (CUDA)
  docker compose --profile cuda --profile ngrok-web up

After starting the web tunnel, you will be able to access your private instance of My Little GPT by visiting the domain generated for you by ngrok.

πŸ§‘β€πŸ’» The code

Contributions are welcome!

WARNING: The development workflow has only been tested on MacOS (Apple Silicon), sorry!

Apps and Packages

  • @my-little-gpt/llama-cpp: llama-cpp-python inference server
  • @my-little-gpt/ngrok: Helper scripts for starting ngrok tunnels via Docker
  • @my-little-gpt/pocketbase: Backend for the chat app, made with Pocketbase
  • @my-little-gpt/web: Chat app, made with SvelteKit
  • @my-little-gpt/eslint-config: ESLint config
  • @my-little-gpt/typescript-config: TypeScript config

Development

WARNING: dev script only tested on MacOS (Apple Silicon).

Install dependencies with npm install, then use the npm run dev command to start the:

  • llama-cpp server at http://localhost:8000
  • pocketbase server at http://localhost:8080
  • web server at http://localhost:5173

Edits to the codebase will trigger a "hot reload" of the web app.

Other scripts

WARNING: build, start, ngrok:web, and ngrok:llama-cpp scripts only tested on MacOS (Apple Silicon).

  • format: Format the codebase using Prettier
  • lint: Run a lint check with ESLint
  • check: Run a type check with TypeScript
  • build: Build and setup all apps and packages
  • build:force: The build script, without using the build cache
  • start: Start My Little GPT in production mode
  • ngrok:web: Start My Little GPT with an ngrok tunnel to the chat app (localhost:3000)
  • ngrok:llama-cpp: Start ngrok tunnel to the llama-cpp server (localhost:8000)