Skip to content
/ pgai Public
forked from nilp0inter/pgai

Bring AI models closer to your PostgreSQL data

License

Notifications You must be signed in to change notification settings

pruizpar/pgai

Repository files navigation

Timescale logo

pgai enables you to handle your AI workflows from your database

Docs SLACK Try Timescale for free

pgai simplifies the process of building similarity search, and Retrieval Augmented Generation(RAG) apps with PostgreSQL.

Directly from your existing PostgreSQL database, pgai empowers you to:

Timescale offers the following AI journeys:

To get the big picture, read PostgreSQL Hybrid Search Using pgvector.

pgai Prerequisites

Before you start working with pgai, you need:

How to get pgai

Use a pre-built Docker image

Follow these instructions to use pgai in docker with a pre-built image.

Enable pgai in a Timescale service

To enable pgai:

  1. Create a new Timescale Service.

    If you want to use an existing service, pgai is added as an available extension on the first maintenance window after the pgai release date.

  2. Connect to your Timescale service:

    psql -d "postgres://<username>:<password>@<host>:<port>/<database-name>"
  3. Create the pgai extension:

    CREATE EXTENSION IF NOT EXISTS ai CASCADE;

    The CASCADE automatically installs the plpython3u and pgvector dependencies.

You now Use pgai with your API keys and Try out the AI models.

Use pgai with your API keys

Handling API keys when using pgai from psql

  1. Set your OpenAI key as an environment variable in your shell:

    OPENAI_API_KEY="this-is-my-super-secret-api-key-dont-tell"
  2. Connect to your database while setting a psql variable to your API key using a psql command line argument.

    psql -d "postgres://<username>:<password>@<host>:<port>/<database-name>" -v OPENAI_API_KEY=$OPENAI_API_KEY

    Your API key is now available as a psql variable in your psql session.

  3. Pass your API key to your parameterized query:

    SELECT * 
    FROM openai_list_models($1)
    ORDER BY created DESC
    \bind :OPENAI_API_KEY
    \g

    Use bind to pass the value of OPENAI_API_KEY as a parameterized variable.

The \bind metacommand is available in psql version 16+.

Handling API keys when using pgai from python

  1. In your Python environment, include the dotenv and postgres driver packages:

    pip install python-dotenv
    pip install psycopg2-binary
  2. Set your OpenAI key in a .env file or as an environment variable:

    OPENAI_API_KEY="this-is-my-super-secret-api-key-dont-tell"
    DB_URL="your connection string"
  3. Pass your API key as a parameter to your queries:

    import os
    from dotenv import load_dotenv
    
    load_dotenv()
    
    OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
    DB_URL = os.environ["DB_URL"]
    
    import psycopg2
    
    with psycopg2.connect(DB_URL) as conn:
        with conn.cursor() as cur:
            # pass the API key as a parameter to the query. don't use string manipulations
            cur.execute("SELECT * FROM openai_list_models(%s) ORDER BY created DESC", (OPENAI_API_KEY,))
            records = cur.fetchall()

    Do not use string manipulation to embed the key as a literal in the SQL query.

Usage

This section shows you how to use AI directly from your database using SQL.

List models

List the models supported by OpenAI functions in pgai.

SELECT * 
FROM openai_list_models($1)
ORDER BY created DESC
\bind :OPENAI_API_KEY
\g

The data returned looks like:

             id              |        created         |    owned_by     
-----------------------------+------------------------+-----------------
 gpt-4o-test-shared          | 2024-05-20 13:06:56-05 | system
 gpt-4o-2024-05-13           | 2024-05-10 14:08:52-05 | system
 gpt-4o                      | 2024-05-10 13:50:49-05 | system
 gpt-4-turbo-2024-04-09      | 2024-04-08 13:41:17-05 | system
 gpt-4-turbo                 | 2024-04-05 18:57:21-05 | system
 ...
(N rows)

Tokenize

To encode content and count the number of tokens returned:

  • Encode content into an array of tokens.

    SELECT openai_tokenize
    ( 'text-embedding-ada-002'
    , 'Timescale is Postgres made Powerful'
    );

    The data returned looks like:

                openai_tokenize             
    ----------------------------------------
     {19422,2296,374,3962,18297,1903,75458}
    (1 row)
    
  • Count the number of tokens generated:

    SELECT array_length
    ( openai_tokenize
      ( 'text-embedding-ada-002'
      , 'Timescale is Postgres made Powerful'
      )
    , 1
    );

    The data returned looks like:

     array_length 
    --------------
                7
    (1 row)
    

Detokenize

Turn tokenized content into natural language:

SELECT openai_detokenize('text-embedding-ada-002', array[1820,25977,46840,23874,389,264,2579,58466]);

The data returned looks like:

             openai_detokenize              
--------------------------------------------
 the purple elephant sits on a red mushroom
(1 row)

Embed

Generate embeddings using a specified model.

  • Request an embedding using a specific model.

    SELECT openai_embed
    ( $1
    , 'text-embedding-ada-002'
    , 'the purple elephant sits on a red mushroom'
    )
    \bind :OPENAI_API_KEY
    \g

    The data returned looks like:

                          openai_embed                      
    --------------------------------------------------------
     [0.005978798,-0.020522336,...-0.0022857306,-0.023699166]
    (1 row)
    
  • Specify the number of dimensions you want in the returned embedding:

    SELECT openai_embed
    ( $1
    , 'text-embedding-ada-002'
    , 'the purple elephant sits on a red mushroom'
    , _dimensions=>768
    )
    \bind :OPENAI_API_KEY
    \g

    This only works for certain models.

  • Pass a user identifier.

    SELECT openai_embed
    ( $1
    , 'text-embedding-ada-002'
    , 'the purple elephant sits on a red mushroom'
    , _user=>'bac1aaf7-4460-42d3-bba5-2957b057f4a5'
    )
    \bind :OPENAI_API_KEY
    \g
  • Pass an array of text inputs.

    SELECT openai_embed
    ( $1
    , 'text-embedding-ada-002'
    , array['Timescale is Postgres made Powerful', 'the purple elephant sits on a red mushroom']
    )
    \bind :OPENAI_API_KEY
    \g
  • Provide tokenized input.

    select openai_embed
    ( $1
    , 'text-embedding-ada-002'
    , array[1820,25977,46840,23874,389,264,2579,58466]
    )
    \bind :OPENAI_API_KEY
    \g

Chat_complete

Generate text or complete a chat:

  • Have an LLM generate text from a prompt:

    -- the following two metacommands cause the raw query results to be printed
    -- without any decoration
    \pset tuples_only on
    \pset format unaligned
    
    SELECT jsonb_pretty
    (
      openai_chat_complete
      ( $1
      , 'gpt-4o'
      , jsonb_build_array
        ( jsonb_build_object('role', 'system', 'content', 'you are a helpful assistant')
        , jsonb_build_object('role', 'user', 'content', 'what is the typical weather like in Alabama in June')
        )
      )
    )
    \bind :OPENAI_API_KEY
    \g

    The data returned looks like:

    {
        "id": "chatcmpl-9RgehyQ0aydAkQajrN6Oe0lepERKC",
        "model": "gpt-4o-2024-05-13",
        "usage": {
            "total_tokens": 332,
            "prompt_tokens": 26,
            "completion_tokens": 306
        },
        "object": "chat.completion",
        "choices": [
            {
                "index": 0,
                "message": {
                    "role": "assistant",
                    "content": "In Alabama, June typically ushers in the summer season with warm to hot temperatures and relatively high humidity. Here’s a general overview of what you can expect:\n\n1. **Temperature**: \n   - Average daytime highs usually range from the mid-80s to low 90s Fahrenheit (around 29-35°C).\n   - Nighttime temperatures often fall to the mid-60s to mid-70s Fahrenheit (18-24°C).\n\n2. **Humidity**:\n   - Humidity levels can be quite high, making the temperatures feel even warmer. The mix of heat and humidity can lead to a muggy atmosphere.\n\n3. **Rainfall**:\n   - June is part of the wet season for Alabama, so you can expect a fair amount of rainfall. Thunderstorms are relatively common, often in the afternoons and evenings.\n   - The precipitation can be sporadic, with sudden downpours that can clear up quickly.\n\n4. **Sunshine**:\n   - There are plenty of sunny days, though the sunshine can be intense. Ultraviolet (UV) levels are high, so sun protection is important.\n\n5. **Overall Climate**:\n   - Generally, the climate in Alabama in June is characterized by a typical Southeastern U.S. summer: hot, humid, and occasionally stormy. \n\nIf you’re planning a visit or activities in Alabama during June, it’s a good idea to stay hydrated, wear light clothing, and keep an eye on the weather forecast for any potential thunderstorms."
                },
                "logprobs": null,
                "finish_reason": "stop"
            }
        ],
        "created": 1716385851,
        "system_fingerprint": "fp_729ea513f7"
    }
  • Return the content as text from a specific message in the choices array.

    openai_chat_complete returns a jsonb object containing the response from the API. You can use jsonb operators and functions to manipulate the object returned. For example, the following query returns the content as text from the first message in the choices array.

    -- the following two metacommands cause the raw query results to be printed
    -- without any decoration
    \pset tuples_only on
    \pset format unaligned
    
    select openai_chat_complete
    ( 'gpt-4o'
    , :'OPENAI_API_KEY'
    , jsonb_build_array
      ( jsonb_build_object('role', 'system', 'content', 'you are a helpful assistant')
      , jsonb_build_object('role', 'user', 'content', 'what is the typical weather like in Alabama in June')
      )
    )->'choices'->0->'message'->>'content'
    ;

    The data returned looks like:

    In June, Alabama generally experiences warm to hot weather as it transitions into summer. Typical conditions include:
    
    1. **Temperatures**: Daytime highs usually range from the mid-80s to low 90s Fahrenheit (around 29-34°C). Nighttime lows typically range from the mid-60s to low 70s Fahrenheit (around 18-23°C).
    
    2. **Humidity**: June tends to be quite humid, which can make the temperatures feel even warmer. High humidity levels are characteristic of Alabama summers.
    
    3. **Precipitation**: June is part of the wetter season in Alabama, with regular afternoon thunderstorms being common. Rainfall can vary, but you can expect an average of about 4 to 5 inches (around 100-125 mm) of rain for the month.
    
    4. **Sunshine**: There are usually plenty of sunny days, although the frequent thunderstorms can lead to overcast skies at times.
    
    Overall, if you're planning to visit Alabama in June, be prepared for hot and humid conditions, and keep an umbrella or rain jacket handy for those afternoon storms.
    

Moderate

Check if content is classified as potentially harmful:

-- the following two metacommands cause the raw query results to be printed
-- without any decoration
\pset tuples_only on
\pset format unaligned

select jsonb_pretty
(
  openai_moderate
  ( $1
  , 'text-moderation-stable'
  , 'I want to kill them.'
  )
)
\bind :OPENAI_API_KEY
\g

The data returned looks like:

{
    "id": "modr-9RsN6qZWoZYm1AK4mtrKuEjfOcMWp",
    "model": "text-moderation-007",
    "results": [
        {
            "flagged": true,
            "categories": {
                "hate": false,
                "sexual": false,
                "violence": true,
                "self-harm": false,
                "self_harm": false,
                "harassment": true,
                "sexual/minors": false,
                "sexual_minors": false,
                "hate/threatening": false,
                "hate_threatening": false,
                "self-harm/intent": false,
                "self_harm_intent": false,
                "violence/graphic": false,
                "violence_graphic": false,
                "harassment/threatening": true,
                "harassment_threatening": true,
                "self-harm/instructions": false,
                "self_harm_instructions": false
            },
            "category_scores": {
                "hate": 0.2324090600013733,
                "sexual": 0.00001205232911161147,
                "violence": 0.997192919254303,
                "self-harm": 0.0000023696395601291442,
                "self_harm": 0.0000023696395601291442,
                "harassment": 0.5278584957122803,
                "sexual/minors": 0.00000007506431387582779,
                "sexual_minors": 0.00000007506431387582779,
                "hate/threatening": 0.024183575063943863,
                "hate_threatening": 0.024183575063943863,
                "self-harm/intent": 0.0000017161115692942985,
                "self_harm_intent": 0.0000017161115692942985,
                "violence/graphic": 0.00003399916022317484,
                "violence_graphic": 0.00003399916022317484,
                "harassment/threatening": 0.5712487697601318,
                "harassment_threatening": 0.5712487697601318,
                "self-harm/instructions": 0.000000001132860139030356,
                "self_harm_instructions": 0.000000001132860139030356
            }
        }
    ]
}

About Timescale

Timescale Cloud is a high-performance developer focused cloud that provides PostgreSQL services enhanced with our blazing fast vector search. Timescale services are built using TimescaleDB and PostgreSQL extensions, like this one. Timescale Cloud provides high availability, streaming backups, upgrades over time, roles and permissions, and great security.

TimescaleDB is an open-source time-series database designed for scalability and performance, built on top of PostgreSQL. It provides SQL support for time-series data, allowing users to leverage PostgreSQL's rich ecosystem while optimizing for high ingest rates and fast query performance. TimescaleDB includes features like automated data retention policies, compression and continuous aggregates, making it ideal for applications like monitoring, IoT, AI and real-time analytics.

About

Bring AI models closer to your PostgreSQL data

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • PLpgSQL 50.9%
  • Shell 21.4%
  • Makefile 20.7%
  • Dockerfile 7.0%