Skip to content

[Store] Add support for ClickHouse #244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lyrixx
Copy link
Member

@lyrixx lyrixx commented Aug 1, 2025

Q A
Bug fix? no
New feature? yes
Docs? no
Issues
License MIT

We already have ClickHouse in our infrastructure, So I'm benchmarkin it.
And, let's share my code 🎉

As you can see, I let some extension points (for the metadata). AFAIK, it's not possible to
store structured / indexed JSON in CH. We have to store it as String instead.
We could instead use better typed structure, but Symfony cannot know the type, obviously.
So with string, if we want to filter the results, we need to use [JSONExtractString](https://clickhouse.com/docs/sql-reference/functions/json-functions#jsonextractstring).
And It's slow!

So with the current extensions points, I could write the following code:

class ClickHouseStore extends SymfonyClickHouseStore
{
    public function initialize(array $options = []): void
    {
        $sql = <<<'SQL'
            CREATE TABLE IF NOT EXISTS {{ table }} (
                id UUID,
                metadata String,
                embedding Array(Float32),
                crawlId UUID,
            ) ENGINE = MergeTree()
            ORDER BY id
        SQL;

        $this->execute('POST', $sql);
    }

    protected function formatVectorDocument(VectorDocument $document): array
    {
        $formatted = parent::formatVectorDocument($document);

        $formatted['crawlId'] = $document->metadata['crawlId'];

        return $formatted;
    }
}

And then:

$clickHouseStore = new ClickHouseStore(
    httpClient: HttpClient::createForBaseUri($_SERVER['CLICKHOUSE_URI']),
    databaseName: 'pocai',
    tableName: $tableName,
);

$documents = $clickHouseStore->query(
    $vector,
    [
        'where' => 'crawlId = {crawlId:String} AND id != {currentId:UUID} AND score < 0.1',
        'params' => [
            'crawlId' => $crawlPoId,
            'currentId' => $row['id'],
        ],
    ],
);

If the PR could be accepted, I'll write tests

@lyrixx lyrixx force-pushed the store-clickhouse branch 3 times, most recently from ef38fe0 to af54406 Compare August 1, 2025 12:52
@chr-hertel
Copy link
Member

the need for extension is not really ClickHouse specific, is it?
I think we need an extension point somewhere here anyways. a low level mapping/hydration thingy.
with #18 we have a similar topic i'd say - and with #158 we also have an issue of making too many assumption. totally open for ideas. :)

@lyrixx
Copy link
Member Author

lyrixx commented Aug 1, 2025

the need for extension is not really ClickHouse specific, is it?

Indeed 👍🏼 The current StoreInterface is a bit limited, but I eventually managed to do what I need to POC something.

I think we need an extension point somewhere here anyways. a low level mapping/hydration thingy.

Yes!


Should I work on tests? edit: done ✅

@lyrixx lyrixx force-pushed the store-clickhouse branch from af54406 to c45e8aa Compare August 1, 2025 13:28
Comment on lines +135 to +136
return $this
->httpClient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return $this
->httpClient
return $this->httpClient

Comment on lines +87 to +88
$httpClient = $this->createMock(HttpClientInterface::class);
$response = $this->createMock(ResponseInterface::class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use MockHttpClient?

@Kocal Kocal added Store Issues & PRs about the AI Store component and removed Store Issues & PRs about the AI Store component labels Aug 4, 2025
@fabpot fabpot changed the title [Store] Add support for ClickHouse Add support for ClickHouse Aug 4, 2025
@Kocal Kocal added the Store Issues & PRs about the AI Store component label Aug 4, 2025
@fabpot fabpot changed the title Add support for ClickHouse [Store] Add support for ClickHouse Aug 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Store Issues & PRs about the AI Store component
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants