Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for JSON schema #6

Open
oskardudycz opened this issue Jul 6, 2024 · 6 comments
Open

Add support for JSON schema #6

oskardudycz opened this issue Jul 6, 2024 · 6 comments
Labels
good first issue Good for newcomers

Comments

@oskardudycz
Copy link
Contributor

This is needed not only for validation but also for detecting if the property is an array and correctly resolving the nested queries if the element is in the object.

@vr-varad
Copy link

vr-varad commented Jul 9, 2024

@oskardudycz I would like to work on this issue.

@oskardudycz
Copy link
Contributor Author

@vr-varad great, much appreciated! Feel invited! 🙂

Here's my suggestion on how I see the breakdown. Each of that can be a separate Pull Request (and that's what I suggest 😉 ):

  1. Add the capability to pass the JSON schema to the Pongo. It should be compatible with MongoDB schema and a regular JSON schema. From what I saw, Mongo, in their schema, allows the definition of the "BSON" type, which, in the regular approach, is not valid, as Pongo is not using BSON.
  2. Store JSON schema in the database. I think that it'd be good to define the Schema Registry table looking more or less as:
CREATE TABLE json_schema_registry (
    id SERIAL PRIMARY KEY,
    schema_name VARCHAR(255) NOT NULL,
    version INT NOT NULL,
    schema JSONB NOT NULL,
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
    UNIQUE (schema_name, version)
);

This should enable storing multiple schema versions. We'd also need to extend the document collection with a schema version nullable column, as each document can have a different schema (or none). By default, the schema name can be derived from the collection name, and the initial one can have 1.
3. Use the schema during insert or update to validate documents. I think that this should be optional. If someone doesn't care about the validation as they're doing it in the application layer, then they may don't want to have additional roundtrip to get some performance optimisation (I'm not sure how Mongo does that). Schemas should be cached in memory so as not to have to be read on each operation.
4. Use it in query parsin. In Mongo, you can do such query:

collection.find({ tags: 'tag1' });

Without knowing if tags are arrays, we don't know if we should do a string equality comparison or check if the provided value is inside the array. Doing both is not always performant. Schema can help here, as then we will know what type.

@vr-varad thoughts?

If you're open, we can tackle that gradually or split the work.

I'm also happy to expand if something I wrote is unclear or if you disagree with some of that.

@oskardudycz oskardudycz added the good first issue Good for newcomers label Jul 9, 2024
@vr-varad
Copy link

vr-varad commented Jul 9, 2024

@oskardudycz I'm excited to start by tackling the first item – adding JSON schema capability to Pongo. I'll work on this and create a Pull Request soon for your feedback.

@oskardudycz
Copy link
Contributor Author

@oskardudycz I'm excited to start by tackling the first item – adding JSON schema capability to Pongo. I'll work on this and create a Pull Request soon for your feedback.

Thank you, looking forward to that!

@oskardudycz
Copy link
Contributor Author

FYI, I added some initial work around the schema. It doesn't include yet the collection schema, just collections list with names and types: #73.

@vr-varad
Copy link

@oskardudycz I have also worked some on it but wasn't sure that it would work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants