Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: parse as AST #6788

Open
wants to merge 63 commits into
base: v5/develop
Choose a base branch
from

Conversation

rasteiner
Copy link
Contributor

@rasteiner rasteiner commented Nov 12, 2024

Description

This PR Introduces a new query parser and "runners" which allow Kirby to "compile" queries as ASTs so that they can subsequently execute faster. Based on initial tests, the runners seem to be about 30% faster (28% for Interpreted, 35% for Transpiled, the other "more experimental" one).

Summary of changes

  • Many new classes in Kirby\Query:

    • Kirby\Query\Parser: take care of splitting a string query first into tokens of specific types and then parsing these into an AST nodes tree
    • Kirby\Query\AST: all kind of nodes a query can consist of and the logic how they can be resolved
    • Kirby\Query\Visitor: classes that help resolving (visit) an AST and turn nodes either directly into processed results (Interpreter) or PHP code representations (Transpiler)
    • Kirby\Query\Runner: classes that take a query and with the help of the above either directly process the result (Interpreted, with in-memory cache) or turn it into PHP code and run it (Transpiled, cached in memory but also the pure PHP code as files for running the same code again in future reque)
  • The general process is:

    1. Query string is split into a flat sequence of tokens (Kirby\Query\Parser\Tokenizer).
    2. Tokens are parsed into a recursive abstract syntax tree (AST) (Kirby\Query\Parser\Parser).
    3. AST is then visited by an Interpreter that directly evaluates the query, or a Transpiler that transpiles it into PHP code.
    4. The whole process and the caching is handled by the two "runner" classes: Interpreted/Transpiled
  • The original Kirby\Query\Query class remains in place and chooses which runner to use based on the query.runner option.

  • All previous Query classes have been deprecated and will be removed in v7. Until then setting query.runner to legacy can still make use of them.

Reasoning

Separating the parsing and execution steps allows for more flexibility and better performance, since it allows us to cache the AST, either as PHP code or in memory during a request (or in some other out of process memory cache).

By adding the new runners as optional and leaving the old parser as default legacy in place, this minimizes the risk of issues and hopefully allows us to add this still to v5, only changing the default in v6.

Additional context

The parser is a predictive recursive descent parser, making the parsing step O(n). As such it can't support ambigous code, which leads to the following breaking change.

Changelog

Enhancements

  • Kirby queries can be run more reliable and performant due to new AST-based parsing
    • query.runner option to switch to interpreted or transpiled mode, the latter caches queries as PHP files
    • Closures in Kirby queries can now accept arguments
    • Queries support subscript access notation, e.g. page[site.pageMethodName]

Breaking changes

The new query runners are stricter in parsing queries. Ambiguous terms aren't supported anymore. In some edge cases, this will require you to access an array/object via subscript notation - in rare cases on the top level via the new this keyword, e.g. this["indentifier.with dots and spaces"].

Docs

Config option query.runner:

<?php 

return [
  // 'query.runner' => 'legacy', // default
  // 'query.runner' => 'interpreted', 
  // 'query.runner' => 'transpiled',
];

Receive arguments in closures:

$query = new Query('(foo) => foo.homer');
$data  = [];

$bar = $query->resolve($data);
$bar = $bar(['homer' => 'simpson']);
$this->assertSame('simpson', $bar);

Queries will still be tried to resolve first directly from the data context, e.g. when your query is null and your data ['null' => 'foo'] the result will directly be 'foo'. Same for user.username and ['user.username' => 'foo'].

Ready?

No! This is a work in progress. The proposed API could change.
Also, for some reason VS Code decided to merge the main branch into this one... As this PR isn't really ment to be merged "as is" I'll just ignore this mistake: this is mainly a place for dicussion.

  • In-code documentation (wherever needed)
  • Unit tests for fixed bug/feature
  • Tests and CI checks all pass

For review team

  • Add changes & docs to release notes draft in Notion

@rasteiner
Copy link
Contributor Author

Updates

  • the syntax user\.username has been implemented for now. The corresponding test has been updated to use that instead of the ambiguous user.username.
    Similarly, the following syntax is supported too: user\\\.username, for when the actual array key is really user\.username. (The first backslash escapes the second, the third escapes the dot.
  • intercept has been implemented.

Discussion

In regards to intercept, all evaluated objects and variables are now passed through intercept. Even stuff like literal strings or numbers. It could be worth discussing if this is actually useful or just slow. One idea could be to only intercept objects on which a member field or method is accessed. Like:

  • page('foo') would stay page('foo')
  • page('foo').title would become intercept(page('foo')).title.

@distantnative
Copy link
Member

One idea could be to only intercept objects on which a member field or method is accessed.
This would be an option if it also includes objects down the chain: E.g. page.files.first would also need the option to intercept the Files and the File object.

@rasteiner
Copy link
Contributor Author

rasteiner commented Nov 13, 2024

One idea could be to only intercept objects on which a member field or method is accessed.

This would be an option if it also includes objects down the chain: E.g. page.files.first would also need the option to intercept the Files and the File object.

Sure, page.files.first would be:

intercept(
  intercept(page).files
).first

page.files.first.delete would be:

intercept(
  intercept(
    intercept(page).files
  ).first
).delete

also foo.callMeBack(() => page('bar')).delete would be:

intercept(
  intercept(foo).callMeBack(() => page('bar'))
).delete

@rasteiner
Copy link
Contributor Author

rasteiner commented Nov 15, 2024

Having implemented the "user\.username" syntax, I've now realized that, while passing the unit test, I didn't really solve the underlying problem.
By allowing us to escape the dot, there's now an exception explicitly for the dot, but every other special symbol (even like whitespace) would still break the parsing:

$query = new Query('Shipping Address'); 
$this->assertSame('742 Evergreen Terrace', $query->resolve(['Shipping Address' => '742 Evergreen Terrace']));

I see 4 options:

  1. We ignore this: the only special character allowed in array keys remains the dot
  2. We escape literally everything, kinda like in regex
  3. We introduce an index operator (i.e. foo["Shipping Address"] or {"Shipping Address"})
  4. We introduce a new kind of string literal which is interpreted as identifier, like in MySQL queries the back ticks for complex column names `Shipping Address`

The index operator obviously gives the most bang for the buck as it allows any kind of expression as index (not just strings), but it either requires an unconventional syntax (the {} brackets in the example above) or, for the special case of wanting to access a "root level variable", we need to introduce a reserved keyword for the global context (foo in example)

@distantnative distantnative changed the title [v5] Feature / Perfomance: compiled queries Query: parse as AST Nov 20, 2024
@rasteiner
Copy link
Contributor Author

I've implememted the intercept logic like described above (see test here)

Other notable change would be moving the "resolver cache" (the cache that maps query strings to closures) to the Kirby\Query\Query class.
Before this change the cache was hidden away in the Runner classes, but we want that cache to be both static (so that one can simply create a new Query instance and benefit from the cache) as well as under the control of Query (should the "Query::entries" change during a request).

@distantnative
Copy link
Member

@rasteiner we are currently trying to wrap some things up for the v5 beta, that's why I'm not active here much these days, but will get back to it probably next week! :)

@distantnative distantnative force-pushed the v5/compiled-queries branch 2 times, most recently from b3998f5 to 4c4521c Compare January 18, 2025 18:15
@distantnative distantnative force-pushed the v5/compiled-queries branch 2 times, most recently from 41b2678 to d961f46 Compare January 19, 2025 10:57
@distantnative distantnative force-pushed the v5/compiled-queries branch 2 times, most recently from bc90c22 to c717f87 Compare January 19, 2025 20:12
 - cast data objects to arrays before passing to query runner
 - allow integers as identifiers of array keys when used after a dot
 - don't emit warnings when an array key is missing
- Implements the intercept mechanism for both runners
- Renames the AST Node classes with a "Node" suffix to avoid confusion with some PHP internal classes (like `Closure` -> `ClosureNode`)
@distantnative distantnative marked this pull request as ready for review January 24, 2025 22:30
@distantnative distantnative requested a review from a team January 24, 2025 22:31
@distantnative distantnative added this to the 5.0.0-beta.3 milestone Jan 24, 2025
* @deprecated 6.0.0
* @codeCoverageIgnore
*/
private function resolve_legacy(array|object $data = []): mixed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private function resolve_legacy(array|object $data = []): mixed
private function resolveLegacy(array|object $data = []): mixed

@distantnative distantnative modified the milestones: 5.0.0-beta.3, 5.1.0 Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants