Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unneccessary sort operation when merge output is already known to be sorted. #8728

Open
nicktobey opened this issue Jan 9, 2025 · 0 comments
Labels
sql Issue with SQL

Comments

@nicktobey
Copy link
Contributor

Example:

create table onepk(pk int primary key, c0 int);
create table onepk2 like onepk;
analyze table onepk update histogram on (pk) using data '{"row_count": 1000}';
analyze table onepk2 update histogram on (pk) using data '{"row_count": 1000}';
describe plan select onepk.c0 from onepk join onepk2 using (pk) order by pk;

Observed output:

+-----------------------------------------+
| plan                                    |
+-----------------------------------------+
| Project                                 |
|  ├─ columns: [onepk.c0]                 |
|  └─ Sort(onepk.pk ASC)                  |
|      └─ MergeJoin                       |
|          ├─ cmp: (onepk.pk = onepk2.pk) |
|          ├─ IndexedTableAccess(onepk)   |
|          │   ├─ index: [onepk.pk]       |
|          │   ├─ filters: [{[NULL, ∞)}]  |
|          │   └─ columns: [pk c0]        |
|          └─ IndexedTableAccess(onepk2)  |
|              ├─ index: [onepk2.pk]      |
|              ├─ filters: [{[NULL, ∞)}]  |
|              └─ columns: [pk]           |
+-----------------------------------------+

Expected output: the Sort node is extraneous and can be removed: the merge join is guaranteed to produce outputs in the same order as the chosen indexes. Including a sort operation can significantly slow down execution if there's also a LIMIT clause, since all results must be pulled in order to sort them.

Solution: if the sort order is a prefix of either of the indexes used in the merge join, it should be safe to remove.

We should make sure we also handle the case of ORDER BY pk DESC by reversing both indexes used in the merge join.

@nicktobey nicktobey added the sql Issue with SQL label Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sql Issue with SQL
Projects
None yet
Development

No branches or pull requests

1 participant