Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] FULL OUTER JOIN not commutative, and not giving correct results #8735

Closed
samjewell opened this issue Jan 10, 2025 · 0 comments · Fixed by dolthub/go-mysql-server#2814
Closed
Labels
analyzer bug Something isn't working customer issue good repro Easily reproducible bugs sql Issue with SQL

Comments

@samjewell
Copy link

samjewell commented Jan 10, 2025

I love go-mysql-server - thanks for all your work on this project.

I got an LLM to help me write this bug report. Forgive me if it's a little verbose.

Description:

I've discovered a bug in the implementation of FULL OUTER JOIN in go-mysql-server. The order of tables in the JOIN clause affects the result set, which is incorrect behaviour for a FULL OUTER JOIN. This operation should be commutative, meaning the order of tables should not impact the output.

Steps to reproduce

  1. Create and populate two tables:
CREATE TABLE teams (
  team VARCHAR(100),
  namespace VARCHAR(100)
);
INSERT INTO teams(team, namespace) VALUES ('sam', 'sam1');
INSERT INTO teams(team, namespace) VALUES ('sam', 'sam2');
INSERT INTO teams(team, namespace) VALUES ('janos', 'janos1');

CREATE TABLE traces (
  namespace VARCHAR(100),
  value INT
);
INSERT INTO traces(namespace, value) VALUES ('janos1', '400');
INSERT INTO traces(namespace, value) VALUES ('0', '500');
  1. Run the following query:
SELECT
  team,
  sum(value)
FROM traces
FULL OUTER JOIN teams
  ON teams.namespace = traces.namespace
GROUP BY team;
  1. Run the same query with the table order reversed:
SELECT
  team,
  sum(value)
FROM teams
FULL OUTER JOIN traces
  ON teams.namespace = traces.namespace
GROUP BY team;

Expected behaviour:

Both queries should return the same result set with three rows:

team sum
NULL 500
janos 400
sam NULL

This can be seen in this Postgres SQL Fiddle:
https://sqlfiddle.com/postgresql/online-compiler?id=a88eed6b-123b-4864-8005-d270326ef577

Actual behavior:

The first query returns only two rows:

team sum
NULL 500
janos 400

While the second query returns a different two rows:

team sum
janos 400
sam NULL

This behavior indicates that the FULL OUTER JOIN implementation is not commutative, which is incorrect. A correct FULL OUTER JOIN should return all rows from both tables, regardless of the order in which they are specified in the query.

Additional information:

I've learned that FULL OUTER JOIN isn't actually implemented in MySQL itself, but it is in Postgres, so I've compared with Postgres behaviour (and my own logic/intuition) to inform this bug report.
The correct behavior has been verified using PostgreSQL, which returns the expected three-row result set for both queries.

This bug affects the completeness and accuracy of query results when using FULL OUTER JOIN in go-mysql-server

I think the bug may only appear when NULLs are present in the final set of results, or when GROUP BY is used, or both. I don't know if this is the smallest reproduction case, or if a smaller case would also reproduce this (but I have worked hard to make it very small, yet easy to understand).

Environment

go-mysql-server version: v0.19.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analyzer bug Something isn't working customer issue good repro Easily reproducible bugs sql Issue with SQL
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants