Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid duplicating complex expression in comparisons #34172

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ranma42
Copy link
Contributor

@ranma42 ranma42 commented Jul 5, 2024

When comparing a nullable expression to a non-nullable one, a NULL result always
represent a difference.

This makes it possible to avoid duplicating the nullable expression by mapping
the NULL result to a FALSE (when comparing for equality).

Fixes #34165.

@ranma42
Copy link
Contributor Author

ranma42 commented Jul 5, 2024

This change can already take care of most of the worst offenders found in #34048 🥳

@ranma42
Copy link
Contributor Author

ranma42 commented Jul 5, 2024

I'll add some tests that check this transformation specifically
EDIT: done 👍

@ranma42 ranma42 force-pushed the avoid-equal-duplication-34165 branch 2 times, most recently from 8a4e1bf to 144b7e0 Compare July 13, 2024 06:55
body = _sqlExpressionFactory.OrElse(
_sqlExpressionFactory.AndAlso(body, _sqlExpressionFactory.AndAlso(leftIsNotNull, rightIsNotNull)),
_sqlExpressionFactory.AndAlso(leftIsNull, rightIsNull));
if (leftNullable && rightNullable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranma42 can you please add a comment here explaining the logic, i.e. that duplication is bad except for columns, plus columns may make usage of indexes which arbitrary expressions (usually) won't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added 4c3a542 (#34172) which is aimed at addressing this

@@ -1003,7 +1003,10 @@ public override async Task Json_collection_index_in_predicate_using_constant(boo
"""
SELECT [j].[Id]
FROM [JsonEntitiesBasic] AS [j]
WHERE JSON_VALUE([j].[OwnedCollectionRoot], '$[0].Name') <> N'Foo' OR JSON_VALUE([j].[OwnedCollectionRoot], '$[0].Name') IS NULL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a place where I'm a bit hesitant about this change... The SQL Server docs specifically document using an indexed computed column over JSON_VALUE as a way to speed up queries filtering inside a JSON document; unless I'm mistaken, these queries would likely stop using such an index if we switch to the CASE translation (maybe in this specific test it doesn't matter because of the inequality, but you get whar I'm saying).

In a perfect world, we'd vary our translation based on knowledge that an indexed computed column exists for this expression, but we're pretty far away from doing that at the moment.

Thoughts?

Copy link
Contributor Author

@ranma42 ranma42 Jul 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe it is very likely that the CASE translation will not take advantage of indexes, but I would expect the same to be true for the original version as well, as it is performing a <> comparison (maybe it would use the index to include all of the NULL values 🤔, but then it would still have to scan all of the non-null values and filter each of them).

For equality in predicates the translation should already be
WHERE JSON_VALUE([j].[OwnedCollectionRoot], '$[0].Name') = N'Foo'
which should effectively take advantage of indexes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

So I'm trying to understand whether there are cases - and which ones - in which this PR causes a perf regression because the switch to CASE doesn't use an index. If there are such cases (and after all, we do avoid the CASE translation for columns because of this), we should think carefully - I'm not sure whether the optimization to remove double evaluation for some cases outweighs the (potentially severe) regression triggered by not using an index. A conservative approach would wait until we could know more reliably whether an index would be used on an expression (e.g. because we're aware of expression indexes/indexed computed columns).

I know I'm being very cautious here, I'm thinking about the perf regressions brought about by the switch from IN+constants to OPENJSON in 8.0 - that change improved general perf for many queries, but also caused severe regressions for others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are cases in which the translation could cause a regression; the main one I can think of (which is the one currently avoided by the column handling) is the following (and similar ones):

.Where(e => !e.BoolA != e.NullableBoolB)

This is

SELECT "e"."Id"
FROM "Entities1" AS "e"
WHERE "e"."BoolA" = "e"."NullableBoolB" OR "e"."NullableBoolB" IS NULL

Sqlite (and litely other SQL providers) would take advantage of an index on NullableBoolB (assuming BoolA and NullableBoolB are actually columns from different tables).

When using the CASE, this becomes

SELECT "e"."Id"
FROM "Entities1" AS "e"
WHERE CASE
    WHEN "e"."BoolA" <> "e"."NullableBoolB" THEN 0
    ELSE 1
END

and the index cannot be used anymore.

I pushed ranma42@ecdd12e to show what happens when the CASE transformation is used whenever it is valid.

With #34166 this could possibly affect a few more tests, but if I am not mistaken, this boolean comparison (negated-different-from) is the only case in which a "good" WHERE would regress (at least according to optimizations rules similar to those of sqlite).

Copy link
Contributor Author

@ranma42 ranma42 Jul 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, obviously you could also do the same on json values:

.Where(e => !e.MyJsonColumn.BoolA != e.MyOtherJsonColumn.NullableBoolB)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of checking for a simple column, the right check would be whether the emitted operand is = vs !=? (aka if the WHERE predicate has some chances of being optimized)

@ranma42 ranma42 force-pushed the avoid-equal-duplication-34165 branch 2 times, most recently from 4a9993e to aeca728 Compare July 29, 2024 20:40
@ranma42
Copy link
Contributor Author

ranma42 commented Jul 29, 2024

I pushed a new version of the branch to solve the merge conflicts.
As I was at it, I also changed the logic behind the activation of the CASE transformation; it now only activates if it is valid (not on nullable vs nullable) and it causes no de-optimization (aka it is only allowed on predicates if the comparison is an inequality).

Comment on lines 2847 to 2865
WHEN [c].[Region] = N'ASK' AND [c].[Region] IS NOT NULL THEN CAST(1 AS bit)
WHEN [c].[Region] = N'ASK' THEN CAST(1 AS bit)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a nice side-effect, but we might want to ensure that this kind of optimization happens regardless of this PR (and possibly not only on comparisons 🤔 )

When comparing a nullable expression to a non-nullable one, a  `NULL` result  always
represent a difference.

This makes it possible to avoid duplicating the nullable expression by mapping
the `NULL` result to a `FALSE` (when comparing for equality).

Fixes dotnet#34165.
@ranma42 ranma42 force-pushed the avoid-equal-duplication-34165 branch from aeca728 to 0f67128 Compare December 23, 2024 17:40
@ranma42 ranma42 requested a review from a team as a code owner December 23, 2024 17:40
@@ -2251,7 +2251,10 @@ SELECT MIN([o0].[HourlyRate])
FROM [Order] AS [o0]
WHERE [o0].[CustomerId] = [o].[CustomerId]) AS [CustomerMinHourlyRate], MIN([o].[HourlyRate]) AS [HourlyRate], COUNT(*) AS [Count]
FROM [Order] AS [o]
WHERE [o].[Number] <> N'A1' OR [o].[Number] IS NULL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this regress performance, at least for the case where Number is NULL? Is it worth making an exception for non-complex expressions, and not do the CASE translation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, similar to this conversation above: #34172 (review)

@roji
Copy link
Member

roji commented Dec 24, 2024

/cc @maumar

@roji
Copy link
Member

roji commented Dec 28, 2024

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate alternative translations for (in)equality comparison
2 participants