-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disregard parameter names in ExpressionEqualityComparer.GetHashCode
#30755
base: main
Are you sure you want to change the base?
Conversation
The `GetHashCode` now bases the hash code of each parameter expression on its position in the containing lambda(s) parameter list, thereby making it so that otherwise identical lambda expressions whose only difference is the parameter names to yield the same hash code. Fixes dotnet#30697
@aradalvand note the test failures. |
Yeah, a test would be good - precisely the case you gave in the issue (i.e. assert that both equality and hashcode are the same). |
Can you provide more details? CompareParameter currently looks like this: private bool CompareParameter(ParameterExpression a, ParameterExpression b)
=> _parameterScope != null
&& _parameterScope.TryGetValue(a, out var mapped)
? mapped.Name == b.Name
: a.Name == b.Name; What exact problem hare you seeing here? |
@aradalvand any response to my comment above? Also, you need to agree to the CLA as per the message above. |
Not sure it's worth it. Nevermind. |
@aradalvand it's a bit of a shame to abandon this after the effort we already put into this... |
To respond to this: private bool CompareParameter(ParameterExpression a, ParameterExpression b)
=> _parameterScope != null
&& _parameterScope.TryGetValue(a, out var mapped)
? mapped == b
: a.Name == b.Name; |
@dotnet-policy-service agree |
In principle you are right - within the lambda, it's expected to see the same instance of ParameterExpression as the one in the lambda's parameter list, wherever that parameter is used (IIRC that's what the compiler produces, in any case). However, there's nothing technically preventing people from constructing expression trees where there are two different ParameterExpression instances with the same name. If we make your proposed change, the comparer would start to return false, leading to re-compilation and perf degradation. One important review comment that's much more important than all the others: your PR currently introduces state on ExpressionEqualityComparer (_lambdaParameters), which has one instance (ExpressionEqualityComparer.Instance) shared concurrently by multiple threads. This is why the equality logic (as opposed to the hashing logic) is encapsulated in the ExpressionComparer struct, which is instantiated for each invocation of Equals - this allows a private instance of _parameterScope which doesn't conflict across multiple concurrent invocations. If we introduce similar state for hash code calculation, we must do the same and move the entire code into the ExpressionComparer struct. But looking over this again... I'm simply not sure about the added value of introducing all the parameter names/indexes into the hash code; we're doing extra calculations, lookups and an instantiation (of _lambdaParameters), although there's no requirement for the hash code to contain the new information... I think the main point here is for the hash code to disregard the lambda parameter names; rather than doing that by hashing their indexes instead, we should consider simply disregarding all parameter names, regardless of whether they're lambda or not. Remember again, that there's no requirement for the hash code to be different for differing expression trees - only for it to be equal for equal ones. So disregarding all names would allow queries with different lambda parameter to be cached as the same query, with a much simpler implementation, less runtime work, and without any real disadvantage (except that query trees whose only difference is parameter names get the same hash code, and therefore get bucketized together in the cache). How does that sound? |
When you say "there's nothing technically preventing people from constructing expression trees where there are two different ParameterExpression instances with the same name", you mean in the same expression tree, right? Just want to make sure I understand your point correctly.
Sure, we won't do it. You explained to me here that it's not necessary, I was initially under the impression that it is. I'll update the code accordingly once I have the time to get to this again, sorry, I've just been really busy!
Yeah I think that makes sense, that seems to be the most reasonable approach. Given that |
@aradalvand any plans on continuing work on this? |
Okay, I made the changes we talked about, could you please take a look and confirm if it's all good? |
Fixes #30697
Description:
The
ExpressionEqualityComparer.GetHashCode
method now bases the hash of eachParameterExpression
on its index (position) in the parameter lists of the containing lambda(s), as opposed to its name — in addition, of course, to its type, which is true for any expression.It's worth noting that the
Equals
method also already does essentially the same thing. Namely, it expects the parameter lists of the two given lambdas to be identical in number, type, and order; and then maps each parameter in lambda "a" to its counterpart with the same index/position in lambda "b" for later equality checks; meaning it's effectively doing something synonymous with what is now being done inGetHashCode
:efcore/src/EFCore/Query/ExpressionEqualityComparer.cs
Lines 393 to 422 in 07284ac
That said, we still do fall back to the parameter name in cases where the expression passed to
GetHashCode
is not a full lambda (i.e. say onlysomeLambda.Body
), because we have to. This is, once again, precisely what happens inEquals
as well; specifically here:efcore/src/EFCore/Query/ExpressionEqualityComparer.cs
Line 471 in 07284ac
This is my very first time contributing to EF so feel free to let me know if I've made any mistakes.
A few things:
ExpressionEqualityComparer
,for
loops seem to have been preferred toforeach
loops even when the latter could've also been used. I'm not sure why. I went withforeach
in this newly-added code because the logic of the loop is dead simple. But let me know if this somehow matters and you want afor
instead.CompareParameter
method shown below, you're doingmapped.Name == b
, as opposed to justmapped == b
, while the latter would work too and is what actually makes sense — should I fix this too?efcore/src/EFCore/Query/ExpressionEqualityComparer.cs
Lines 469 to 470 in 07284ac