Disregard parameter names in `ExpressionEqualityComparer.GetHashCode` #30755

aradalvand · 2023-04-24T21:00:16Z

Description:

The ExpressionEqualityComparer.GetHashCode method now bases the hash of each ParameterExpression on its index (position) in the parameter lists of the containing lambda(s), as opposed to its name — in addition, of course, to its type, which is true for any expression.

It's worth noting that the Equals method also already does essentially the same thing. Namely, it expects the parameter lists of the two given lambdas to be identical in number, type, and order; and then maps each parameter in lambda "a" to its counterpart with the same index/position in lambda "b" for later equality checks; meaning it's effectively doing something synonymous with what is now being done in GetHashCode:

efcore/src/EFCore/Query/ExpressionEqualityComparer.cs

Lines 393 to 422 in 07284ac

    
           private bool CompareLambda(LambdaExpression a, LambdaExpression b) 
        
           { 
        
               var n = a.Parameters.Count; 
        
               if (b.Parameters.Count != n) 
        
               { 
        
                   return false; 
        
               } 
        
               _parameterScope ??= new Dictionary<ParameterExpression, ParameterExpression>(); 
        
               for (var i = 0; i < n; i++) 
        
               { 
        
                   var (p1, p2) = (a.Parameters[i], b.Parameters[i]); 
        
                   if (p1.Type != p2.Type) 
        
                   { 
        
                       for (var j = 0; j < i; j++) 
        
                       { 
        
                           _parameterScope.Remove(a.Parameters[j]); 
        
                       } 
        
                       return false; 
        
                   } 
        
                   if (!_parameterScope.TryAdd(p1, p2)) 
        
                   { 
        
                       throw new InvalidOperationException(CoreStrings.SameParameterInstanceUsedInMultipleLambdas(p1.Name)); 
        
                   } 
        
               }

That said, we still do fall back to the parameter name in cases where the expression passed to GetHashCode is not a full lambda (i.e. say only someLambda.Body), because we have to. This is, once again, precisely what happens in Equals as well; specifically here:

efcore/src/EFCore/Query/ExpressionEqualityComparer.cs

Line 471 in 07284ac

: a.Name == b.Name;

This is my very first time contributing to EF so feel free to let me know if I've made any mistakes.

A few things:

I noticed that (at least) in the ExpressionEqualityComparer, for loops seem to have been preferred to foreach loops even when the latter could've also been used. I'm not sure why. I went with foreach in this newly-added code because the logic of the loop is dead simple. But let me know if this somehow matters and you want a for instead.
I'm not sure if I should add a test for this. It doesn't seem like a substantial enough change in my estimation. But let me know if I should.
One small thing I noticed (which is functionally inconsequential as far as I can tell but semantically weird) is the fact that in the CompareParameter method shown below, you're doing mapped.Name == b, as opposed to just mapped == b, while the latter would work too and is what actually makes sense — should I fix this too?

efcore/src/EFCore/Query/ExpressionEqualityComparer.cs

Lines 469 to 470 in 07284ac

&& _parameterScope.TryGetValue(a, out var mapped)

? mapped.Name == b.Name

I've read the guidelines for contributing and seen the walkthrough
I've posted a comment on an issue with a detailed description of how I am planning to contribute and got approval from a member of the team
The code builds and tests pass locally (also verified by our automated build checks)
Commit messages follow this format:

        Summary of the changes
        - Detail 1
        - Detail 2

        Fixes #bugnumber

Tests for the changes have been added (for bug fixes / features)
Code follows the same patterns and style as existing code in this repo

The `GetHashCode` now bases the hash code of each parameter expression on its position in the containing lambda(s) parameter list, thereby making it so that otherwise identical lambda expressions whose only difference is the parameter names to yield the same hash code. Fixes dotnet#30697

src/EFCore/Query/ExpressionEqualityComparer.cs

roji · 2023-04-25T15:02:37Z

@aradalvand note the test failures.

roji · 2023-04-25T15:03:42Z

I'm not sure if I should add a test for this. It doesn't seem like a substantial enough change in my estimation. But let me know if I should.

Yeah, a test would be good - precisely the case you gave in the issue (i.e. assert that both equality and hashcode are the same).

roji · 2023-04-25T15:05:59Z

One small thing I noticed (which is functionally inconsequential as far as I can tell but semantically weird) is the fact that in the CompareParameter method shown below, you're doing mapped.Name == b, as opposed to just mapped == b, while the latter would work too and is what actually makes sense — should I fix this too?

Can you provide more details? CompareParameter currently looks like this:

private bool CompareParameter(ParameterExpression a, ParameterExpression b)
    => _parameterScope != null
        && _parameterScope.TryGetValue(a, out var mapped)
            ? mapped.Name == b.Name
            : a.Name == b.Name;

What exact problem hare you seeing here?

roji · 2023-05-25T15:06:17Z

@aradalvand any response to my comment above?

Also, you need to agree to the CLA as per the message above.

aradalvand · 2023-05-25T19:50:50Z

Not sure it's worth it. Nevermind.

roji · 2023-05-26T06:16:35Z

@aradalvand it's a bit of a shame to abandon this after the effort we already put into this...

aradalvand · 2023-05-26T09:19:01Z

To respond to this:
This check is basically: Is b the corresponding parameter for a?
The value coming out of TryGetValue(a, out var mapped) (that is, mapped) will be b itself, so it makes more sense to just do a simple reference equality check, as opposed to comparing the names, even though the latter works, it is nonetheless a weird thing to do, when we know mapped is going to be the same object as b.
It could just be:

private bool CompareParameter(ParameterExpression a, ParameterExpression b)
    => _parameterScope != null
        && _parameterScope.TryGetValue(a, out var mapped)
            ? mapped == b
            : a.Name == b.Name;

aradalvand · 2023-05-26T09:24:53Z

@dotnet-policy-service agree

roji · 2023-05-30T18:57:18Z

This check is basically: Is b the corresponding parameter for a?

In principle you are right - within the lambda, it's expected to see the same instance of ParameterExpression as the one in the lambda's parameter list, wherever that parameter is used (IIRC that's what the compiler produces, in any case). However, there's nothing technically preventing people from constructing expression trees where there are two different ParameterExpression instances with the same name. If we make your proposed change, the comparer would start to return false, leading to re-compilation and perf degradation.

One important review comment that's much more important than all the others: your PR currently introduces state on ExpressionEqualityComparer (_lambdaParameters), which has one instance (ExpressionEqualityComparer.Instance) shared concurrently by multiple threads. This is why the equality logic (as opposed to the hashing logic) is encapsulated in the ExpressionComparer struct, which is instantiated for each invocation of Equals - this allows a private instance of _parameterScope which doesn't conflict across multiple concurrent invocations. If we introduce similar state for hash code calculation, we must do the same and move the entire code into the ExpressionComparer struct.

But looking over this again... I'm simply not sure about the added value of introducing all the parameter names/indexes into the hash code; we're doing extra calculations, lookups and an instantiation (of _lambdaParameters), although there's no requirement for the hash code to contain the new information... I think the main point here is for the hash code to disregard the lambda parameter names; rather than doing that by hashing their indexes instead, we should consider simply disregarding all parameter names, regardless of whether they're lambda or not. Remember again, that there's no requirement for the hash code to be different for differing expression trees - only for it to be equal for equal ones. So disregarding all names would allow queries with different lambda parameter to be cached as the same query, with a much simpler implementation, less runtime work, and without any real disadvantage (except that query trees whose only difference is parameter names get the same hash code, and therefore get bucketized together in the cache).

How does that sound?

aradalvand · 2023-05-31T06:09:48Z

In principle you are right - within the lambda, it's expected to see the same instance of ParameterExpression as the one in the lambda's parameter list, wherever that parameter is used (IIRC that's what the compiler produces, in any case). However, there's nothing technically preventing people from constructing expression trees where there are two different ParameterExpression instances with the same name. If we make your proposed change, the comparer would start to return false, leading to re-compilation and perf degradation.

When you say "there's nothing technically preventing people from constructing expression trees where there are two different ParameterExpression instances with the same name", you mean in the same expression tree, right? Just want to make sure I understand your point correctly.
And in those cases, would those two ParameterExpression instances technically be pointing to the same parameter? I actually don't know this, I'm curious; if so, then I guess you'd be right.

But looking over this again... I'm simply not sure about the added value of introducing all the parameter names/indexes into the hash code; we're doing extra calculations

Sure, we won't do it. You explained to me here that it's not necessary, I was initially under the impression that it is. I'll update the code accordingly once I have the time to get to this again, sorry, I've just been really busy!

we should consider simply disregarding all parameter names, regardless of whether they're lambda or not

Yeah I think that makes sense, that seems to be the most reasonable approach. Given that GetHashCode isn't conclusive — which is the key point I didn't know before — there's not really any reason at all for this additional ceremony. You're spot on.

roji · 2023-07-07T19:13:34Z

@aradalvand any plans on continuing work on this?

aradalvand · 2023-07-08T03:18:29Z

Okay, I made the changes we talked about, could you please take a look and confirm if it's all good?

roji requested changes Apr 25, 2023

View reviewed changes

roji self-assigned this May 25, 2023

roji added the waiting-for-response label May 25, 2023

aradalvand closed this May 25, 2023

aradalvand reopened this May 26, 2023

Use a HashSet and don't add parameter index to the hash

102844c

ajcvickers removed the waiting-for-response label Jan 27, 2024

ranma42 mentioned this pull request Jun 30, 2024

ExpressionEqualityComparer and Parameter Names #34125

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disregard parameter names in `ExpressionEqualityComparer.GetHashCode` #30755

Disregard parameter names in `ExpressionEqualityComparer.GetHashCode` #30755

aradalvand commented Apr 24, 2023 •

edited

Loading

roji commented Apr 25, 2023

roji commented Apr 25, 2023

roji commented Apr 25, 2023

roji commented May 25, 2023

aradalvand commented May 25, 2023 •

edited

Loading

roji commented May 26, 2023

aradalvand commented May 26, 2023 •

edited

Loading

aradalvand commented May 26, 2023

roji commented May 30, 2023

aradalvand commented May 31, 2023 •

edited

Loading

roji commented Jul 7, 2023

aradalvand commented Jul 8, 2023 •

edited

Loading

	private bool CompareLambda(LambdaExpression a, LambdaExpression b)
	{
	var n = a.Parameters.Count;

	if (b.Parameters.Count != n)
	{
	return false;
	}

	_parameterScope ??= new Dictionary<ParameterExpression, ParameterExpression>();

	for (var i = 0; i < n; i++)
	{
	var (p1, p2) = (a.Parameters[i], b.Parameters[i]);

	if (p1.Type != p2.Type)
	{
	for (var j = 0; j < i; j++)
	{
	_parameterScope.Remove(a.Parameters[j]);
	}

	return false;
	}

	if (!_parameterScope.TryAdd(p1, p2))
	{
	throw new InvalidOperationException(CoreStrings.SameParameterInstanceUsedInMultipleLambdas(p1.Name));
	}
	}

	&& _parameterScope.TryGetValue(a, out var mapped)
	? mapped.Name == b.Name

Disregard parameter names in ExpressionEqualityComparer.GetHashCode #30755

Are you sure you want to change the base?

Disregard parameter names in ExpressionEqualityComparer.GetHashCode #30755

Conversation

aradalvand commented Apr 24, 2023 • edited Loading

Description:

roji commented Apr 25, 2023

roji commented Apr 25, 2023

roji commented Apr 25, 2023

roji commented May 25, 2023

aradalvand commented May 25, 2023 • edited Loading

roji commented May 26, 2023

aradalvand commented May 26, 2023 • edited Loading

aradalvand commented May 26, 2023

roji commented May 30, 2023

aradalvand commented May 31, 2023 • edited Loading

roji commented Jul 7, 2023

aradalvand commented Jul 8, 2023 • edited Loading

Disregard parameter names in `ExpressionEqualityComparer.GetHashCode` #30755

Disregard parameter names in `ExpressionEqualityComparer.GetHashCode` #30755

aradalvand commented Apr 24, 2023 •

edited

Loading

aradalvand commented May 25, 2023 •

edited

Loading

aradalvand commented May 26, 2023 •

edited

Loading

aradalvand commented May 31, 2023 •

edited

Loading

aradalvand commented Jul 8, 2023 •

edited

Loading