You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Seeing as this repo inherits lots of code from https://github.com/ekzhu/datasketch, it should be noted that the implementation of mersenne prime hashing used in both repos causes overflows, and potentially more hash collisions than intended:
While the resulting numbers are not the same, this does not seem conclusive proof to me that you get more collisions, you will simply get them on different values as in the end you will always have the modulo squashing everything to the same value range for both cases
The pure python implementation has the (very big) disadvantage of being considerably slower
I agree that this is not conclusive proof of more collisions, however, it seems like a bug to me to purportedly do affine transforms modulo mersenne primes, when this is not what the code is doing.
Currently, the implementation is doing the following:
Seeing as this repo inherits lots of code from https://github.com/ekzhu/datasketch, it should be noted that the implementation of mersenne prime hashing used in both repos causes overflows, and potentially more hash collisions than intended:
The text was updated successfully, but these errors were encountered: