Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash functions #118938

Merged
merged 20 commits into from
Jan 8, 2025
Merged

Hash functions #118938

merged 20 commits into from
Jan 8, 2025

Conversation

idegtiarenko
Copy link
Contributor

This change adds infrastructure to add more hash functions easier as well as md5 implementation to demonstrate it.

@idegtiarenko idegtiarenko added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.0.0 v8.18.0 labels Dec 18, 2024
@idegtiarenko idegtiarenko requested a review from nik9000 December 18, 2024 11:00
Copy link
Contributor

Documentation preview:

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @idegtiarenko, I've created a changelog YAML for you.

@idegtiarenko
Copy link
Contributor Author

While this is open we might add couple more functions as well.

This digests are defined and used in ES:

private static final ThreadLocal<MessageDigest> MD5_DIGEST = createThreadLocalMessageDigest("MD5");
private static final ThreadLocal<MessageDigest> SHA_1_DIGEST = createThreadLocalMessageDigest("SHA-1");
private static final ThreadLocal<MessageDigest> SHA_256_DIGEST = createThreadLocalMessageDigest("SHA-256");
private static final ThreadLocal<MessageDigest> SHA_512_DIGEST = createThreadLocalMessageDigest("SHA-512");

Following algorithms seems to be available in several JVMs I checked:

SHA-1
SHA-224
SHA-256
SHA-384
SHA-512/256
SHA-512/224
SHA-512
SHA3-224
SHA3-384
SHA3-256
SHA3-512

Copy link
Contributor

@ivancea ivancea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!


private static final HashFunction MD5 = HashFunction.create("MD5");

@FunctionInfo(returnType = "keyword", description = "Computes MD5 hash of the input.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add the md5Hash test as an examples = @Example(...) here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


@Override
public Expression replaceChildren(List<Expression> newChildren) {
return new Md5(source(), field);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return new Md5(source(), field);
return new Md5(source(), newChildren.get(0));

Edit: Oh, looks like EsqlNodeSubclassTests.testReplaceChildren failed successfully with this in CI

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


private static final HashFunction MD5 = HashFunction.create("MD5");

@FunctionInfo(returnType = "keyword", description = "Computes MD5 hash of the input.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@FunctionInfo(returnType = "keyword", description = "Computes MD5 hash of the input.")
@FunctionInfo(returnType = "keyword", description = "Computes the MD5 hash of the input.")

@@ -87,12 +87,28 @@ private static TestCaseSupplier createTestCase(String algorithm, boolean forceLi
});
}

static List<TestCaseSupplier> createHashFunctionTestCases(String algorithm) {
return List.of(createHashFunctionTestCase(algorithm, DataType.KEYWORD), createHashFunctionTestCase(algorithm, DataType.TEXT));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use TestCaseSupplier.forUnaryStrings(...) instead of manually creating the TestCaseSuppliers. It will also take care of both KEYWORD and TEXT types

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add some extra test cases, to double check both the:

  • One with a FROM command
  • Something like FROM x | EVAL input = "...", hash = MD5(input)? Trying to "break" the replaceChildren() comment bug with it
  • MD5(123::keyword)
  • Maybe having MD5() as a STATS grouping


private static final HashFunction MD5 = HashFunction.create("MD5");

@FunctionInfo(returnType = "keyword", description = "Computes MD5 hash of the input.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


@Override
public Expression replaceChildren(List<Expression> newChildren) {
return new Md5(source(), field);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@costin
Copy link
Member

costin commented Jan 8, 2025

Let's get this in - this will be handy for security queries!

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Could you link the other new hash functions in? Maybe add an example for them?

@@ -43,6 +43,7 @@ include::layout/left.asciidoc[]
include::layout/length.asciidoc[]
include::layout/locate.asciidoc[]
include::layout/ltrim.asciidoc[]
include::layout/md5.asciidoc[]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link the other new functions?

@idegtiarenko idegtiarenko merged commit fd1be8c into elastic:main Jan 8, 2025
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

@idegtiarenko idegtiarenko deleted the hash_functions branch January 8, 2025 15:46
idegtiarenko added a commit to idegtiarenko/elasticsearch that referenced this pull request Jan 8, 2025
This change adds md5, sha1 and sha256 hash functions.
elasticsearchmachine pushed a commit that referenced this pull request Jan 9, 2025
This change adds md5, sha1 and sha256 hash functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants