Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ESQL] EXACT_ Functions #119388

Open
BenB196 opened this issue Dec 31, 2024 · 2 comments
Open

[ESQL] EXACT_ Functions #119388

BenB196 opened this issue Dec 31, 2024 · 2 comments
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@BenB196
Copy link

BenB196 commented Dec 31, 2024

Description

Feature Request

Introduce the concept of EXACT_ prefixed functions that operate similar to their "approximate" counterpart functions, but focus on accuracy over speed.

Use-case

Elasticsearch has well documented that functions things like COUNT_DISTINCT are approximate as it is a trade-off for speed. However, as ES|QL is intended for a large number of use-cases, it'd be nice to have the ability to perform the opposite and trade speed for accuracy. This is only partially possible today with the COUNT_DISTINCT function as there is an upper-bound limit on precision. It would be useful to have a short-hand function built into ES|QL like, EXACT_COUNT_DISTINCT that would do this. Accurate counts are possible today in regular DSL, but are somewhat verbose, as they require scripted metrics to be used. Having something similar in ES|QL (while also being "simpler") would be a nice value add.

@BenB196 BenB196 added >enhancement needs:triage Requires assignment of a team area label labels Dec 31, 2024
@tvernum tvernum added :Analytics/ES|QL AKA ESQL and removed needs:triage Requires assignment of a team area label labels Jan 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 2, 2025
@nik9000
Copy link
Member

nik9000 commented Jan 2, 2025

Sounds like a good idea. I like the EXACT_ prefix. It's informative and compatible with what we have. More traditional databases have gone with APPROXIMATE_ prefixes for the approximate versions but we didn't do that years and years ago when we built our aggs. Arguably an approximate algorithm is a better default at the scales we're talking about anyway. But that doesn't matter. The EXACT_ prefix gives us a good recipe for when we add exact versions of these functions.

I'm not sure when we will build them, but here's a list of the approximate algorithms I remember:

  • EXACT_COUNT_DISTINCT
  • EXACT_PERCENTILE
  • EXACT_MEDIAN_ABSOLUTE_DEVIATION

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

4 participants