Skip to content

Do not serialize EsIndex in plan #119580

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Jan 21, 2025

Conversation

idegtiarenko
Copy link
Contributor

Certain plan classes (such as EsRelation, EsSourceExec, EsQueryExec) contain and serialize the entire EsIndex instance.
This instance might contain huge mapping that is never used in plan. This change replaces EsIndex usage with name and indexNameWithModes to minimize the size of the serialized plan.

Closes: #112998

@idegtiarenko idegtiarenko added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.0.0 v8.18.0 labels Jan 6, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @idegtiarenko, I've created a changelog YAML for you.

@@ -2644,7 +2644,6 @@ private void assertEmptyEsRelation(LogicalPlan plan) {
assertThat(plan, instanceOf(EsRelation.class));
EsRelation esRelation = (EsRelation) plan;
assertThat(esRelation.output(), equalTo(NO_FIELDS));
assertTrue(esRelation.index().mapping().isEmpty());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it is possible to replace this assertion in assertEmptyEsRelation with any alternative?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the assertion on output above is sufficient. Asserting the empty mapping here looks like this was over constraining, I think.

@idegtiarenko idegtiarenko marked this pull request as ready for review January 6, 2025 15:09
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/EsRelation.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/EsQueryExec.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/EsSourceExec.java
Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find - left a question around the usage of IndexModes map; not clear why that has to be serialized?

IndexMode indexMode,
Map<String, IndexMode> indexNameWithModes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this map needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in LocalExecutionPlanner:

Map<String, IndexMode> indicesWithModes = localSourceExec.indexNameWithModes();
if (indicesWithModes.size() != 1) {
throw new IllegalArgumentException("can't plan [" + join + "], found more than 1 index");
}
var entry = indicesWithModes.entrySet().iterator().next();
if (entry.getValue() != IndexMode.LOOKUP) {
throw new IllegalArgumentException("can't plan [" + join + "], found index with mode [" + entry.getValue() + "]");
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the only thing that requires that: I believe this check could happen much earlier. Potentially already in the analyzer on the coordinator node. We shouldn't need to wait until the local planning to determine that an index isn't a lookup index.

Could you maybe check if it's feasible to move that to an earlier place and, thus, remove the indexNameWithModes map altogether from the EsRelation and related classes?

Happy to assist with moving that to an earlier stage!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There already is an index mode check in the Analyzer:

if (plan.indexMode().equals(IndexMode.LOOKUP)) {
String indexResolutionMessage = null;
var indexNameWithModes = esIndex.indexNameWithModes();
if (indexNameWithModes.size() != 1) {
indexResolutionMessage = "invalid ["
+ table
+ "] resolution in lookup mode to ["
+ indexNameWithModes.size()
+ "] indices";
} else if (indexNameWithModes.values().iterator().next() != IndexMode.LOOKUP) {
indexResolutionMessage = "invalid ["
+ table
+ "] resolution in lookup mode to an index in ["
+ indexNameWithModes.values().iterator().next()
+ "] mode";
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got it. The check is indeed already performed and could be removed - it's just there out of defensiveness as far as I can see.

What cannot be removed is the line

 var entry = indicesWithModes.entrySet().iterator().next(); 

which, in case of aliases, contains the actual lookup index name in the entry's key.

Looks to me like we should add a comment and leave this as-is. (Well, and maybe remove the defensive double validation, but it also doesn't hurt.)

I'll confirm with @idegtiarenko if this is what we want to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. IndexName could actually be alias or pattern, while map entries include resolved index names

Copy link
Contributor

@alex-spies alex-spies Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@costin , we decided removing this map is too large in scope for this PR. Ievgen opened a follow-up issue: #120210

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The map seems similar to EsIndex - it's detailed information that isn't required.
Nit: doing it in a separate PR will require a transport bump; maybe consider this complete but not merge, do the follow-up PR and then merge them at the same time to have only one increment?

Copy link
Contributor Author

@idegtiarenko idegtiarenko Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I would be able to pick up followup immediately.
Also keeping this updated required multiple conflict resolutions.
I would really prefer to merge this earlier.

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
@@ -126,12 +129,13 @@ public void testDeeplyNestedFields() throws IOException {
* with a single root field that has many children, grandchildren etc.
*/
public void testDeeplyNestedFieldsKeepOnlyOne() throws IOException {
ByteSizeValue expected = ByteSizeValue.ofBytes(9425804);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test file beautifully demonstrates the improvements from the change. Awesome, in this particularly bad case we got a reduction by 99.996%!

IndexMode indexMode,
Map<String, IndexMode> indexNameWithModes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the only thing that requires that: I believe this check could happen much earlier. Potentially already in the analyzer on the coordinator node. We shouldn't need to wait until the local planning to determine that an index isn't a lookup index.

Could you maybe check if it's feasible to move that to an earlier place and, thus, remove the indexNameWithModes map altogether from the EsRelation and related classes?

Happy to assist with moving that to an earlier stage!

@alex-spies alex-spies self-requested a review January 9, 2025 16:31
public EsRelation(Source source, EsIndex index, List<Attribute> attributes, IndexMode indexMode, boolean frozen) {
public EsRelation(
Source source,
String indexName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In many cases, there will not be a concrete index name, but an index pattern. I think it's slightly misleading to call this indexName for this reason, as it gives the impression as if this was one singular index. E.g. FROM logs* will have logs* as the index name, but actually that's not the name of any index.

Suggested change
String indexName,
String indexPattern,

IndexMode indexMode,
Map<String, IndexMode> indexNameWithModes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got it. The check is indeed already performed and could be removed - it's just there out of defensiveness as far as I can see.

What cannot be removed is the line

 var entry = indicesWithModes.entrySet().iterator().next(); 

which, in case of aliases, contains the actual lookup index name in the entry's key.

Looks to me like we should add a comment and leave this as-is. (Well, and maybe remove the defensive double validation, but it also doesn't hurt.)

I'll confirm with @idegtiarenko if this is what we want to do.

public EsRelation(
Source source,
String indexName,
IndexMode indexMode,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the indexMode attribute has become obsolete. It's required in the UnresolvedRelation, but if we keep track of indexNameWithModes, the latter has more complete information obtained from actual field caps calls and should be preferred.

I think we only use this indexMode to check if a relation corresponds to a lookup index. I think we could just have a helper function instead isLookupIndex which checks the indexNameWithModes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try to replace its usages with the map

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have 2 checks for TIME_SERIES:

that is originally configured in:

I am thinking of deferring that to a separate change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened an issue for that #120210

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

IndexMode indexMode,
Map<String, IndexMode> indexNameWithModes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The map seems similar to EsIndex - it's detailed information that isn't required.
Nit: doing it in a separate PR will require a transport bump; maybe consider this complete but not merge, do the follow-up PR and then merge them at the same time to have only one increment?

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTests.java
@alex-spies
Copy link
Contributor

@costin:

The map seems similar to EsIndex - it's detailed information that isn't required.
Nit: doing it in a separate PR will require a transport bump; maybe consider this complete but not merge, do the follow-up PR and then merge them at the same time to have only one increment?

@idegtiarenko and I went over this, and the map is not redundant; compared to EsIndex.indexMode, this contains data retrieved from the field caps response and is used in some places where we need to know the concrete indices that this EsIndex instance represents. And the actual (not presumed) index modes of said indices.

Removing this, or at least reducing it to the absolutely required minimum, is desired but more effort (quite a bit of disentangling different usages there); and we'd like to postpone this effort. The present PR delivers on the goal of reducing the redundantly serialized data greatly in many cases.

So I'd prefer getting this PR in, live with the additional transport version bump, and enjoy the benefit of leaner serialization already.

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PruneColumns.java
Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a ton, @idegtiarenko ! I think this is a great improvement.

I have only very minor comments; this can be merged as-is, please treat the comments as optional.

When backporting, I think it's important we run the PR with test-full-bwc to make sure that we don't accidentally break bwc with 8.17, 8.16 etc. On the off chance that we break bwc with e.g. 8.15, we might need to skip a test or so based on a capability.
To ensure this, maybe it's better to remove the auto-backport label and do the backport manually.

}

public EsSourceExec(Source source, EsIndex index, List<Attribute> attributes, QueryBuilder query, IndexMode indexMode) {
public EsSourceExec(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: we could make the constructor's parameter order here consistent with that of EsQueryExec's, in that the attributes come before the query.

private final QueryBuilder query;
private final Expression limit;
private final List<Attribute> attrs;
private final List<Stat> stats;

public EsStatsQueryExec(
Source source,
EsIndex index,
String indexName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consistency

Suggested change
String indexName,
String indexPattern,

If you decide to align this, let's not forget about the nodeString as well.

@@ -2644,7 +2644,6 @@ private void assertEmptyEsRelation(LogicalPlan plan) {
assertThat(plan, instanceOf(EsRelation.class));
EsRelation esRelation = (EsRelation) plan;
assertThat(esRelation.output(), equalTo(NO_FIELDS));
assertTrue(esRelation.index().mapping().isEmpty());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the assertion on output above is sufficient. Asserting the empty mapping here looks like this was over constraining, I think.

Comment on lines +150 to +155
/**
* Test the size of serializing a plan like
* FROM index* | LIMIT 10 | KEEP one_single_field
* with an index pattern pointing to a hundred actual indices with rather long names
*/
public void testIndexPatternTargetingMultipleIndices() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++, thank you for the test case!

@idegtiarenko idegtiarenko added test-full-bwc Trigger full BWC version matrix tests v8.16.0 and removed auto-backport Automatically create backport pull requests when merged v8.16.0 v8.18.0 labels Jan 20, 2025
# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
@idegtiarenko idegtiarenko added auto-backport Automatically create backport pull requests when merged v8.18.0 labels Jan 21, 2025
@idegtiarenko idegtiarenko merged commit 5ab81a9 into elastic:main Jan 21, 2025
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

@idegtiarenko idegtiarenko deleted the do_not_serialize_es_index branch January 21, 2025 08:16
idegtiarenko added a commit that referenced this pull request Jan 21, 2025
Certain plan classes (such as EsRelation, EsSourceExec, EsQueryExec) contain and serialize the entire EsIndex instance.
This instance might contain huge mapping that is never used in plan. This change replaces EsIndex usage with indexPattern and indexNameWithModes to minimize the size of the serialized plan.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) test-full-bwc Trigger full BWC version matrix tests v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: Remove EsIndex from plan and serialization
4 participants