-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Do not serialize EsIndex in plan #119580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not serialize EsIndex in plan #119580
Conversation
Hi @idegtiarenko, I've created a changelog YAML for you. |
@@ -2644,7 +2644,6 @@ private void assertEmptyEsRelation(LogicalPlan plan) { | |||
assertThat(plan, instanceOf(EsRelation.class)); | |||
EsRelation esRelation = (EsRelation) plan; | |||
assertThat(esRelation.output(), equalTo(NO_FIELDS)); | |||
assertTrue(esRelation.index().mapping().isEmpty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it is possible to replace this assertion in assertEmptyEsRelation
with any alternative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the assertion on output
above is sufficient. Asserting the empty mapping here looks like this was over constraining, I think.
Pinging @elastic/es-analytical-engine (Team:Analytics) |
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/EsRelation.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/EsQueryExec.java # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/EsSourceExec.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great find - left a question around the usage of IndexModes map; not clear why that has to be serialized?
IndexMode indexMode, | ||
Map<String, IndexMode> indexNameWithModes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this map needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used in LocalExecutionPlanner:
Lines 564 to 571 in 2ff1d76
Map<String, IndexMode> indicesWithModes = localSourceExec.indexNameWithModes(); | |
if (indicesWithModes.size() != 1) { | |
throw new IllegalArgumentException("can't plan [" + join + "], found more than 1 index"); | |
} | |
var entry = indicesWithModes.entrySet().iterator().next(); | |
if (entry.getValue() != IndexMode.LOOKUP) { | |
throw new IllegalArgumentException("can't plan [" + join + "], found index with mode [" + entry.getValue() + "]"); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the only thing that requires that: I believe this check could happen much earlier. Potentially already in the analyzer on the coordinator node. We shouldn't need to wait until the local planning to determine that an index isn't a lookup index.
Could you maybe check if it's feasible to move that to an earlier place and, thus, remove the indexNameWithModes
map altogether from the EsRelation
and related classes?
Happy to assist with moving that to an earlier stage!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There already is an index mode check in the Analyzer:
elasticsearch/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
Lines 239 to 255 in 4ac8d55
if (plan.indexMode().equals(IndexMode.LOOKUP)) { | |
String indexResolutionMessage = null; | |
var indexNameWithModes = esIndex.indexNameWithModes(); | |
if (indexNameWithModes.size() != 1) { | |
indexResolutionMessage = "invalid [" | |
+ table | |
+ "] resolution in lookup mode to [" | |
+ indexNameWithModes.size() | |
+ "] indices"; | |
} else if (indexNameWithModes.values().iterator().next() != IndexMode.LOOKUP) { | |
indexResolutionMessage = "invalid [" | |
+ table | |
+ "] resolution in lookup mode to an index in [" | |
+ indexNameWithModes.values().iterator().next() | |
+ "] mode"; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got it. The check is indeed already performed and could be removed - it's just there out of defensiveness as far as I can see.
What cannot be removed is the line
var entry = indicesWithModes.entrySet().iterator().next();
which, in case of aliases, contains the actual lookup index name in the entry's key.
Looks to me like we should add a comment and leave this as-is. (Well, and maybe remove the defensive double validation, but it also doesn't hurt.)
I'll confirm with @idegtiarenko if this is what we want to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. IndexName could actually be alias or pattern, while map entries include resolved index names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The map seems similar to EsIndex - it's detailed information that isn't required.
Nit: doing it in a separate PR will require a transport bump; maybe consider this complete but not merge, do the follow-up PR and then merge them at the same time to have only one increment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I would be able to pick up followup immediately.
Also keeping this updated required multiple conflict resolutions.
I would really prefer to merge this earlier.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/EsRelation.java
Outdated
Show resolved
Hide resolved
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
@@ -126,12 +129,13 @@ public void testDeeplyNestedFields() throws IOException { | |||
* with a single root field that has many children, grandchildren etc. | |||
*/ | |||
public void testDeeplyNestedFieldsKeepOnlyOne() throws IOException { | |||
ByteSizeValue expected = ByteSizeValue.ofBytes(9425804); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test file beautifully demonstrates the improvements from the change. Awesome, in this particularly bad case we got a reduction by 99.996%!
IndexMode indexMode, | ||
Map<String, IndexMode> indexNameWithModes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the only thing that requires that: I believe this check could happen much earlier. Potentially already in the analyzer on the coordinator node. We shouldn't need to wait until the local planning to determine that an index isn't a lookup index.
Could you maybe check if it's feasible to move that to an earlier place and, thus, remove the indexNameWithModes
map altogether from the EsRelation
and related classes?
Happy to assist with moving that to an earlier stage!
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
public EsRelation(Source source, EsIndex index, List<Attribute> attributes, IndexMode indexMode, boolean frozen) { | ||
public EsRelation( | ||
Source source, | ||
String indexName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In many cases, there will not be a concrete index name, but an index pattern. I think it's slightly misleading to call this indexName
for this reason, as it gives the impression as if this was one singular index. E.g. FROM logs*
will have logs*
as the index name, but actually that's not the name of any index.
String indexName, | |
String indexPattern, |
IndexMode indexMode, | ||
Map<String, IndexMode> indexNameWithModes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got it. The check is indeed already performed and could be removed - it's just there out of defensiveness as far as I can see.
What cannot be removed is the line
var entry = indicesWithModes.entrySet().iterator().next();
which, in case of aliases, contains the actual lookup index name in the entry's key.
Looks to me like we should add a comment and leave this as-is. (Well, and maybe remove the defensive double validation, but it also doesn't hurt.)
I'll confirm with @idegtiarenko if this is what we want to do.
public EsRelation( | ||
Source source, | ||
String indexName, | ||
IndexMode indexMode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the indexMode
attribute has become obsolete. It's required in the UnresolvedRelation
, but if we keep track of indexNameWithModes
, the latter has more complete information obtained from actual field caps calls and should be preferred.
I think we only use this indexMode
to check if a relation corresponds to a lookup index. I think we could just have a helper function instead isLookupIndex
which checks the indexNameWithModes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try to replace its usages with the map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also have 2 checks for TIME_SERIES
:
Line 35 in ab26077
if (plan.indexMode() == IndexMode.TIME_SERIES) { |
Line 184 in d953079
if (esQueryExec.indexMode() == IndexMode.TIME_SERIES) { |
that is originally configured in:
Line 484 in 212e16b
IndexMode.TIME_SERIES, |
I am thinking of deferring that to a separate change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened an issue for that #120210
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
IndexMode indexMode, | ||
Map<String, IndexMode> indexNameWithModes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The map seems similar to EsIndex - it's detailed information that isn't required.
Nit: doing it in a separate PR will require a transport bump; maybe consider this complete but not merge, do the follow-up PR and then merge them at the same time to have only one increment?
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/AnalyzerTests.java
@idegtiarenko and I went over this, and the map is not redundant; compared to Removing this, or at least reducing it to the absolutely required minimum, is desired but more effort (quite a bit of disentangling different usages there); and we'd like to postpone this effort. The present PR delivers on the goal of reducing the redundantly serialized data greatly in many cases. So I'd prefer getting this PR in, live with the additional transport version bump, and enjoy the benefit of leaner serialization already. |
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PruneColumns.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a ton, @idegtiarenko ! I think this is a great improvement.
I have only very minor comments; this can be merged as-is, please treat the comments as optional.
When backporting, I think it's important we run the PR with test-full-bwc
to make sure that we don't accidentally break bwc with 8.17, 8.16 etc. On the off chance that we break bwc with e.g. 8.15, we might need to skip a test or so based on a capability.
To ensure this, maybe it's better to remove the auto-backport
label and do the backport manually.
} | ||
|
||
public EsSourceExec(Source source, EsIndex index, List<Attribute> attributes, QueryBuilder query, IndexMode indexMode) { | ||
public EsSourceExec( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: we could make the constructor's parameter order here consistent with that of EsQueryExec
's, in that the attributes
come before the query
.
private final QueryBuilder query; | ||
private final Expression limit; | ||
private final List<Attribute> attrs; | ||
private final List<Stat> stats; | ||
|
||
public EsStatsQueryExec( | ||
Source source, | ||
EsIndex index, | ||
String indexName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consistency
String indexName, | |
String indexPattern, |
If you decide to align this, let's not forget about the nodeString
as well.
@@ -2644,7 +2644,6 @@ private void assertEmptyEsRelation(LogicalPlan plan) { | |||
assertThat(plan, instanceOf(EsRelation.class)); | |||
EsRelation esRelation = (EsRelation) plan; | |||
assertThat(esRelation.output(), equalTo(NO_FIELDS)); | |||
assertTrue(esRelation.index().mapping().isEmpty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the assertion on output
above is sufficient. Asserting the empty mapping here looks like this was over constraining, I think.
/** | ||
* Test the size of serializing a plan like | ||
* FROM index* | LIMIT 10 | KEEP one_single_field | ||
* with an index pattern pointing to a hundred actual indices with rather long names | ||
*/ | ||
public void testIndexPatternTargetingMultipleIndices() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++, thank you for the test case!
# Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java
💚 Backport successful
|
Certain plan classes (such as EsRelation, EsSourceExec, EsQueryExec) contain and serialize the entire EsIndex instance. This instance might contain huge mapping that is never used in plan. This change replaces EsIndex usage with indexPattern and indexNameWithModes to minimize the size of the serialized plan.
Certain plan classes (such as EsRelation, EsSourceExec, EsQueryExec) contain and serialize the entire
EsIndex
instance.This instance might contain huge
mapping
that is never used in plan. This change replacesEsIndex
usage withname
andindexNameWithModes
to minimize the size of the serialized plan.Closes: #112998