Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not serialize EsIndex in plan #119580

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

idegtiarenko
Copy link
Contributor

Certain plan classes (such as EsRelation, EsSourceExec, EsQueryExec) contain and serialize the entire EsIndex instance.
This instance might contain huge mapping that is never used in plan. This change replaces EsIndex usage with name and indexNameWithModes to minimize the size of the serialized plan.

Closes: #112998

@idegtiarenko idegtiarenko added >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v9.0.0 v8.18.0 labels Jan 6, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @idegtiarenko, I've created a changelog YAML for you.

@@ -2644,7 +2644,6 @@ private void assertEmptyEsRelation(LogicalPlan plan) {
assertThat(plan, instanceOf(EsRelation.class));
EsRelation esRelation = (EsRelation) plan;
assertThat(esRelation.output(), equalTo(NO_FIELDS));
assertTrue(esRelation.index().mapping().isEmpty());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it is possible to replace this assertion in assertEmptyEsRelation with any alternative?

@idegtiarenko idegtiarenko marked this pull request as ready for review January 6, 2025 15:09
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/EsRelation.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/EsQueryExec.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/EsSourceExec.java
Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find - left a question around the usage of IndexModes map; not clear why that has to be serialized?

IndexMode indexMode,
Map<String, IndexMode> indexNameWithModes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this map needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in LocalExecutionPlanner:

Map<String, IndexMode> indicesWithModes = localSourceExec.indexNameWithModes();
if (indicesWithModes.size() != 1) {
throw new IllegalArgumentException("can't plan [" + join + "], found more than 1 index");
}
var entry = indicesWithModes.entrySet().iterator().next();
if (entry.getValue() != IndexMode.LOOKUP) {
throw new IllegalArgumentException("can't plan [" + join + "], found index with mode [" + entry.getValue() + "]");
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the only thing that requires that: I believe this check could happen much earlier. Potentially already in the analyzer on the coordinator node. We shouldn't need to wait until the local planning to determine that an index isn't a lookup index.

Could you maybe check if it's feasible to move that to an earlier place and, thus, remove the indexNameWithModes map altogether from the EsRelation and related classes?

Happy to assist with moving that to an earlier stage!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There already is an index mode check in the Analyzer:

if (plan.indexMode().equals(IndexMode.LOOKUP)) {
String indexResolutionMessage = null;
var indexNameWithModes = esIndex.indexNameWithModes();
if (indexNameWithModes.size() != 1) {
indexResolutionMessage = "invalid ["
+ table
+ "] resolution in lookup mode to ["
+ indexNameWithModes.size()
+ "] indices";
} else if (indexNameWithModes.values().iterator().next() != IndexMode.LOOKUP) {
indexResolutionMessage = "invalid ["
+ table
+ "] resolution in lookup mode to an index in ["
+ indexNameWithModes.values().iterator().next()
+ "] mode";
}

private final EsIndex index;
private final String indexName;
private final IndexMode indexMode;
private final Map<String, IndexMode> indexNameWithModes;
private final List<Attribute> attrs;
private final boolean frozen;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frozen is no longer needed - since you're refactoring the classes, please remove this field from the Es classes.

# Conflicts:
#	server/src/main/java/org/elasticsearch/TransportVersions.java
@@ -126,12 +129,13 @@ public void testDeeplyNestedFields() throws IOException {
* with a single root field that has many children, grandchildren etc.
*/
public void testDeeplyNestedFieldsKeepOnlyOne() throws IOException {
ByteSizeValue expected = ByteSizeValue.ofBytes(9425804);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test file beautifully demonstrates the improvements from the change. Awesome, in this particularly bad case we got a reduction by 99.996%!

Comment on lines +158 to +159
public static final TransportVersion ESQL_SKIP_ES_INDEX_SERIALIZATION = def(8_823_00_0);
public static final TransportVersion ESQL_REMOVE_ES_RELATION_FROZEN = def(8_824_00_0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: couldn't that be a single new transport version?

IndexMode indexMode,
Map<String, IndexMode> indexNameWithModes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the only thing that requires that: I believe this check could happen much earlier. Potentially already in the analyzer on the coordinator node. We shouldn't need to wait until the local planning to determine that an index isn't a lookup index.

Could you maybe check if it's feasible to move that to an earlier place and, thus, remove the indexNameWithModes map altogether from the EsRelation and related classes?

Happy to assist with moving that to an earlier stage!

@alex-spies alex-spies self-requested a review January 9, 2025 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESQL: Remove EsIndex from plan and serialization
4 participants