Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce BlockLoaderTestCase #119415

Merged
merged 3 commits into from
Jan 8, 2025
Merged

Introduce BlockLoaderTestCase #119415

merged 3 commits into from
Jan 8, 2025

Conversation

lkts
Copy link
Contributor

@lkts lkts commented Dec 31, 2024

This PR proposes extracting block loader tests into a separate hierarchy and breaking the connection between synthetic source tests and block loader tests. See #115257 for motivation.

This approach leverages existing data generation infrastructure used for LogsDB testing. As a result there is no need to implement generation of mapping/values again and block loader tests have to check all possible mapping parameter combinations (provided they are supported in data generation). KeywordFieldBlockLoaderTests is implemented as an example.

Next steps:

  • Move everything in org.elasticsearch.logsdb.datageneration to org.elasticsearch.datageneration since it is not specific to LogsDB now.
  • Remove now-obsolete block loader tests from KeywordFieldMapperTests.
  • When more tests are migrated to this approach, data generation needs to be extended with more field types and mapping parameters. Specifically we already need ignore_above for keyword and we'll need ignore_malformed fairly soon.

@lkts lkts added >test Issues or PRs that are addressing/adding tests :Analytics/ES|QL AKA ESQL :StorageEngine/Mapping The storage related side of mappings labels Dec 31, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine v9.0.0 labels Dec 31, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

return new DataSourceResponse.DynamicMappingGenerator(isObject -> false);
}
}))
.build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how much it's worth changing the interface of this one so it feels less "logsdb" and more "everything". OTOH, that's a thing you do when you are further along then this. So keep this for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something specific that caught your eye? Reading this requires some knowledge of data generation indeed but i don't really see it screaming "logsdb" (except the namespace which i will change).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataSourceHandlers isn't really a vocabulary word I know. But it's fine. DynamicMappingGenerator sounds quite sensible.

}

return list;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This'll almost certainly float to the superclass when you make more of these.

var mapping = mappingGenerator.generate(template);
var mappingXContent = XContentBuilder.builder(XContentType.JSON.xContent()).map(mapping.raw());

var syntheticSource = randomBoolean();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this makes me wonder how much we should parameterize vs randomize the mappings. It's more work to make a parameterized array, but these feel like something a couple of nested for loops could spit out the valid cases and we could see them. It there aren't thousands of them we could run them on every execution.

No need to change anything in this PR. I'd be fine flipping later if you think it's a good idea. Or never. It's probably fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a good question. My statement here is definitely more "i need mappings and values and there is already this thing that knows how to do that" than "randomize is a way to go". I'll keep it but i can definitely see some evolution.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Sasha!


var fieldValue = generator.generateValue();

Object blockLoaderResult = setupAndInvokeBlockLoader(mapperService, fieldValue);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we randomly inject extra fields and/or documents to mimic the issue in #117792, but that's for follow-ups.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could but i wonder if we should cover this in more integration level tests.

Copy link
Contributor

@limotova limotova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM..!

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lkts for working on this! LGTM and is a good step in the direction of removing block loader tests from field mapper tests.

@lkts lkts merged commit c158692 into elastic:main Jan 8, 2025
16 checks passed
@lkts lkts deleted the block_loader_tests branch January 8, 2025 19:46
elasticsearchmachine pushed a commit that referenced this pull request Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL :StorageEngine/Mapping The storage related side of mappings Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine >test Issues or PRs that are addressing/adding tests v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants