Skip to content

Create an array-based version of ChunkedToXContentBuilder #119063

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

thecoop
Copy link
Member

@thecoop thecoop commented Dec 19, 2024

Use fixed-size arrays, but the methods and semantics of ChunkedToXContentBuilder

Fixes #118647

@thecoop thecoop force-pushed the array-xcontent-chunks branch from 11e0d0e to 919f514 Compare December 19, 2024 11:41
@original-brownbear
Copy link
Contributor

I'm sorry @thecoop but I fail to see what this is trying to achieve.
Yes this is probably somewhat faster than the previous version but also obviously slower than the original version without the builder. Again, the chunked x-content logic is only applicable to a very narrow set of use-cases and any API that makes it look nicer (somewhat subjective to begin with, but I get where this is coming from) by adding more push-style logic before the pull-style iteration will slow things down.
Is it really worth all this noise and cost iterating on a builder implementation when the previous implementation will always be faster and really since this is transport thread logic and about as hot as anything ever gets in ES, that is all that matters pretty much. If anything, I'd rather look into writing custom code for search response and bulk response instead of chaining iterators since both of these are considerable CPU consumers.
Sorry but looking at benchmarks and profiling in general we want faster here and any abstraction layer means slower (and slower means less stable because we're on transport threads!).
=> in light of the cost of all of this, is there any reason to not just revert to the old version and be done with this? :)

@thecoop
Copy link
Member Author

thecoop commented Dec 19, 2024

Can you point to how this will be slower? The objects returned are the same types returned by the previous ChunkedToXContentHelper methods.

@original-brownbear
Copy link
Contributor

This one in isolation isn't much slower but even this still has the negative effect of duplicating existing implementations in ChunkedToXContentHelper (and reintroduction of a method previously deleted from that) which is just needless overhead as well (larger table to look up from for the virtual calls).
My point here really was just: we had something that worked reasonably fast why not just go back to that and be happy with it and optimize further rather than go through all this noise? :) Also, note that we started from issues with slow search response serialization, this doesn't remove the slow builder nor addresses search responses in any way?
A fix for #118647 should deal with the slow builder, not just (re-)add duplicate code for one specific path?

@thecoop
Copy link
Member Author

thecoop commented Dec 19, 2024

This is a WIP - the code here is an example of the changes I would like to make. I am continuing to refactor the old builder to the new one now.

I want to do this change because it makes the code easier to understand and less error-prone, especially given the obtuseness of some exceptions when you mis-match start and end blocks. The methods here are designed to minimise the chances of doing that by doing the start and end for you, in the same style as ChunkedToXContentBuilder (as I encountered when doing some changes in elasticsearch-internal).

I also did these changes as there was little documentation or comments specifying the performance-critical nature of this code. May I suggest adding assertions for specific types, and extensive comments explaining what is going on, if these are necessary to ensure certain calls do not turn megamorphic?

@original-brownbear
Copy link
Contributor

@thecoop I finally got around to be profiling this stuff in some detail.
Now I understand why even this PR doesn't get us back to "normal" for search performance. In fact, even though you'd think it'd return to the old numbers, it doesn't.
The reason is that prior to introducing the builder we had a situation where we pretty much always had the array based iterator at the top level of the chunking and it inlined. We already got regressions out of moving to the builder in early October for bulk (regressions in search though!) because that polluted the profile. The regression was just relatively small and we missed it because of the ongoing Lucene 10 work :(

=> there's no point in messing with this stuff on a per response basis. I'd suggest we asap revert the builder stuff for the time being. Reverting to the old code brings back better than before performance numbers in my testing because part of the Lucene 10 improvements became invisible with the builder introduction on the bulk response.

@thecoop thecoop closed this Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label >refactoring v9.0.0 WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ChunkedToXContentBuilder should be removed
3 participants