-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store token trees in contiguous Vec
instead of as a tree
#18327
Conversation
Related https://rust-lang.zulipchat.com/#narrow/channel/185405-t-compiler.2Frust-analyzer/topic/mbe.20transcription I tried this before (i also saw no perf impact like you are, with a slight memory usage decrease iirc but the branch seems to be lost now), for some reason I was unable to actually integrate it without rewriting the mbe crate at the time though. Cant recall why that was. But neat either way. Though I guess this will heavily clash with #17830 now 😬 (wonder when Ill be able to get back to that PR, probably not this year) |
Perhaps that was the hardness of traversing the tree correctly? It was very ugly in my early attempts, but when I discovered the secret sauce (a |
Might very well have been the traversal yes |
4bda4a9
to
3796848
Compare
☔ The latest upstream changes (presumably #17954) made this pull request unmergeable. Please resolve the merge conflicts. |
3796848
to
9ff252b
Compare
☔ The latest upstream changes (presumably #18366) made this pull request unmergeable. Please resolve the merge conflicts. |
448b361
to
a84450b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am generally fine with the changes just one question. The iterator / cursor stuff does clean up some things as well which is nice!
I expected this to be faster (due to less allocations and better cache locality), but benchmarked it is not (neither it is slower). Memory usage, however, drops by ~50mb (of `analysis-stats .`). I guess tt construction is just not hot. This also simplifies using even less memory for token trees by compressing equal span, which I plan to do right after. Some workflows are more easily expressed with a flat tt, while some are better expressed with a tree. With the right helpers, though (which was mostly a matter of trial and error), even the worst workflows become very easy indeed.
a84450b
to
ceba289
Compare
We should immediately mark them as finished, on the first entry. The funny (or sad) part was that this bug was pre-existing, but previously to rust-lang#18327, it was causing us to generate bindings non-stop, 65535 of them, until we get to the hardcoded repetition limit, and then throw it all away. And it was so Blazingly Fast that nobody noticed. With rust-lang#18327 however, this is still what happens, except that now instead of *merging* the fragments into the result, we write them on-demand. Meaning that when we hit the limit, we've already written all previous entries. This is a minor change, I thought for myself when I was writing this, and it's actually for the better, so who cares. Minor change? Not so fast. This caused us to emit 65535 repetitions, all of which the MBE infra needs to handle when calling other macros with the expansion, and convert to rowan tree etc., which resulted a *massive* hang. The test (and also `analysis-stats`) used to crash with stack overflow on this macro, because we were dropping some crazily deep rowan tree. Now they work properly. Because I am lazy, and also because I could not find the exact conditions that causes a macro match but with a missing binding, I just copied all macros from tracing. Easy.
As part of my attempts to optimize #18074, I noticed that token trees contain a significant portion of our memory usage (around 200mb out of 1650mb). As such, I went to optimize them.
TokenTree
is quite large (64 bytes), and a significant part of this is spans. I came up an idea to shrink spans, but it will be easier to do if token trees were contiguous in memory. Given that this seemed like something worth doing perf-wise anyway, I went to do this :)I expected this to be faster (due to less allocations and better cache locality), but benchmarked it is not (neither it is slower). I guess tt construction is just not hot. Memory usage, however, drops by ~50mb (of
analysis-stats .
), probably due toTokenTree
shrinking to 48 bytes.Some workflows are more easily expressed with a flat tt, while some are better expressed with a tree. With the right helpers, though (which was mostly a matter of trial and error), even the worst workflows become very easy indeed.