-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stackalloc localloc #112168
base: main
Are you sure you want to change the base?
Stackalloc localloc #112168
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
@hez2010 fyi Expect this may blow up in some tests with stack overflows; we'll see. Also I forgot to exclude sites in handers, so things may blow up there too. |
Some preliminary notes. @dotnet/jit-contrib @davidwrighton @jkotas interested in your feedback. This builds up on #104906 JIT-introduced
|
Do you have good examples in BCL or other real-world code where this kicks in?
The typical use of stackallocs in BCL are constant-sized stackallocs or stackalloc+ArrayPool combos. Do you see the unsafe nature of the stackalloc uses in the BCL in unbounded stackalloc that may slip through code review? BCL stackallocs and stackalloc+ArrayPool combos have other safety problems:
For the BCL use cases in particular, it may be more interesting to work on #52065 and base this optimization on top of it:
|
If we had malloca-like API, I think this specific example can be converted to it as an optimization in Roslyn as well. |
Sometimes we may need to return a buffer to its caller so that we cannot use
Some typical scenarios that this may kick in after we have the support for gcref arrays are like |
Mostly this was an exploration of how hard it would be to enable the transformation in the JIT, and to contemplate what else might need to be addressed. I have started scouting around for potential impact but it will take a while to get a useful set of data. I also need to build up a better automated analysis for categorizing the things that block and unblock allocation (at least for the first blocker) and make sure we're not missing anything simple in our analysis. With this PR as is, on one large internal application that has likely been extensively hand tuned, there are roughly 22K Tier1 optimized methods, 2.3K methods with array creation sites, and 4.1K total array allocation sites. 2 of the arrays are stack allocated (not sure if via localloc). I don't have a breakdown yet of what blocks the other 4K. For some context, on this same application, conditional escape analysis for enumerators kicks in for around 200 methods. |
Remaining failures all look like stack overflows -- there needs to be a per-instance size limit as well as a dynamic size limit. So seems like this sort of transformation is feasible. Adding a per-instance limit will introduce conditional heap/stack allocation, so that seems like an easy next step. |
Experiment with turning non-escaping
new (nongc)[n]
into stackallocs.Also enable
new (nongc)[100]
if the allocation site is within a loop, also via stackalloc.Currently no restriction on how big (that will have to change).