Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native/mono] Use mono_jit_thread_attach() #9937

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jonpryor
Copy link
Member

TODO: what's the explanation?

TODO: what's the explanation?
…o_jit_thread_attach

Get all those CI "ignore networking error" fixes!

Make CI Green Again™
(…wait.)
@lateralusX
Copy link
Member

lateralusX commented Mar 21, 2025

Analysis switching to mono_jit_thread_attach in dotnet/android:

TL;DR

dotnet/android already runs majority of its threads as cooperate suspend aware threads (GC safe when return from thread attach) and hybrid suspend will pre-emptive suspend threads that are currently in GC safe mode, indicates that changing the remaining calls from mono_thread_attach into mono_jit_thread_attach should be a low-risk change.

The full story:

Mono runtime can run in different suspend models, pre-emptive, hybrid or coop suspend. In the past, Xamarin Android (mono/mono), used pre-emptive suspend model, but starting with .net6, dotnet Android switched to hybrid suspend model.

The major difference between these two is how threads are suspend/resumed when triggering a GC, pre-emptive suspend relies on signals, meaning that any thread attached to the runtime, can be suspended at any location in code, including in bad areas that could cause side effects (like holding low level locks). The Mono embedding API’s was original designed with this suspend model in mind.

The hybrid suspend model on the other hand is a combination between pre-emptive and cooperate suspend model, but for this discussion, the interesting part is the fact that threads running GC unsafe under hybrid suspend need to hit safe points to be suspended, if not hitting a safe point, the runtime will (by default), wait for that thread forever, hanging the GC in its stw "stop the world" phase. A safe point is a location in code where thread will yield execution and wait for GC to complete and resume it. A safe point is just a location where a thread tells the runtime that it's in a GC safe region promising not to touch any managed memory or call any runtime functions as described here, https://www.mono-project.com/docs/advanced/runtime/docs/coop-suspend/#gc-safe-mode.

The switching back and forth between GC unsafe and safe is mainly taken care of by the runtime, for example, calling a p/invoke will mark the thread as being in GC safe mode while running the p/invoke, internal runtime waits, hitting safe points in C# code etc. A thread running in GC unsafe mode means that its executing managed or runtime code and needs to hit a safe point before it can be suspended by GC.

A thread could start out either as in GC unsafe or safe mode. The following lists a couple of scenarios:

  • mono_thread_attach due to backward compability with Mono embedding API and embedders, thread attached to the runtime in GC unsafe mode, meaning that it needs to reach a safe point to be suspended.
  • mono_jit_thread_attach thread gets attached to the runtime in GC safe mode.
  • mono_jit_init/mono_jit_init_version, thread calling these functions to initialize the runtime will be put in GC safe mode.
  • Native to managed wrappers, like unmanaged callers only methods, reverse p/invoke function pointers, GetFunctionPointerForDelegate etc, will attach unattached threads so they will be in GC safe mode on return.

A thread that is running in GC safe mode must be switched to GC unsafe mode when re-entering managed or runtime code. When calling through the native to managed wrappers, this will be taken care of by the wrapper. When calling through the Mono embedding API’s, each individual API needs to take a decision (based on what it does) to switch to GC unsafe and then back to the state thread had when entering the API (could actually be GC unsafe if it was called in GC unsafe mode).

Threads GC mode is critical when running a GC, since GC would need to do a stw in order to proceed with GC work. The hybrid suspend models stw is a little more complex in how it operates compared to both pre-emptive and coop suspend, but it mainly boils down to two steps. The first, all threads attached to the runtime will be checked. If thread is currently in GC safe mode, it will be ignored in first step, all threads in GC unsafe mode will be waited upon until they reach a safe point. This is normally where we see deadlocks in ANR’s due to threads not reaching safe points in timely manners. Once the first phase is done (all threads in GC unsafe mode reached safe points), second phase will consider all threads still in GC safe mode and pre-emptive suspended them (using signals).

We have identified several ANR’s (Application Not Responding) on Android where we seen threads attached to runtime with callstacks like this:

"queue-1-2" tid=8105 Native
  #00  pc 0x000000000006a0c0  /system/lib64/libc.so (__rt_sigsuspend+4)
  #01  pc 0x0000000000029684  /system/lib64/libc.so (sigsuspend+44)
  #02  pc 0x00000000001ff994  /data/app/<app.bundle.id>-_I4CSOAWam382fA8t14IEg==/lib/arm64/libmonosgen-2.0.so (suspend_signal_handler+200)
  #03  pc 0x00000000000005dc  [vdso:000000737837f000]
  #04  pc 0x000000000001dae8  /system/lib64/libc.so (syscall+24)
  #05  pc 0x00000000000e1ee4  /system/lib64/libart.so (art::ConditionVariable::WaitHoldingLocks(art::Thread*)+152)
  #06  pc 0x0000000000392794  /system/lib64/libart.so (art::Monitor::Wait(art::Thread*, long, int, bool, art::ThreadState)+632)
  #07  pc 0x0000000000394288  /system/lib64/libart.so (art::Monitor::Wait(art::Thread*, art::mirror::Object*, long, int, bool, art::ThreadState)+252)
  #08  pc 0x00000000001dcadc  /system/framework/arm64/boot.oat (java.lang.Object.wait [DEDUPED]+140)
  #09  pc 0x00000000001fcebc  /system/framework/arm64/boot.oat (java.lang.Thread.parkFor$+428)
  #10  pc 0x0000000000608cd8  /system/framework/arm64/boot.oat (java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await+808)
  #11  pc 0x00000000005de8ec  /system/framework/arm64/boot.oat (java.util.concurrent.LinkedBlockingQueue.take+156)
  #12  pc 0x00000000005e9e1c  /system/framework/arm64/boot.oat (java.util.concurrent.ThreadPoolExecutor.getTask+492)
  #13  pc 0x00000000005ec2b0  /system/framework/arm64/boot.oat (java.util.concurrent.ThreadPoolExecutor.runWorker+240)
  #14  pc 0x00000000005fb114  /system/framework/arm64/boot.oat (java.util.concurrent.ThreadPoolExecutor$Worker.run+68)
  #15  pc 0x00000000001fd13c  /system/framework/arm64/boot.oat (java.lang.Thread.run+76)
  #16  pc 0x0000000000509384  /system/lib64/libart.so (art_quick_invoke_stub+580)
  #17  pc 0x00000000000d8078  /system/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+200)
  #18  pc 0x0000000000431120  /system/lib64/libart.so (art::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art::ArgArray*, art::JValue*, char const*)+104)
  #19  pc 0x00000000004322ac  /system/lib64/libart.so (art::InvokeVirtualOrInterfaceWithJValues(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, jvalue*)+432)
  #20  pc 0x0000000000458e8c  /system/lib64/libart.so (art::Thread::CreateCallback(void*)+1140)
  #21  pc 0x00000000000678b4  /system/lib64/libc.so (__pthread_start(void*)+36)
  #22  pc 0x000000000001ef24  /system/lib64/libc.so (__start_thread+68)

This thread is attached to runtime since its running our suspend signal handler, but it also seems to be waiting inside some Java thread pool. This works under pre-emptive suspend, but if the same thread has been attached to runtime under hybrid and ends up waiting like this outside of managed and runtime code, then the thread must be in GC safe mode or it will violate the runtime hybrid suspend model, since if a thread is in GC unsafe mode, then it needs to reach a safe point in timely manner, something above callstack will probably never do, blocking the completion of stw.

It turns out that dotnet/android codebase still have two locations that could attach threads using mono_thread_attach, while the majority of threads are attached either as runtime init thread, marshalled methods using mono_jit_thread_attach or native to managed wrappers. Thread attached using one of the above will be in GC safe mode, meaning they should either reach safe points or being pre-emptive suspended under hybrid suspend model. After analyzing the code paths in dotnet/android ending up in calls to mono_thread_attach it is however still not clear if they are reachable in real world scenarios, but since we have ANR’s that points to issues suspending threads, we seen threads with callstack waiting outside runtime and majority of threads attached to runtime running dotnet Android seems to attach as cooperate suspend aware (GC safe mode on return), it would make sense to standardize and attach all threads as cooperate suspend aware in dotnet/android repro.

As part of this analysis, I also looked over all Mono API’s used by dotnet/runtime, analyzed if they are correctly switching to GC unsafe when called, if the API’s are cooperate aware and if they are safe to only be called during init or before running managed code. The fact that hybrid suspend will do a pre-emptive suspend on threads that are running in GC safe mode reduce issues using API’s that currently won’t enter GC unsafe (but probably should) or are not cooperate suspend aware, passing raw GC objects as parameter or return values.

Since we already run majority of threads as cooperate suspend aware threads (GC safe when return from thread attach) in dotnet/android and that hybrid suspend will pre-emptive suspend threads that are currently in GC safe mode, indicates that changing the remaining calls from mono_thread_attach into mono_jit_thread_attach should be a low risk change.

For completion, this is the list of Mono embedding API’s used by dotnet/android and their state regarding switching to GC unsafe, being cooperate suspend aware and potential implications. API’s without comments should be safe to call under any suspend model. API’s marked as “init-only” should be called before runtime or before running managed code. They are either not thread safe changing runtime state or used during runtime initialization or needs to be in place before running managed code. API’s marked with “Can’t be called under coop suspend model.” normally means that the API uses raw GC objects as parameters or return values. These API’s can’t be called under cooperate suspend model, but since hybrid suspend model will pre-emptive suspend threads in GC safe mode, it can still scan threads active stack and registers, so should be able to handle direct GC references on stack or in register for all attached runtime threads. The last category “Should transition to GC unsafe.”, is mainly API’s that should do a GC unsafe transition internally but currently don’t. This is something that should probably be fixed in runtime and until done, these API’s can’t be safely called under coop suspend model. They should however still be safe under hybrid suspend, since threads in GC safe mode will be pre-empted.

Mono API GC Unsafe Cooperate Comment
mono_add_internal_call No Yes init-only
mono_alc_get_default_gchandle No Yes
mono_array_new Yes No Can’t be called under coop suspend model.
mono_assembly_get_image Yes Yes
mono_assembly_load_from_full Yes Yes
mono_assembly_load_full Yes Yes
mono_assembly_load_full_alc Yes Yes
mono_assembly_loaded Yes Yes
mono_assembly_name_free Yes Yes
mono_assembly_name_get_culture Yes Yes
mono_assembly_name_get_name Yes Yes
mono_assembly_name_new Yes Yes
mono_assembly_open_full Yes Yes
mono_check_corlib_version Yes Yes
mono_class_from_mono_type Yes Yes
mono_class_from_name Yes Yes
mono_class_get Yes Yes
mono_class_get_field_from_name Yes Yes
mono_class_get_image No Yes
mono_class_get_method_from_name Yes Yes
mono_class_get_name Yes Yes
mono_class_get_namespace Yes Yes
mono_class_get_type No Yes
mono_class_get_type_token No Yes
mono_class_is_subclass_of Yes Yes
mono_class_vtable Yes Yes
mono_config_is_server_mode No Yes
mono_debug_init No Yes init-only
mono_debug_open_image_from_memory Yes Yes
mono_debugger_agent_unhandled_exception Yes No Can’t be called under coop suspend model.
mono_dl_fallback_register No Yes init-only
mono_domain_foreach Yes Yes
mono_domain_get No Yes
mono_domain_get_id No Yes
mono_domain_set Yes Yes
mono_error_get_message No Yes Should transition to GC unsafe.
mono_field_get_value Yes No Can’t be called under coop suspend model.
mono_field_set_value Yes No Can’t be called under coop suspend model.
mono_field_static_set_value Yes Yes
mono_gc_register_bridge_callbacks No Yes init-only
mono_gc_wait_for_bridge_processing Yes Yes
mono_get_byte_class No Yes
mono_get_method No Yes Should transition to GC unsafe.
mono_get_root_domain No Yes
mono_get_runtime_build_info No Yes
mono_guid_to_string No Yes
mono_image_get_name No Yes
mono_image_loaded Yes Yes
mono_image_open_from_data_alc Yes Yes
mono_image_open_from_data_with_name Yes Yes
mono_image_strerror No Yes
mono_install_assembly_preload_hook No Yes init-only
mono_install_assembly_preload_hook_v3 No Yes init-only
mono_jit_init_version No Yes
mono_jit_parse_options No Yes init-only
mono_jit_set_aot_mode No Yes init-only
mono_jit_set_trace_options No Yes init-only
mono_jit_thread_attach No Yes
mono_method_full_name Yes Yes
mono_method_get_unmanaged_callers_only_ftnptr Yes Yes
mono_object_get_class Yes No Can’t be called under coop suspend model.
mono_profiler_create No Yes init-only
mono_reflection_assembly_get_assembly No No Can’t be called under coop suspend model.
mono_reflection_type_from_name Yes Yes
mono_reflection_type_get_type Yes No Can’t be called under coop suspend model.
mono_runtime_init No Yes
mono_runtime_invoke Yes No Can’t be called under coop suspend model.
mono_runtime_set_main_args No Yes init-only
mono_set_crash_chaining No Yes init-only
mono_set_signal_chaining No Yes init-only
mono_set_use_llvm No Yes init-only
mono_string_chars No No Can’t be called under coop suspend model.
mono_string_length No No Can’t be called under coop suspend model.
mono_string_new Yes No Can’t be called under coop suspend model.
mono_string_to_utf8 Yes No Can’t be called under coop suspend model.
mono_thread_attach No No Can’t be called under coop suspend model.
mono_thread_create Yes Yes
mono_trace_set_level_string No Yes init-only
mono_trace_set_log_handler No Yes init-only
mono_trace_set_mask_string No Yes init-only
mono_trace_set_print_handler No Yes init-only
mono_trace_set_printerr_handler No Yes init-only
mono_type_get_name_full No Yes Should transition to GC unsafe.
mono_type_get_object Yes No Can’t be called under coop suspend model.
mono_unhandled_exception Yes No Can’t be called under coop suspend model.
mono_value_copy_array No No Can’t be called under coop suspend model.

@lateralusX
Copy link
Member

@jonpryor should be proceed with this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants