Skip to content

Conversation

KJTsanaktsidis
Copy link
Contributor

@KJTsanaktsidis KJTsanaktsidis commented Nov 12, 2023

This fixes some of the issues with ASAN I outlined in https://bugs.ruby-lang.org/issues/20001

With this merged, make now works on my machine with the following:

CC=clang ./configure cppflags="-fsanitize=address -fno-omit-frame-pointer" optflags=-O2 LDFLAGS="-fsanitize=address -fno-omit-frame-pointer" --prefix="$(realpath installed)"
ASAN_OPTIONS=use_sigaltstack=0:detect_leaks=0:abort_on_error=1 make

Without these fixes, I get crashes in various steps which use the built miniruby (e.g. parsing rdoc or runing extconf files) because VALUE's on the machine stack are not getting properly marked (because they are instead living in ASAN fake stacks).

This seems to build fine with ASAN both enabled and disabled on linux/mac, and builds without asan on windows (i wasn't brave enough to try turning asan on with msvc heh).

[Bug #20001]

@KJTsanaktsidis KJTsanaktsidis marked this pull request as draft November 12, 2023 05:28
@KJTsanaktsidis KJTsanaktsidis force-pushed the ktsanaktsidis/fix_asan_pt_1 branch 4 times, most recently from 2ea84e7 to 68686fc Compare November 12, 2023 10:57
* @param[in] addr A pointer somewhere on the stack, near its bottom.
*/
void ruby_init_stack(volatile VALUE *addr);
void ruby_init_stack(volatile void *addr);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a public header; changing VALUE to void is not a breaking change I think (and where this is called inside ruby, it's called with a (void *) cast anyway today). However, removing volatile would break a program which calls this with a volatile variable I think.

It doesn't need to be volatile (we never dereference this address anyway), but it doesn't do any harm I suppose.

@KJTsanaktsidis KJTsanaktsidis force-pushed the ktsanaktsidis/fix_asan_pt_1 branch 5 times, most recently from 1be89ca to 2eb64ea Compare November 12, 2023 21:26
@KJTsanaktsidis KJTsanaktsidis marked this pull request as ready for review November 12, 2023 22:36
The implementation of `native_thread_init_stack` for the various
threading models can use the address of a local variable as part of the
calculation of the machine stack extents:

* pthreads uses it as a lower-bound on the start of the stack, because
  glibc (and maybe other libcs) can store its own data on the stack
  before calling into user code on thread creation.
* win32 uses it as an argument to VirtualQuery, which gets the extent of
  the memory mapping which contains the variable

However, the local being used for this is actually allocated _inside_
the `native_thread_init_stack` frame; that means the caller might
allocate a VALUE on the stack that actually lies outside the bounds
stored in machine.stack_{start,end}.

A local variable from one level above the topmost frame that stores
VALUEs on the stack must be drilled down into the call to
`native_thread_init_stack` to be used in the calculation. This probably
doesn't _really_ matter for the win32 case (they'll be in the same
memory mapping so VirtualQuery should return the same thing), but
definitely could matter for the pthreads case.

[Bug #20001]
Where a local variable is used as part of the stack bounds detection, it
has to actually be on the stack. ASAN can put local variable on "fake
stacks", however, with addresses in different memory mappings. This
completely destroys the stack bounds calculation, and can lead to e.g.
things not getting GC marked on the machine stack or stackoverflow
checks that always fail.

The __asan_addr_is_in_fake_stack helper can be used to get the _real_
stack address of such variables, and thus perform the stack size
calculation properly

[Bug #20001]
This is preparing for a more specialised, asan-aware version of
gc_mark_maybe which needs some additional context passed through.

[Bug #20001]
__has_feature is a clang-ism, and GCC has a different way to tell if
sanitizers are enabled. For this reason, I don't want to spray
__has_feature all over the codebase for other places where conditional
compilation based on sanitizers is required.

[Bug #20001]
ASAN leaves a pointer to the fake frame on the stack; we can use the
__asan_addr_is_in_fake_stack API to work out the extent of the fake
stack and thus mark any VALUEs contained therein.

[Bug #20001]
@KJTsanaktsidis KJTsanaktsidis force-pushed the ktsanaktsidis/fix_asan_pt_1 branch from 2eb64ea to 75ebdbd Compare January 10, 2024 11:14
@KJTsanaktsidis KJTsanaktsidis merged commit d10bc3a into ruby:master Jan 12, 2024
@KJTsanaktsidis KJTsanaktsidis deleted the ktsanaktsidis/fix_asan_pt_1 branch January 12, 2024 06:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant