Closed
Bug 1477490
Opened 6 years ago
Closed 6 years ago
AddressSanitizer: stack-buffer-overflow with WRITE of size 360 through [@ patched_BaseThreadInitThunk] in WindowsDllBlocklist
Categories
(Core :: mozglue, defect)
Tracking
()
RESOLVED
FIXED
mozilla64
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox62 | --- | unaffected |
firefox63 | --- | wontfix |
firefox64 | --- | fixed |
People
(Reporter: decoder, Assigned: away)
References
(Blocks 2 open bugs)
Details
(Keywords: crash, regression)
Attachments
(3 files)
The attached crash information was submitted via the ASan Nightly Reporter on mozilla-central-asan-nightly revision 63.0a1-20180721100146-https://hg.mozilla.org/mozilla-central/rev/9daa53881b7ae80bf6b093dac5d7744cf7fd18b1.
For detailed crash information, see attachment.
This is from a Windows ASan test build that Emanuel manually downloaded from Treeherder. Putting this under mozglue because WindowsDllBlocklist.cpp is the only code that is ours and on the stack. I also don't know how to get better symbols for the rest of the stack on Windows.
Reporter | ||
Comment 1•6 years ago
|
||
Reporter | ||
Comment 2•6 years ago
|
||
David, can you help answering the symbols question maybe as a start?
Flags: needinfo?(dmajor)
Reporter | ||
Comment 3•6 years ago
|
||
Also CCing some people that might know what this code exactly does. Any help is greatly appreciated.
We hook the Windows thread init function, so patched_BaseThreadInitThunk appears at the bottom of nearly every thread and probably isn't at fault here.
To get anything more out of this we'd either need to get a memory dump (does ASan Nightly collect such things?), and/or fix bug 1426176, and/or at least get the exact version of kernel32/ntdll/etc and disassemble those addresses from a local copy.
Flags: needinfo?(dmajor)
Comment 5•6 years ago
|
||
Hmm, if I'm reading this correctly, dumpbin says that
1) ntdll.dll+0x18001f3f2 corresponds to TppWorkerThread (0x18001f320),
2) KERNEL32.DLL+0x180013033 corresponds to BaseThreadInitThunk (0x180013020) and
3) ntdll.dll+0x180071430 corresponds to RtlUserThreadStart (0x180071410)
All are located in the Guard CF Function Table which seems to be related to Control Flow Guard (but that might just be a consequence of using CFG at all).
We don't seem to have symbols for clang_rt.asan_dynamic-x86_64.dll so we're probably missing the most important part.
Can you link me the exact Treeherder build that you used? I wonder if I could guess at a function name just from reading clang_rt.asan_dynamic-x86_64.dll disassembly.
Comment 7•6 years ago
|
||
Sure, it was this one: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=9daa53881b7ae80bf6b093dac5d7744cf7fd18b1&selectedJob=189310358
The file is also identical to the one included in the build of clang you currently get from mach bootstrap (30a8aceddc0675ca-clang.tar.bz2).
I don't understand this stack. These offsets like ntdll+0x180071430 are unreasonably large. If I take out 0x180000000 then ntdll+71430 is a sensible function on my machine. So OK, maybe there is some rebasing going on or something, who knows. But then clang_rt_asan_dynamic_x86_64+0x2f2ca still doesn't make sense, it's not instruction-aligned.
Also noteworthy is that ASan thinks this is "thread T16777215" or in other words 0xFFFFFF. Maybe ASan's bookkeeping is messed up somehow.
Emanuel, are you able to reproduce this? If so, can you share what steps you took? (I didn't get any problems from just starting that build locally) Even better, could you try running under WinDbg and see if it triggers a break?
Comment 9•6 years ago
|
||
Regarding the offsets, could they be a consequence of CFG-related indirection? The offsets in ntdll and kernel32 all corresponded to functions in the Guard CF Function Table (according to dumpbin). I don't know what to make of the alignment though - especially the odd-aligned offsets (do they correspond to the current instruction or the next one?).
I don't have a consistent STR, though I'll try to find some. I got a crash once trying to download the latest (regular) Nightly so there might be some connection with network activity - I also got a few crashes scrolling a Reddit post, but I wonder if I was hovering over links while scrolling and triggering some prefetching mechanism.
I tried attaching the Visual Studio debugger to a running process once and immediately got an exception (which I could choose to ignore), but I wasn't sure whether that was simply a quirk of ASAN.
Assignee | ||
Comment 10•6 years ago
|
||
On Win64, ASan uses a page fault handler to map shadow memory as needed, because mapping it all upfront would exceed OS limits. So you do get a lot of benign exceptions when running under a debugger. If you hit a non-continuable exception, then you know you're actually in trouble.
Comment 11•6 years ago
|
||
I'm having difficulty triggering these crashes on a newer build. On the build I was using previously, I stopped getting browser crashes but was still getting tab crashes, but only with my usual set of pinned tabs open. So this may require a very specific set of circumstances to trigger, but I'll keep browsing around to see if it happens again.
Assignee | ||
Comment 12•6 years ago
|
||
Any chance you could try that old build under a debugger?
Comment 13•6 years ago
|
||
I tried for a while but couldn't get it to crash, ugh. I'll try again tomorrow.
Updated•6 years ago
|
Group: core-security → core-security-release
Comment 14•6 years ago
|
||
Hmm, I should mention that I *have* gotten other crashes, but they're mostly assertion failures at [1]. I've also gotten a few stack-buffer-underflow crashes (very similar to the crashes here and equally unsymbolized).
I wonder if these crashes all have the same underlying cause. None of them have been very reproducible but the assertion failures seem to happen more frequently if something else in the system is using a lot of CPU.
[1] https://github.com/llvm-mirror/compiler-rt/blob/master/lib/asan/asan_thread.cc#L350
Comment 15•6 years ago
|
||
I set up a local build of clang to get symbols for the ASAN runtime; unfortunately the llvm-symbolizer doesn't seem to be working (not sure why). Here's the stack according to MSVC for one of the assertion failures:
> clang_rt.asan_dynamic-x86_64.dll!__sanitizer::internal__exit(int exitcode) Line 748 C++
clang_rt.asan_dynamic-x86_64.dll!__sanitizer::Die() Line 60 C++
clang_rt.asan_dynamic-x86_64.dll!__asan::AsanCheckFailed(const char * file, int line, const char * cond, unsigned __int64 v1, unsigned __int64 v2) Line 77 C++
clang_rt.asan_dynamic-x86_64.dll!__sanitizer::CheckFailed(const char * file, int line, const char * cond, unsigned __int64 v1, unsigned __int64 v2) Line 81 C++
clang_rt.asan_dynamic-x86_64.dll!__asan::AsanThread::GetStackFrameAccessByAddr(unsigned __int64 addr, __asan::AsanThread::StackFrameAccess * access) Line 350 C++
clang_rt.asan_dynamic-x86_64.dll!__asan::GetStackAddressInformation(unsigned __int64 addr, unsigned __int64 access_size, __asan::StackAddressDescription * descr) Line 205 C++
clang_rt.asan_dynamic-x86_64.dll!__asan::AddressDescription::AddressDescription(unsigned __int64 addr, unsigned __int64 access_size, bool shouldLockThreadRegistry) Line 465 C++
clang_rt.asan_dynamic-x86_64.dll!__asan::ErrorGeneric::ErrorGeneric(unsigned int tid, unsigned __int64 pc_, unsigned __int64 bp_, unsigned __int64 sp_, unsigned __int64 addr, bool is_write_, unsigned __int64 access_size_) Line 396 C++
clang_rt.asan_dynamic-x86_64.dll!__asan::ReportGenericError(unsigned __int64 pc, unsigned __int64 bp, unsigned __int64 sp, unsigned __int64 addr, bool is_write, unsigned __int64 access_size, unsigned int exp, bool fatal) Line 463 C++
clang_rt.asan_dynamic-x86_64.dll!__asan_wrap_memset(void * dst, int v, unsigned __int64 size) Line 764 C++
ntdll.dll!LdrpInitializeThread() Unknown
ntdll.dll!_LdrpInitialize() Unknown
ntdll.dll!LdrpInitialize() Unknown
ntdll.dll!LdrInitializeThunk() Unknown
Assignee | ||
Comment 16•6 years ago
|
||
I'd be very curious to see a full-memory dump if you hit it again.
Comment 17•6 years ago
|
||
Here's the dump for that stack. Do you need the local build I made?
Comment 18•6 years ago
|
||
I managed to get a crash on a fresh profile, here's the build, profile and minidump+heap saved by MSVC: https://mega.nz/#!vF9HxaKR!OjU_s-JWJGa-ydHI5ck8dhlPYmAkBjuqSIlWJvqGURw
Ther's no ASAN report since it was a content process crash and I forgot to turn the sandbox off (I tried turning it off but then I couldn't get it to crash anymore, alas).
Assignee | ||
Comment 19•6 years ago
|
||
Thanks Emanuel!
OK, I'm a little hazy on some details but I think I have a rough idea of what's happening.
Windows is creating a new thread and ntdll!LdrpInitializeThread is calling memset to zero out some data structure on the stack. That call gets redirected to __asan_wrap_memset. ASan thinks that area is a poison/redzone (maybe leftover from a previous thread that occupied that memory?) and tries to report an error.
Then, when attempting to get a backtrace for the first complaint, ASan runs into some other internal inconsistencies, and trips a fatal assertion. I don't know or care much about the details here because fixing the poisoning problem would make this moot.
We should ask the ASan folks whether there are any known issues around stack memory re-use.
In the meantime we might be able to work around this with the `replace_intrin:0` option or by adding `interceptor_name:memset` to a suppressions file.
Assignee | ||
Comment 20•6 years ago
|
||
> We should ask the ASan folks whether there are any known issues around stack memory re-use.
"Stack memory re-use" was a bad phrase here, as it sounds too much like a common C++ bug. What I really mean is issues with the OS creating a new thread using the stack memory area from a previous thread.
Reporter | ||
Comment 21•6 years ago
|
||
No need to keep this s-s, this is a bug with ASan on Windows.
I am currently testing with `replace_intrin:0` as suggested by :dmajor. I am not sure if this will make the crashes go away entirely but so far, it seems more stable. I would opt for making a patch that adds this to the ASan default options for reporter on Windows only, just to see if the crashes disappear. That would improve the overall stability on Windows for the reporter nightlies.
Group: core-security-release
Comment 23•6 years ago
|
||
Hey Aaron, anything we should be concerned about here?
Flags: needinfo?(aklotz)
Comment 24•6 years ago
|
||
I believe these crashes were fixed / worked around by disabling ASAN's stack instrumentation in bug 1479385.
Updated•6 years ago
|
Comment 26•6 years ago
|
||
I can reproduce on Windows 10 64bit using https://www.ibatt.ru/cart/step1/ This is a pretty common crash in bughunter. I don't have access to bug 1481745. If someone could cc me that would be great.
Comment 27•6 years ago
|
||
oops, mine was an underflow. sorry. Let me know if there is anything else you would like me to do.
Reporter | ||
Comment 28•6 years ago
|
||
Bob, do I need to do anything more than loading the URL in comment 26 to reproduce?
Flags: needinfo?(bob)
Comment 29•6 years ago
|
||
I just used nightly opt asan, enabled pop ups for the site and then used web console to open a tab from the loaded page, then in that new tab opened developer console then executed setInterval('opener.document.location.reload()', 30000) to load the original tab once every thirty seconds.
One attempt failed quickly with
==8620==AddressSanitizer CHECK failed: Z:\task_1536324217\build\src\build\build-clang\build-clang\src\llvm\projects\compiler-rt\lib\asan\asan_thread.cc:350 "((ptr[0] == kCurrentStackFrameMagic)) != (0)" (0x0, 0x0)
Another failed while I was away with
==7740==ERROR: AddressSanitizer: stack-use-after-scope on address 0x003b319ff860 at pc 0x7ff97fddfae2 bp 0x003b319fef80 sp 0x003b319fefc8
WRITE of size 56 at 0x003b319ff860 thread T16777215
==7740==WARNING: Failed to use and restart external symbolizer!
#0 0x7ff97fddfb0a in __asan_wrap_memset Z:\task_1536324217\build\src\build\build-clang\build-clang\src\llvm\projects\compiler-rt\lib\sanitizer_common\sanitizer_common_interceptors.inc:764
#1 0x7ff9a7df1aa4 in RtlUnicodeStringToAnsiString+0x2e4 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x180031aa4)
#2 0x7ff9a7e330c0 in LdrInitializeThunk+0x100 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x1800730c0)
#3 0x7ff9a7e3301a in LdrInitializeThunk+0x5a (C:\WINDOWS\SYSTEM32\ntdll.dll+0x18007301a)
#4 0x7ff9a7e32fcd in LdrInitializeThunk+0xd (C:\WINDOWS\SYSTEM32\ntdll.dll+0x180072fcd)
Address 0x003b319ff860 is a wild pointer.
SUMMARY: AddressSanitizer: stack-use-after-scope Z:\task_1536324217\build\src\build\build-clang\build-clang\src\llvm\projects\compiler-rt\lib\sanitizer_common\sanitizer_common_interceptors.inc:764 in __asan_wrap_memset
Shadow bytes around the buggy address:
0x01ed33a3feb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x01ed33a3fec0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x01ed33a3fed0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x01ed33a3fee0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x01ed33a3fef0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x01ed33a3ff00: 00 00 00 00 f1 f1 f1 f1 f8 f8 f8 f8[f8]f8 f3 f3
0x01ed33a3ff10: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
0x01ed33a3ff20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x01ed33a3ff30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x01ed33a3ff40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x01ed33a3ff50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
The stack is useless but you should be able to find something with this via the reload kludge.
Flags: needinfo?(bob)
Reporter | ||
Comment 30•6 years ago
|
||
Unhiding :bc's comments so :dmajor can read through them. The issue here is not s-s anyways, this is a bug in ASan itself and not an issue in our codebase.
Assignee | ||
Comment 31•6 years ago
|
||
bc, I have a proposed fix/workaround here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d4bff18d6a54 -- could you take it for a spin and see if it addresses these issues? (The two BoR builds should be the same, I think that was just a misclick)
Flags: needinfo?(bob)
Comment 32•6 years ago
|
||
It's running. I've set asanreporter.clientid to bclary@mozilla.com. I'll leave it running for a while.
Flags: needinfo?(bob)
Comment 33•6 years ago
|
||
When I just checked on it, I found tab crashed but no information in the output related to asan. Perhaps it sent an asan report, but I can't check on it. Hopefully that helped and hopefully this fix will land upstream soon.
Reporter | ||
Comment 34•6 years ago
|
||
I've checked the results in the backend, there is one report from :bc but it is an OOM related crashed. No sign of the stack issues anymore, so I guess we can consider this fixed as soon as we have the new toolchain. Thanks :dmajor!
Assignee | ||
Comment 35•6 years ago
|
||
Committed upstream at 342652 (with a bonus change to the interceptor at 342649). I'll cherry-pick these into our tree. Note to self, don't forget to remove the asan-stack workaround.
Assignee: nobody → dmajor
Assignee | ||
Comment 36•6 years ago
|
||
Comment 37•6 years ago
|
||
Comment on attachment 9010806 [details]
Bug 1477490: Merge upstream ASan patch to unpoison thread stacks.
Ted Mielczarek [:ted] [:ted.mielczarek] has approved the revision.
Attachment #9010806 -
Flags: review+
Comment 38•6 years ago
|
||
Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8846c0a176c8
Merge upstream ASan patch to unpoison thread stacks. r=ted
Comment 39•6 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 6 years ago
status-firefox64:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Comment 40•6 years ago
|
||
David, would this patch be a good candidate for uplifting to 63 beta?
Flags: needinfo?(dmajor)
Assignee | ||
Comment 41•6 years ago
|
||
There's no real benefit since the asan reporter project is nightly-only.
Flags: needinfo?(dmajor)
Comment 42•6 years ago
|
||
Thanks David
Updated•6 years ago
|
status-firefox62:
--- → unaffected
status-firefox-esr60:
--- → unaffected
Updated•5 years ago
|
Blocks: asan-maintenance
You need to log in
before you can comment on or make changes to this bug.
Description
•