Closed
Bug 1397301
Opened 7 years ago
Closed 3 years ago
Crash in sandbox::SharedMemIPCClient::DoCall (Avast)
Categories
(Core :: Security: Process Sandboxing, defect, P3)
Tracking
()
People
(Reporter: philipp, Unassigned)
References
Details
(Keywords: crash, Whiteboard: sb+)
Crash Data
This bug was filed from the Socorro interface and is report bp-99a8ec3d-932f-4645-9e19-ee3060170906. ============================================================= Crashing Thread (34) Frame Module Signature Source 0 firefox.exe sandbox::SharedMemIPCClient::DoCall(sandbox::CrossCallParams*, sandbox::CrossCallReturn*) security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc:105 1 firefox.exe sandbox::CrossCall<sandbox::SharedMemIPCClient, wchar_t*, unsigned int, sandbox::InOutCountedBuffer>(sandbox::SharedMemIPCClient&, unsigned int, wchar_t* const&, unsigned int const&, sandbox::InOutCountedBuffer const&, sandbox::CrossCallReturn*) security/sandbox/chromium/sandbox/win/src/crosscall_client.h:366 2 firefox.exe TargetNtQueryAttributesFile security/sandbox/chromium/sandbox/win/src/filesystem_interception.cc:233 3 ntdll.dll LdrpResolveFileName 4 ntdll.dll LdrpSearchPath 5 ntdll.dll RtlSubAuthorityCountSid 6 ntdll.dll LdrpFindOrMapDll 7 ntdll.dll LdrpFindOrMapDll 8 mozglue.dll patched_LdrLoadDll mozglue/build/WindowsDllBlocklist.cpp:779 9 kernelbase.dll NlsConvertStringToIntegerW 10 kernelbase.dll LoadLibraryExA 11 kernel32.dll LoadLibraryA 12 @0x8f0012 13 kernel32.dll BaseThreadInitThunk 14 ntdll.dll RtlUserThreadStart these crashes seem to be more frequent on 64bit versions of firefox, so they are now on the rise after we started automatically migrating users to win64 on 56.0b9.
Reporter | ||
Comment 1•7 years ago
|
||
on nightly the signature got more frequent after 55.0a1 build 20170307030205, maybe something is regressing since then and tickling this crash more often.
Comment 2•7 years ago
|
||
The ones I've looked at are complaining that answer at [1] is null, in what I think are SSE instructions being used for the memcpy. The thing is in the dump it isn't. At least for a 64-bit process. So, I wonder if there is some confusion between 32-bit and 64-bit somehow during the migration. [1] https://hg.mozilla.org/releases/mozilla-beta/file/97bee4069ef3/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#l105
Looking at this report: https://crash-stats.mozilla.com/report/index/6e9aaa8b-344d-496d-9273-e84dd0170910#tab-details The failure code is EXCEPTION_ACCESS_VIOLATION_READ on address 0, but the instruction is a write: firefox!sandbox::SharedMemIPCClient::DoCall+0xc2 00007ff7`55349352 0f2903 movaps xmmword ptr [rbx],xmm0 Note that rbx (aka `answer`) ends in hex digit "8" so it's not 16-byte aligned. I suspect the reason code is bogus and this is actually an alignment error in the movaps. Spot-checking several reports from that build they all have rbx ending in 8. Either the compiler is wrong to assume that `answer` is 16-byte aligned, or the caller is wrong in not living up to this requirement. My guess is the former, but I defer to Bob in case he knows something about this code that I don't.
Flags: needinfo?(bobowencode)
Comment 4•7 years ago
|
||
(In reply to David Major [:dmajor] from comment #3) > Looking at this report: > https://crash-stats.mozilla.com/report/index/6e9aaa8b-344d-496d-9273- > e84dd0170910#tab-details > > The failure code is EXCEPTION_ACCESS_VIOLATION_READ on address 0, but the > instruction is a write: > > firefox!sandbox::SharedMemIPCClient::DoCall+0xc2 > 00007ff7`55349352 0f2903 movaps xmmword ptr [rbx],xmm0 > > Note that rbx (aka `answer`) ends in hex digit "8" so it's not 16-byte > aligned. I suspect the reason code is bogus and this is actually an > alignment error in the movaps. Spot-checking several reports from that build > they all have rbx ending in 8. > > Either the compiler is wrong to assume that `answer` is 16-byte aligned, or > the caller is wrong in not living up to this requirement. My guess is the > former, but I defer to Bob in case he knows something about this code that I > don't. Thanks for looking at this, I don't know of anything about this code that would change your opinion. Do you think this could be confusion between 32-bit/64-bit, parent/child during migration, it seemed to spike in Beta 9? I guess maybe not given that answer always seems to be a stack allocated local variable in the child, so maybe this spike is just down to running more 64-bit versions. Looking at the SANDBOX_FAILED_LAUNCH telemetry, we do seem to be getting a lot more launch errors for 64-bit as well.
Flags: needinfo?(bobowencode) → needinfo?(dmajor)
(In reply to Bob Owen (:bobowen) from comment #4) > Do you think this could be confusion between 32-bit/64-bit, parent/child > during migration, it seemed to spike in Beta 9? > I guess maybe not given that answer always seems to be a stack allocated > local variable in the child, so maybe this spike is just down to running > more 64-bit versions. My first guess would have been that in 56b9 something tickled the compiler into using `movaps` instructions where it didn't before, but my disassembly of 56b8 says this is not true. My second guess would be that until 56b8 the pointer math worked out such that rbx _was_ always aligned in practice. Do you know how to exercise this code? Could you try breaking on the movaps in 56b8 and see what your rbx looks like and how it was calculated?
Flags: needinfo?(dmajor)
> My second guess would be that until 56b8 the pointer math worked out such
> that rbx _was_ always aligned in practice.
Hm, actually, this isn't true either. The crash has been seen on rare occasion going back to version 55 nightlies.
Maybe it's right that it's more correlated with the win64 rollout than the actual contents of the builds.
I don't have access to memory dumps but it'd be interesting to know what the stack frames/addresses look like. IIRC the stack must be aligned to 16 bytes at function calls on Win64. Maybe someone lower in the stack messed this up and SharedMemIPCClient is merely the first thing that falls over when we're unaligned.
Comment 8•7 years ago
|
||
So, it appears to be at that weird line in the stacks (for example from the crash referenced in comment 3): 12 @0x6e0012 That's where the addresses start ending with 8.
Flags: needinfo?(dmajor)
(In reply to Bob Owen (:bobowen) from comment #8) > So, it appears to be at that weird line in the stacks (for example from the > crash referenced in comment 3): > > 12 @0x6e0012 > > That's where the addresses start ending with 8. If the callee of BaseThreadInitThunk is some random hex address then this sounds like someone is injecting code/threads into our process. :-( Any chance the stack still contains the name of the DLL they're trying to load? (Just to confirm that it's something shady.) And I'd be curious to see what `!address 0x6e0012` says -- we might be able to expand our patched_BaseThreadInitThunk code to 64-bit builds.
Flags: needinfo?(dmajor) → needinfo?(bobowencode)
Comment 10•7 years ago
|
||
I don't see anything in the stack (although I'm not all that familiar with poking around in it). In the dump for that crash windbg actually says 0x6e0013 0:031> !address 0x6e0013 Mapping file section regions... Mapping module regions... Mapping PEB regions... Mapping TEB and stack regions... *** ERROR: Symbol file could not be found. Defaulted to export symbols for mswsock.dll - Mapping heap regions... Mapping page heap regions... Mapping other regions... Mapping stack trace database regions... Mapping activation context regions... Usage: <unknown> Base Address: 00000000`006e0000 End Address: 00000000`006e1000 Region Size: 00000000`00001000 ( 4.000 kB) State: 00001000 MEM_COMMIT Protect: 00000040 PAGE_EXECUTE_READWRITE Type: 00020000 MEM_PRIVATE Allocation Base: 00000000`006e0000 Allocation Protect: 00000040 PAGE_EXECUTE_READWRITE Content source: 0 (invalid), length: 5b1f9b5
Flags: needinfo?(bobowencode)
Comment 11•7 years ago
|
||
Interestingly in the crash dump for this report [1], when I trace the stack it tries to load symbols for snxhk64.dll which apparently is an Avast DLL. [1] https://crash-stats.mozilla.com/report/index/4b3b1ea4-9152-461d-8d37-33b690170912
![]() |
||
Comment 12•7 years ago
|
||
(In reply to Bob Owen (:bobowen) from comment #10) > Protect: 00000040 PAGE_EXECUTE_READWRITE It sounds like this thread would be blocked if we expand the patched_BaseThreadInitThunk filtering to 64-bits (i.e. remove ifdef X86) because it fails the PAGE_EXECUTE_READ requirement at ShouldBlockThread: https://dxr.mozilla.org/mozilla-central/rev/f9a5e9ed62103c84e4cde915f4d08f1ce71be83e/mozglue/build/WindowsDllBlocklist.cpp#782,784,794
![]() |
||
Comment 13•7 years ago
|
||
> It sounds like this thread would be blocked if we expand the
> patched_BaseThreadInitThunk filtering to 64-bits (i.e. remove ifdef X86)
Do you want to take on this patch? I can review.
I'm worried about uplifting this to 56 so near release though. The 32-bit version of this code didn't cause any major regressions, but touching these codepaths is always risky. We may have to be prepared to pull it if it causes more problems than it solves.
Comment 14•7 years ago
|
||
(In reply to David Major [:dmajor] from comment #13) > > It sounds like this thread would be blocked if we expand the > > patched_BaseThreadInitThunk filtering to 64-bits (i.e. remove ifdef X86) > > Do you want to take on this patch? I can review. > > I'm worried about uplifting this to 56 so near release though. The 32-bit > version of this code didn't cause any major regressions, but touching these > codepaths is always risky. We may have to be prepared to pull it if it > causes more problems than it solves. Thanks for your help dmajor,
Whiteboard: sb?
Comment 15•7 years ago
|
||
(posted too soon) ... Thanks for your help dmajor, sounds like this would be a useful change. Not sure over uplifting, as this crash doesn't seem too bad and has dropped since Fx56b9, but maybe we should try and get it into 57 though.
Comment 16•7 years ago
|
||
I don't know if this is related, but in Lenovo DLL bug 1369361, we are seeing a problem where the 64-bit Lenovo DLLs are not blocked even though they have been on the DLL blocklist since Firefox 36 bug 1123778. Those crash reports have an annotation saying "BlocklistInitFailed: 1". That annotation points to a LdrLoadDll detour that dmajor added four years ago (Firefox 27 bug 951827).
Blocks: win64-migration
status-firefox-esr52:
--- → wontfix
![]() |
||
Comment 17•7 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #16) > I don't know if this is related, but in Lenovo DLL bug 1369361, we are > seeing a problem where the 64-bit Lenovo DLLs are not blocked even though > they have been on the DLL blocklist since Firefox 36 bug 1123778. I think the root cause of bug 1369361 is "something external is breaking our blocklist" which means it's unrelated to Lenovo software per se. Those crashes are merely fallout: we're unable to block Lenovo because we're unable to block anyone at all. And this bug is about thread injection so it's probably unrelated as well. > Those > crash reports have an annotation saying "BlocklistInitFailed: 1". That > annotation points to a LdrLoadDll detour that dmajor added four years ago > (Firefox 27 bug 951827). FTR the LdrLoadDll hook existed long before that. In bug 951827 we merely tweaked it.
![]() |
||
Updated•7 years ago
|
Priority: -- → P3
Whiteboard: sb? → sb+
Comment 18•7 years ago
|
||
(In reply to David Major [:dmajor] from comment #13) > > It sounds like this thread would be blocked if we expand the > > patched_BaseThreadInitThunk filtering to 64-bits (i.e. remove ifdef X86) > > Do you want to take on this patch? I can review. > > I'm worried about uplifting this to 56 so near release though. The 32-bit > version of this code didn't cause any major regressions, but touching these > codepaths is always risky. We may have to be prepared to pull it if it > causes more problems than it solves. Bob, do you plan to work on this BaseThreadInitThunk patch for 58? There are about 500 crash reports from Beta 57. We started to migrate 1% of 32-bit Firefox 56.0 users to 64-bit 56.0.1 yesterday, so we will probably see more of these crashes on the Release channel.
status-firefox58:
--- → affected
Flags: needinfo?(bobowencode)
Comment 19•7 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #18) ... > Bob, do you plan to work on this BaseThreadInitThunk patch for 58? > > There are about 500 crash reports from Beta 57. We started to migrate 1% of > 32-bit Firefox 56.0 users to 64-bit 56.0.1 yesterday, so we will probably > see more of these crashes on the Release channel. Yes I'm already working on bug 1372823. I tried to land it once, but it causes intermittent oranges to become permanent, probably because of underlying races. The first of those was fairly easy to fix because it was just an incorrect assertion the next one is a little harder to track down.
Flags: needinfo?(bobowencode)
Updated•7 years ago
|
Comment 20•7 years ago
|
||
¡Hola Bob! Ended up here from https://support.mozilla.org/questions/1179750 where the user is crashing like https://crash-stats.mozilla.com/report/index/8406e2a8-f9b0-41f7-9986-9e9ce1171016 in Release Version 56.0.1 Build ID 20171002220106 What shall I suggest there? Reinstall 32-bit Firefox? ¡Gracias! Alex
Flags: needinfo?(bobowencode)
Comment 21•7 years ago
|
||
(In reply to alex_mayorga from comment #20) ... > What shall I suggest there? > > Reinstall 32-bit Firefox? Either that or find a different anti-virus, it looks like it is Avast injecting a thread to load their DLL that is causing the crash. They are messing up the alignment in the call stack. It would be good if they could report the issue to Avast. I think that the fix landed in bug 1372823 would stop it, but we're not going to uplift this to 57 at this stage.
Flags: needinfo?(bobowencode)
Comment 22•7 years ago
|
||
There are still some crash reports after BaseThreadInitThunk bug 1372823 landed in Nightly 58, but the volume is very low. There have only been 97 reports from Beta 58 and 6 from Nightly 59.
status-firefox59:
--- → affected
Summary: Crash in sandbox::SharedMemIPCClient::DoCall → Crash in sandbox::SharedMemIPCClient::DoCall (Avast)
Comment 23•3 years ago
|
||
Hello! I am closing this issue with RESOLVED->WORKSFORME since there where no changes made and there is no recent crash signatures in the past 6 months.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•