Closed
Bug 1287437
Opened 8 years ago
Closed 8 years ago
LeakSanitizer has encountered a fatal error
Categories
(Core :: Security: Process Sandboxing, defect)
Core
Security: Process Sandboxing
Tracking
()
RESOLVED
DUPLICATE
of bug 1287971
People
(Reporter: cbook, Unassigned)
References
(Blocks 1 open bug, )
Details
(Keywords: regression, Whiteboard: [MemShrink][sblc2])
we got a lot of noise recently on asan tests (nothing fatal or so, just riding along the error log when we got a failure)
https://treeherder.mozilla.org/logviewer.html#?job_id=10612543&repo=fx-team#L3256
02:37:04 INFO - ==2278==LeakSanitizer has encountered a fatal error.
02:37:08 INFO - -----------------------------------------------------
02:37:08 INFO - Suppressions used:
02:37:08 INFO - count bytes template
02:37:08 INFO - 40 986 libc.so
02:37:08 INFO - 836 26672 nsComponentManagerImpl
02:37:08 INFO - 52 7072 mozJSComponentLoader::LoadModule
02:37:08 INFO - 1 384 pixman_implementation_lookup_composite
02:37:08 INFO - 360 15936 libfontconfig.so
02:37:08 INFO - 1 32 libdl.so
02:37:08 INFO - 26 6492 libglib-2.0.so
02:37:08 INFO - 8 224 libresolv.so
02:37:08 INFO - -----------------------------------------------------
Reporter | ||
Comment 1•8 years ago
|
||
andrew: do you know where to look in hg what caused this ?
Flags: needinfo?(continuation)
Comment 2•8 years ago
|
||
(In reply to Carsten Book [:Tomcat] from comment #1)
> andrew: do you know where to look in hg what caused this ?
I'm not sure what you mean by that.
Anyways, this sounds kind of bad. I wonder if we're even actually checking for leaks right now. Did we bump the version of Clang we use for ASan builds recently? I'll try to look at logs to see when this started showing up.
I also wonder if the "WARNING - Can't figure out symbols_url from installer_url" is related.
Blocks: LSan
Updated•8 years ago
|
Product: Core → Testing
Reporter | ||
Comment 3•8 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #2)
> (In reply to Carsten Book [:Tomcat] from comment #1)
> > andrew: do you know where to look in hg what caused this ?
>
> I'm not sure what you mean by that.
>
oh i meant if a in-tree change could cause that and so something we can backout
Comment 4•8 years ago
|
||
Fortunately, it looks like a recent regression.
I don't see the fatal error in this push on m-c:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=ef5f932101e5b833b2429407cb0873471b4d764e
But I do see it in the next one:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=711963e8daa312ae06409f8ab5c06612cb0b8f7b
This is the set of changes that landed in the second one:
https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=711963e8daa312ae06409f8ab5c06612cb0b8f7b
Flags: needinfo?(continuation)
Comment 5•8 years ago
|
||
I'll try to figure out what regressed this.
We should also make the tree turn orange when this happens.
Assignee: nobody → continuation
Comment 6•8 years ago
|
||
I bisected down to this push:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=54e9af18d31426a6474d584add8f487d99848854
The only suspicious commit in that push is bug 1286324.
Jed, could you take a look at this? If you can't fix it soon, please consider backing out your patch, as people may be introducing new LSan leaks. I'm not sure if LSan is actually going to report anything or not. (It might just be dying after it would do the report or something.)
I'll file a separate bug for making this turn the tree orange and work on that.
Updated•8 years ago
|
Keywords: regression
Whiteboard: [MemShrink]
Updated•8 years ago
|
Component: General → Security: Process Sandboxing
Product: Testing → Core
Comment 7•8 years ago
|
||
I've confirmed locally that backing out bug 1287877 makes the LeakSanitizer fatal error message go away.
Comment 8•8 years ago
|
||
Maybe you could take a look at this, Julian, if Jed isn't around? Thanks.
Flags: needinfo?(julian.r.hector)
Comment 9•8 years ago
|
||
Jed's looking at it. Anyways, the bigger problem seems to be that we're not running LeakSanitizer in the content process. Jed's patch just made it so that we got some alert about it rather than silently failing.
Flags: needinfo?(julian.r.hector)
Comment 10•8 years ago
|
||
That patch changes the sandbox policy so that fork() fails with EPERM instead of just crashing, so anything reasonable that LSan was doing that could observe it would have already been broken before that patch. It's possible that mozharness wouldn't notice that kind of crash, because ASan builds have no crash reporter, but there would have been log messages on stderr (starting with "Sandbox: ") and I don't see any in the logs for the “before” m-c build.
One weird thing here is that if the sanitizer runtime managed to block SIGSYS, possibly by trying to block all signals (we have symbol interposition for sigprocmask and pthread_sigmask to force SIGSYS to stay unblocked for exactly this reason, but that wouldn't apply if it does the sigprocmask syscall directly) then the kernel will unblock the signal *and* reset its disposition before delivering it, which means the process will immediately exit — no log messages, no crash reporting, nothing. So that could result in weird breakage that wouldn't show up as a test failure or even be obvious to a human reading the logs.
But a look at the compiler-rt source doesn't show anything that might be doing this besides TSan, which is known to be incompatible with sandboxing for various reasons (and will disable it: bug 1182565).
Comment 11•8 years ago
|
||
Yeah, it looks like LSan is just not running at all on Nightly. I filed bug 1287971 for that.
Comment 12•8 years ago
|
||
This and bug 1287971 are going to have the same solution, it looks like: disable sandboxing if ASan (and therefore LSan) is used.
(Vague summary of bug 1287971 comment #9: it's the same syscall causing both of these bugs, and it's weirder than just a plain fork() but that doesn't really matter; bug 1287971 is that it immediately and silently killed the process (rather than crashing it very noisily as intended) because the last paragraph of comment #10 is wrong, and *this* bug is that it now fails in such a way that LSan is able to complain about it.)
Flags: needinfo?(jld)
Updated•8 years ago
|
Whiteboard: [MemShrink] → [MemShrink][sblc2]
Comment 13•8 years ago
|
||
Looking at some before/after failed test jobs on TH, this seems to have been fixed by https://hg.mozilla.org/mozilla-central/rev/8d2a4af272e3 as expected.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•