Closed
Bug 1191492
Opened 9 years ago
Closed 9 years ago
AddressSanitizer: heap-buffer-overflow during incremental GC
Categories
(Core :: JavaScript: GC, defect)
Tracking
()
RESOLVED
FIXED
mozilla42
Tracking | Status | |
---|---|---|
firefox40 | --- | unaffected |
firefox41 | --- | unaffected |
firefox42 | --- | fixed |
firefox-esr38 | --- | unaffected |
b2g-v2.0 | --- | unaffected |
b2g-v2.0M | --- | unaffected |
b2g-v2.1 | --- | unaffected |
b2g-v2.1S | --- | unaffected |
b2g-v2.2 | --- | unaffected |
b2g-v2.2r | --- | unaffected |
b2g-master | --- | fixed |
People
(Reporter: bc, Assigned: bzbarsky)
References
(Blocks 1 open bug)
Details
(Keywords: csectype-bounds, regression, sec-high, Whiteboard: [asan][post-critsmash-triage][b2g-adv-main2.5-])
Attachments
(3 files)
1. Use Spider to scan cnn.com or load ~20 urls from cnn.com manually with Flash disabled or click-to-play (see Bug 1191489). ;-)
See the attached log for the urls loaded in this scan. The actual crashing url will change depending on the phase of the moon, etc.
2. AddressSanitizer: heap-buffer-overflow on address 0x6170005da608 at pc 0x7fa71f769ebb bp 0x7ffcfea05450 sp 0x7ffcfea05448
WRITE of size 8 at 0x6170005da608 thread T0
#0 0x7fa71f769eba in remove /builds/slave/m-cen-l64-asan-000000000000000/build/src/obj-firefox/js/src/../../dist/include/mozilla/LinkedList.h:208
#1 0x7fa71f769eba in unboxedLayout /builds/slave/m-cen-l64-asan-000000000000000/build/src/js/src/vm/UnboxedObject.cpp:278
#2 0x7fa71f769eba in js::ObjectGroup::sweep(js::AutoClearTypeInferenceStateOnOOM*) /builds/slave/m-cen-l64-asan-000000000000000/build/src/js/src/vm/TypeInference.cpp:4119
Reporter | ||
Comment 1•9 years ago
|
||
fyi, this was on nightly only and not beta or aurora
Comment 2•9 years ago
|
||
bc: is this a result from the new Bughunter+ASAN combination?
Flags: needinfo?(bob)
Reporter | ||
Comment 3•9 years ago
|
||
Not yet. It is the result of me testing running asan builds and trying to figure out the steps required to add support to Bughunter. I figured I'd file bugs as I see them during my experimenting rather than wait. One thing for sure I've learned is the need for Bughunter to load multiple urls in order to find GC related issues but that will have to wait for the moment.
Flags: needinfo?(bob)
Reporter | ||
Updated•9 years ago
|
Attachment #8643914 -
Attachment mime type: text/x-log → text/plain
Comment 4•9 years ago
|
||
Bug 1191465 is another report of similar-sounding GC sweeping crashes.
Updated•9 years ago
|
Flags: needinfo?(terrence)
Comment 5•9 years ago
|
||
The stack in comment 0 involves unboxed object, but of course that could be incidental.
Comment 6•9 years ago
|
||
This stack is right in the middle of UnboxedObject, which might implicate fdf5862a8c00 as well. Flagging Brian to see if he can spot anything.
Flags: needinfo?(bhackett1024)
Comment 7•9 years ago
|
||
I'm marking this sec-high because whatever is happening it seems very discoverable, and the GC is involved so who knows what sort of badness is happening.
Keywords: sec-high
Comment 8•9 years ago
|
||
Bob, would it be possible for you to bisect this a little using inbound builds? Thanks.
Flags: needinfo?(bob)
Reporter | ||
Comment 9•9 years ago
|
||
fyi, I've started bisecting and decided to make sure I could still reproduce with the same build I used yesterday (2015-08-05) and found this new error which is still GC sweep related but different from what I found yesterday. I should have an inbound range soon.
Comment 10•9 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #9)
> Created attachment 8644315 [details]
> new error with today's cnn
>
> fyi, I've started bisecting and decided to make sure I could still reproduce
> with the same build I used yesterday (2015-08-05) and found this new error
> which is still GC sweep related but different from what I found yesterday. I
> should have an inbound range soon.
\o/
With a signature that moves around this much, is hard to repro locally, and has no obvious candidates in the regression range, bisection is basically the only way we're ever going to find the cause.
Reporter | ||
Comment 11•9 years ago
|
||
fwiw, on my bisection I saw only the original failure.
Good
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64-asan/1438624016/
Bad
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64-asan/1438624017/
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=8ad982618f06&tochange=2f16fb18314a
-> Bug 1181908
bz is on vacation, so can someone else decide if this makes sense?
Flags: needinfo?(bob)
Comment 12•9 years ago
|
||
Not a ton, but it's as good a candidate as anything else in our short list and /much/ more likely than the majority of patches that don't touch spidermonkey at all.
I vote that we back out and see if the crashes go away on tip.
Flags: needinfo?(terrence)
Comment 13•9 years ago
|
||
Bug 1181908 has been backed out from m-c and I'll be triggering new nightlies shortly. Will hold off on resolving this bug until we know for sure that it's fixed.
Reporter | ||
Comment 14•9 years ago
|
||
Ryan, is http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64-asan/1438882244/ the correct build? I can still reproduce the asan error with it.
Flags: needinfo?(ryanvm)
Reporter | ||
Comment 16•9 years ago
|
||
Ok, Under the assumption that the bisection using a 1-level deep scan of cnn.com was insufficient to deterministically find the crash I'll start a bisection using two-level scan. This will take a some time however.
Reporter | ||
Comment 17•9 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d66d293b4498 did not help where I backed out http://hg.mozilla.org/mozilla-central/rev/fdf5862a8c00 still reproduced the error. I'll continue to bisection again.
Reporter | ||
Comment 18•9 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #17)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=d66d293b4498 did not
> help where I backed out
> http://hg.mozilla.org/mozilla-central/rev/fdf5862a8c00 still reproduced the
> error. I'll continue to bisection again.
I think I tested the wrong build here. I redownloaded and retested with a 1-level scan and *did not* reproduce the error. I'll do several more then a 2-level scan to confirm but it does look like reverting http://hg.mozilla.org/mozilla-central/rev/fdf5862a8c00 fixed the issue.
Reporter | ||
Comment 19•9 years ago
|
||
Running the 1-level scan 10 times did not reproduce this error though it did find another unrelated error for HashKey. reverting http://hg.mozilla.org/mozilla-central/rev/fdf5862a8c00 does fix this bug.
I backed that out in https://hg.mozilla.org/mozilla-central/rev/d6ea652c5799
I'll leave it to you to determine if that's enough to close this bug. :)
Flags: needinfo?(bob)
Comment 21•9 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #19)
> Running the 1-level scan 10 times did not reproduce this error though it did
> find another unrelated error for HashKey. reverting
> http://hg.mozilla.org/mozilla-central/rev/fdf5862a8c00 does fix this bug.
Yeah, that makes a lot more sense given the stacks.
Reporter | ||
Comment 22•9 years ago
|
||
I tested with the build in http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64-asan/latest/ from http://hg.mozilla.org/mozilla-central/rev/3e51753a099f and still can reproduce this.
Reporter | ||
Comment 23•9 years ago
|
||
fwiw, I tried and failed to get a local build of asan so I could bisect locally and am going back to bisecting inbound asan builds from tinderbox again in the hope I can get the real regressor. It will take a while though, so don't wait up.
Reporter | ||
Comment 24•9 years ago
|
||
I confirmed the regression range by testing by scanning cnn 1 level deep 10 times for each iteration that the regression range is:
Good
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64-asan/1438624016/
Bad
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64-asan/1438624017/
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=8ad982618f06&tochange=2f16fb18314a
-> Bug 1181908
which mccr8 confirmed in bug 1191465. I don't know why I failed to confirm this with try builds.
Flags: needinfo?(bob)
Comment 25•9 years ago
|
||
Regressor has been identified, so I'm clearing the ni? on bhackett.
Jesse and Decoder, we had a massive regression (this and all of the many crashes in bug 1191465) from bug 1181908, apparently caused by content JS. It would be good to figure out why none of the fuzzers seemed to have detected this. That other bug messes with JS compiler options, so maybe something more needs to be exposed to the fuzzers? Of course, it will be easier to figure out once somebody knows why that bug caused these crashes.
Blocks: 1181908
Flags: needinfo?(bhackett1024)
Updated•9 years ago
|
Flags: needinfo?(choller)
Updated•9 years ago
|
Flags: needinfo?(jruderman)
Reporter | ||
Comment 26•9 years ago
|
||
I retested scanning cnn 1 level deep with today's mozilla central asan build and could not reproduce the error. fixed by the backout of Bug 1181908. We could probably unhide this and dupe to Bug 1181908 if you want.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 27•9 years ago
|
||
I have found this crash with fuzzing as well and I remember trying to get a proper test for it. The problem was that the test was very intermittent and failed most reduction attempts. I have 29 reports for this particular bug in FuzzManager but none of them reproduced properly for me. Maybe we could improve this somehow if we figure out why it's so hard to hit.
Flags: needinfo?(choller)
Comment 28•9 years ago
|
||
DOMFuzz is partially offline at the moment. Did jsfunfuzz hit this, and did it create any reduced testcases?
Flags: needinfo?(jruderman) → needinfo?(gary)
(In reply to Jesse Ruderman from comment #28)
> DOMFuzz is partially offline at the moment. Did jsfunfuzz hit this, and did
> it create any reduced testcases?
Nope, certainly not on a large enough volume to generate notice.
Flags: needinfo?(gary)
Comment 30•9 years ago
|
||
Can we get this TSan'd? If it's that intermittent, it might be a race.
Flags: needinfo?(nfroyd)
Comment 31•9 years ago
|
||
For what it is worth, bug 1191628 is a decoder-reported intermittent test case that is almost certainly another version of this issue. I'm not sure how hard it would be to run TSan on the shell.
Comment 32•9 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #30)
> Can we get this TSan'd? If it's that intermittent, it might be a race.
It looks like we know what's going on here without TSan's involvement.
Flags: needinfo?(nfroyd)
Updated•9 years ago
|
Assignee: nobody → bzbarsky
status-b2g-v2.0:
--- → unaffected
status-b2g-v2.0M:
--- → unaffected
status-b2g-v2.1:
--- → unaffected
status-b2g-v2.1S:
--- → unaffected
status-b2g-v2.2:
--- → unaffected
status-b2g-v2.2r:
--- → unaffected
status-b2g-master:
--- → fixed
status-firefox-esr38:
--- → unaffected
Target Milestone: --- → mozilla42
Updated•9 years ago
|
Group: core-security → core-security-release
Updated•9 years ago
|
Whiteboard: [asan] → [asan][post-critsmash-triage]
Updated•9 years ago
|
Group: core-security-release
Updated•9 years ago
|
Whiteboard: [asan][post-critsmash-triage] → [asan][post-critsmash-triage][b2g-adv-main2.5-]
Updated•8 years ago
|
Keywords: csectype-bounds
Updated•5 years ago
|
Blocks: asan-maintenance
You need to log in
before you can comment on or make changes to this bug.
Description
•