Closed Bug 1191492 Opened 9 years ago Closed 9 years ago

AddressSanitizer: heap-buffer-overflow during incremental GC

Categories

(Core :: JavaScript: GC, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla42
Tracking Status
firefox40 --- unaffected
firefox41 --- unaffected
firefox42 --- fixed
firefox-esr38 --- unaffected
b2g-v2.0 --- unaffected
b2g-v2.0M --- unaffected
b2g-v2.1 --- unaffected
b2g-v2.1S --- unaffected
b2g-v2.2 --- unaffected
b2g-v2.2r --- unaffected
b2g-master --- fixed

People

(Reporter: bc, Assigned: bzbarsky)

References

(Blocks 1 open bug)

Details

(Keywords: csectype-bounds, regression, sec-high, Whiteboard: [asan][post-critsmash-triage][b2g-adv-main2.5-])

Attachments

(3 files)

Attached file cnn-nightly-asan-no-flash.log (deleted) —
1. Use Spider to scan cnn.com or load ~20 urls from cnn.com manually with Flash disabled or click-to-play (see Bug 1191489). ;-) See the attached log for the urls loaded in this scan. The actual crashing url will change depending on the phase of the moon, etc. 2. AddressSanitizer: heap-buffer-overflow on address 0x6170005da608 at pc 0x7fa71f769ebb bp 0x7ffcfea05450 sp 0x7ffcfea05448 WRITE of size 8 at 0x6170005da608 thread T0 #0 0x7fa71f769eba in remove /builds/slave/m-cen-l64-asan-000000000000000/build/src/obj-firefox/js/src/../../dist/include/mozilla/LinkedList.h:208 #1 0x7fa71f769eba in unboxedLayout /builds/slave/m-cen-l64-asan-000000000000000/build/src/js/src/vm/UnboxedObject.cpp:278 #2 0x7fa71f769eba in js::ObjectGroup::sweep(js::AutoClearTypeInferenceStateOnOOM*) /builds/slave/m-cen-l64-asan-000000000000000/build/src/js/src/vm/TypeInference.cpp:4119
fyi, this was on nightly only and not beta or aurora
bc: is this a result from the new Bughunter+ASAN combination?
Flags: needinfo?(bob)
Not yet. It is the result of me testing running asan builds and trying to figure out the steps required to add support to Bughunter. I figured I'd file bugs as I see them during my experimenting rather than wait. One thing for sure I've learned is the need for Bughunter to load multiple urls in order to find GC related issues but that will have to wait for the moment.
Flags: needinfo?(bob)
Attachment #8643914 - Attachment mime type: text/x-log → text/plain
Bug 1191465 is another report of similar-sounding GC sweeping crashes.
Flags: needinfo?(terrence)
The stack in comment 0 involves unboxed object, but of course that could be incidental.
This stack is right in the middle of UnboxedObject, which might implicate fdf5862a8c00 as well. Flagging Brian to see if he can spot anything.
Flags: needinfo?(bhackett1024)
I'm marking this sec-high because whatever is happening it seems very discoverable, and the GC is involved so who knows what sort of badness is happening.
Keywords: sec-high
Bob, would it be possible for you to bisect this a little using inbound builds? Thanks.
Flags: needinfo?(bob)
Attached file new error with today's cnn (deleted) —
fyi, I've started bisecting and decided to make sure I could still reproduce with the same build I used yesterday (2015-08-05) and found this new error which is still GC sweep related but different from what I found yesterday. I should have an inbound range soon.
(In reply to Bob Clary [:bc:] from comment #9) > Created attachment 8644315 [details] > new error with today's cnn > > fyi, I've started bisecting and decided to make sure I could still reproduce > with the same build I used yesterday (2015-08-05) and found this new error > which is still GC sweep related but different from what I found yesterday. I > should have an inbound range soon. \o/ With a signature that moves around this much, is hard to repro locally, and has no obvious candidates in the regression range, bisection is basically the only way we're ever going to find the cause.
Not a ton, but it's as good a candidate as anything else in our short list and /much/ more likely than the majority of patches that don't touch spidermonkey at all. I vote that we back out and see if the crashes go away on tip.
Flags: needinfo?(terrence)
Bug 1181908 has been backed out from m-c and I'll be triggering new nightlies shortly. Will hold off on resolving this bug until we know for sure that it's fixed.
Ryan, is http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64-asan/1438882244/ the correct build? I can still reproduce the asan error with it.
Flags: needinfo?(ryanvm)
Looks like it
Flags: needinfo?(ryanvm)
Ok, Under the assumption that the bisection using a 1-level deep scan of cnn.com was insufficient to deterministically find the crash I'll start a bisection using two-level scan. This will take a some time however.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d66d293b4498 did not help where I backed out http://hg.mozilla.org/mozilla-central/rev/fdf5862a8c00 still reproduced the error. I'll continue to bisection again.
(In reply to Bob Clary [:bc:] from comment #17) > https://treeherder.mozilla.org/#/jobs?repo=try&revision=d66d293b4498 did not > help where I backed out > http://hg.mozilla.org/mozilla-central/rev/fdf5862a8c00 still reproduced the > error. I'll continue to bisection again. I think I tested the wrong build here. I redownloaded and retested with a 1-level scan and *did not* reproduce the error. I'll do several more then a 2-level scan to confirm but it does look like reverting http://hg.mozilla.org/mozilla-central/rev/fdf5862a8c00 fixed the issue.
Running the 1-level scan 10 times did not reproduce this error though it did find another unrelated error for HashKey. reverting http://hg.mozilla.org/mozilla-central/rev/fdf5862a8c00 does fix this bug.
I backed that out in https://hg.mozilla.org/mozilla-central/rev/d6ea652c5799 I'll leave it to you to determine if that's enough to close this bug. :)
Flags: needinfo?(bob)
(In reply to Bob Clary [:bc:] from comment #19) > Running the 1-level scan 10 times did not reproduce this error though it did > find another unrelated error for HashKey. reverting > http://hg.mozilla.org/mozilla-central/rev/fdf5862a8c00 does fix this bug. Yeah, that makes a lot more sense given the stacks.
fwiw, I tried and failed to get a local build of asan so I could bisect locally and am going back to bisecting inbound asan builds from tinderbox again in the hope I can get the real regressor. It will take a while though, so don't wait up.
I confirmed the regression range by testing by scanning cnn 1 level deep 10 times for each iteration that the regression range is: Good http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64-asan/1438624016/ Bad http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64-asan/1438624017/ https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=8ad982618f06&tochange=2f16fb18314a -> Bug 1181908 which mccr8 confirmed in bug 1191465. I don't know why I failed to confirm this with try builds.
Flags: needinfo?(bob)
Regressor has been identified, so I'm clearing the ni? on bhackett. Jesse and Decoder, we had a massive regression (this and all of the many crashes in bug 1191465) from bug 1181908, apparently caused by content JS. It would be good to figure out why none of the fuzzers seemed to have detected this. That other bug messes with JS compiler options, so maybe something more needs to be exposed to the fuzzers? Of course, it will be easier to figure out once somebody knows why that bug caused these crashes.
Blocks: 1181908
Flags: needinfo?(bhackett1024)
Flags: needinfo?(choller)
Flags: needinfo?(jruderman)
I retested scanning cnn 1 level deep with today's mozilla central asan build and could not reproduce the error. fixed by the backout of Bug 1181908. We could probably unhide this and dupe to Bug 1181908 if you want.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
I have found this crash with fuzzing as well and I remember trying to get a proper test for it. The problem was that the test was very intermittent and failed most reduction attempts. I have 29 reports for this particular bug in FuzzManager but none of them reproduced properly for me. Maybe we could improve this somehow if we figure out why it's so hard to hit.
Flags: needinfo?(choller)
DOMFuzz is partially offline at the moment. Did jsfunfuzz hit this, and did it create any reduced testcases?
Flags: needinfo?(jruderman) → needinfo?(gary)
(In reply to Jesse Ruderman from comment #28) > DOMFuzz is partially offline at the moment. Did jsfunfuzz hit this, and did > it create any reduced testcases? Nope, certainly not on a large enough volume to generate notice.
Flags: needinfo?(gary)
Can we get this TSan'd? If it's that intermittent, it might be a race.
Flags: needinfo?(nfroyd)
For what it is worth, bug 1191628 is a decoder-reported intermittent test case that is almost certainly another version of this issue. I'm not sure how hard it would be to run TSan on the shell.
(In reply to Julian Seward [:jseward] from comment #30) > Can we get this TSan'd? If it's that intermittent, it might be a race. It looks like we know what's going on here without TSan's involvement.
Flags: needinfo?(nfroyd)
Group: core-security → core-security-release
Whiteboard: [asan] → [asan][post-critsmash-triage]
Group: core-security-release
Whiteboard: [asan][post-critsmash-triage] → [asan][post-critsmash-triage][b2g-adv-main2.5-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: