Closed
Bug 1264300
Opened 9 years ago
Closed 9 years ago
Rooting hazard with Maybe<AutoClearTypeInferenceStateOnOOM> in ObjectGroup::sweep
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla49
People
(Reporter: jandem, Assigned: jonco)
References
(Blocks 1 open bug)
Details
(Keywords: sec-high, Whiteboard: [post-critsmash-triage][adv-main47+])
Attachments
(3 files, 4 obsolete files)
(deleted),
patch
|
sfink
:
review+
abillings
:
approval-mozilla-aurora+
abillings
:
approval-mozilla-beta+
abillings
:
sec-approval+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
jonco
:
review+
abillings
:
sec-approval+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
jonco
:
review+
Sylvestre
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
decoder found a hard-to-reduce OOM crash. I think what's happening is that ObjectGroup::sweep does:
Maybe<AutoClearTypeInferenceStateOnOOM> fallbackOOM;
The rooting analysis doesn't know the destructor of this thing can GC, so various callers can call this with unrooted pointers on the stack.
Reporter | ||
Comment 1•9 years ago
|
||
Note that we have some (top)crashes in ObjectGroup::sweep and related code, so maybe fixing this will help there as well.
Reporter | ||
Comment 2•9 years ago
|
||
sfink, is there anything we can do to make the analysis understand the situation in comment 0, and related footguns?
Flags: needinfo?(sphink)
Reporter | ||
Comment 3•9 years ago
|
||
Hm, the GC stack looks like this:
* ~Maybe
* ~AutoClearTypeInferenceStateOnOOM
* Zone::discardJitCode
* ZoneCellIter::ZoneCellIter
* evictNursery
...
I'm not sure if it's caused by the Maybe<>, mrgiggles tells me Zone::discardJitCode and ~AutoClearTypeInferenceStateOnOOM also cannot GC, but maybe that's just because all callers currently suppress GC or something?
Reporter | ||
Updated•9 years ago
|
status-firefox46:
--- → affected
status-firefox47:
--- → affected
tracking-firefox46:
--- → ?
tracking-firefox47:
--- → ?
Too late for a fix for 46, tracking for 47+ since it's rated sec-high.
Comment 5•9 years ago
|
||
Ok, this is a weird one. There was a bug in the analysis where it lost track of the GC call possibility (due to an internally-generated destructor). I fixed that, and immediately got hazard reports for something that looks very much like this. But when I looked closer, it seemed to me like it was a false positive -- the things the analysis were worried about actually passed in nullptr for the AutoClearTypeInferenceStateOnOOM* parameter, so in actual practice it should never be able to GC. I "fixed" it by telling the analysis that when that oom* parameter was nullptr, that it wouldn't GC. But this bug suggests that the annotation is incorrect. (In a debug build, this annotation will now cause an assert crash if it really does GC, but given that it may only happen on OOM I'm skeptical that will help.)
Can I see a more complete crash stack? I want to know if this was from a pre write barrier, or a GC. Is SweepArenaList in the stack or not?
Flags: needinfo?(sphink) → needinfo?(jdemooij)
Reporter | ||
Comment 6•9 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #5)
> But when I looked closer, it seemed to me like it was a
> false positive -- the things the analysis were worried about actually passed
> in nullptr for the AutoClearTypeInferenceStateOnOOM* parameter, so in actual
> practice it should never be able to GC.
I think most of these functions construct an AutoClearTypeInferenceStateOnOOM "on demand" if their |oom| argument is nullptr. For the case I was looking at we did it here I think:
https://dxr.mozilla.org/mozilla-central/rev/6adc822f5e27a55551faeb6c47a9bd8b0859a23b/js/src/vm/TypeInference.cpp#4190-4191
See also the stack in comment 3, the ~Maybe was the one in ObjectGroup::sweep IIRC. I don't have the test handy but I can try to reproduce again on Linux if really needed.
Flags: needinfo?(jdemooij) → needinfo?(sphink)
Comment 7•9 years ago
|
||
Ok, seems like the stack the analysis was reporting wasn't possible in practice, but the stack here is.
It starts at NativeObject.cpp's UpdateShapeTypeAndValue which calls ObjectGroup::newScript() which ends up calling ObjectGroup::sweep(), which creates an on-demand AutoClearTypeInferenceStateOnOOM (ugh; I missed that). In its destructor, it can call discardJitCode, which makes a ZoneCellIter, which evicts the nursery. :(
In this particular case, we don't *need* to evict the nursery, because we're only iterating over scripts which are not currently nursery-allocated.
Flags: needinfo?(sphink)
Assignee | ||
Comment 8•9 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #7)
This has come about because I changed Zone::discardJitCode to use ZoneCellIter rather than ZoneCellIterUnderGC in bug 1244412 (it's not always called under GC). Previously this OOM situation would have resulted in an assertion failure in debug builds, but would have worked (I think) in release builds.
We can fix ZoneCellIter to not evict the nursery unnecessarily which is worth doing for its own sake. But it sounds like we should also fix the analysis as well.
Assignee | ||
Comment 9•9 years ago
|
||
Only evict the nursery when iterating nursery allocated things.
Attachment #8748106 -
Flags: review?(sphink)
Comment 10•9 years ago
|
||
Comment on attachment 8748106 [details] [diff] [review]
bug1264300-dont-evict-unnecessarily
Review of attachment 8748106 [details] [diff] [review]:
-----------------------------------------------------------------
Heh. This is what I finally figured out that we'd want, and then I saw your review request. Perfect!
I think I'll probably tell the analysis about this by doing an AutoSuppressGCAnalysis higher up in the stack, when we still know IsNurseryAllocable is going to be false. But I'm still trying to fix the hazard bug that is once again preventing it from reporting this problem.
Thanks, Jon!
Attachment #8748106 -
Flags: review?(sphink) → review+
Assignee | ||
Comment 11•9 years ago
|
||
Comment on attachment 8748106 [details] [diff] [review]
bug1264300-dont-evict-unnecessarily
[Security approval request comment]
How easily could an exploit be constructed based on the patch? Difficult and would require triggering OOM at just the right moment.
Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No.
Which older supported branches are affected by this flaw? Aurora only.
If not all supported branches, which bug introduced the flaw? Bug 1244412.
Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Same patch should apply.
How likely is this patch to cause regressions; how much testing does it need? Unlikely.
Attachment #8748106 -
Flags: sec-approval?
Assignee | ||
Comment 12•9 years ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #11)
> Which older supported branches are affected by this flaw? Aurora only.
It seems the regressing patch has been uplifted to beta too now.
Comment 13•9 years ago
|
||
Comment on attachment 8748106 [details] [diff] [review]
bug1264300-dont-evict-unnecessarily
Approval Request Comment: see comments 11 and 12.
[Feature/regressing bug #]: Bug 1244412
[User impact if declined]: theoretically exploitable UAF-type thing
[Describe test coverage new/current, TreeHerder]: this is probably showing up as an intermittent orange. There's an unreducible fuzz test that hits it.
[Risks and why]: low risk. It avoids doing some unnecessary and dangerous work, but it also asserts that the work was not needed.
[String/UUID change made/needed]: none
Attachment #8748106 -
Flags: approval-mozilla-beta?
Attachment #8748106 -
Flags: approval-mozilla-aurora?
Comment 14•9 years ago
|
||
sec-approval+ for trunk. I'm approving for others since it is early in the cycle.
Updated•9 years ago
|
Attachment #8748106 -
Flags: sec-approval?
Attachment #8748106 -
Flags: sec-approval+
Attachment #8748106 -
Flags: approval-mozilla-beta?
Attachment #8748106 -
Flags: approval-mozilla-beta+
Attachment #8748106 -
Flags: approval-mozilla-aurora?
Attachment #8748106 -
Flags: approval-mozilla-aurora+
Assignee | ||
Updated•9 years ago
|
Keywords: leave-open
Assignee | ||
Comment 15•9 years ago
|
||
Comment 16•9 years ago
|
||
Assignee: nobody → jcoppeard
status-firefox-esr45:
--- → unaffected
Comment 17•9 years ago
|
||
Target Milestone: --- → mozilla49
Comment 18•9 years ago
|
||
has problems uplifting to beta:
grafting 343376:7229e6a595af "Bug 1264300 - Don't evict the nursery unnecessarily in ZoneCellIter r=sfink a=abillings"
merging js/src/jsgcinlines.h
warning: conflicts while merging js/src/jsgcinlines.h! (edit, then use 'hg resolve --mark')
Flags: needinfo?(jcoppeard)
Assignee | ||
Comment 19•9 years ago
|
||
(In reply to Carsten Book [:Tomcat] from comment #18)
The patch depends on subsequent changes made in bug 1259042. I'll request uplift for that too.
Depends on: 1259042
Flags: needinfo?(jcoppeard)
Assignee | ||
Comment 20•9 years ago
|
||
This also depends on the changes in bug 1263572.
Depends on: 1263572
Assignee | ||
Comment 21•9 years ago
|
||
So it seems like we do need to do a minor GC when discarding JIT code after all (see bug 1271110). Is the issue in this bug confined to UpdateShapeTypeAndValue? If so it seem like we should just fix the rooting there.
Flags: needinfo?(sphink)
Comment 22•9 years ago
|
||
It is not constrained to that function.
Right now, it leaks out into all pre barriers on anything at all. But I think that one is relatively easy to resolve.
I'm not sure how many other things are affected. I'll run it through the analysis.
Comment 23•9 years ago
|
||
You're right about that function, though; it is trivially rootable. (The only caller already has Rooteds, so it's not even adding any roots.)
I snuck that into the bug 1259850 patch, but it doesn't change anything here.
One reason that ~AutoClearTypeInferenceStateOnOOM needs to GC is that discardJitCode() needs to empty the store buffer (and it now does, again, as of bug 1271110.) Bug 1272167 seems to suggest that we might be able to do away with that requirement if we can use RelocatablePtr for everything stored in ICs, and run their destructors. Or jandem mentions allocating from the "per-Zone allocator" over there, but I don't understand what that means yet. Perhaps that makes it easier to run the destructors for just the types we need?
Alternatively, we could make yet another Ptr thing that modifies the post barriers to always insert the whole JSScript into the store buffer instead of pointing into the IonScript/BaselineScript. UnstablePtr or something. DontLookAtMePtr.
Flags: needinfo?(sphink)
Updated•9 years ago
|
Blocks: GC.stability
Assignee | ||
Comment 24•9 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #23)
Another approach would be to make BaselineScript a GC thing. This might actually be the simplest fix too.
Assignee | ||
Comment 25•9 years ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #24
...and an even simpler approach might be delay freeing the memory until after the next minor GC, by putting it on a list somewhere.
Assignee | ||
Comment 26•9 years ago
|
||
The problem is that there can be storebuffer entries pointing into the fallback stub space and we don't run any destructors when freeing this space. This patch defers freeing the memory allocated to the fallback stub space until after the next minor GC, unless we are already collecting.
If you think this is OK I'll r? jandem too
Attachment #8754418 -
Flags: review?(sphink)
Assignee | ||
Comment 27•9 years ago
|
||
Ditto for IonScript. Also removes the problematic calls to evictNursery.
Attachment #8754419 -
Flags: review?(sphink)
Assignee | ||
Comment 28•9 years ago
|
||
Comment on attachment 8754418 [details] [diff] [review]
bug1264300-baseline-script
So this causes xpcshell tests to crash while marking the storebuffer... cancelling review.
Attachment #8754418 -
Flags: review?(sphink)
Assignee | ||
Updated•9 years ago
|
Attachment #8754419 -
Flags: review?(sphink)
Assignee | ||
Comment 29•9 years ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #28)
I fixed the xpcshell problem but now I get mochitest failures like "Timed out while polling clipboard for pasted data"... on Windows 7 only.
Comment 30•9 years ago
|
||
That's not your fault. That's from Win7 testing on AWS that's enabled on Try only at the moment.
Assignee | ||
Comment 31•9 years ago
|
||
Updated patches.
Attachment #8754418 -
Attachment is obsolete: true
Attachment #8754419 -
Attachment is obsolete: true
Attachment #8755482 -
Flags: review?(sphink)
Assignee | ||
Comment 32•9 years ago
|
||
Attachment #8755483 -
Flags: review?(sphink)
Comment 33•9 years ago
|
||
Comment on attachment 8755482 [details] [diff] [review]
bug1264300-baseline-script v2
Review of attachment 8755482 [details] [diff] [review]:
-----------------------------------------------------------------
Whoa, ok, given the existing machinery to postpone freeing until after sweeping, this fits in remarkably cleanly. This looks pretty solid.
Attachment #8755482 -
Flags: review?(sphink) → review+
Comment 34•9 years ago
|
||
Comment on attachment 8755483 [details] [diff] [review]
bug1264300-ion-script v2
Review of attachment 8755483 [details] [diff] [review]:
-----------------------------------------------------------------
It's so nice to see all those "evict nursery here because of complicated action at a distance" calls go.
Attachment #8755483 -
Flags: review?(sphink) → review+
https://hg.mozilla.org/releases/mozilla-beta/rev/cb27eacbe04a
(This was the original patch that failed to uplift to beta in comment 18. Unsure what that means for the status flags in here, and whether the new patch will also need to be uplifted.)
Flags: needinfo?(jcoppeard)
Assignee | ||
Comment 36•9 years ago
|
||
Comment on attachment 8755482 [details] [diff] [review]
bug1264300-baseline-script v2
Requesting additional review for JIT changes.
Attachment #8755482 -
Flags: review?(jdemooij)
Assignee | ||
Comment 37•9 years ago
|
||
Comment on attachment 8755483 [details] [diff] [review]
bug1264300-ion-script v2
Requesting additional review for JIT changes.
Attachment #8755483 -
Flags: review?(jdemooij)
Assignee | ||
Comment 38•9 years ago
|
||
(In reply to Wes Kocher (:KWierso) from comment #35)
We should uplift the new patches too.
Flags: needinfo?(jcoppeard)
Reporter | ||
Comment 40•9 years ago
|
||
Comment on attachment 8755482 [details] [diff] [review]
bug1264300-baseline-script v2
Review of attachment 8755482 [details] [diff] [review]:
-----------------------------------------------------------------
Nice idea. Thanks a lot for fixing this!
::: js/src/jit/BaselineJIT.h
@@ +607,5 @@
>
> +namespace JS {
> +
> +template <>
> +struct DeletePolicy<js::jit::BaselineScript>
Why do we need this DeletePolicy, I think that's mostly used for UniquePtr etc?
@@ +609,5 @@
> +
> +template <>
> +struct DeletePolicy<js::jit::BaselineScript>
> +{
> + DeletePolicy(JSRuntime* rt) : rt_(rt) {}
Nit: explicit
Attachment #8755482 -
Flags: review?(jdemooij) → review+
Reporter | ||
Updated•9 years ago
|
Attachment #8755483 -
Flags: review?(jdemooij) → review+
Assignee | ||
Comment 41•9 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #40)
> Comment on attachment 8755482 [details] [diff] [review]
> Why do we need this DeletePolicy, I think that's mostly used for UniquePtr
> etc?
Yes, BaselineCompiler::compile uses a UniquePtr while initialising a BaselineScript.
> Nit: explicit
Thanks, that would have been another static analysis build failure for sure.
Assignee | ||
Comment 42•9 years ago
|
||
Combined patch, carrying r+.
Attachment #8755482 -
Attachment is obsolete: true
Attachment #8755483 -
Attachment is obsolete: true
Attachment #8756334 -
Flags: review+
Assignee | ||
Comment 43•9 years ago
|
||
Comment on attachment 8756334 [details] [diff] [review]
bug1264300-defer-free-combined
[Security approval request comment]
How easily could an exploit be constructed based on the patch? I'd say very difficult, this requires triggering OOM while type information is being swept lazily.
Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No.
Which older supported branches are affected by this flaw? Everything back to 31.
If not all supported branches, which bug introduced the flaw? The problem this patch fixes goes all the way back to Bug 619558.
Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Backports should be straightforward.
How likely is this patch to cause regressions; how much testing does it need? Unlikely but we should let this bake for a little while before uplifting.
Attachment #8756334 -
Flags: sec-approval?
Comment 44•9 years ago
|
||
sec-approval+ for trunk. You should ask for Aurora approval. I don't think this should go to Beta since we're too close to release.
Updated•9 years ago
|
Attachment #8756334 -
Flags: sec-approval? → sec-approval+
Assignee | ||
Comment 45•9 years ago
|
||
Comment 46•9 years ago
|
||
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•8 years ago
|
Flags: qe-verify-
Whiteboard: [post-critsmash-triage]
Updated•8 years ago
|
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main47+]
Updated•8 years ago
|
Group: javascript-core-security → core-security-release
Assignee | ||
Comment 47•8 years ago
|
||
Approval Request Comment
[Feature/regressing bug #]: Bug 619558
[User impact if declined]: Potential crash / security vulnerability.
[Describe test coverage new/current, TreeHerder]: On central since 26th May.
[Risks and why]: Low.
[String/UUID change made/needed]: None
Attachment #8759123 -
Flags: review+
Attachment #8759123 -
Flags: approval-mozilla-aurora?
Comment 48•8 years ago
|
||
Attachment #8759123 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Assignee | ||
Comment 49•8 years ago
|
||
Updated•8 years ago
|
Group: core-security-release
Comment 51•7 years ago
|
||
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in
before you can comment on or make changes to this bug.
Description
•