Closed Bug 1474659 Opened 6 years ago Closed 5 years ago

Store ArrayBuffer metadata in its own arena

Categories

(Core :: JavaScript Engine, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla71
Tracking Status
firefox71 --- fixed

People

(Reporter: mhowell, Assigned: tjr)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file, 3 obsolete files)

This bug was broken out from bug 1052582; see discussion over there, starting from bug 1052582 comment 18.
CC'ing jonco, as he has opinions about the direction here. I would like to point out other points in the design space. 1. Leave everything as-is, with arenas shared between inline ArrayBuffers and JSObjects (of matching size). 2. Take this patch as-is. 3. Stop using inline ArrayBuffer storage entirely. 4. This patch, but cut down on the number of AllocKinds (reduces wasted space in ArrayBuffer arenas, but increases the wasted space in the inline storage). It's midway between eliminating inline ArrayBuffers and this patch. 5. Implement another class of storage for GC-controlled memory that isn't just malloc buffers and switch the inline ArrayBuffer data to that. (This storage would presumably be used for other things as well.) 6. Mask all inline storage accesses to stay within their AllocKind's size class. (You'd want to prevent access to the length/offset/pointer header as well, so it might not be a totally simple mask? Or maybe it is, if it's an unsigned offset from the start of the data.) That last one is reminiscent of Spectre mitigations, and might be prohibitively expensive. In terms of security, we'd probably also want to consider how much we're gaining even from the Arena segregation -- if the attacker can get to an adjacent arena just as easily, with its own AllocKind, we might not win much unless we separate out the storage even more. But having large chunks (1MB?) dedicated only to ArrayBuffer data seems pretty expensive, so that could push us towards something more like #5.
(In reply to Steve Fink [:sfink] [:s:] from comment #3) Thanks for the write up Steve. I'm trying to understand the security implications here. I guess that ArrayBuffers are problematic because if you can corrupt the length you can use them to read/write other memory in the arena. Is that right? So the idea idea of putting them in their own arenas is to avoid a buffer overrun in a neighbouring JSObject of a different kind from overwriting the length. Alternatively a UAF could have the same effect. This patch would do that. (It wouldn't protect against an overrun or UAF of an ArrayBuffer, obviously). (It seems that a corrupted ArrayBuffer could access arbitrary memory, but it would be harder to do damage if the data was stored externally as you would have to work out where your target is relative to the data without crashing the VM.) I don't like the extra overhead this adds but the security part seems like a a good idea. Some questions: Do we know what the distribution of array buffer sizes is like? I'm wondering if we could get away with fewer array buffer kinds. Preferably two rather than four. Does this apply to other types of JSObject (e.g. TypedObjects, Arrays)? If so then are we going to have to do a similar thing for them? I don't think it's feasible to do this if there are going to be many cases.
(In reply to Jon Coppeard (:jonco) from comment #4) > Does this apply to other types of JSObject (e.g. TypedObjects, Arrays)? If > so then are we going to have to do a similar thing for them? I don't think > it's feasible to do this if there are going to be many cases. For UAF mitigations, ideally we would address all objects that allow control over the object in a way similar to arrays. (That is, the attacker can write easily controlled bytes at a specific offset while controlling the object's size and therefore memory placement.) So strings and array-like objects are easy initial targets. After that, the hitlist drops off dramatically. I believe Chrome needed to segment functions with multiple or variable arguments (or something like that?) that allowed this capability, but that was the only additional one they identified to mitigate?
Sorry about the delay getting back to this. I still want to do it, but I'm unclear on what the next steps need to be. (In reply to Jon Coppeard (:jonco) (PTO until 1st October) from comment #4) > Do we know what the distribution of array buffer sizes is like? I'm > wondering if we could get away with fewer array buffer kinds. Preferably > two rather than four. I don't know, and I'm not immediately sure how to find out, but I like the idea of using fewer AllocKinds, so I can try to get some numbers on this. Is that what needs to happen next here?
Priority: -- → P2
Flags: needinfo?(jcoppeard)
(In reply to Matt Howell [:mhowell] from comment #6) > I don't know, and I'm not immediately sure how to find out, but I like the > idea of using fewer AllocKinds, so I can try to get some numbers on this. Is > that what needs to happen next here? Yes, I think so. I'd suggest putting some tracing in and run the browser and have a browse around. Doesn't have to but particularly exact but it would be good to get a feel for the distribution of array buffer sizes.
Flags: needinfo?(jcoppeard)

I worked on this a bit. I updated the patch (only one needed now): patch

I ran Talos and the results weren't great: Try Run / Talos
Some 2-3% regressions. And yet, you'll find this was actually the best try run I had...

Then I tried to only apply the separation to non-System Principal: patch
Talos for that was significantly worse: Talos

Then I noticed that previously the data was able to be background-finalized but we had lost that capability, so I added it back: patch
That (plus the system principal patch) also did poorly on Talos

And then I ran it with background finalization but without the system principal exception: Try Talos
I was able to get a really nice win on tabswitch on windows (20%) but a bunch of regressions elsewhere - again more than the very first try run.

Next I'm going to try with fewer AllocKinds as suggested.

Assignee: mhowell → tom

2 AllocKinds Talos got a lot of ~3% regressions across the board; but also a lot of 3% improvements on a11yr specifically.

There's also the now-typical large medium-confident regressions on perf_reftest but those have such huge std dev I'm not sure what to make of them...

Hey Steve, Jon - do you have any suggestions for things to try based on these Talos runs?

Flags: needinfo?(sphink)
Flags: needinfo?(jcoppeard)

(In reply to Tom Ritter [:tjr] from comment #10)

Hey Steve, Jon - do you have any suggestions for things to try based on these Talos runs?

First of all, I think you should get some better baseline numbers. I've come to strongly distrust comparisons against recent mozilla-central (especially when the patch under test is based off an old mozilla-central; I don't think it takes your base into account, it just looks at the last 2 days.) So before making any decisions, can you make a dummy change to e596664275d5e3e2fdcb7fa8d1447289f99269c3 and push that to try with --rebuild 7 (or --rebuild-talos 7, depending on which command you use). It looks like all of your pushes are based on that rev. Then you can change the comparisons to be based off of that. (And you'll have to create a dummy change. I usually append something to README.txt. It's annoying, but the decision job will just refuse to run anything otherwise.)

Partly why I'm skeptical is that I looked at the awsy memory results on that first talos link, and they didn't show much change, but did have /improvement/ of 2.3%, which I can't explain. It's also based on only 2 runs with your patch though, so in addition to adding in a proper baseline, you should probably do some more retriggers.

That said, I doubt we have enough ArrayBuffers to make up any significant percentage of total browser memory usage, so I don't expect awsy to show all that much. (And yet, if that's the case, then maybe there's no problem taking this...)

Anyway, it seems from the results you have so far, there's a perf blocker. I suspect with better measurement that particular issue will just disappear, and we can focus on the more relevant memory usage stuff.

Flags: needinfo?(sphink)
Flags: needinfo?(jcoppeard)

(In reply to Steve Fink [:sfink] [:s:] from comment #11)

First of all, I think you should get some better baseline numbers.

Fair enough. Here's a try run of my base commit.

Base compared to initial patch - Very favorable! Only seemed to affect Windows. On x64: 6% improvement on tabpaint, 9% on tsvg_static. 6% regression on tsvgr_opacity, 10% on about_preferences_basic. 4-6.5% on time_to_session_store_window_restored_ms for Windows x86. 2.4% regression on Linux64 AWSY.

Base Compared to excluding system principal - Huge 32% regression on glvideo Mean tick time across 100 ticks on OSX. 8% regression on perf_reftest for Win10 x64, 2% and 7% regressions on Windows x86 tests, 6% improvement on OSX ts_paint_webext. No notable AWSY regressions.

Base compared to Background Finalization - No important Talos regressions or improvements. 2% Linux x64 AWSY regression.

Base compared to Background Finalization + System Principal Exclusion 6% improvement on tsvg_static for Win x64, 1 small regression for Linux and OSX each, 2 small for x86 Windows. No notable AWSY regressions.

Base compared to Two AllocKinds - 12% regression on time_to_session_store_window_restored_ms for Windows x86. AWSY: A 5% Winx86 regression and a 10% x86 improvement. A 2% OSX improvement.

This... doesn't really seem very bad.

Flags: needinfo?(sphink)
Attachment #8991063 - Attachment is obsolete: true
Attachment #8991063 - Flags: review?(sphink)
Attachment #8991064 - Attachment is obsolete: true
Attachment #8991064 - Flags: review?(sphink)

The reason for doing this is to get ArrayBufferObjects allocated into their own arenas.

The specific enum values were chosen to avoid breaking assumptions about where
certain values fall in the list, such as OBJECT_FIRST == FUNCTION.

This improves performance.

Depends on D40227

(In reply to Tom Ritter [:tjr] from comment #14)
It looks like the first patch has the array buffer alloc kinds marked as background finalizable, so what does the second patch actually do? Array buffers are background finalized so I don't think we need foreground finalized versions of these arena kinds. Was there some bug working out the arena kind that meant we picked the wrong kind?

(In reply to Jon Coppeard (:jonco) from comment #15)

(In reply to Tom Ritter [:tjr] from comment #14)
It looks like the first patch has the array buffer alloc kinds marked as background finalizable, so what does the second patch actually do?

I hate to disagree with you on your own code; but that isn't how I read it. The boolean 'BGFinal' seems to say "Can this be Background Finalizable?' but I think it means "Is this already finalized?" (So a 'true' means you can't background finalize, and a 'false' means you can.)

If we follow ArrayBufferObject::create* into NewObjectWithClassProto and then into NewObjectWithClassProtoCommon it checks if (CanBeFinalizedInBackground(allocKind, clasp)). CanBeFinalizedInBackground in turn says return !IsBackgroundFinalized(kind) which just returns the bool from the initial table. So 'false' means "I can be background finalized". (I think.)

In the first patch I mark then as true and in the second patch I add the new _BG types and mark the original as false.

Array buffers are background finalized so I don't think we need foreground finalized versions of these arena kinds.

The reason I also have the foreground types is because of how the Background types are determined. In NewObjectWithClassProtoCommon it calls GetBackgroundAllocKind which just adds 1 to the allockind. So to keep that codepath the same; I create foreground types that won't be used and background types that would.

However typing this all out; it seems like I could just return the BG types in my GetArrayBufferGCObjectKind function added in the first patch and that ought to work too; eliminating the foreground types.

I'll do that if this all looks correct to you.

Flags: needinfo?(jcoppeard)

(In reply to Tom Ritter [:tjr] from comment #16)

The boolean 'BGFinal' seems to say "Can this be Background Finalizable?' but I think it means "Is this already finalized?" (So a 'true' means you can't background finalize, and a 'false' means you can.)

No, true means this kind is finalized in the background, false means it's finalized in the foreground (on the main thread).

If we follow ArrayBufferObject::create* into NewObjectWithClassProto and then into NewObjectWithClassProtoCommon it checks if (CanBeFinalizedInBackground(allocKind, clasp)). CanBeFinalizedInBackground in turn says return !IsBackgroundFinalized(kind) which just returns the bool from the initial table. So 'false' means "I can be background finalized".

Oh yeah so reading it again this code is is really confusing.

What happens is that the caller passes in the foreground alloc kind and if possible we change it to the background alloc kind by adding one to it. This happens by | if (CanBeFinalizedInBackground()) { allocKind = GetBackgroundAllocKind() } | or similar logic. That's why CanBeFinalizedInBackground only returns true if the alloc kind it was passed is not already background alloc kind.

However typing this all out; it seems like I could just return the BG types in my GetArrayBufferGCObjectKind function added in the first patch and that ought to work too; eliminating the foreground types.

Yes I think that would be better if it works.

I'll file for cleaning up the background finalizable confusion.

Flags: needinfo?(jcoppeard)

(In reply to Jon Coppeard (:jonco) from comment #17)

However typing this all out; it seems like I could just return the BG types in my GetArrayBufferGCObjectKind function added in the first patch and that ought to work too; eliminating the foreground types.

Yes I think that would be better if it works.

Ah okay; you're right then; my initial patch is all that's needed. It also means that my summary of Talos performance was slightly off in its descriptions.

(This is still accurate) Base compared to initial patch - Very favorable! Only seemed to affect Windows. On x64: 6% improvement on tabpaint, 9% on tsvg_static. 6% regression on tsvgr_opacity, 10% on about_preferences_basic. 4-6.5% on time_to_session_store_window_restored_ms for Windows x86. 2.4% regression on Linux64 AWSY.

(This is still accurate) Base Compared to excluding system principal - Huge 32% regression on glvideo Mean tick time across 100 ticks on OSX. 8% regression on perf_reftest for Win10 x64, 2% and 7% regressions on Windows x86 tests, 6% improvement on OSX ts_paint_webext. No notable AWSY regressions.

(This code is behaving the same way as 'Base compared to initial patch') Base compared to Background Finalization - No important Talos regressions or improvements. 2% Linux x64 AWSY regression.

(This is behaving the same way as 'Base Compared to excluding system principal') Base compared to Background Finalization + System Principal Exclusion 6% improvement on tsvg_static for Win x64, 1 small regression for Linux and OSX each, 2 small for x86 Windows. No notable AWSY regressions.

(This is still accurate) Base compared to Two AllocKinds - 12% regression on time_to_session_store_window_restored_ms for Windows x86. AWSY: A 5% Winx86 regression and a 10% x86 improvement. A 2% OSX improvement.

Attachment #9082283 - Attachment is obsolete: true

Sorry, I've been ignoring this.

Code-wise, the change looks good. And I don't anticipate any significant performance differences, and for the most part it seems like your tests confirm that.

So it really boils down to the question of memory usage. A 2% regression in exchange for security... it doesn't seem out of the question, but it doesn't seem like something we should obviously take, either.

I'm realizing I'm just not the right person to be deciding. I'd like to punt that decision over to people closer to memshrink & fission. The wheel of fate points at erahm.

That said, I am most interested in that final experiment, of the two AllocKind solution. Those numbers sound off, which makes me suspect there's some excess variance in there. You did a decent number of pushes, so I'm not sure what to say. I guess I can pawn that off on erahm as well, though you could also retrigger another dozen times or so to be sure.

Flags: needinfo?(sphink) → needinfo?(erahm)

These regressions are possibly in the noise, a minor regression on Linux is probably okay as we have some other larger wins in the pipeline (bug 1470591 for example). If we see larger regressions or regressions on Windows in particular we might want to consider backing out, but overall I'm fine with giving it a shot. Lets shoot for trying this out early in 71 so that we can get some more stable perf data and some time to bake on Nightly.

Flags: needinfo?(erahm)
Attachment #9082282 - Attachment description: Bug 1474659 Add dedicated AllocKinds just for ArrayBufferObjects. r?sfink → Bug 1474659 Add dedicated AllocKinds just for ArrayBufferObjects. r?jonco
Blocks: 1577549
Pushed by tritter@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0b78b172fc1a Add dedicated AllocKinds just for ArrayBufferObjects. r=jonco
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71

We were kind of expecting some sort of regression (memory, performance potentially) to show up from this. Talos was indicative but inconclusive - did we see anything?

Flags: needinfo?(igoldan)

(In reply to Tom Ritter [:tjr] from comment #23)

We were kind of expecting some sort of regression (memory, performance potentially) to show up from this. Talos was indicative but inconclusive - did we see anything?

Marian, have you seen any AWSY changes related to this bug?

Flags: needinfo?(igoldan) → needinfo?(marian.raiciof)

Ionut, i checked the alerts from awsy but i see nothing related to this bug.

Flags: needinfo?(marian.raiciof)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: