Sandbox woff2 in ots using RLBox
Categories
(Core :: Security: RLBox, enhancement)
Tracking
()
Tracking | Status | |
---|---|---|
firefox96 | --- | fixed |
People
(Reporter: deian, Assigned: deian)
References
(Blocks 1 open bug)
Details
Attachments
(1 file, 2 obsolete files)
(deleted),
text/x-phabricator-request
|
Details |
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 1•3 years ago
|
||
Assignee | ||
Comment 2•3 years ago
|
||
Depends on D126435
Depends on D126435
Assignee | ||
Comment 3•3 years ago
|
||
This patch puts woff2 and brotli in a sandbox. We could expose the brotli functions as callbacks if the memory overhead is worth the trade-off.
Comment 4•3 years ago
|
||
Nice!
(In reply to Deian Stefan from comment #3)
This patch puts woff2 and brotli in a sandbox. We could expose the brotli functions as callbacks if the memory overhead is worth the trade-off.
Having them in the same sandbox seems fine to me.
Updated•3 years ago
|
Comment 5•3 years ago
|
||
(In reply to Deian Stefan from comment #3)
This patch puts woff2 and brotli in a sandbox. We could expose the brotli functions as callbacks if the memory overhead is worth the trade-off.
Note that brotli is also used by a few other parts of the code: see https://searchfox.org/mozilla-central/search?q=brotli%2Fdecode.h. So as I understand it, the patch here will mean we have two copies of brotli (including its big static dictionary, https://searchfox.org/mozilla-central/source/modules/brotli/common/dictionary.c), one copy compiled to WASM and the other as native code.
That seems unfortunate; if we're going to WASM-ify brotli for woff2 use, we should also use that version for the HTTP compression code (and the layout inspector code) instead of carrying two "compiled" versions of it.
So for woff2 purposes it seems fine to have brotli in the same sandbox, but I think we should ensure that it is also usable for the other modules that want to access brotli-decode functionality.
Assignee | ||
Comment 6•3 years ago
|
||
Depends on D126435
Depends on D126435
Updated•3 years ago
|
Assignee | ||
Comment 7•3 years ago
|
||
New patches expose brotli to the sandboxed code. We can revisit all of this once we put brotli in a sandbox too :)
Try in the works: https://treeherder.mozilla.org/jobs?repo=try&revision=8d764cb0e60d1a893fe42a9575ffc5e38cb13aeb
Assignee | ||
Comment 8•3 years ago
|
||
AFAICT the try failures are unrelated to these set of patches.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 9•3 years ago
|
||
I addressed the remaining comments. One thing that's not super obvious to me is if we want the sandbox to be a thread_local or if we want to create a single sandbox and tear it down when the process exits. Any thoughts?
Comment 10•3 years ago
|
||
(In reply to Deian Stefan from comment #9)
I addressed the remaining comments. One thing that's not super obvious to me is if we want the sandbox to be a thread_local or if we want to create a single sandbox and tear it down when the process exits. Any thoughts?
Can we do one sandbox per font decoding instance, with a decode failure if we can't reserve the contiguous memory? That's what we ended up settling on for OGG [1], with the idea that we're soon planning to make the heaps growable so the address space overhead will be less of an issue over time.
Assignee | ||
Comment 11•3 years ago
|
||
(In reply to Bobby Holley (:bholley) from comment #10)
Can we do one sandbox per font decoding instance, with a decode failure if we can't reserve the contiguous memory? That's what we ended up settling on for OGG [1], with the idea that we're soon planning to make the heaps growable so the address space overhead will be less of an issue over time.
Done.
Assignee | ||
Comment 12•3 years ago
|
||
Looking at https://searchfox.org/mozilla-central/source/config/recurse.mk#217 do we need to add some bits to recurse.mk too now?
Assignee | ||
Comment 13•3 years ago
|
||
Assignee | ||
Comment 14•3 years ago
|
||
(In reply to Deian Stefan from comment #12)
Looking at https://searchfox.org/mozilla-central/source/config/recurse.mk#217 do we need to add some bits to recurse.mk too now?
We do. Updated part 1.
Assignee | ||
Comment 15•3 years ago
|
||
On windows, we're failing at an invoke here with WASM_RT_TRAP_CALL_INDIRECT_TYPE_MISMATCH
errors. Here is an example: https://treeherder.mozilla.org/logviewer?job_id=356511111&repo=try&lineNumber=5931
I got this reproduced locally in a vm, but my ability to use windows is pathetic. @shravan is this something obvious to you? (I'm wondering if it's the same issue we ran into with hunspell but it triggers a bit differently.)
Assignee | ||
Comment 16•3 years ago
|
||
Fixed up windows. Issue was rlbox callback with *_t types. (WIP try: https://treeherder.mozilla.org/jobs?repo=try&revision=c38e83380001fb03144a29b3334d50b67365898a&selectedTaskRun=KRBbrcypR3S45aCAYgu_1A.0) And just kicked off a try fuzzy to be sure:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b6c88e8f7dd1a22e80a682d092ec9e7218ccac77
Comment 17•3 years ago
|
||
To see what impact this might have on performance, I did a local build with instrumentation added around the call to SanitizeOpenTypeData that is part of loading a webfont resource. I collected the list of resource sizes and time spent in SanitizeOpenTypeData when visiting the Google Fonts home page, which loads fairly small subsets of a number of fonts, and also the Source Han Serif page on TypeKit, which includes some larger resources.
Results (from a MacBook Pro) are shown at https://docs.google.com/spreadsheets/d/1YvAfNot3QO0BaB3iggUBxrt3GS38jUM5xAKqA5YNblY/edit?usp=sharing, sorted by WOFF2 resource size, which compares the time taken with a current mozilla-central build with the time when the patches here are applied. (In both cases this is using a default Opt -- but not PGO -- build.)
Although the figures vary quite a bit (and would doubtless show variability across multiple runs), it's clear that there is a noticeable overhead as a result of the sandboxing/wasm-translation; the total time taken to decode the 50-plus resources is about 30% more. However, note that for small resources the per-font decoding time is typically less than a millisecond, so the 30% -- or even 50% or 100% -- regression is scarcely going to be noticeable among all the work of loading the page. Larger font resources, in the 100K-plus size range, may show a regression of several milliseconds, but these are resources that (on my connection, at least) have spent dozens or hundreds of ms downloading, so in terms of the overall webfont-loading process, the regression in decoding speed seems pretty minor to me.
Comment 18•3 years ago
|
||
Agree that the overhead is probably acceptable. Could you post a profile of a 100x WOFF2 loop so that we can see if there are any obvious hotspots we could optimize further?
Comment 19•3 years ago
|
||
Updated the sheet at https://docs.google.com/spreadsheets/d/1YvAfNot3QO0BaB3iggUBxrt3GS38jUM5xAKqA5YNblY/edit?usp=sharing to have numbers from a shippable build, and wrapped the WOFF2-decompression in a 100-iteration loop to make it easier to see things.
What's clear from these numbers -- where small resources show a huge regression, percentage-wise (even though it's less than a millisecond per iteration), while larger resources show a much smaller percentage -- is that the overhead of setting up the sandbox each time is quite significant on small resources.
This can also be clearly seen in a profile: https://share.firefox.dev/3BSsEJk is a profile captured while loading https://fonts.google.com, which displays small subsets of a number of fonts. create_sandbox
and destroy_sandbox
(to a lesser extent) show up as dominating the processing of small font resources.
Given that small font resources are very common on the web, I think we should reconsider the strategy of spinning up a new sandbox for each call. Maybe we could make it thread-local, so that when a given thread is asked to process several resources in succession, it doesn't pay the overhead of a fresh sandbox every time?
Comment 20•3 years ago
|
||
The downside of minting thread-local sandboxes is that the sandboxes allocate a fair amount of memory (~325k I think), and so it's not good for browser memory usage to just have them sitting around.
One strawman idea would be the following:
- We have a shared (locked) nsTArray of spare sandboxes
- When a thread needs a sandbox, it grabs one from the spares. If there are no spares, it mints one.
- When a thread is done with its sandbox, it pushes it back to the spares.
- When the array is non-empty and hasn't been touched for 3 seconds, a timer clears the array.
Jonathan, does that seem like it would strike the right balance based on how this stuff gets used?
Comment 21•3 years ago
|
||
My 2 cents - either option could work.
I think thread locals can work fine, but I think we would want to the same, destroy sandbox if unused for 3 seconds sort of behavior even on thread locals, so that we don't leave sandboxes hanging around. Specifically, it would have to operate like a a shared_pointer thread_local, but the destructor would need to delay execution by 3 seconds.
Fwiw, we did try the sandbox pool approach in our original paper and it worked quite well when we measured performance of sandboxing zlib, libjpeg and other very commonly used libraries.
I'll let both of you make the final call here.
Comment 22•3 years ago
|
||
(In reply to Bobby Holley (:bholley) from comment #20)
One strawman idea would be the following:
- We have a shared (locked) nsTArray of spare sandboxes
- When a thread needs a sandbox, it grabs one from the spares. If there are no spares, it mints one.
- When a thread is done with its sandbox, it pushes it back to the spares.
- When the array is non-empty and hasn't been touched for 3 seconds, a timer clears the array.
That sounds like it'd be reasonable. Generally, if there's just a single font resource involved in a given page-load or reflow, the extra millisecond or so to set up a sandbox is pretty insignificant; the concern would be when reflow ends up requesting a dozen separate resources, so as long as we get to (largely) re-use sandboxes in that case we should be fine.
Using thread locals seems nice in that it avoids any need for the (lock-protected) array, and the total number of sandboxes automatically matches the number of threads using them (aside: I'm looking into whether bumping up the background-thread pool count may be helpful). But I don't have a strong view here; AFAICS either approach should work pretty much equally well, so I'm happy for the implementer to choose which way to do it. :)
Assignee | ||
Comment 23•3 years ago
|
||
Thread pool support added. Pretty much the strawman implementation, except we don't always clear the whole array.
Try auto: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d7ba732f3f16759a46da4333871beb907885a97f
Assignee | ||
Comment 24•3 years ago
|
||
Kicked off try with the new threadpool: https://treeherder.mozilla.org/jobs?repo=try&revision=7a475c93cad9dd0388e6671ce2696ae3acfdf265
Assignee | ||
Updated•3 years ago
|
Comment 25•3 years ago
|
||
Comment on attachment 9242609 [details]
Bug 1732201 - Part 3: Turn on Wasm sandbox support for RLBoxed woff2
Revision D126436 was moved to bug 1742916. Setting attachment 9242609 [details] to obsolete.
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 26•3 years ago
|
||
Assignee | ||
Comment 27•3 years ago
|
||
Comment 28•3 years ago
|
||
Comment 29•3 years ago
|
||
Backed out for causing web-platform-tests failures on header-totalsfntsize-001.xht
- Backout link
- Push with failures
- Failure Log
- Failure line: TEST-UNEXPECTED-FAIL | /css/WOFF2/header-totalsfntsize-001.xht | assertion count 1 is more than expected 0 assertions
Comment 30•3 years ago
|
||
Comment 31•3 years ago
|
||
bugherder |
Assignee | ||
Updated•3 years ago
|
Description
•