Sandbox libexpat using RLBox
Categories
(Core :: Security: RLBox, enhancement, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox96 | --- | fixed |
People
(Reporter: deian, Assigned: deian)
References
(Blocks 2 open bugs)
Details
(Keywords: perf-alert)
Attachments
(1 file, 10 obsolete files)
(deleted),
text/x-phabricator-request
|
Details |
Assignee | ||
Comment 1•4 years ago
|
||
Assignee | ||
Comment 2•4 years ago
|
||
AFAICT none of the tests in html/parser exercise the stream code that uses expat. I'd love some suggestion for how to at least sanity test this.
EDIT: figured it out. view source of some xml file works
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 3•4 years ago
|
||
Depends on D102851
Assignee | ||
Comment 4•4 years ago
|
||
Depends on D102851
Assignee | ||
Comment 5•4 years ago
|
||
Sorry for the two Part 2
s. I think we should split out the two uses of libexpat so we can back out things more easily.
So let's hold off on D104658 for a bit. I'll upload a new version of the nsExpatDriver that turns on sandboxing too (and addresses @tjr's comments). For this we need to preload the library much earlier. I did this, but ran into a RLBox assertion when registering callback (could not find an empty slot in sandbox function table
). We are creating quite a few sandboxes so this might be the reason, but @shravanrn and I need to sit down and see what's up.
Assignee | ||
Comment 6•4 years ago
|
||
Tom: I was thinking of creating a single sandbox for nsExpatDriver vs. per parser. This might be an okay first step, but we're still curious what the attacker model is here (i.e., do we want something finer grained)?
Comment 7•4 years ago
|
||
(In reply to Deian Stefan from comment #6)
Tom: I was thinking of creating a single sandbox for nsExpatDriver vs. per parser. This might be an okay first step, but we're still curious what the attacker model is here (i.e., do we want something finer grained)?
I think the attacker model is basically that somebody injects malformed XML in order to trigger a memory hazard and take over the content process. To that end, I'm not sure what multiple sandboxes would buy us (for libexpat or for any other library I can think of).
What is the memory overhead of each sandbox? If it's non-trivial, we should seriously consider just having a single sandbox per process shared across all RLBoxed libraries. Any reason not to do that?
Assignee | ||
Comment 8•4 years ago
|
||
Sounds good to me. I think we can coalesce things into a single sandbox (beyond expat), but right now the interface tied to library is easier to manage and reason about (e.g., which callbacks can be called by what library). We may want to chat about this in more detail.
For nsExpatDriver, specifically, I think we want two sandboxes (one with callbacks for system principal, one not). I assume like the other usage of expat (specifically this comment), we can't rely on thread locals to keep track of the driver (this
)?
Comment 9•4 years ago
|
||
(In reply to Deian Stefan from comment #8)
Sounds good to me. I think we can coalesce things into a single sandbox (beyond expat), but right now the interface tied to library is easier to manage and reason about (e.g., which callbacks can be called by what library).
Perhaps we could keep the interface the same, but simply have create_sandbox share the lucet/wasmtime sandbox under the hood (probably via TLS)?
We may want to chat about this in more detail.
Sounds good, we can talk on Tuesday.
For nsExpatDriver, specifically, I think we want two sandboxes (one with callbacks for system principal, one not).
Can you say more about this?
Comment 10•4 years ago
|
||
(In reply to Bobby Holley (:bholley) from comment #9)
For nsExpatDriver, specifically, I think we want two sandboxes (one with callbacks for system principal, one not).
Can you say more about this?
In particular: I believe the separate codepath for system principal code in nsExpatDriver comes from bug 1539759. The idea there was that attackers could use langpack DTDs to inject markup into privileged documents, and so we modified the parsing code for privileged documents to disallow that. I don't think there's any reason why it would be dangerous for a compromised libexpat (presumably parsing a non-system-principal document containing the exploit) to call the system-principal version of the callbacks.
In general I think it would be helpful to have Peter take a look at the patches here and give a general take as to whether they make sense.
Comment 11•4 years ago
|
||
In general I think we should be wary about colocating wasm libraries I think. If we assume an attacker can control the inside of a wasm sandbox, manipulate it fully, and return bad values; then a library A that introduces a vulnerability allowing takeover would allow the attacker to manipulate the result returned by library B. In theory library B's values should be sanity checked to prevent anything too-bad from happening but in practice we aren't putting in all the checks presently. (Bug 1693991) With Fission you won't be able to attack cross-site for the most part, so the risk is lower also.
That said this attack is going to be pretty darn complicated to pull off; and I don't think it's unreasonable to err on the side of performance until we have an inkling that this has become more concerning.
Comment 12•4 years ago
|
||
Yeah. I think it's really a question of the memory cost, primarily on windows, which we won't have a good understanding of until we actually land rlbox support on windows.
For now, I think the best approach is to keep the sandboxes conceptually separate (per library I think, rather than per instantiation of the library). That aligns with the design of the rlbox API, and gives us the most flexibility. Later on, if the memory usage is significant, we can coalesce the sandboxes under the hood per my suggestion in comment 9.
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 13•4 years ago
|
||
Depends on D106254
Assignee | ||
Comment 14•4 years ago
|
||
The second patch will need a tiny fix once we fix Bug 1699028, but otherwise I think we're set for another glance (will follow up on phab).
Assignee | ||
Comment 15•4 years ago
|
||
Assignee | ||
Comment 16•3 years ago
|
||
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 17•3 years ago
|
||
The default handler and character-data handler callbacks are identical
and some Windows compilers just reconciled them into a single function.
This, unfortunately, resulted in a RLBox runtime error: the same
callback was registered twice. This patch removes the duplicate handler
implementation and just sets the character-data handler callback as the
default handler.
Depends on D106254
Depends on D106254
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 18•3 years ago
|
||
Assignee | ||
Updated•3 years ago
|
Comment 19•3 years ago
|
||
Comment 20•3 years ago
|
||
Comment 21•3 years ago
|
||
Backed out for causing several test failures.
Failure logs:
- https://treeherder.mozilla.org/logviewer?job_id=353117859&repo=autoland
- https://treeherder.mozilla.org/logviewer?job_id=353118745&repo=autoland
- https://treeherder.mozilla.org/logviewer?job_id=353124162&repo=autoland
Backout link: https://hg.mozilla.org/integration/autoland/rev/ab54fa20c599a000a04361c6ab6674c2bdaac987
Comment 22•3 years ago
|
||
(In reply to Pulsebot from comment #20)
Backout by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ab54fa20c599
Backed out 4 changesets for assertion and bc failures on
browser_translation_bing.js.
== Change summary for alert #31711 (as of Wed, 06 Oct 2021 06:38:13 GMT) ==
Regressions:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
2% | google-docs fnbpaint | linux1804-64-shippable-qr | warm webrender | 257.02 -> 262.75 |
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
2% | google-docs fnbpaint | linux1804-64-shippable-qr | warm webrender | 261.82 -> 255.94 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=31711
Updated•3 years ago
|
Assignee | ||
Comment 23•3 years ago
|
||
With wasmboxing not enabled (just applying first three patches), I think all the windows failures are unrelated to these sets of patches: https://treeherder.mozilla.org/jobs?revision=f62f11e713cf8867b30572c0e22e2698d1332f87&repo=try&selectedTaskRun=Y1Gk7mquRzuSeP8BbReEmg.0
Going to investigate the alternative flags that we set when we enable wasmboxing
Assignee | ||
Comment 24•3 years ago
|
||
Depends on D106254
Depends on D106254
Comment 25•3 years ago
|
||
(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #22)
Regressions:
Ratio Test Platform Options Absolute values (old vs new) 2% google-docs fnbpaint linux1804-64-shippable-qr warm webrender 257.02 -> 262.75 Improvements:
Ratio Test Platform Options Absolute values (old vs new) 2% google-docs fnbpaint linux1804-64-shippable-qr warm webrender 261.82 -> 255.94 For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=31711
Bas, could use your insight on this. This patch compiles libexpat to WASM to sandbox it. It was backed out, which caused an associated regression (which reverted post-backout). Round-tripping through WASM definitely adds some overhead, and this regression would suggest that libexpat is on the hot path. However, it's a bit odd to me that the regression only surfaced on linux (and only on gdocs). Do you have a sense of how concerned we should be and how much investigation we should do? What next steps would you recommend?
Sandboxing expat is certainly not required, but it would be nice.
Comment 26•3 years ago
|
||
OK nevermind, it's actually straightforward to see non-trivial libexpat usage by just loading a regular gdoc (no need to mess around with automation).
STR:
(1) Run a shippable build
(2) Start the firefox profiler
(3) Open (or shift+reload) https://docs.google.com/document/d/1TNnya6B8pyomDK2F1R9CL3dY10OAmqWlnCxsWyOBDVQ/edit#heading=h.gjgriz8s7dlc
(4) capture profile
(5) Filter samples by MOZ_XML_PARSE
On Nightly OSX, I get around 45 milliseconds spent in expat for SVG parsing. That's pretty significant, and adding normal WASM overhead to that may not be acceptable. That said, a bunch of the work is in callbacks, so the true overhead might be less.
To do a proper comparison, I backfilled shippable OSX builds for the revision this landed, and the one prior:
https://treeherder.mozilla.org/jobs?repo=autoland&revision=1a720cffc019aff6f92a974b681e7190199406a0&searchStr=OSX%2Cshippable
https://treeherder.mozilla.org/jobs?repo=autoland&revision=aa67c5a8e93c72482b4204fe401bef662987cfbe&searchStr=OSX%2Cshippable
When those complete, I'll profile them side by side to get a better picture.
Comment 27•3 years ago
|
||
Without rlbox, ~55ms under MOZ_XML_PARSE: https://share.firefox.dev/2Z3CCtU
With rlbox, ~76ms under MOZ_XML_PARSE: https://share.firefox.dev/3aFtyh8
Unfortunately, looks like expat is moderately hot for SVG parsing. :-(
Assignee | ||
Comment 28•3 years ago
|
||
We'll take a look at where the perf overhead is coming from.
Another path: Maybe we can turn it on for nsHtml5StreamParser for in the short term, but not for nsExpatDriver (which we can put behind a flag). The downside is we'd have two copies of the library.
Comment 29•3 years ago
|
||
So I dug into the profiles a bit. Zooming on on the XML parsing in imgRequest::OnDataAvailable (i.e. SVG parsing), it looks the difference is mainly in the callback overhead from the wasmboxed libexpat to nsExpatDriver::Handle{Start,End}Element:
Seems like there's a bunch of malloc/free at the boundary, both within lookup_symbol and outside of it, as well as some locking. This all strikes me as stuff that can be optimized out. Expat is only ever accessed on the main thread, so the locking shouldn't be necessary, and there's almost certainly a way to design the system such that wasm can repeatedly invoke a callback without mallocing each time.
I was concerned when I first profiled that we were just dealing with the overhead of executing more instructions to implement wasm semantics. But if it's primarily just issues like those described above, I'm much more optimistic that we can make it work.
Comment 30•3 years ago
|
||
Oh interesting! Yup, the move to static linking will definitely remove lookup_symbol related overheads. Other sources of malloc at the boundary would still have to be addressed though.
Assignee | ||
Comment 31•3 years ago
|
||
Depends on D106254
Depends on D106254
Updated•3 years ago
|
Assignee | ||
Comment 32•3 years ago
|
||
Depends on D130148
Depends on D130148
Assignee | ||
Comment 33•3 years ago
|
||
Depends on D130238
Assignee | ||
Comment 34•3 years ago
|
||
Depends on D130239
Updated•3 years ago
|
Comment 35•3 years ago
|
||
I did some more profiling with the static linking changes and the bump allocator. This time I did five refreshes for each profile to smooth out the noise. It appears that if we eliminate the pthread overhead (by stubbing out the synchronization we don't need) as well as the allocation and copying at the boundary (by simply passing the strings directly into the callbacks), we should get within about 1-2% of raw parse performance.
Here are profiles with a bunch of equivalent work filtered out from each, and the aforementioned overhead filtered out from the RLBox profile.
Without RLBox: 98ms (base parse overhead of 282ms) https://share.firefox.dev/3GZY7xd
With RLBox: 103ms (base parse overhead of 333ms) https://share.firefox.dev/3bEZ1AM
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 36•3 years ago
|
||
Comment 37•3 years ago
|
||
I did quite a bit of performance measurement on the latest patches (which eliminate all the boundary allocation, copying, and locking, at least on 64-bit). The upshot is that, for the gdocs testcase I measured, RLBox introduces an SVG parsing overhead of about 7% on 64-bit platforms (~80% of our users), and about 20% on 32-bit platforms.
The testcase I focused on was the sprite sheet loaded by gdocs. Gdocs actually loads a number of different sprite sheets. During core page-load, it loads both a small high-priority sheet (5k compressed, 21k uncompressed) and a large low-priority sheet (80k compressed, 335k uncompressed). Then several seconds later (after the doc is largely rendered), it also loads 2 more sheets similar to the large one. The small sheet takes about a millisecond or so to parse, the large sheet takes about 20 milliseconds. Given that the large sheet it marked as low priority, it seems like there we ought to be able to defer parsing it until after initial page load — but that's somewhat out of scope here.
I took measurements by repeatedly loading a testcase 10 times to eliminate noise. I tested on both the gdoc from comment 26, as well as just the sprite sheet — amplified by concatenating the contents 10 times. I did some spot-checking of other test-cases, but nothing exhaustive. The profiles are as follows:
32-bit windows:
- Baseline
** loading gdoc (10x): https://share.firefox.dev/3kfVZrm
** loading amplified-sprites (10x): https://share.firefox.dev/3mUGoyT - RLBoxed Expat
** loading gdoc (10x): https://share.firefox.dev/3EY0WwJ
** loading amplified-sprites (10x): https://share.firefox.dev/3o9je7w
64-bit windows:
- Baseline
** loading gdoc (10x): https://share.firefox.dev/3H7a5VQ
** loading amplified-sprites (10x): https://share.firefox.dev/3EUn0IA - RLBoxed Expat
** loading gdoc (10x): https://share.firefox.dev/3kiUxV2
** loading amplified-sprites (10x): https://share.firefox.dev/3BWZ4Co
64-bit mac:
- Baseline:
** loading amplified-sprites (10x): https://share.firefox.dev/3H6UqFJ - RLBoxed Expat:
** loading amplified-sprites (10x): https://share.firefox.dev/3qyTVPd
Whether or not this overhead is acceptable depends on how frequently users encounter large SVG images. Bas believes that SVG is becoming more important, but that doesn't necessarily mean SVG images. For example, google slides spends much more time in SVG rendering ( https://share.firefox.dev/3qh5gmz ) that google docs, but has identical parse overhead (presumably because all of the slide-specific SVG construction is done programmatically).
I dug into both the tp5 and tp6 pagesets (snapshotted a few years ago), and found very few SVG images. So most widespread cost of shipping this today is likely to be a ~2ms regression loading gsuite — which is probably acceptable, and also probably something we can turn into a 20ms improvement via the priority handling discussed above.
In summary, I think we should ship this.
Comment 38•3 years ago
|
||
If all the XML parsing is RLBoxed, how much does this affect to initial startup? Frontend has quite a bit xml and in the cases startup cache is empty, all that needs to be parsed.
Comment 39•3 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #38)
If all the XML parsing is RLBoxed, how much does this affect to initial startup? Frontend has quite a bit xml and in the cases startup cache is empty, all that needs to be parsed.
I believe we have cold startup perf tests running in CI, so the fact that we didn't see them in comment 22 (when the RLBoxing overhead of these patches was about 5 times higher than it is now) suggests we're probably ok. I suspect the total volume of actual semantic XML is quite a bit less than the 335k SVG sprite sheet (but definitely let me know if I'm off-base).
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 40•3 years ago
|
||
Does this affect innerHTML speed on XHTML documents?
Found an old test http://mozilla.pettay.fi/moztests/innerhtml_nonsimple_nostyle_test.xhtml
xhtml is already quite a bit slower than html http://mozilla.pettay.fi/moztests/innerhtml_nonsimple_nostyle_test.html so I'd rather not slow it even more. But maybe we're lucky and RLBox somehow speeds up memory accesses or something :)
Comment 41•3 years ago
|
||
(In reply to Bobby Holley (:bholley) from comment #37)
I dug into both the tp5 and tp6 pagesets (snapshotted a few years ago), and found very few SVG images. So most widespread cost of shipping this today is likely to be a ~2ms regression loading gsuite — which is probably acceptable, and also probably something we can turn into a 20ms improvement via the priority handling discussed above.
This is now tracked in bug 1741224.
I'll profile startup and the testcases from comment 40 when I get a minute.
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 42•3 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #38)
If all the XML parsing is RLBoxed, how much does this affect to initial startup? Frontend has quite a bit xml and in the cases startup cache is empty, all that needs to be parsed.
I profiled startup with the patches. It appears that, even for a fresh profile, all of the browser chrome is already read from the prototype cache (presumably we do this as part of the build now), so we don't hit expat.
We do spend about 7 ms parsing SVG, which suggests this might introduce a sub-millisecond impact to startup.
(In reply to Olli Pettay [:smaug] from comment #40)
Does this affect innerHTML speed on XHTML documents?
Found an old test http://mozilla.pettay.fi/moztests/innerhtml_nonsimple_nostyle_test.xhtml
xhtml is already quite a bit slower than html http://mozilla.pettay.fi/moztests/innerhtml_nonsimple_nostyle_test.html so I'd rather not slow it even more. But maybe we're lucky and RLBox somehow speeds up memory accesses or something :)
I profiled this as well, and as a matter of fact, the testcase does get slightly faster with rlbox — probably because mallocing in the sandbox is cheaper and there's not much real parsing work to be done.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Comment 43•3 years ago
|
||
Comment on attachment 9246234 [details]
Bug 1688452 - Turn on Wasm sandboxing for libexpat
Revision D128652 was moved to bug 1741995. Setting attachment 9246234 [details] to obsolete.
Assignee | ||
Comment 44•3 years ago
|
||
Try without turning on sandbox: https://treeherder.mozilla.org/jobs?repo=try&revision=213f71b0f39c7a1c3b6fb91234e90e7365f9202
Try with it enabled: https://treeherder.mozilla.org/#/jobs?repo=try&revision=489812dc8dacebaa84e31937b0c20dedbb1a16af
Comment 45•3 years ago
|
||
Comment 46•3 years ago
|
||
Backed out changeset 7893bbd002e0 (Bug 1688452) for causing bustages in nsRLBoxExpatDriver.
Backout link: https://hg.mozilla.org/integration/autoland/rev/c73bcc465e0c2bce7debb0a86277e2dcb27444e4
Push with failures, failure log.
Comment 47•3 years ago
|
||
Assignee | ||
Updated•3 years ago
|
Comment 48•3 years ago
|
||
Backed out for causing mochitest failures in test_blockParsing & wpt failures
Backout link: https://hg.mozilla.org/integration/autoland/rev/72b186a7935f3cf4c2c6d7cbf10fde98523b48f1
Comment 49•3 years ago
|
||
Comment 50•3 years ago
|
||
bugherder |
Assignee | ||
Updated•3 years ago
|
Description
•