Closed Bug 941818 Opened 11 years ago Closed 9 years ago

Implement DOM workers using JS helper threads

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: mccr8, Unassigned)

References

Details

(Whiteboard: [MemShrink:P2])

bhackett had this idea. Per-runtime data structures like the atoms table are hard to share between runtimes, so if we can move DOM workers into a shared runtime, or the main threads runtime, on separate JS workers, then we could reduce the overhead of having a DOM worker. bhackett's work on off-thread Ion compilation (bug 785905) has been moving us closer to being able to do this, but more needs to be done beyond that.
Depends on: 941819
Depends on: 941820
Whiteboard: [MemShrink]
Summary: Implement DOM workers using JS workers → Implement DOM workers using JS helper threads
Whiteboard: [MemShrink] → [MemShrink:P2]
Thinking and measuring some more related to this, I think it would be better if we still had separate runtimes, but shared things between them when possible. With an empty DOM worker (i.e. Worker('foo.js') where foo.js is empty), if we shared atoms and script data between runtimes then we would save a good deal of memory. I don't think using separate runtimes would improve much on this in either the empty or non-empty worker case, since we share so little data between zones as it is. Looking at about:memory info for a memory-usage-minimized empty worker on x64: 1.10 MB (01.15%) -- worker(script.js, 0x7fab38fd9800) ├──0.47 MB (00.50%) -- runtime │ ├──0.25 MB (00.26%) ── gc-marker │ ├──0.09 MB (00.09%) ── script-data │ ├──0.06 MB (00.07%) ── atoms-table │ ├──0.06 MB (00.07%) ── runtime-object │ ├──0.00 MB (00.00%) ── dtoa │ ├──0.00 MB (00.00%) ── interpreter-stack │ ├──0.00 MB (00.00%) ── temporary │ ├──0.00 MB (00.00%) ── contexts │ ├──0.00 MB (00.00%) ── script-sources │ ├──0.00 MB (00.00%) ++ code │ ├──0.00 MB (00.00%) ── math-cache │ └──0.00 MB (00.00%) ── regexp-data ├──0.25 MB (00.27%) -- zone(0x7fab35c75000) │ ├──0.21 MB (00.22%) -- compartment(web-worker) │ │ ├──0.08 MB (00.08%) ++ objects │ │ ├──0.08 MB (00.08%) ++ shapes │ │ ├──0.05 MB (00.05%) ++ scripts │ │ └──0.01 MB (00.01%) ++ sundries │ ├──0.04 MB (00.04%) ── unused-gc-things │ └──0.01 MB (00.01%) ── sundries/gc-heap ├──0.22 MB (00.23%) -- zone(0x7fab38fdc800) │ ├──0.20 MB (00.22%) ++ strings │ ├──0.01 MB (00.01%) ++ sundries │ ├──0.01 MB (00.01%) ── unused-gc-things │ └──0.00 MB (00.00%) ++ compartment(web-worker-atoms) ├──0.12 MB (00.12%) -- zone(0x7fab34dae800) │ ├──0.07 MB (00.07%) -- compartment(web-worker) │ │ ├──0.03 MB (00.04%) ++ shapes │ │ ├──0.02 MB (00.02%) ++ objects │ │ └──0.01 MB (00.01%) ++ sundries │ ├──0.04 MB (00.04%) ── unused-gc-things │ ├──0.01 MB (00.01%) ── type-pool │ └──0.00 MB (00.00%) ── sundries/gc-heap └──0.03 MB (00.03%) ++ gc-heap An empty worker isn't actually empty, there are scripts and such created in the runtime though I don't know what they're there for. The first zone above has this stuff, the second zone has the atoms, and I don't know what the third zone is for. Anyways, sharing the atoms with the main runtime would eliminate: - The worker's atoms zone (.22 MB) - The runtime's atoms table (.06 MB) - Most of the runtime object itself, the bits used for common names and static strings (.04 MB) Sharing the script data table with the main runtime (which in general will only help if workers are running overlapping code) would eliminate .09 MB. I don't know why the gc-marker is taking up .25 MB and that should be fixed. If it is, and the above sharing done too, the memory consumption for the worker would go down to roughly .44 MB.
Oh, and with .37 MB being taken up by the non-atoms zones in the worker, which we'd be using even if the worker were part of the main runtime, there would only be about .44 - .37 => .07 MB of bookkeeping overhead left incurred by having a separate worker runtime.
> I don't know why the gc-marker is taking up .25 MB and that should be fixed. It's the GC mark buffer. Apparently it is needed; see bug 921224 where I greatly reduced its size.
Keeping separate runtimes also has the benefit of not requiring me to write any new code ;) Thanks for measuring this. Anything I can do to help?
(In reply to Brian Hackett (:bhackett) (Vacation until early February) from comment #1) > An empty worker isn't actually empty, there are scripts and such created in > the runtime though I don't know what they're there for. The first zone > above has this stuff, the second zone has the atoms, and I don't know what > the third zone is for. The first zone is probably the self-hosting stuff, and the third is for the content global. In bug 921213, nbp proposed to merge the two. If we shared the atoms between runtimes, that would leave just one zone. > Sharing the script data table with the main runtime (which in general will > only help if workers are running overlapping code) would eliminate .09 MB. What about LazyScripts? Sharing those, if possible, would reduce parsing time for common scripts > > I don't know why the gc-marker is taking up .25 MB and that should be fixed. > If it is, and the above sharing done too, the memory consumption for the > worker would go down to roughly .44 MB. (In reply to Brian Hackett (:bhackett) (Vacation until early February) from comment #2) > Oh, and with .37 MB being taken up by the non-atoms zones in the worker, > which we'd be using even if the worker were part of the main runtime, there > would only be about .44 - .37 => .07 MB of bookkeeping overhead left > incurred by having a separate worker runtime. In one runtime, we might be able to share the self-hosting compartment completely. I'm not sure about this, though, because we have a, rarely-used, mechanism for running code *in* the self-hosting compartment instead of only cloning things over on access. In any case, once bug 900292 lands (with proper support for retrieving the source instead of needlessly storing a copy in memory), the remaining overhead should go down quite a bit. Though I have to say that 50kb of scripts and 90kb of shared script data for the self-hosting stuff is suspiciously low.
(In reply to Till Schneidereit [:till] from comment #5) > The first zone is probably the self-hosting stuff, and the third is for the > content global. OK, thanks. What are the circumstances when we run code in the self-hosting compartment? It seems like we should only need to create a single copy of the self-hosting values in the main runtime, and clone off of the main runtime's self hosting compartment from within worker runtimes. That depends on the contents of the self-hosting compartment being stable though --- we'd need to protect against code running in the compartment or delazification if those could occur. Lazification in the self hosting compartment would complicate things, though maybe it wouldn't be too bad. Either allow cloning LazyScripts when reading the self-hosting compartment from a worker thread, or dispatch an event to the main thread so it can delazify the script while the worker waits. Removing the self hosting zone would drop the total size of the empty worker to .19 MB or so.
(In reply to Brian Hackett (:bhackett) (Vacation until early February) from comment #6) > What are the circumstances when we run code in the self-hosting compartment? The only real use case we have right now is bug 899361. Well, the only use case for running code that's not initialization-related. The Intl stuff is initialized lazily, and runs in parts as required, since bug 919872 landed. For the first use case, we could conceivably find another solution. The lazy Intl initialization is hard to work around, though. The question is if it can somehow be special-cased. I think the script execution in that case is always triggered from C++, so maybe we could take a lock here? There are, I think, three different sub-objects of Intl that are lazily initialized, so this slow path would be taken at most three times per runtime (or, in a better future, process). CCing Waldo to weigh in on this. > It seems like we should only need to create a single copy of the > self-hosting values in the main runtime, and clone off of the main runtime's > self hosting compartment from within worker runtimes. That depends on the > contents of the self-hosting compartment being stable though --- we'd need > to protect against code running in the compartment or delazification if > those could occur. Lazification in the self hosting compartment would > complicate things, though maybe it wouldn't be too bad. Either allow > cloning LazyScripts when reading the self-hosting compartment from a worker > thread, or dispatch an event to the main thread so it can delazify the > script while the worker waits. I think we could forbid relazification in the self-hosting compartment without meaningful downsides. I'd really like to enable lazy parsing, if possible, though. Sharing with web workers is more important, though. OTOH, maybe the same solution can be used for the Intl initialization issue? > > Removing the self hosting zone would drop the total size of the empty worker > to .19 MB or so. That sounds fantastic.
Flags: needinfo?(jwalden+bmo)
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #4) > Thanks for measuring this. Anything I can do to help? One question right now (re the self hosting lazification stuff above): are there any cases right now where the main thread can block while waiting for the worker to do something?
Currently there is only one; we block the main thread briefly while we wait for a worker to block itself and generate a memory report. We use all sorts of tricks to make that blocking time as short as possible but there is still a small window where the main thread is blocked. Soon, though, we will have another for Shumway. The design there is to post a message to a worker and block the main thread while we wait for it to respond. We can't use any fancy tricks in this case so the main thread could conceivably block for an extended amount of time.
(In reply to Till Schneidereit [:till] from comment #7) > (In reply to Brian Hackett (:bhackett) (Vacation until early February) from > comment #6) > > What are the circumstances when we run code in the self-hosting compartment? > > The only real use case we have right now is bug 899361. Well, the only use > case for running code that's not initialization-related. The Intl stuff is > initialized lazily, and runs in parts as required, since bug 919872 landed. The lazy Intl initialization bits don't run in the self-hosted compartment. They're cloned out of it just like most other self-hosted code. So I don't think they pose a problem for this at all. > For the first use case, we could conceivably find another solution. A C++-implemented self-hosting builtin is possible. You still need to store a WeakMap with stuff from all compartments somehow, if you were to do it that way. It's just much messier than being able to define it in JS, for the most part. > I think the script execution in that case is always triggered from C++, so > maybe we could take a lock here? There are, I think, three different sub- > objects of Intl that are lazily initialized, so this slow path would be > taken at most three times per runtime (or, in a better future, process). No to the first, yes to the second. For the first, it would be triggered by var nft = Intl.NumberFormat(...); nft.format(...); // triggered here Basically anything that needs to elaborate the internal properties of an Intl object will trigger one of those three lazy-init bits, once for each per global, until they're all initialized.
Flags: needinfo?(jwalden+bmo)
(In reply to Andrew McCreight [:mccr8] from comment #0) > bhackett had this idea. Per-runtime data structures like the atoms table > are hard to share between runtimes, It was hard, but we do this now. > so if we can move DOM workers into a > shared runtime, or the main threads runtime, on separate JS workers, then we > could reduce the overhead of having a DOM worker. bhackett's work on > off-thread Ion compilation (bug 785905) has been moving us closer to being > able to do this, but more needs to be done beyond that. This isn't really feasible. We'd have to convert the entire engine over to ExclusiveContext and even that would not guarantee thread safety to any significant margin. Given that we've gotten the vast majority of the memory benefits through a simpler mechanism, I'm going to WONTFIX this.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.