Open
Bug 1164539
Opened 10 years ago
Updated 2 years ago
[meta] postMessage during document load is slow
Categories
(Core :: General, defect, P5)
Tracking
()
NEW
tracking-b2g | backlog |
People
(Reporter: wilsonpage, Unassigned, NeedInfo)
References
(Depends on 3 open bugs)
Details
(Keywords: meta, perf, Whiteboard: [platform])
Attachments
(4 files)
window.postMessage() between window and iframe during document load appears to be delayed.
TEST-CASE: https://github.com/wilsonpage/post-message-startup-perf
1. git clone
2. open index.html in desktop (or flash app to phone)
3. Check console for timings
4. Press 'reconnect' button in app
5. Check console for timings
---
We have observed the test to be up to 10x slower at startup.
Reporter | ||
Comment 1•10 years ago
|
||
Chrome seems to have consistent postMessage timings, during load and loaded.
Comment 2•10 years ago
|
||
I wonder if we should just have a separate runnable queue for window <-> window messaging.
Called around the end of a main thread task.
Reporter | ||
Comment 3•10 years ago
|
||
Split test-case [1] into BroadcastChannel example and window.postMessage() example.
BroadcastChannel appears to have a similar delay on document load.
[1] https://github.com/wilsonpage/post-message-startup-perf
Comment 4•10 years ago
|
||
Implementing https://html.spec.whatwg.org/multipage/comms.html#posted-message-task-source would help here.
Reporter | ||
Updated•10 years ago
|
Comment 5•10 years ago
|
||
Sounds like also Chrome just uses main thread the way we do, so using a separate queue for
post messages might be too regression risky.
What I don't understand where we spend time between posting the message and receiving the message on the other end, and why chrome doesn't take any time there even during page load.
Reporter | ||
Updated•10 years ago
|
Blocks: nga-toolkit-communication
Reporter | ||
Updated•10 years ago
|
Comment 6•10 years ago
|
||
From the profile [1] Wilson captured, I see two big chunks at B2G process:
- [3602, 3662] system app handling mozbrowserloadend event
- [3677, 3758] nsRefreshDriver::Tick
The long refresh driver tick seems relate to bug 1110625, bug 1163471 is trying to fix it.
[1] https://people.mozilla.org/~bgirard/cleopatra/#report=d6b4d410427c59d9240294571fc6ca7eb93c3941
Comment 7•10 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #6)
> From the profile [1] Wilson captured, I see two big chunks at B2G process:
>
> - [3602, 3662] system app handling mozbrowserloadend event
> - [3677, 3758] nsRefreshDriver::Tick
>
> The long refresh driver tick seems relate to bug 1110625, bug 1163471 is
> trying to fix it.
>
> [1]
> https://people.mozilla.org/~bgirard/cleopatra/
> #report=d6b4d410427c59d9240294571fc6ca7eb93c3941
I just learnt from Cervantes that broadcast channel message is dispatched to listeners by B2G process background thread, not main thread, and will be handled in content process main thread. So these two chunks are not important at all.
Comment 8•10 years ago
|
||
Since the sender and receiver are the same content process, yield after postMessage() as early as possible to receive the message earlier. Off load JS parsing and CSS rule matching should also help since this is during app startup.
I'll double check the profile once I reach a PC.
Comment 9•10 years ago
|
||
I see 4 postMessage() and 4 onmessage() in process Contactsproto:13409, I assume they're in pair:
1. [3426, 3493]
1a. [3427, 3432] nsInputStreamPump::OnStateStop() / threads.js is still running after postMessage()
1b. [3438, 3481] nsHtml5TreeOpExecutor::RunFlushLoop() / list.js, contacts.js, and styles
1c. [3482, 3491] Timer::Fire() / PresShell::Flush(Flush_Layout)
2. [3632, 3781]
2a. [3640, 3658] PCompositor::RecvDidComposite() / PresShell::Paint
2b. [3660, 3672] nsInputStreamPump::OnStateStart() / AppsService.js
2c. [3687, 3759] nsInputStreamPump::OnStateStop() / parse threads.js and run
2d. [3766, 3779] nsHTML5TreeOpExecutor::RunFlushLoop() / contacts.js
3. [3752, 3814]
2c.
2d.
3a. [3783, 3795] nsRefreshDriver::Tick / PresShell::Flush(Flush_InterruptibleLayout)
3b. [3800, 3813] nsRefreshDriver::Tick / PresShell::Paint
4. [3854, 3864]
One thing I noticed is 1a and 2c seems exactly the same thing, parsing and loading threads.js but from 2 different iframes. :smaug, is it ok to skip the 2nd parsing?
Another idea came up is the nested OOP so running scripts at outer iframe does not block the inner iframe to receive broadcast message. I just asked Kanru, it can be enabled by the pref "dom.ipc.tabs.nested.enabled", but it's not complete yet, he says using IndexedDB and Blob could crash it.
Flags: needinfo?(bugs)
Reporter | ||
Comment 10•10 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #9)
> One thing I noticed is 1a and 2c seems exactly the same thing, parsing and
> loading threads.js but from 2 different iframes. :smaug, is it ok to skip
> the 2nd parsing?
This could be fixed by JS 'bytecode-caching' (bug 1164389)
> Another idea came up is the nested OOP so running scripts at outer iframe
> does not block the inner iframe to receive broadcast message. I just asked
> Kanru, it can be enabled by the pref "dom.ipc.tabs.nested.enabled", but it's
> not complete yet, he says using IndexedDB and Blob could crash it.
I don't really understand what this means :)
Comment 11•10 years ago
|
||
(In reply to Wilson Page [:wilsonpage] from comment #10)
> > Another idea came up is the nested OOP so running scripts at outer iframe
> > does not block the inner iframe to receive broadcast message. I just asked
> > Kanru, it can be enabled by the pref "dom.ipc.tabs.nested.enabled", but it's
> > not complete yet, he says using IndexedDB and Blob could crash it.
>
> I don't really understand what this means :)
When there's only one process for nested iframes, if the main thread is running JS of outer iframe, even the message is arrived, it will be queued til the JS finishs. But if there're two processes (one for each iframe), then the inner iframe will not affect by outer iframe running whatever JS.
Comment 12•10 years ago
|
||
We are nowhere near having support for oop-iframe in general, and it will probably never work in case
the iframe is same-domain as its parent.
And yes, something like Bug 1164389 could help quite a bit when loading large script libraries to
several iframes.
But for now, any chance to split threads.js to smaller pieces? Some small part which initiates the communication between iframes and so, and then load large library when needed?
Flags: needinfo?(bugs)
Reporter | ||
Comment 13•10 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #12)
> But for now, any chance to split threads.js to smaller pieces? Some small
> part which initiates the communication between iframes and so, and then load
> large library when needed?
This is completely possible. I've already done some tests with smaller pieces and we can see some improvements in app load time. If/when we do have byte-code caching, it might be better to use the single large library so the other frames get super quick load?
Comment 14•10 years ago
|
||
(In reply to Olli Pettay [:smaug] (Reviewing overload, no new review requests before June 7, please) from comment #12)
> We are nowhere near having support for oop-iframe in general, and it will
> probably never work in case
> the iframe is same-domain as its parent.
Could you please explain me a bit, I don't understand why it won't work in this case? Thanks.
Comment 15•10 years ago
|
||
Because the web relies on that JS in window A and B run in the same thread in case A and B can access each others (which is the case when A for example has B in an iframe and A and B are from the same domain).
Reporter | ||
Comment 16•10 years ago
|
||
(In reply to Olli Pettay [:smaug] (Reviewing overload, no new review requests before June 7, please) from comment #15)
> Because the web relies on that JS in window A and B run in the same thread
> in case A and B can access each others (which is the case when A for example
> has B in an iframe and A and B are from the same domain).
If iframes are 'sandboxed' windows can't access each others' internal state. Could run in separate processes/threads under these conditions?
Reporter | ||
Comment 17•10 years ago
|
||
My crude diagnosis so far:
1. Script parse is quite expensive and should be minimised on startup. Byte-code caching (bug 1164389) could help here.
2. Creating new documents is quite expensive, especially on b2g where injected scripts do stuff to the document (eg. BrowserElementChildPreload.js bug 1170644)
3. There's some random GC happening on startup sometimes and this is blocking.
4. For an empty document we're painting too often (need gfx contact for more info here)
5. 'Main Thread IO!' is happening a lot (dunno if that's normal)
6. BroadcastChannel.postMessage takes longer to reach .onmessage handler than window.postMessage. This is like due to requiring two tasks to process as opposed to one.
7. Layout may be invalidated when injecting hidden iframe into document (probably shouldn't?).
Latest profile: http://people.mozilla.org/~bgirard/cleopatra/#report=b5ea5763e27bcc8238d4e15ef8af2b67c84c7fa2
Reporter | ||
Updated•9 years ago
|
Blocks: nga-performance
Comment 18•9 years ago
|
||
Bug 1154987 and bug 1166173 would also be helpful.
Reporter | ||
Comment 19•9 years ago
|
||
benwa: Can you take a look at comment 17 and see if there is anything unusual gfx wise during startup of this demo app [1]. The app has no styling and simply loads one <iframe> from initial markup and then another via script.
Right now it's taking to long for us to receive a data from the last iframe via postMessage and we're trying to figure out why.
[1] https://github.com/wilsonpage/contacts_prototype/tree/perf/using-threads-library
Flags: needinfo?(bgirard)
Comment 20•9 years ago
|
||
What if move broadcast message listener to a worker, can we do the work off main thread?
Comment 21•9 years ago
|
||
(In reply to Wilson Page [:wilsonpage] from comment #17)
> My crude diagnosis so far:
I think your diagnosis is correct and would be nice to measure each point of this list.
> 2. Creating new documents is quite expensive, especially on b2g where
> injected scripts do stuff to the document (eg. BrowserElementChildPreload.js
> bug 1170644)
Are we planning to rewrite it in C++? Would this help?
NI fabrice for this.
> 5. 'Main Thread IO!' is happening a lot (dunno if that's normal)
Can you give us a list of those?
Flags: needinfo?(fabrice)
Reporter | ||
Comment 22•9 years ago
|
||
(In reply to Andrea Marchesini (:baku) from comment #21)
> (In reply to Wilson Page [:wilsonpage] from comment #17)
> > My crude diagnosis so far:
>
> I think your diagnosis is correct and would be nice to measure each point of
> this list.
>
> > 2. Creating new documents is quite expensive, especially on b2g where
> > injected scripts do stuff to the document (eg. BrowserElementChildPreload.js
> > bug 1170644)
>
> Are we planning to rewrite it in C++? Would this help?
> NI fabrice for this.
>
> > 5. 'Main Thread IO!' is happening a lot (dunno if that's normal)
>
> Can you give us a list of those?
You can see them in the profile linked to [1] in comment 17. Make sure you have the 'Content (Contactsprotot:9304)' process highlighted (blue).
[1] http://people.mozilla.org/~bgirard/cleopatra/#report=b5ea5763e27bcc8238d4e15ef8af2b67c84c7fa2
Comment 23•9 years ago
|
||
The main thread I/O is a bug in the profiler. All the IPC calls cause it, and BenWa said on IRC to just not care about it (except it there is actual real file I/O or some such happening).
Comment 24•9 years ago
|
||
Kanru, do you think the loading of BrowserElementChildPreload.js can be a problem here?
any plan to rewrite Browser Element API in C++?
Flags: needinfo?(fabrice) → needinfo?(kchen)
Comment 25•9 years ago
|
||
(In reply to Andrea Marchesini (:baku) from comment #24)
> Kanru, do you think the loading of BrowserElementChildPreload.js can be a
> problem here?
> any plan to rewrite Browser Element API in C++?
I don't think the loading of BrowserElementChildPreload.js could be a problem other than the sync reflow it causes. Yes, there was a plan to rewrite Browser Element API in C++ and part of it has already been converted. Looks like it's time to convert the remaining parts to C++ too.
Flags: needinfo?(kchen)
Comment 26•9 years ago
|
||
Is rewriting the Browser Element API in C++ the only thing we can do here?
Comment 27•9 years ago
|
||
Split threads.js to smaller pieces so that we don't load massive amount scripts before doing anything.
Bug 1154987 may help, if there are spare cores on the device, and Bug 1164389 should help when
loading the same script to several documents.
From comment 17 (4) is an interesting case. Someone more familiar with layout or graphics should take a look. Why are we painting so often?
(I don't know what (7) from that list means. visibility: hidden or display: none; or what?)
Reporter | ||
Comment 28•9 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #27)
> From comment 17 (4) is an interesting case. Someone more familiar with
> layout or graphics should take a look. Why are we painting so often?
I NI'd :benwa, but no response as of yet.
> (I don't know what (7) from that list means. visibility: hidden or display:
> none; or what?)
`display: none`
Comment 29•9 years ago
|
||
When CC or GC happens, are we dealing with a new process or loading some new html document to existing process? Since CC/GC are normally postponed while starting up a new process.
Reporter | ||
Comment 30•9 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #29)
> When CC or GC happens, are we dealing with a new process or loading some new
> html document to existing process? Since CC/GC are normally postponed while
> starting up a new process.
My test-case just works with iframes, so assuming all in same process.
Comment 31•9 years ago
|
||
Well, Contactsproto is some separate process.
Comment 32•9 years ago
|
||
Yes, please ignore the main thread IO unless there's actual file activity happening in the timeline there.
(In reply to Wilson Page [:wilsonpage] from comment #19)
> benwa: Can you take a look at comment 17 and see if there is anything
> unusual gfx wise during startup of this demo app [1]. The app has no styling
> and simply loads one <iframe> from initial markup and then another via
> script.
I don't see anything wrong. I just see that app running too much scripts and that being the bottleneck to starting fast. There's a fraction of gfx work to put our a frame here and there but nothing that really holds back the timeline.
Let me know if you have something in specific you want information on.
Flags: needinfo?(bgirard)
Reporter | ||
Comment 33•9 years ago
|
||
Oli can you relay your paint concerns to Benwa?
Flags: needinfo?(bugs)
Comment 34•9 years ago
|
||
I think it is more about layout and invalidation stuff. So, some layout developer might be better to answer why we need to paint so often.
dholbert, maybe you have some input to comment 17 (4).
Flags: needinfo?(bugs) → needinfo?(dholbert)
Comment 35•9 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #34)
> dholbert, maybe you have some input to comment 17 (4).
(In reply to Wilson Page [:wilsonpage] from comment #17)
> 4. For an empty document we're painting too often (need gfx contact for more
> info here)
> http://people.mozilla.org/~bgirard/cleopatra/#report=b5ea5763e27bcc8238d4e15ef8af2b67c84c7fa2
We're painting 60 times a second; for an emptyish document, I'd hope we could skip some of those. We're spending nonzero time in style recalculation / display list construction, so we apparently think there are *some* style changes that are happening & need flushing (which trigger repaints).
I can't really figure out anything more from this profile, though, since it doesn't have C++ backtraces. Do we have any sort of sampling profiler (like the gecko profiler add-on) which works on B2G?
Flags: needinfo?(dholbert)
Comment 36•9 years ago
|
||
You can use -f stackwalk to get stackwalk on b2g. This will give you less accurate sampling but with more details. The cause for the style flush is coming from the (previous?) tick. So there's definitely something occurring.
Comment 37•9 years ago
|
||
Also, immediately before each style flush, the waterfall shows a teeny "scripts" blip, which corresponds to this section:
http://mxr.mozilla.org/mozilla-central/source/layout/base/nsRefreshDriver.cpp?rev=f52c18aac7ce&mark=1540-1567#1540
... which I think means we're running some requestAnimationFrame (rAF) callbacks.
Those rAF scripts could conceivably be restyling something in a way that makes us flush styles.
Alternately/also: since this is at document load-time, we might be progressively loading the document's stylesheets, and we're getting more and more of the styles loaded every refresh-driver tick, and that impacts the <body> or something in such a way that we think we need to repaint.
(These are semi-shots in the dark; not sure if either is on the right track here.)
Comment 38•9 years ago
|
||
We should have all of the UA style sheets available in the style sheet cache, so I wouldn't expect we need to wait for any loads there. If there are other style sheets loaded by the page/iframe then that's another matter.
Reporter | ||
Updated•9 years ago
|
Reporter | ||
Comment 39•9 years ago
|
||
Devtools performance profile of an empty document with an empty <iframe hidden> sending a child-ping<->parent-pong postMessage() during document load.
We can see it's taking >70ms to receive the 'pong' message back from the parent window. This is almost instant when the document is idle.
There is a small 'layout' event and a 'paint' event that could be partially responsible for blocking the 'message' event here.
There are a succession of small layout and recalc-style events at the start of the profile which don't seem necessary considering we don't have any nodes to layout (excl. <html> and <body>).
Reporter | ||
Comment 40•9 years ago
|
||
I will follow this up with a proper Gecko profile, just waiting for B2G to build...
Reporter | ||
Comment 41•9 years ago
|
||
(In reply to Daniel Holbert [:dholbert] (vacation 8/27-9/13) from comment #35)
> (In reply to Olli Pettay [:smaug] from comment #34)
> > dholbert, maybe you have some input to comment 17 (4).
> (In reply to Wilson Page [:wilsonpage] from comment #17)
> > 4. For an empty document we're painting too often (need gfx contact for more
> > info here)
> > http://people.mozilla.org/~bgirard/cleopatra/#report=b5ea5763e27bcc8238d4e15ef8af2b67c84c7fa2
>
> We're painting 60 times a second; for an emptyish document, I'd hope we
> could skip some of those. We're spending nonzero time in style recalculation
> / display list construction, so we apparently think there are *some* style
> changes that are happening & need flushing (which trigger repaints).
>
> I can't really figure out anything more from this profile, though, since it
> doesn't have C++ backtraces. Do we have any sort of sampling profiler (like
> the gecko profiler add-on) which works on B2G?
How can I get C++ backtraces included in my profile?
Flags: needinfo?(dholbert)
Reporter | ||
Comment 42•9 years ago
|
||
It appears from this profile that this function within BrowserElementChildPreload.js [1] is blocking the main thread for 33ms.
I'm not exactly sure what the job of this function is, but wondering if it could be delayed until after we've got content painted?
[1] https://github.com/mozilla/gecko-dev/blob/master/dom/browser-element/BrowserElementChildPreload.js#L1583
Reporter | ||
Comment 43•9 years ago
|
||
After our app has finished doing its thing we can see ~50ms is spent in System app:
33ms - spent in window management, 22ms of which is statusbar.js which is causing a sync reflow.
11ms - Spent inside the l10n.js responding to DOM mutations.
Reporter | ||
Comment 44•9 years ago
|
||
Between BrowserElementChild and BrowserElementParent scripts we're adding ~100ms to the critical-path of every fxos app. Although this work must be done, I'm sure majority of it isn't as critical as letting the app render its content.
FULL PROFILE: https://people.mozilla.org/~bgirard/cleopatra/#report=9b2be1600fbc5bb79d5468ffc515896905a26631
Reporter | ||
Comment 45•9 years ago
|
||
Other interesting parts:
52ms - Painting empty document
49ms - Layout of empty document (Flush_InterruptibleLayout)
30ms - In clock_gettime (system/lib/libc.so)
Reporter | ||
Comment 46•9 years ago
|
||
Comment 47•9 years ago
|
||
(In reply to Wilson Page [:wilsonpage] from comment #42)
> Created attachment 8657076 [details]
> ping-pong-gecko-profile.png
>
> It appears from this profile that this function within
> BrowserElementChildPreload.js [1] is blocking the main thread for 33ms.
>
> I'm not exactly sure what the job of this function is, but wondering if it
> could be delayed until after we've got content painted?
>
> [1]
> https://github.com/mozilla/gecko-dev/blob/master/dom/browser-element/
> BrowserElementChildPreload.js#L1583
That is bug 926452
Comment 48•9 years ago
|
||
> > I can't really figure out anything more from this profile, though, since it
> > doesn't have C++ backtraces. Do we have any sort of sampling profiler (like
> > the gecko profiler add-on) which works on B2G?
>
> How can I get C++ backtraces included in my profile?
I don't know (on b2g), sorry. (I was asking the same question, RE a sampling profiler, which would have c++ backtraces)
Flags: needinfo?(dholbert)
Comment 49•9 years ago
|
||
(In reply to Daniel Holbert [:dholbert] (vacation 8/27-9/13) from comment #48)
> > How can I get C++ backtraces included in my profile?
>
> I don't know (on b2g), sorry. (I was asking the same question, RE a sampling
> profiler, which would have c++ backtraces)
You need to include "stackwalk" [1] in the start command, e.g., ./profile.sh -f stackwalk,js
[1] https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Profiling_with_the_Built-in_Profiler#Stackwalk
Comment 50•9 years ago
|
||
Wilson, the clock_gettime() you asked me yesterday is the regression from bug 1166207 and Cervantes is fixing it.
Updated•9 years ago
|
blocking-b2g: --- → 2.5+
Updated•9 years ago
|
Flags: needinfo?(dbaron)
Comment 51•9 years ago
|
||
Making it a P1. This bug is a must have for Music App, and as listed, it makes the app 10 times slower.
Priority: -- → P1
Comment 52•9 years ago
|
||
Given that this is a meta bug, moving from DOM to General.
Component: DOM → General
Comment 53•9 years ago
|
||
After Cervantes fixes bug 1166207, does Music app still have performance regression bugs like what comment 51 mentions?
Flags: needinfo?(hkoka)
Whiteboard: [platform]
Reporter | ||
Comment 54•9 years ago
|
||
(In reply to Ken Chang[:ken] from comment #53)
> After Cervantes fixes bug 1166207, does Music app still have performance
> regression bugs like what comment 51 mentions?
We've worked around the issue by using a tip from :smaug that gives us synchronous postMessage-like behaviour using window.dispatchEvent().
As I said this is a 'workaround' and masks the problem. Our postMessage task is just one in a long queue of 'runnables' during document load. This means message can't be delivered until the other tasks ahead of it have completed.
To speed this up we have some options:
- Do less stuff on startup (tricky when you have UI to draw)
- Move some of these blocking tasks of main-thread (dependencies of this bug tracking some of these efforts)
- Somehow prioritise postMessage tasks over others on the main-thread
We'd need someone with in-depth platform knowledge in this area to take this further.
Updated•9 years ago
|
Keywords: meta
Summary: postMessage during document load is slow → [meta] postMessage during document load is slow
Comment 55•9 years ago
|
||
(In reply to Ken Chang[:ken] from comment #53)
> After Cervantes fixes bug 1166207, does Music app still have performance
> regression bugs like what comment 51 mentions?
As Wilson mentioned in the recent comment, we have a workaround and as a result this bug will not block Music app for 2.5.
I am Ni'ing Francisco also here to make sure Messages and Contacts are not affected by this one.
Thanks
hema
Flags: needinfo?(hkoka) → needinfo?(fjordano)
Updated•9 years ago
|
Updated•9 years ago
|
Flags: needinfo?(fjordano) → needinfo?(francisco)
Comment 56•9 years ago
|
||
I've been double checking with Julien and Oleg and they confirm they are not using the postMessage until visually completed.
This will change after 2.5, but definitely we won't be blocking here for the branching.
Flags: needinfo?(francisco)
Comment 57•9 years ago
|
||
(In reply to Francisco Jordano [:arcturus] [:francisco] from comment #56)
> I've been double checking with Julien and Oleg and they confirm they are not
> using the postMessage until visually completed.
>
> This will change after 2.5, but definitely we won't be blocking here for the
> branching.
Based on the above and the implementation of workaround, this is not currently blocking for Nov 2nd RA for 2.5 release. However, please put this on the priority list for the next release.
Thanks
Hema
blocking-b2g: 2.5+ → ---
Comment 58•9 years ago
|
||
[Tracking Requested - why for this release]:
Adding the tracking flag as this is a Priority to handle post 2.5
tracking-b2g:
--- → backlog
Comment 59•6 years ago
|
||
Decreasing the priority as no update for the last 2 years on this bug.
See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage
about the priority meaning.
Priority: P1 → P5
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•