Network loads seem to be started very late in this profile
Categories
(Core :: DOM: Service Workers, defect, P2)
Tracking
()
People
(Reporter: mstange, Assigned: smaug)
References
(Depends on 2 open bugs, Blocks 1 open bug)
Details
(Keywords: perf:pageload, Whiteboard: [necko-triaged])
Attachments
(4 files, 2 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
baku
:
feedback+
|
Details | Diff | Splinter Review |
(deleted),
text/x-phabricator-request
|
Details |
Please take a look at the network charts in this profile: http://bit.ly/2WgurnM, or in this profile: http://bit.ly/2WaMOL0
These are profiles from bug 1522311.
It seems to me that what's happening here is that the content process knows about network loads it wants to do many seconds before the parent process knows about them. And then the actual network loads start extremely late, long after they could have started. In fact, the network is idle during most of the time when the user is looking at a blank screen.
What causes these network loads to start in the end? It looks like it's not happening until the content process main thread becomes available. Do these loads require a jump through the content process event loop before they get forwarded to the parent process?
Comment hidden (obsolete) |
Assignee | ||
Comment 2•6 years ago
|
||
http://bit.ly/2DsPtIw smells like service workers, but I could be very wrong.
Assignee | ||
Comment 3•6 years ago
|
||
Is this possibly a regression from https://bug1300659.bmoattachments.org/attachment.cgi?id=8808282 or did that make the situation worse?
Assignee | ||
Updated•6 years ago
|
Comment 4•6 years ago
|
||
I may sound a little bit pushy, perhaps, but I think BT would help here, clearly. If you are using a custom build, I would love to help to make the BT patch work and try to get something out of the instrumented build.
Comment 5•6 years ago
|
||
To provide some general SW context:
- Service Worker interception currently happens in the child, although the e10s refactor will change that and is nearly done.
- The parent only gets told about network requests if the ServiceWorker decides not to intercept.
- The interception gets reset on the main thread (in the content process) because that's where the network request was happening in the first place.
- fetch() calls made from workers do get bounced off the main thread off the content process
And to provide some context from the imgur SW:
- It appears to implement a "fetch" event handler, so the SW has to be spun up for all requests if it's not already spun up. (It actually appears to implement 2 "fetch" handlers? The minified/mangled source doesn't make it super clear what's going on when trivially beautified.)
- It appears to use pass-through fetch, which means that it effectively ends up doing event.respondWith(fetch(event.request)) which means that the the Response (headers) has to come down to the content process main thread, then bounce to the SW to resolve the promise, then get bounced back to the main thread to actually get the data flowing. The good news is the input stream stuff should end up efficiently streaming, although things could be further improved.
So in general, if the main thread in the content process is experiencing a lot of contention, yeah, it's a problem.
Comment 6•6 years ago
|
||
And to answer :smaug's question, it's certainly possible that the interactions of a highly contended main thread and the throttled event queue could make things worse.
Reporter | ||
Comment 7•6 years ago
|
||
(In reply to Andrew Sutherland [:asuth] from comment #5)
So in general, if the main thread in the content process is experiencing a lot of contention, yeah, it's a problem.
As a rule of thumb, the main thread of the content process is always contended. If we are round-tripping through it, we add arbitrary delays to everything. I think it's extremely important to address this problem.
Comment 8•6 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #7)
As a rule of thumb, the main thread of the content process is always contended. If we are round-tripping through it, we add arbitrary delays to everything. I think it's extremely important to address this problem.
Yes, this is a known issue. If you'd like to help, please take a look at bug 1330826 and bug 1351231. Thanks!
Reporter | ||
Comment 9•6 years ago
|
||
Thanks for the pointers! And thanks for figuring out what causes the problem on this page.
Reporter | ||
Updated•6 years ago
|
Comment 10•6 years ago
|
||
Talking with :smaug on IRC right now, I should also mention that the SW also seemed to potentially be doing Cache API lookups. These happen via PBackground and can 100% be impacted by PBackground contention (WebRTC can block this for dumb reasons) or disk I/O contention and would also stall the pass-through fetches. In the short term I'm hoping to add MOZ_LOG stuff for ServiceWorkers and the Cache API which can help get us more profiler markers with :jesup's show-MOZ_LOG-as-markers fix thing to better understand rather than guess. (Has that landed?)
Assignee | ||
Updated•6 years ago
|
Comment 11•6 years ago
|
||
ytausky, can be interesting for you to work on this? The idea is probably to have 2 ways to dispatch worker runnables: with and without throttable queue. For networking runnables we should not use the throttable queue.
Comment 12•6 years ago
|
||
The priority flag is not set for this bug.
:selenamarie, could you have a look please?
Updated•6 years ago
|
Updated•6 years ago
|
Assignee | ||
Comment 13•6 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=de6e79b3125ed00fa523c798d2e94973899c5b0d
Because of the debugger stuff, the queue handling in workers is a tad weird.
normal postMessage uses always mMainThreadDebuggeeEventTarget.
So this messaging queue is in fact used only by console API for now.
Assignee | ||
Comment 14•6 years ago
|
||
remote: View your change here:
remote: https://hg.mozilla.org/try/rev/74d184574e47797bc0a8414b6acf6c3692c2a018
remote:
remote: Follow the progress of your build on Treeherder:
remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=74d184574e47797bc0a8414b6acf6c3692c2a018
remote: recorded changegroup in replication log in 0.041s
Comment 15•6 years ago
|
||
Sorry for not being responsive, I just came back from vacation. I'm assuming :smaug is working on this now? Please ping me again if I should still take over.
Assignee | ||
Comment 16•6 years ago
|
||
remote: Follow the progress of your build on Treeherder:
remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=da1222000d145e9979b01d9d25739129b711259f
remote: recorded changegroup in replication log in 0.048s
push complete
Assignee | ||
Comment 17•6 years ago
|
||
remote: Follow the progress of your build on Treeherder:
remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9361b86d35c305d1c1771b91639bfd5391409b42
Assignee | ||
Comment 18•6 years ago
|
||
"ForMessaging" is a bit weird name currently, but that is mostly because postMessage ends up using mMainThreadDebuggeeEventTarget queue (which super weird).
There are two perhaps unexpected cases,
(1) LifeCycleEventWatcher uses the lower priority queue so that .installing on main thread gets set correctly (and the case is sort about messaging)
(2) TopLevelWorkerFinishedRunnable uses lower priority queue so that all the other runnables dispatched during shutdown (some of them are dispatched without worker queues) are handled before.
Workers event loop handling is a tad weird even without the patch. Some of the runnables go to the throttable queue, meaning basically that they have lower priority, but some runnables bypass that and use just the normal NS_DispatchToMainThread. That feels error prone.
The patch should let us create certain objects which require main thread sooner, or open network connections etc. But still, there is main thread usage, and main thread is mostly blocked during the whole page load, so eventually we should remove all the unneeded main thread usage.
Comment 19•6 years ago
|
||
Assignee | ||
Comment 20•6 years ago
|
||
try push, pref off
remote:
remote: Follow the progress of your build on Treeherder:
remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c6061f4bae4a336c7ed99e19e0be6c51af148613
Assignee | ||
Comment 21•6 years ago
|
||
try push, pref on
remote:
remote: Follow the progress of your build on Treeherder:
remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=218bc59a2959982880545c1a9da430fc597251f6
remote: recorded changegroup in replication log in 0.046s
Assignee | ||
Comment 22•6 years ago
|
||
If main thread is busy handling runnables in the normal priority queue, control-type of messages from
workers are possibly postponed to run after those. That can lead to bad performance, if the page
expects workers to be able to proceed simultanously with the main thread.
This patch makes the control messages to use medium high priority queue in order to try to
ensure they are handled in timely manner.
Pref dom.worker.use_medium_high_event_queue can be set to false to disable this new behavior.
Comment 23•6 years ago
|
||
Comment 24•6 years ago
|
||
bugherder |
Updated•6 years ago
|
Updated•3 years ago
|
Description
•