Closed Bug 1522316 Opened 6 years ago Closed 6 years ago

Network loads seem to be started very late in this profile

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla67

Project Flags:

Performance Impact

high

Tracking Flags:

Tracking

Status

firefox66

---

wontfix

firefox67

---

fixed

People

(Reporter: mstange, Assigned: smaug)

References

(Depends on 2 open bugs, Blocks 1 open bug)

Details

(Keywords: perf:pageload, Whiteboard: [necko-triaged])

Attachments

(4 files, 2 obsolete files)

wip 6 years ago Olli Pettay [:smaug][bugs@pettay.fi] (deleted), patch		Details \| Diff \| Splinter Review
wip (v3) 6 years ago Olli Pettay [:smaug][bugs@pettay.fi] (deleted), patch		Details \| Diff \| Splinter Review
unthrottled_worker_queue_4.diff 6 years ago Olli Pettay [:smaug][bugs@pettay.fi] (deleted), patch		Details \| Diff \| Splinter Review
unthrottled_worker_queue_5.diff 6 years ago Olli Pettay [:smaug][bugs@pettay.fi] (deleted), patch		Details \| Diff \| Splinter Review
unthrottled_worker_queue_7.diff 6 years ago Olli Pettay [:smaug][bugs@pettay.fi] (deleted), patch	baku : feedback+	Details \| Diff \| Splinter Review
Bug 1522316, use medium high priority queue for worker->main thread control messages, r=baku 6 years ago Olli Pettay [:smaug][bugs@pettay.fi] (deleted), text/x-phabricator-request		Details

Markus Stange [:mstange]

Reporter

Description

•

6 years ago

Please take a look at the network charts in this profile: http://bit.ly/2WgurnM, or in this profile: http://bit.ly/2WaMOL0

These are profiles from bug 1522311.

It seems to me that what's happening here is that the content process knows about network loads it wants to do many seconds before the parent process knows about them. And then the actual network loads start extremely late, long after they could have started. In fact, the network is idle during most of the time when the user is looking at a blank screen.

What causes these network loads to start in the end? It looks like it's not happening until the content process main thread becomes available. Do these loads require a jump through the content process event loop before they get forwarded to the parent process?

Comment hidden (obsolete)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 2

•

6 years ago

http://bit.ly/2DsPtIw smells like service workers, but I could be very wrong.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 3

•

6 years ago

Is this possibly a regression from https://bug1300659.bmoattachments.org/attachment.cgi?id=8808282 or did that make the situation worse?

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

6 years ago

Flags: needinfo?(amarchesini)

Honza Bambas (:mayhemer)

Comment 4

•

6 years ago

I may sound a little bit pushy, perhaps, but I think BT would help here, clearly. If you are using a custom build, I would love to help to make the BT patch work and try to get something out of the instrumented build.

Andrew Sutherland [:asuth] (he/him)

Comment 5

•

6 years ago

To provide some general SW context:

Service Worker interception currently happens in the child, although the e10s refactor will change that and is nearly done.
- The parent only gets told about network requests if the ServiceWorker decides not to intercept.
- The interception gets reset on the main thread (in the content process) because that's where the network request was happening in the first place.
fetch() calls made from workers do get bounced off the main thread off the content process

And to provide some context from the imgur SW:

It appears to implement a "fetch" event handler, so the SW has to be spun up for all requests if it's not already spun up. (It actually appears to implement 2 "fetch" handlers? The minified/mangled source doesn't make it super clear what's going on when trivially beautified.)
It appears to use pass-through fetch, which means that it effectively ends up doing event.respondWith(fetch(event.request)) which means that the the Response (headers) has to come down to the content process main thread, then bounce to the SW to resolve the promise, then get bounced back to the main thread to actually get the data flowing. The good news is the input stream stuff should end up efficiently streaming, although things could be further improved.

So in general, if the main thread in the content process is experiencing a lot of contention, yeah, it's a problem.

Andrew Sutherland [:asuth] (he/him)

Comment 6

•

6 years ago

And to answer :smaug's question, it's certainly possible that the interactions of a highly contended main thread and the throttled event queue could make things worse.

Markus Stange [:mstange]

Reporter

Comment 7

•

6 years ago

(In reply to Andrew Sutherland [:asuth] from comment #5)

So in general, if the main thread in the content process is experiencing a lot of contention, yeah, it's a problem.

As a rule of thumb, the main thread of the content process is always contended. If we are round-tripping through it, we add arbitrary delays to everything. I think it's extremely important to address this problem.

Andrew Sutherland [:asuth] (he/him)

Comment 8

•

6 years ago

(In reply to Markus Stange [:mstange] from comment #7)

As a rule of thumb, the main thread of the content process is always contended. If we are round-tripping through it, we add arbitrary delays to everything. I think it's extremely important to address this problem.

Yes, this is a known issue. If you'd like to help, please take a look at bug 1330826 and bug 1351231. Thanks!

Markus Stange [:mstange]

Reporter

Comment 9

•

6 years ago

Thanks for the pointers! And thanks for figuring out what causes the problem on this page.

Markus Stange [:mstange]

Reporter

Updated

•

6 years ago

Depends on: 1522593

Markus Stange [:mstange]

Reporter

Updated

•

6 years ago

Depends on: 1330826, 1351231

Andrew Sutherland [:asuth] (he/him)

Comment 10

•

6 years ago

Talking with :smaug on IRC right now, I should also mention that the SW also seemed to potentially be doing Cache API lookups. These happen via PBackground and can 100% be impacted by PBackground contention (WebRTC can block this for dumb reasons) or disk I/O contention and would also stall the pass-through fetches. In the short term I'm hoping to add MOZ_LOG stuff for ServiceWorkers and the Cache API which can help get us more profiler markers with :jesup's show-MOZ_LOG-as-markers fix thing to better understand rather than guess. (Has that landed?)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

6 years ago

Whiteboard: [qf:p1:pageload]

Andrea Marchesini [:baku]

Comment 11

•

6 years ago

ytausky, can be interesting for you to work on this? The idea is probably to have 2 ways to dispatch worker runnables: with and without throttable queue. For networking runnables we should not use the throttable queue.

Flags: needinfo?(amarchesini) → needinfo?(ytausky)

BugBot [:suhaib / :marco/ :calixte]

Comment 12

•

6 years ago

The priority flag is not set for this bug.
:selenamarie, could you have a look please?

Flags: needinfo?(sdeckelmann)

Honza Bambas (:mayhemer)

Updated

•

6 years ago

Flags: needinfo?(sdeckelmann)

Priority: -- → P2

Whiteboard: [qf:p1:pageload] → [qf:p1:pageload][necko-triaged]

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

6 years ago

Blocks: 1335900

Selena Deckelmann :selenamarie :selena

Updated

•

6 years ago

Component: Networking → DOM: Service Workers

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 13

•

6 years ago

Attached patch wip (obsolete) (deleted) — Details — Splinter Review

https://treeherder.mozilla.org/#/jobs?repo=try&revision=de6e79b3125ed00fa523c798d2e94973899c5b0d

Because of the debugger stuff, the queue handling in workers is a tad weird.
normal postMessage uses always mMainThreadDebuggeeEventTarget.
So this messaging queue is in fact used only by console API for now.

Assignee: nobody → bugs

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 14

•

6 years ago

Attached patch wip (v3) (obsolete) (deleted) — Details — Splinter Review

remote: View your change here:
remote: https://hg.mozilla.org/try/rev/74d184574e47797bc0a8414b6acf6c3692c2a018
remote:
remote: Follow the progress of your build on Treeherder:
remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=74d184574e47797bc0a8414b6acf6c3692c2a018
remote: recorded changegroup in replication log in 0.041s

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

6 years ago

Depends on: 1524006

Yaron Tausky [:ytausky]

Comment 15

•

6 years ago

Sorry for not being responsive, I just came back from vacation. I'm assuming :smaug is working on this now? Please ping me again if I should still take over.

Flags: needinfo?(ytausky)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 16

•

6 years ago

Attached patch unthrottled_worker_queue_4.diff (deleted) — Details — Splinter Review

remote: Follow the progress of your build on Treeherder:
remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=da1222000d145e9979b01d9d25739129b711259f
remote: recorded changegroup in replication log in 0.048s
push complete

Attachment #9043964 - Attachment is obsolete: true

Attachment #9044333 - Attachment is obsolete: true

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 17

•

6 years ago

Attached patch unthrottled_worker_queue_5.diff (deleted) — Details — Splinter Review

remote: Follow the progress of your build on Treeherder:
remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9361b86d35c305d1c1771b91639bfd5391409b42

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 18

•

6 years ago

Attached patch unthrottled_worker_queue_7.diff (deleted) — Details — Splinter Review

"ForMessaging" is a bit weird name currently, but that is mostly because postMessage ends up using mMainThreadDebuggeeEventTarget queue (which super weird).

There are two perhaps unexpected cases,
(1) LifeCycleEventWatcher uses the lower priority queue so that .installing on main thread gets set correctly (and the case is sort about messaging)
(2) TopLevelWorkerFinishedRunnable uses lower priority queue so that all the other runnables dispatched during shutdown (some of them are dispatched without worker queues) are handled before.

Workers event loop handling is a tad weird even without the patch. Some of the runnables go to the throttable queue, meaning basically that they have lower priority, but some runnables bypass that and use just the normal NS_DispatchToMainThread. That feels error prone.

The patch should let us create certain objects which require main thread sooner, or open network connections etc. But still, there is main thread usage, and main thread is mostly blocked during the whole page load, so eventually we should remove all the unneeded main thread usage.

Attachment #9048025 - Flags: feedback?(amarchesini)

Andrea Marchesini [:baku]

Comment 19

•

6 years ago

Comment on attachment 9048025 [details] [diff] [review] unthrottled_worker_queue_7.diff Review of attachment 9048025 [details] [diff] [review]: ----------------------------------------------------------------- It seems all reasonable to me.

Attachment #9048025 - Flags: feedback?(amarchesini) → feedback+

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 20

•

6 years ago

try push, pref off
remote:
remote: Follow the progress of your build on Treeherder:
remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c6061f4bae4a336c7ed99e19e0be6c51af148613

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 21

•

6 years ago

try push, pref on
remote:
remote: Follow the progress of your build on Treeherder:
remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=218bc59a2959982880545c1a9da430fc597251f6
remote: recorded changegroup in replication log in 0.046s

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 22

•

6 years ago

Attached file Bug 1522316, use medium high priority queue for worker->main thread control messages, r=baku (deleted) — Details

If main thread is busy handling runnables in the normal priority queue, control-type of messages from
workers are possibly postponed to run after those. That can lead to bad performance, if the page
expects workers to be able to proceed simultanously with the main thread.
This patch makes the control messages to use medium high priority queue in order to try to
ensure they are handled in timely manner.
Pref dom.worker.use_medium_high_event_queue can be set to false to disable this new behavior.

Pulsebot

Comment 23

•

6 years ago

Pushed by opettay@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/73824b8dd855 use medium high priority queue for worker->main thread control messages, r=baku

Cristina Coroiu [:ccoroiu]

Comment 24

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/73824b8dd855

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox67: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla67

Liz Henry (:lizzard Please n-i to RyanVM, jcristau, or pascal)

Updated

•

6 years ago

status-firefox66: affected → wontfix

Chris Peterson [:cpeterson]

Updated

•

6 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1531511

Markus Stange [:mstange]

Reporter

Updated

•

6 years ago

Regressions: 1545577

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → P1

Keywords: perf:pageload

Whiteboard: [qf:p1:pageload][necko-triaged] → [necko-triaged]

You need to log in before you can comment on or make changes to this bug.