Closed Bug 1613496 Opened 5 years ago Closed 4 years ago

IPC profiling should differentiate between IPC overhead and event loop delay

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla80

Tracking Flags:

Tracking

Status

firefox80

---

fixed

People

(Reporter: mstange, Assigned: squib)

References

Details

Attachments

(1 file, 1 obsolete file)

Bug 1613496 - Add IPC profiler markers to track when messages are processed by the IO thread queues 5 years ago Jim Porter (:squib) (deleted), text/x-phabricator-request		Details
Bug 1613496 - Part 2: Move logic to determine when to avoid emitting a profiler marker to `AddIPCProfilerMarker` 4 years ago Jim Porter (:squib) (deleted), text/x-phabricator-request		Details

Markus Stange [:mstange]

Reporter

Description

•

5 years ago

Here's a profile with IPC information: https://perfht.ml/2v8P502

In that profile, many of the IPCIn and IPCOut markers take upwards of 10ms. This can be very misleading! Users of the profiler might assume that this means that the messages took a long time just to cross the process boundary. But that's not what's happening! Instead, what's happening is that these times are the following:

The start time is when the SendXXX function was called on the origin thread.
The end time is when the RecvXXX function is called on the destination thread.

And the latter only happens once the destination thread becomes available. So we're seeing the sum of the time spent in IPC overhead and the time spent pending in the target thread's event loop, waiting for any existing work and other pending tasks to finish.

We really need a breakdown of these two numbers, because these two phases are actionable in very different ways.

The raw IPC overhead is not something we can do a lot about other than by reworking the IPC system.
But the event loop delay is very actionable, in the following ways:

Events can be reordered by assigning different IPC message priorities.
The work that the thread is busy with can potentially be sped up, removed, or moved to a different time or thread.
The message can be targeted at a different thread that is less busy.

Until we have this breakdown, "IPC profiling" can really only be used to analyze event loop loads and message ordering, and not to analyze IPC overhead.

Jim Porter (:squib)

Assignee

Comment 1

•

5 years ago

Hmm, how important is it to measure the IPC overhead? I admit I chose the placement of where we record each side of the message somewhat arbitrarily, and I could see an argument for simply moving the logging to a more-appropriate spot. On the other hand, it's not that much more work to log both (though it probably raises the priority for implementing a better UI in perf.html).

I had hoped that we could just assume that IPC overhead is (roughly) constant and approximate the event loop time by taking currentDuration - minimumDuration for a given message, but maybe that's overly optimistic...

Markus Stange [:mstange]

Reporter

Comment 2

•

5 years ago

(In reply to Jim Porter (:squib) from comment #1)

I had hoped that we could just assume that IPC overhead is (roughly) constant and approximate the event loop time by taking currentDuration - minimumDuration for a given message, but maybe that's overly optimistic...

I think we don't really know whether this is overly optimistic or not. Having actual data would let us see how much of a problem IPC overhead is in practice and how much it varies. Bas and Randell seem to think it's quite an important value to drive down.
Personally, I'm somewhat optimistic that IPC overhead is not too much of a problem, and I'm hoping that having both values side-by-side will let us see that IPC overhead is usually dwarfed by event loop delay. But I might be wrong.

Randell Jesup [:jesup] (needinfo me)

Comment 3

•

5 years ago

I also would hope it's dwarfed by event loop delay; bas seems to have measured cases where it's not -- though if he used the current profiler data to do so, that would be warped by including the event delay time.

Jim Porter (:squib)

Assignee

Comment 4

•

5 years ago

Ok, that sounds reasonable to me! As far as prioritizing this, I think we're focusing on Fenix stuff for now, so this will probably happen after that unless it's needed for Fenix-related work. Markus, is this blocking any of your work?

Flags: needinfo?(mstange)

Honza Bambas (:mayhemer)

Comment 5

•

5 years ago

Note that there are following queues involved here:

the queue connecting the thread calling SendXX() and the IPC IO thread (ProcessLink::mIOLoop : MessageLoop queue), can be blocked by a large IO chunk on the IPC thread, but that doesn't matter that much
the output queue of the sending channel (platform specific): Channel::ChannelImpl::output_queue_, blocked by all same-or-higher priority messages in that queue
the socket or pipe between the processes with its own buffers and scheduling
optionally, the message can be held on the receiving thread in MessageChannel::mPending : MessageQueue holding MessageTasks when the target thread is sync processing a message (or other conditions I don't completely follow at the moment)
the queue between the IPC IO thread and the target thread to receive the RecvXX() call, which is a normal thread queue: MessageChannel::mListener->GetMessageEventTarget(MessageTask : nsIRunnable), or another MessageLoop queue, depending on what is the actual target (my knowledge is weak here)

All of these should be concerned, or at least the source process output_queue_ and the target process target thread queue.

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Updated

•

5 years ago

Priority: -- → P3

Markus Stange [:mstange]

Reporter

Comment 6

•

5 years ago

(In reply to Jim Porter (:squib) from comment #4)

Ok, that sounds reasonable to me! As far as prioritizing this, I think we're focusing on Fenix stuff for now, so this will probably happen after that unless it's needed for Fenix-related work. Markus, is this blocking any of your work?

Sounds good to me! It's not blocking any of my Fenix-related work.

Flags: needinfo?(mstange)

Julien Wajsberg [:julienw]

Comment 7

•

5 years ago

Somewhat related, we discussed the same issue, but with network markers, with Honza durint the all hands.

Jim Porter (:squib)

Assignee

Comment 8

•

5 years ago

Taking this to work on it.

Assignee: nobody → jporter+bmo

Status: NEW → ASSIGNED

Jim Porter (:squib)

Assignee

Comment 9

•

5 years ago

Attached file Bug 1613496 - Add IPC profiler markers to track when messages are processed by the IO thread queues (deleted) — Details

Jim Porter (:squib)

Assignee

Comment 10

•

5 years ago

Comment on attachment 9140305 [details]
Bug 1613496 - Add IPC profiler markers to track when messages are processed by the IO thread queues

Honza + Markus: Does this look roughly like what you'd expect? It's still a bit sketchy, mainly in how I plan to store this data. Right now I'm just saying "phase 0" is when the IPC message is in SendXXX/RecvXXX, and "phase 1" is when the message is in the IO thread queue. I need to come up with a better name though.

This also needs some UI work on the profiler frontend, but you should be able to at least see IPC markers in the UI if you enable the "IPC Messages" flag and add "Gecko_IOThread" to the list of profiled threads. We may want some way of automatically profiling that thread when the "IPC Messages" feature is enabled, or go one step further and record profile markers for IPC messages for every thread, no matter whether it's specifically being profiled or not. There might be a bug on that last suggestion already...

Attachment #9140305 - Flags: feedback?(mstange)

Attachment #9140305 - Flags: feedback?(honzab.moz)

Matt Woodrow (:mattwoodrow)

Updated

•

5 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1560090

Jim Porter (:squib)

Assignee

Updated

•

4 years ago

Attachment #9140305 - Flags: feedback?(mstange)

Attachment #9140305 - Flags: feedback?(honzab.moz)

Jim Porter (:squib)

Assignee

Comment 11

•

4 years ago

Just a quick update so folks know where things are at here: this patch is just about done, so anyone on POSIX can try it out to see how things work. I've also mostly completed the front-end side here https://github.com/firefox-devtools/profiler/pull/2535, however we'll probably want to think about how to present this information in a friendlier way, since it's kind of like drinking from the firehose right now.

Stuff remaining to do in this bug at the time of this comment:

Windows implementation
Consider what to do about getting markers for threads not being profiled (especially ones like Gecko_IOThread)

Phabricator Automation

Updated

•

4 years ago

Attachment #9140305 - Attachment description: Bug 1613496 - [WIP] Add IPC profiler markers to track when messages are processed by the IO thread queues → Bug 1613496 - Add IPC profiler markers to track when messages are processed by the IO thread queues

Jim Porter (:squib)

Assignee

Comment 12

•

4 years ago

Attached file Bug 1613496 - Part 2: Move logic to determine when to avoid emitting a profiler marker to `AddIPCProfilerMarker` (obsolete) (deleted) — Details

Note: This should be rolled into the previous commit before landing.

This pulls some repeated logic into AddIPCProfilerMarker, but it also changes
the behavior slightly: now we only emit a marker for messages with a routing
ID. This should be even better than the old way, since messages with no
routing ID are all special types for internal IPC channel bookkeeping that we
don't care about (at least, not right now...).

Depends on D70790

Phabricator Automation

Updated

•

4 years ago

Attachment #9150210 - Attachment is obsolete: true

Phabricator Automation

Updated

•

4 years ago

Attachment #9150210 - Attachment is obsolete: false

Phabricator Automation

Updated

•

4 years ago

Attachment #9150210 - Attachment is obsolete: true

BugBot [:suhaib / :marco/ :calixte]

Comment 13

•

4 years ago

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:squib, could you have a look please?
For more information, please visit auto_nag documentation.

Flags: needinfo?(jporter+bmo)

Jim Porter (:squib)

Assignee

Comment 14

•

4 years ago

This is waiting on https://github.com/firefox-devtools/profiler/pull/2535 to land, since landing the backend before the frontend supports it will cause some weird UI.

Also, here is a try build rebased onto a fairly-recent m-c: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0813ed3ad8a3c8d538f38ed230f8d1c2e4a70015

Flags: needinfo?(jporter+bmo)

Pulsebot

Comment 15

•

4 years ago

Pushed by jporter@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/866a65ae320d Add IPC profiler markers to track when messages are processed by the IO thread queues r=gerald,froydnj

Atila Butkovits

Comment 16

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/866a65ae320d

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox80: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla80

Treeherder Bug Filer

Updated

•

4 years ago

Regressions: 1649670

You need to log in before you can comment on or make changes to this bug.