Closed Bug 581930 Opened 14 years ago Closed 14 years ago

[E10S] IPC messages overhead in TabChild/Parent progress listener

Categories

(Core :: IPC, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 582840

People

(Reporter: romaxa, Unassigned)

References

()

Details

(Keywords: perf)

Attachments

(3 files)

I've found that on mobile device we are loading some pages ~2x slower.

I was testing on  HS.fi, and found 2 problems:
1) Time between LoadURL call and Title signal ~2x more on e10s fennec comparing to normal Fennec. most of time wasted in sending Progress messages from Child to Parent.

For hs.fi between start loading and title changed approximately we have:
>TabChild:OnStatusChange   - 350 calls
>TabChild:OnStateChange    - 356 calls
>TabChild:OnProgressChange - 118 calls

I think we should do something about it
Second problem is plugins, time between title-change and first paint  increased (2 seconds more) because of plugins initialization and loading...

I think it would be nice to delay plugins loading, and paint at first initial page content, and then start plugins initialization  (process execution e.t.c.)
56  - Progress
237 - OnStateChange::511 state:c0010 - 9 different flags/status combinations
168 - OnStatusChange - 9 different status/message combinations

Here is the time measure results:
Normal fennec:
START->TITLE - 4 sec
TITLE->PAINT - 6 sec

E10S fennec:
START->TITLE - 7 sec
TITLE->PAINT - 8 sec

E10S fennec - with dsabled "status/state/progress" messages:
START->TITLE - 4.3 sec
TITLE->PAINT - 7.2 sec
I think we must simplify progress system, and send only valid (useful) messages, only if parent listening for that messages, and don't send same messages with the same values repeatedly.
Also if I disable progress/state/status messages sending in TabChild fennec still works, and progress/start/stop messaging system working because of content.js script loaded on child process...

Question is do we still need this TabChild/Parent progress messaging system? is still in use?
(In reply to comment #4)
> Also if I disable progress/state/status messages sending in TabChild fennec
> still works, and progress/start/stop messaging system working because of
> content.js script loaded on child process...
> 
> Question is do we still need this TabChild/Parent progress messaging system? is
> still in use?

We actually want to stop using the webprogress message in content.js, if possible. We would ideally only use the TabParent/TabChild webprogress message system.
This information does provide some interesting clues as to the affect of sending messages (message load) on the speed of web page loading times.
Then we should definitely simplify and reduce amount of messages between Parent/Child

because
Sending IPC 43 'Transferring data from www.hs.fi…'
            45 'Waiting for www.hs.fi…'
            54 'Waiting for hs.fi…' 
messages in 3 seconds is definitely wrong behavior.
> 
> We actually want to stop using the webprogress message in content.js, if

content.js approach looks much flexible for me, because we can filter useless messages on content side, and send to UI via IPC-bus only useful status info.
Attached file Here is the messages in original order (deleted) β€”
Filtering also will not help here because 
it is possible to filter messages like this:
...
>Func:OnStateChange::511 state:10001, status:0
>Func:OnStateChange::511 state:10001, status:0
....

but it is not possible to filter these messages universally:
>Func:OnStatusChange::544 status:804b000a, msg:'Waiting for hs.fi…'
>Func:OnStatusChange::544 status:804b000a, msg:'Waiting for blogit.hs.fi…'
>Func:OnStatusChange::544 status:804b0006, msg:'Transferring data from blogit.hs.fi…'
>Func:OnStateChange::511 state:10004, status:0
>Func:OnProgressChange::529 cs:2561, ms:2561, ct:1159528, mt:1159528
>Func:OnStateChange::511 state:10010, status:0
>Func:OnStatusChange::544 status:804b000a, msg:'Waiting for hs.fi…'
>Func:OnStatusChange::544 status:804b000a, msg:'Waiting for blogit.hs.fi…'
>Func:OnStatusChange::544 status:804b0006, msg:'Transferring data from blogit.hs.fi…'
>Func:OnStateChange::511 state:10004, status:0

I think client approach is more optimal for this case...


Also we can create new browser webListener IPC interface, and send only specific messages. like "start" "progress 1 per sec" stop.
These are async messages, right? 350 async messages shouldn't really show up as a factor. 350 sync messages sure would, though.
(In reply to comment #10)
> These are async messages, right? 350 async messages shouldn't really show up as
> a factor. 350 sync messages sure would, though.

The progress messages sent in the fennec code (chrome/content/bindings/browser.js) are all async

(In reply to comment #7)
> Sending IPC 43 'Transferring data from www.hs.fi…'
>             45 'Waiting for www.hs.fi…'
>             54 'Waiting for hs.fi…' 

We should not even send those messages, since Fennec does not display any status messages. We really need a way make the filter, sent via WebProgress.addProgressListener) work in the child.
(In reply to comment #10)
> These are async messages, right? 350 async messages shouldn't really show up as
> a factor. 350 sync messages sure would, though.

we have mobile device, one processor, and amount of messages in IPC channel per second make noticeable perf hit on CPU/BUS e.t.c...
We need to know what this means a lot more exactly: sending an async message should be really cheap, so that you can pass thousands per second with minimal overhead. The sequence of actions involved is:

1) content process main thread posts to the I/O event queue: one lock, thread continues immediately
2) context-switch to the I/O thread
3) I/O thread locks and pulls events off the queue, posts them to the IPC pipe. If there are multiple enqueued messages they can all be processed in one go

4) chrome process I/O thread unblocks, receives messages, posts them to the main-thread event queue (one lock)
5) chrome process main thread receives messages as part of normal event processing

If you schedule all four threads involved for every one of those messages, that's a lot of context switching. But that *shouldn't* be the case, and if it is we should figure out how to just fix it. If you're at all CPU-bound, multiple progress events can be batched at almost any of these junction points, so that you can process multiple events in one thread context. That's one of the many advantages of asynchronous processing.
Probably problem here is not big amount of messages, but messages content...
I've made simple filter on OnStateChange 
+  if (mPrevStateFlags != aStateFlags) {
+    SendNotifyStateChange(aStateFlags, aStatus);
+    mPrevStateFlags = aStateFlags;
+  }

and with that filter and disabled StatusChange/ProgressChange - 7 sec ->  4.5 sec

If I enabled ProgressChange messages, then I have 4.5 sec -> 5.3 sec perf hit
If I enable statusChange messages (string content) then 5.3 -> 7 sec.
Added bug 581958 to help filter webprogress messages
One general note: no matter how fast our IPC code is, sending messages will never be free (obviously).  We shouldn't be sending unnecessary or superfluous messages, just like we shouldn't be delivering unnecessary or superfluous observer notifications, running unnecessary or superfluous timers, etc.

To investigate perf here more, we need a few things.  First, the attached patch, which gives us a rough idea of the best-case raw perf of our C++ IPC code.  This test is best-case because it runs with a quiet event loop, no XPCOM.  For reference, below are some numbers I get on my desktop machine (16 HW thread Xeon X5570 @ 2.93GHz, tons of RAM) with an opt build.  I see rather high variance test-to-test, it's worth trying a few times; these numbers seem to be fairly typical.  (Note that you need to build --enable-ipdl-tests to run this.)

$ ~/mozilla/mp-opt/dist/bin/run-mozilla.sh ~/mozilla/mp-opt/dist/bin/ipdlunittest  TestLatency
TEST-INFO| running test | TestLatency
  timing resolution: 9e-08 seconds
[snip]
TEST-PASS | TestLatency | 
  average #ping-pong/sec:        13245.5
  average #ping5-pong5/sec:      7180.98
  average #RPC call-answer/sec:  25095.7
  average #spams/sec:            537804

The most relevant numbers here are "#RPC call-answer/sec", which tests making an RPC parent->child in a tight loop ('sync' would be about the same), and "#spams/sec", sending async messages parent->child in a tight loop.  We're doing 25k/sec and 500k/sec respectively on my machine.  I would expect these numbers to be about an order of magnitude lower on the N900.

Second, we want similar best-case numbers for messagemanager spam.  Note that smaug has a basic test for this at [1].  The numbers will be strictly lower for this test than for #spams/sec above because of JS+JSON+XPCOM-event-loop overhead.  On my machine, I repeatedly get around 1,000 messages in 90-100 ms, so ~10,000/sec.

Third, we need a more detailed profile of the bottlenecks.  Which test setup to look at first depends on which numbers look most skewed compared to desktop.  Oleg, can you help with this?  We have a few options for profiling: something like Vlad's function timer setup for finding bottlenecks; zwol's (awesome) PerfMeasurement tool in bug 568863; and (obviously) "perf" or oprofile or something similar.

I know of a few existing dumb overheads in the IPC code.  One is bug 552009; we use atomic refcounting even though we don't need it.  The second isn't filed yet, but the problem is that in RPCChannels, we make a full copy of each Message that goes through so as to provide runtime reflection on the IPC message stack (we only need to copy the Message::Header, which is much much cheaper.  There's also an unnecessary Message copy that happens from IO->main thread, and there may be others.)  The last is that messagemanager JSON-stringifies its message object (apparently).  Benjamn has a pure-IPDL alternative to JSON[2] that in addition to almost certainly being faster than toJSON(), is strictly more powerful: it can serialize self-referential objects.

[1] http://hg.mozilla.org/mozilla-central/file/d1183b4f57b1/dom/ipc/test.xul#l151
[2] http://hg.mozilla.org/mozilla-central/file/d1183b4f57b1/js/ipc/PObjectWrapper.ipdl#l52
Er oops, [2] was supposed to be http://hg.mozilla.org/mozilla-central/file/f9ce465b7276/js/jetpack/PJetpack.ipdl#l46 .
Keywords: perf
Pushed spam test: http://hg.mozilla.org/mozilla-central/rev/01cd3b3fc201
582840 - is fixing that code
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
No longer blocks: 581532
(In reply to comment #16)
> $ ~/mozilla/mp-opt/dist/bin/run-mozilla.sh
> ~/mozilla/mp-opt/dist/bin/ipdlunittest  TestLatency
> TEST-INFO| running test | TestLatency
>   timing resolution: 9e-08 seconds
> [snip]
> TEST-PASS | TestLatency | 
>   average #ping-pong/sec:        13245.5
>   average #ping5-pong5/sec:      7180.98
>   average #RPC call-answer/sec:  25095.7
>   average #spams/sec:            537804
> 
> The most relevant numbers here are "#RPC call-answer/sec", which tests making
> an RPC parent->child in a tight loop ('sync' would be about the same), and
> "#spams/sec", sending async messages parent->child in a tight loop.  We're
> doing 25k/sec and 500k/sec respectively on my machine.  I would expect these
> numbers to be about an order of magnitude lower on the N900.
> 

FTR, here are some apparently typical results from this test on the N900 (copied by hand >:( )

$ LD_LIBRARY_PATH="/home/user/fennec" ./ipdlunittest TestLatency
TEST-INFO | running test | TestLatency
  timing resolution: 1e-09 seconds [YAH RIGHT!!!  TimeStamp_posix isn't dealing well with really slow devices that tell clock_getres() fibs]
[snip]
TEST-PASS | TestLatency |
  average #ping-pong/sec:        1121.82
  average #ping5-pong5/sec:      270.162
  average #RPC call-answer/sec:  1436.92
  average #spams/sec:            7557.47
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: