Closed
Bug 926447
Opened 11 years ago
Closed 3 years ago
sendAsyncMessage is expensive
Categories
(Core :: DOM: Content Processes, defect)
Core
DOM: Content Processes
Tracking
()
RESOLVED
INACTIVE
People
(Reporter: vingtetun, Unassigned)
References
Details
(Keywords: perf)
Attachments
(1 file)
(deleted),
text/plain
|
Details |
I was looking at the time spent in the JS handler of a mozbrowser iframe and I noticed than most of the time is spent in sendAsyncMessage. Not sure if this is the code to clone the data or not but it would help a little bit startup time to make it faster.
The time spend in that code is a few ms each time.
Smaug can you suggest someone to help here?
Flags: needinfo?(bugs)
Comment 1•11 years ago
|
||
Is the call itself expensive irrespective of how the type of message sent? Thinker Li had opened bug 915733 to try and improve our IPC performance; there's not been any action there but it might be worth linking these two bugs.
Comment 2•11 years ago
|
||
Is there a testcase for this, something which can be run on desktop?
Would need to get some profile here.
Flags: needinfo?(21)
Reporter | ||
Comment 3•11 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #2)
> Is there a testcase for this, something which can be run on desktop?
> Would need to get some profile here.
Gabriele is much better at doing profile than me. Can you help here?
Flags: needinfo?(21) → needinfo?(gsvelto)
Comment 4•11 years ago
|
||
I'm happy to profile, if I had a testcase for desktop.
Flags: needinfo?(bugs)
Reporter | ||
Comment 5•11 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #4)
> I'm happy to profile, if I had a testcase for desktop.
The thing is that on desktop you won't probably notice the 3ms taken in sendAsyncMessage :/
Comment 6•11 years ago
|
||
That doesn't matter. If I have a test case to profile, the profiler tells what is taking most of the
time. Whether it is 30ms or 30ns doesn't matter.
Though running the slow thing several times helps profiling.
Comment 7•11 years ago
|
||
(In reply to Vivien Nicolas (:vingtetun) (:21) from comment #3)
> Gabriele is much better at doing profile than me. Can you help here?
Instead of using the profiler whose granularity might not be enough it might be worth doing some ad-hoc timing of sendAsyncMessage method at the Gecko level. I'll try to do that and report here with the timings (and the patch I will use for taking them).
Comment 8•11 years ago
|
||
I would still like to see a testcase for this.
(Note, I'm not using Gecko profiler but Zoom which gives better granularity, and on OSX one could
use Shark or whatever has replaced Shark in the recent OSX releases.)
Comment 9•11 years ago
|
||
I guess something like
data:, function perftest() { for (var i = 0; i < 10000; ++i) { sendAsyncMessage("foo"); } }; content.setInterval(perftest, 50);
in the test-ipc.xul
http://mxr.mozilla.org/mozilla-central/source/dom/ipc/test.xul
might work.
Or do I need some more complicated message?
Comment 10•11 years ago
|
||
Based on the profile:
- we should webidly message manager at some point, but not quite possible yet. Need to wait
for window to be webidlfied. Webidlfication should make us spend less time in xpconnect.
- most time is still spend under DispatchAsyncMessage
- creating and destroying nsString takes time. Need to optimize that. IPDL at least used to require
nsStrings and didn't accept nsAString, but need to check if that is fixed these days.
- 35+% is IPDL/IPC level stuff. Plenty of allocation and deallocation.
Comment 11•11 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #10)
> - most time is still spend under DispatchAsyncMessage
> - creating and destroying nsString takes time. Need to optimize that. IPDL
> at least used to require
> nsStrings and didn't accept nsAString, but need to check if that is
> fixed these days.
It's not. Probably ought to be, and shouldn't be too awful to fix.
> - 35+% is IPDL/IPC level stuff. Plenty of allocation and deallocation.
Are there any hotspots for this?
Comment 12•11 years ago
|
||
Message ctor is rather heavy, well Pickle under it. Allocation is just always slow.
Could we recycle Messages? or stack allocate?
ProcessLink::SendMessage seems to be also very malloc-happy.
So in general it looks like we should do some recycling, and not rely on malloc to give good performance.
Comment 13•11 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #10)
> - 35+% is IPDL/IPC level stuff. Plenty of allocation and deallocation.
ProcessLink::SendMessage() is creating a runnable method for each invocation, that constructor in turns seems to be creating quite some stuff:
http://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/task.h#l333
Flags: needinfo?(gsvelto)
Comment 14•11 years ago
|
||
Currently there's no point for IPDL to use nsAString& instead of nsString& because it immediately saves the value into a nsString - string sharing should make this as cheap as we can get at the moment.
I'm interested in the allocation situation, though: other than a single Message I didn't think we'd have much allocation involved for this stuff.
Reporter | ||
Comment 15•11 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #9)
> I guess something like
> data:, function perftest() { for (var i = 0; i < 10000; ++i) {
> sendAsyncMessage("foo"); } }; content.setInterval(perftest, 50);
> in the test-ipc.xul
> http://mxr.mozilla.org/mozilla-central/source/dom/ipc/test.xul
> might work.
> Or do I need some more complicated message?
Yeah this sounds good enough. We are not doing anything fancy, just sending stuff like sendAsyncMessage('browser-element-api:call', {'msg': 'loadstart'});
sendAsyncMessage('browser-element-api:call', {'msg': 'loadstart', 'backgroundColor': '#000000'});
The difference with your testcase is that there is always some data in our case.
Comment 16•11 years ago
|
||
OK, I'll profile with a case when there is some data.
Comment 17•11 years ago
|
||
data:, function perftest() { for (var i = 0; i < 10000; ++i) { sendAsyncMessage("foo", {'msg': 'loadstart'}); } }; content.setInterval(perftest, 50);
And with that the profile is quite different, since structured clone serialization takes as much
time as dispatching the message. Someone from jseng team should look at that.
Comment 18•11 years ago
|
||
Reporter | ||
Comment 19•11 years ago
|
||
Nicolas, do you know anything about that?
Flags: needinfo?(nicolas.b.pierron)
Comment 20•11 years ago
|
||
Where is setAsyncMessage defined? I cannot find it anywhere in the code base.
(In reply to Olli Pettay [:smaug] from comment #9)
> data:, function perftest() { for (var i = 0; i < 10000; ++i) {
> sendAsyncMessage("foo"); } }; content.setInterval(perftest, 50);
> in the test-ipc.xul
> http://mxr.mozilla.org/mozilla-central/source/dom/ipc/test.xul
> might work.
Note that xul code is not running with the compileAndGo flag in the JS Engine, which means that it will not be compiled with IonMonkey and will only remain in Baseline. But I don't think this would be the issue here.
(In reply to Olli Pettay [:smaug] from comment #18)
> Created attachment 817466 [details]
> jsstructuredclone part of the profile
Looking at the profile and at the description of sendAsyncMessage[1], I guess this might be related to what Yoric reported during the JS work week (see Bug 873293)
Jason, do you have any idea what we could do better? I guess that we can move the Rooted outside the loop of JSStructuredCloneWriter::write, as we did in the interpreter. The Rooted class are adding them-self into a linked list of mark-able memory area, moving them outside the loop will reduce the cost. Also, we might be able to do a switch on the type tag in JSStructuredCloneWriter::startWrite, instead of doing tons of else-if.
Vivien, are the message always identical, is that always the same object or is it a new similar one all the time?
[1] https://wiki.mozilla.org/Content_Process_Event_Handlers
Flags: needinfo?(nicolas.b.pierron) → needinfo?(jorendorff)
Comment 21•11 years ago
|
||
I don't think we've made any serious effort to profile or optimize the structured cloning code.
One part that is definitely unnecessarily slow is how we serialize arrays. We treat them like objects and make an array in memory of all their properties.
Another possibly slow part is the JSObject::getGeneric call in JSStructuredCloneWriter::write. We could probably optimize that away in most cases.
Anyone could hack on this. Doesn't take a JS engine hacker. Knowing how to work a profiler is more relevant.
Flags: needinfo?(jorendorff)
Comment 22•11 years ago
|
||
fwiw, I timed JSStructuredCloneWriter::write() during app startup and it's a very low overhead compared to everything else: at most 3ms overall.
Updated•10 years ago
|
Component: IPC → DOM: Content Processes
Comment 23•3 years ago
|
||
If I read this bug right, there were several options analyzed here. Can I assume safely that the most interesting ones are probably already fixed or at least recorded elsewhere? At least quite some work seems to have been done on JSStructuredCloneWriter
since then.
Flags: needinfo?(bugs)
Comment 24•3 years ago
|
||
Structured cloning is still a bit slow, but IIRC it was optimized a bit for quantum.
And the particular code path which this bug is about is going away.
We can file new bugs when needed for JSActors
Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(bugs)
Resolution: --- → INACTIVE
You need to log in
before you can comment on or make changes to this bug.
Description
•