Closed Bug 1331988 Opened 8 years ago Closed 8 years ago

10.74% tabpaint (linux64) regression on push 8d8e6116ed6d (Tue Jan 17 2017)

Categories

(Core :: DOM: Content Processes, defect)

53 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox50 --- unaffected
firefox51 --- unaffected
firefox52 --- unaffected
firefox53 --- fixed
firefox54 --- fixed

People

(Reporter: jmaher, Assigned: kats)

References

Details

(Keywords: perf, regression, talos-regression)

Attachments

(1 file)

Talos has detected a Firefox performance regression from push 8d8e6116ed6d. As author of one of the patches included in that push, we need your help to address this regression. Regressions: 11% tabpaint summary linux64 opt 74.05 -> 82.01 You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=4831 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running *** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Component: Untriaged → DOM: Content Processes
Product: Firefox → Core
I'll bisect the patchset to see whether it came from the first half or the second half. If it's the first half I don't think there's much we can do. The second we might be able to improve.
Assignee: nobody → bugmail
Flags: needinfo?(bugmail)
I ended up doing a series of try pushes: Parts 1-2: https://treeherder.mozilla.org/#/jobs?repo=try&revision=43f11336e0c95924a49d06efb0dd6f9505d2c75f Parts 1-3: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9fdb790d9e3f2b2e337309706af66594164d8969 Parts 1-4: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4f86817b4dfbb94786d15ee81c7dbd5bbb36a3f1 Parts 1-5: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2ac04fd8179e2483b139b3f5c7b08f3306fe1b14 From the windows numbers (I didn't bother manually adding the linux jobs) it seems clear that the regression comes from part 4. What part 4 is doing is sending the InitRenderingState IPC message from TabParent to TabChild nearer to the tab creation, which in theory could delay the paint a little bit. However it's an async IPC message so it really shouldn't block the TabParent by anything noticeable. It's more likely that the processing of the InitRenderingState on the child side is slowing things down, because it makes a sync IPC to the compositor in order to get the compositor options. If my theory is correct we should be able to mitigate this by getting the compositor options into the parent side as part of existing communications with the compositor, and send it to the child as part of InitRenderingState. That way the child wouldn't need to do the sync IPC. I'm writing a patch to see if that works.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #4) > I ended up doing a series of try pushes: > > Parts 1-2: > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=43f11336e0c95924a49d06efb0dd6f9505d2c75f > Parts 1-3: > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=9fdb790d9e3f2b2e337309706af66594164d8969 > Parts 1-4: > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=4f86817b4dfbb94786d15ee81c7dbd5bbb36a3f1 > Parts 1-5: > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=2ac04fd8179e2483b139b3f5c7b08f3306fe1b14 > > From the windows numbers (I didn't bother manually adding the linux jobs) it > seems clear that the regression comes from part 4. What part 4 is doing is > sending the InitRenderingState IPC message from TabParent to TabChild nearer > to the tab creation, which in theory could delay the paint a little bit. > However it's an async IPC message so it really shouldn't block the TabParent > by anything noticeable. It's more likely that the processing of the > InitRenderingState on the child side is slowing things down, because it > makes a sync IPC to the compositor in order to get the compositor options. > > If my theory is correct we should be able to mitigate this by getting the > compositor options into the parent side as part of existing communications > with the compositor, and send it to the child as part of InitRenderingState. > That way the child wouldn't need to do the sync IPC. I'm writing a patch to > see if that works. Does any of this work affecting the startup time of a content process? We don't have a Talos test to measure that yet so I'm just curious if we should worry about that on a theoretic level. It is very important for e10s-multi not to regress that number any further. As far as I understand these patches are more about about changing the order of things when a new tab is opened (with possibly a new content process startup) and does not really add any significant extra work. Just want to double check it...
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #5) > > Does any of this work affecting the startup time of a content process? We > don't > have a Talos test to measure that yet so I'm just curious if we should worry > about > that on a theoretic level. It depends on what you mean by startup time. The original patches shift a little bit of code that was happening when the tab was first "shown" to earlier, when the tab is being created. It's possible that if a content process is starting up with a tab that's not yet "shown" the bit of shifted code will slow it down slightly. Most of the slowdown would come, again, from the sync IPC call which I'm trying to mitigate by piggybacking it on other pre-existing IPC messages.
Also here's a try push with my first attempt at fixing this: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c1a022ed5d533915662861c1292b1078eaf94cbc - it seems to show some improvement but maybe not fully restored? I'll do more retriggers but I have a pretty full day today with interviews and meetings so I'm not sure how much progress I'll make before tomorrow.
Update: I'm having a very hard time actually getting talos numbers from try to see if my patch helps or not. I ran into two bugs already (see dependencies) and the backlog of jobs on the linux machines is very high and so it takes a while to get anywhere.
I reproduced the regression on my local Linux machine and chased it down. It looks like what is happening is that because we run InitRendering sooner, TabChild::mRemoteFrame is St earlier than it would be otherwise. And between the "early" and "late" points, there is a call to TabChild::GetDPI. That call does an early return if mRemoteFrame is not set, but does a sync IPC call if it *is* set. This sync IPC accounts for most of the extra time in my local setup. We can probably mitigate this by sending the DPI value as part of InitRendering instead of requiring the sync IPC call.
Try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=12264cc38e6c47fcc0244582c9c187e6a45f57ba confirms the above by fixing the regression. However I had second thoughts about the approach I used. I think it might be better to have GetDPI and GetDefaultScale continue to return -1 until RecvShow is called. That will maintain the same behaviour as before while also fixing the regression. It also avoids duplicating the sending of the dpi/default scale values over, since they are sent over as part of the ShowInfo struct in RecvShow already.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=dcf2993f28aadc7afad9834150e2024c75f3dc45 shows the talos regression fixed on win8. there's too much of a linux talos backlog but i did a try push for correctness at https://treeherder.mozilla.org/#/jobs?repo=try&revision=376b8e7f0d4adc7efcf30ba2d72f2dffea55498d which is also green.
Comment on attachment 8829500 [details] Bug 1331988 - Don't provide DPI and related values until the TabChild has received the the ShowInfo, to avoid doing sync IPC. https://reviewboard.mozilla.org/r/106586/#review107612
Attachment #8829500 - Flags: review?(dvander) → review+
Pushed by kgupta@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d37cbfeaec72 Don't provide DPI and related values until the TabChild has received the the ShowInfo, to avoid doing sync IPC. r=dvander
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Thanks for fixing this :kats, I have verified this on the graphs. should we uplift this to Aurora as the regression was merged there yesterday morning?
Comment on attachment 8829500 [details] Bug 1331988 - Don't provide DPI and related values until the TabChild has received the the ShowInfo, to avoid doing sync IPC. Approval Request Comment [Feature/Bug causing the regression]: bug 1331509 [User impact if declined]: slightly slower time to paint when a new tab is created from the parent process [Is this code covered by automated tests?]: yes [Has the fix been verified in Nightly?]: joel verified the talos regression is fixed, see previous comment [Needs manual test from QE? If yes, steps to reproduce]: no [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: not particularly [Why is the change risky/not risky?]: the change is fairly small and the code paths involved are well understood [String changes made/needed]: none
Attachment #8829500 - Flags: approval-mozilla-aurora?
Comment on attachment 8829500 [details] Bug 1331988 - Don't provide DPI and related values until the TabChild has received the the ShowInfo, to avoid doing sync IPC. Fix a perf regression related to tabpaint. Aurora53+.
Attachment #8829500 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: