Closed
Bug 951863
Opened 11 years ago
Closed 11 years ago
Investigate why un-PGOing nsDebugImpl.cpp caused a 5-10% performance improvement
Categories
(Core :: General, defect)
Tracking
()
RESOLVED
INCOMPLETE
People
(Reporter: benjamin, Assigned: away)
References
Details
(Keywords: perf)
In bug 946877 I set the build flags to remove nsDebugImpl.cpp from LTCG/PGO. Quite by accident, it appears that this improved performance by almost 10% on some benchmarks. This is kinda surprising and frankly a little worrying to me, since it means that we could bet getting huge "free" perf gains out of this.
See https://groups.google.com/d/topic/mozilla.dev.tree-management/ZkRWKK-2Gew/discussion for the automated PGO emails.
dmajor, next year could you spend some time in the code or profiling or both to figure out why this would actually be happening? I'd love to know:
* whether we're miscompiling the file and that accounts for any of the speedup
* if it's something special about this particular file, or whether we should investigate removing other files from the PGO build if we know that they are always cold
* Any details we can figure out about how the compiles differ
Reporter | ||
Comment 2•11 years ago
|
||
Bug 946710 is a performance *regression* from this changeset on the Dromaeo CSS tests. Please add that to your investigation!
jmaher, did the Dromaeo perf regression happen on beta as well, or only on inbound/central?
Flags: needinfo?(jmaher)
Comment 3•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #2)
> Bug 946710 is a performance *regression* from this changeset on the Dromaeo
> CSS tests. Please add that to your investigation!
No, this changeset was not in the regression range for bug 946710. It was in the improvement range (bug 946710 comment 5).
Comment 4•11 years ago
|
||
For reference, here is the dev-tree-management thread from bug 946877 landing on inbound:
https://groups.google.com/forum/#!topic/mozilla.dev.tree-management/4l74Jca7irw
Comment 5•11 years ago
|
||
I had a case of misreading the data- higher is better for dromaeo and this changeset improved it. In bug 946710 we identified a regression which this specific changeset reverted- more history in bug 946710. Possibly the regression there could help identify the reason for improvement here!
Flags: needinfo?(jmaher)
Reporter | ||
Comment 6•11 years ago
|
||
Or this cset just had general perf improvements which didn't actually fix the cause of the previous regression, just masked it behind an unrelated improvement...
I am able to reproduce this locally using these builds:
No PGO for nsDebugImpl: https://tbpl.mozilla.org/?tree=Try&rev=b5397a95155a
With PGO for nsDebugImpl: https://tbpl.mozilla.org/?tree=Try&rev=b8e18a7cfed7
I see a consistent 15-20% difference on the a11yr_paint subtest dhtml.html (slightly hacked to remove dependency on Talos machinery): http://hg.mozilla.org/build/talos/file/2bcf422011d1/talos/page_load_test/a11y/dhtml.html
Now the fun part, let's see what the profiler says...
I am seeing extremely poor codegen on certain PGO builds from December. The compiler fails to inline tons of trivial functions in important codepaths.
Example from nsBlockFrame::ReflowDirtyLines:
if (line.next() != end_lines())
2084 1013f7af 8b55a0 mov edx,dword ptr [ebp-60h]
2084 1013f7b2 8b02 mov eax,dword ptr [edx]
2084 1013f7b4 8d8d58ffffff lea ecx,[ebp-0A8h]
2084 1013f7ba 51 push ecx
2084 1013f7bb 8b4d9c mov ecx,dword ptr [ebp-64h]
2084 1013f7be 894594 mov dword ptr [ebp-6Ch],eax
2084 1013f7c1 e83a220200 call xul!nsBlockFrame::end_lines (10161a00)
2084 1013f7c6 8b10 mov edx,dword ptr [eax]
2084 1013f7c8 52 push edx
2084 1013f7c9 8d4d94 lea ecx,[ebp-6Ch]
2084 1013f7cc e8dfe90000 call xul!nsLineList_iterator::operator!= (1014e1b0)
2084 1013f7d1 84c0 test al,al
2084 1013f7d3 0f8520010000 jne xul!nsBlockFrame::ReflowDirtyLines+0x369 (1013f8f9)
On a good build, that reduces to:
2084 1009b244 8b4d8c mov ecx,dword ptr [ebp-74h]
2084 1009b247 8b03 mov eax,dword ptr [ebx]
2084 1009b249 83c150 add ecx,50h
2084 1009b24c 3bc1 cmp eax,ecx
2084 1009b24e 0f855a010000 jne xul!nsBlockFrame::ReflowDirtyLines+0x35e (1009b3ae)
On mozilla-beta, PGO started to go awry with the December 9 merge payload, and the "improvement" from bsmedberg's change was a return to nominal inlining behavior: http://graphs.mozilla.org/graph.html#tests=[[223,53,25]]&sel=none&displayrange=90&datatype=running
On mozilla-inbound in bug 946710, there was a string of bad-PGO builds starting with a unified-build change, and ending with a (different) unified-build backout: http://graphs.mozilla.org/graph.html#tests=[[72,63,25]]&sel=none&displayrange=90&datatype=running
It seems that we are at a tipping point, and either PGO changes or unification changes can push us one way or the other. Perhaps the sheer size of xul is hitting some kind of internal limit in the optimizer?
Looks like all branches are back to normal. I don't know of any further action for this bug other than to keep an eye on things. It does worry me that I can still induce the bad codegen; that means we might see this problem again.
Assignee | ||
Comment 10•11 years ago
|
||
If the bad builds start appearing again, we can revisit this and try un-PGOing and/or de-unifying more files.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•