Closed Bug 951863 Opened 11 years ago Closed 11 years ago

Investigate why un-PGOing nsDebugImpl.cpp caused a 5-10% performance improvement

Categories

(Core :: General, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: benjamin, Assigned: away)

References

Details

(Keywords: perf)

In bug 946877 I set the build flags to remove nsDebugImpl.cpp from LTCG/PGO. Quite by accident, it appears that this improved performance by almost 10% on some benchmarks. This is kinda surprising and frankly a little worrying to me, since it means that we could bet getting huge "free" perf gains out of this. See https://groups.google.com/d/topic/mozilla.dev.tree-management/ZkRWKK-2Gew/discussion for the automated PGO emails. dmajor, next year could you spend some time in the code or profiling or both to figure out why this would actually be happening? I'd love to know: * whether we're miscompiling the file and that accounts for any of the speedup * if it's something special about this particular file, or whether we should investigate removing other files from the PGO build if we know that they are always cold * Any details we can figure out about how the compiles differ
Keywords: perf
Bug 946710 is a performance *regression* from this changeset on the Dromaeo CSS tests. Please add that to your investigation! jmaher, did the Dromaeo perf regression happen on beta as well, or only on inbound/central?
Flags: needinfo?(jmaher)
(In reply to Benjamin Smedberg [:bsmedberg] from comment #2) > Bug 946710 is a performance *regression* from this changeset on the Dromaeo > CSS tests. Please add that to your investigation! No, this changeset was not in the regression range for bug 946710. It was in the improvement range (bug 946710 comment 5).
For reference, here is the dev-tree-management thread from bug 946877 landing on inbound: https://groups.google.com/forum/#!topic/mozilla.dev.tree-management/4l74Jca7irw
I had a case of misreading the data- higher is better for dromaeo and this changeset improved it. In bug 946710 we identified a regression which this specific changeset reverted- more history in bug 946710. Possibly the regression there could help identify the reason for improvement here!
Flags: needinfo?(jmaher)
Or this cset just had general perf improvements which didn't actually fix the cause of the previous regression, just masked it behind an unrelated improvement...
I am able to reproduce this locally using these builds: No PGO for nsDebugImpl: https://tbpl.mozilla.org/?tree=Try&rev=b5397a95155a With PGO for nsDebugImpl: https://tbpl.mozilla.org/?tree=Try&rev=b8e18a7cfed7 I see a consistent 15-20% difference on the a11yr_paint subtest dhtml.html (slightly hacked to remove dependency on Talos machinery): http://hg.mozilla.org/build/talos/file/2bcf422011d1/talos/page_load_test/a11y/dhtml.html Now the fun part, let's see what the profiler says...
I am seeing extremely poor codegen on certain PGO builds from December. The compiler fails to inline tons of trivial functions in important codepaths. Example from nsBlockFrame::ReflowDirtyLines: if (line.next() != end_lines()) 2084 1013f7af 8b55a0 mov edx,dword ptr [ebp-60h] 2084 1013f7b2 8b02 mov eax,dword ptr [edx] 2084 1013f7b4 8d8d58ffffff lea ecx,[ebp-0A8h] 2084 1013f7ba 51 push ecx 2084 1013f7bb 8b4d9c mov ecx,dword ptr [ebp-64h] 2084 1013f7be 894594 mov dword ptr [ebp-6Ch],eax 2084 1013f7c1 e83a220200 call xul!nsBlockFrame::end_lines (10161a00) 2084 1013f7c6 8b10 mov edx,dword ptr [eax] 2084 1013f7c8 52 push edx 2084 1013f7c9 8d4d94 lea ecx,[ebp-6Ch] 2084 1013f7cc e8dfe90000 call xul!nsLineList_iterator::operator!= (1014e1b0) 2084 1013f7d1 84c0 test al,al 2084 1013f7d3 0f8520010000 jne xul!nsBlockFrame::ReflowDirtyLines+0x369 (1013f8f9) On a good build, that reduces to: 2084 1009b244 8b4d8c mov ecx,dword ptr [ebp-74h] 2084 1009b247 8b03 mov eax,dword ptr [ebx] 2084 1009b249 83c150 add ecx,50h 2084 1009b24c 3bc1 cmp eax,ecx 2084 1009b24e 0f855a010000 jne xul!nsBlockFrame::ReflowDirtyLines+0x35e (1009b3ae) On mozilla-beta, PGO started to go awry with the December 9 merge payload, and the "improvement" from bsmedberg's change was a return to nominal inlining behavior: http://graphs.mozilla.org/graph.html#tests=[[223,53,25]]&sel=none&displayrange=90&datatype=running On mozilla-inbound in bug 946710, there was a string of bad-PGO builds starting with a unified-build change, and ending with a (different) unified-build backout: http://graphs.mozilla.org/graph.html#tests=[[72,63,25]]&sel=none&displayrange=90&datatype=running It seems that we are at a tipping point, and either PGO changes or unification changes can push us one way or the other. Perhaps the sheer size of xul is hitting some kind of internal limit in the optimizer?
Looks like all branches are back to normal. I don't know of any further action for this bug other than to keep an eye on things. It does worry me that I can still induce the bad codegen; that means we might see this problem again.
If the bad builds start appearing again, we can revisit this and try un-PGOing and/or de-unifying more files.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INCOMPLETE
Blocks: 946710
You need to log in before you can comment on or make changes to this bug.