Closed Bug 1458480 Opened 6 years ago Closed 6 years ago

4.41 - 5.13% Heap Unclassified (windows10-64, windows7-32) regression on push c460245ddcff (Mon Apr 30 2018)

Categories

(Core :: Graphics: Layers, defect, P1)

Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox59 --- unaffected
firefox60 --- unaffected
firefox61 - disabled
firefox62 + fixed

People

(Reporter: igoldan, Assigned: rhunt)

References

Details

(Keywords: perf, regression)

Attachments

(1 file)

We have detected an awsy regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=d28c827d1e8538b7dd7c957edd927212c32cbcad&tochange=c460245ddcff73d4a9b24a5f6fb47bb288ca2746

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  5%  Heap Unclassified windows10-64 opt stylo     41,256,503.18 -> 43,371,387.44
  5%  Heap Unclassified windows10-64 pgo stylo     41,697,218.97 -> 43,738,137.94
  5%  Heap Unclassified windows7-32 pgo stylo      34,595,052.62 -> 36,221,533.48
  4%  Heap Unclassified windows7-32 opt stylo      34,174,280.37 -> 35,680,062.83


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=13011

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/AWSY/Tests
Component: Untriaged → Graphics: Layers
Product: Firefox → Core
:rhunt Can you take a look over fixing these big memory regressions?
Flags: needinfo?(rhunt)
I will look to see if there is anything we can do here, but this is most likely overhead from enabling tiling when we aren't hardware accelerated on windows.

Tiling can consume more memory because of the tile pool, especially when painting quicker. We ran into a similar issue with OSX and OMTP, bug 1424325.

This change should also have some significant performance benefits making it a speed-memory tradeoff, but it appears we are no longer testing Skia in talos on windows. Skia is still being tested in AWSY on windows, so only half of the changes are visible here. I've filed bug 1458638 about that.

I'll try and confirm this is tiling overhead and not a bug.
Regardless of the cause, I would assume this is at a minimum us not measuring something we *should* be if it can have this much of an effect on heap unclassified?
(In reply to Ryan VanderMeulen [:RyanVM] from comment #3)
> Regardless of the cause, I would assume this is at a minimum us not
> measuring something we *should* be if it can have this much of an effect on
> heap unclassified?

Yes that's true. Nothing comes to mind that could be the cause of this. I'll give DMD a try and see if that sheds some light on this.

I'm having a hard time getting AWSY working locally for me to debug this. Currently getting:

0:09.44 CRASH: pid:10556. Test:marionette.py. Minidump anaylsed:False. Signature:[None]
Crash dump filename: c:\users\rhunt\appdata\local\temp\tmphhous6.mozrunner\minidumps\7cef1e0e-9d8c-4c7b-b176-f33130061395.dmp
MINIDUMP_STACKWALK not set, can't process dump.
...
IOError: Process crashed (Exit code: 1) (Reason: [Errno 10054] An existing connection was forcibly closed by the remote host)

This is on both my windows machines on the latest m-c (2d95d70298e2) with clean builds.

Not sure what's up. I'll keep investigating.
Blocks: 1454980
Flags: needinfo?(rhunt)
Forgot to keep the ni?.
Flags: needinfo?(rhunt)
(In reply to Ryan Hunt [:rhunt] from comment #4)
> I'm having a hard time getting AWSY working locally for me to debug this.
> Currently getting:
> 
> 0:09.44 CRASH: pid:10556. Test:marionette.py. Minidump anaylsed:False.
> Signature:[None]
> Crash dump filename:
> c:\users\rhunt\appdata\local\temp\tmphhous6.mozrunner\minidumps\7cef1e0e-
> 9d8c-4c7b-b176-f33130061395.dmp
> MINIDUMP_STACKWALK not set, can't process dump.
> ...
> IOError: Process crashed (Exit code: 1) (Reason: [Errno 10054] An existing
> connection was forcibly closed by the remote host)
> 
> This is on both my windows machines on the latest m-c (2d95d70298e2) with
> clean builds.
> 
> Not sure what's up. I'll keep investigating.

In theory you should be able to load that dmp into Visual Studio [1] and at least get us a stack.

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/Debugging_a_minidump
Assignee: nobody → rhunt
Priority: -- → P1
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #6)
> (In reply to Ryan Hunt [:rhunt] from comment #4)
> > I'm having a hard time getting AWSY working locally for me to debug this.
> > Currently getting:
> > 
> > 0:09.44 CRASH: pid:10556. Test:marionette.py. Minidump anaylsed:False.
> > Signature:[None]
> > Crash dump filename:
> > c:\users\rhunt\appdata\local\temp\tmphhous6.mozrunner\minidumps\7cef1e0e-
> > 9d8c-4c7b-b176-f33130061395.dmp
> > MINIDUMP_STACKWALK not set, can't process dump.
> > ...
> > IOError: Process crashed (Exit code: 1) (Reason: [Errno 10054] An existing
> > connection was forcibly closed by the remote host)
> > 
> > This is on both my windows machines on the latest m-c (2d95d70298e2) with
> > clean builds.
> > 
> > Not sure what's up. I'll keep investigating.
> 
> In theory you should be able to load that dmp into Visual Studio [1] and at
> least get us a stack.
> 
> [1]
> https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/
> Debugging_a_minidump

I had to hack some python files to prevent the minidump from being deleted, but with that I was able to see what was crashing and filed bug 1460077.

After that I ran into a startup crash with DMD which I eventually figured out was bug 1426499.

Also ran into that MSVC ICE from an update for good measure.

But now I've finally got AWSY + DMD working locally for me and can reproduce the difference in heap unclassified. Right now it looks like the top unreported blocks are in the skia glyph cache. I'll continue investigating.

I'll drop the ni? as it makes more sense to just be assigned to the bug.
Flags: needinfo?(rhunt)
(In reply to Ryan Hunt [:rhunt] from comment #7)
> (In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment
> #6)
> > (In reply to Ryan Hunt [:rhunt] from comment #4)
> > > I'm having a hard time getting AWSY working locally for me to debug this.
> > > Currently getting:
> > > 
> > > 0:09.44 CRASH: pid:10556. Test:marionette.py. Minidump anaylsed:False.
> > > Signature:[None]
> > > Crash dump filename:
> > > c:\users\rhunt\appdata\local\temp\tmphhous6.mozrunner\minidumps\7cef1e0e-
> > > 9d8c-4c7b-b176-f33130061395.dmp
> > > MINIDUMP_STACKWALK not set, can't process dump.
> > > ...
> > > IOError: Process crashed (Exit code: 1) (Reason: [Errno 10054] An existing
> > > connection was forcibly closed by the remote host)
> > > 
> > > This is on both my windows machines on the latest m-c (2d95d70298e2) with
> > > clean builds.
> > > 
> > > Not sure what's up. I'll keep investigating.
> > 
> > In theory you should be able to load that dmp into Visual Studio [1] and at
> > least get us a stack.
> > 
> > [1]
> > https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/
> > Debugging_a_minidump
> 
> I had to hack some python files to prevent the minidump from being deleted,
> but with that I was able to see what was crashing and filed bug 1460077.
> 
> After that I ran into a startup crash with DMD which I eventually figured
> out was bug 1426499.
> 
> Also ran into that MSVC ICE from an update for good measure.
> 
> But now I've finally got AWSY + DMD working locally for me and can reproduce
> the difference in heap unclassified. Right now it looks like the top
> unreported blocks are in the skia glyph cache. I'll continue investigating.
> 
> I'll drop the ni? as it makes more sense to just be assigned to the bug.

So I thought I was reproducing the regression, but it turns out I was comparing
direct2d, to skia + tiling. Comparing skia to skia + tiling turned up no big
differences locally for me.

Oddly enough, when locally comparing direct2d to skia + tiling, the heap
unclassified regression was 15% on tabs open which is identical to what we're
seeing here. But I can't see how we were running direct2d on awsy before this
patch, or that the patch could have disabled that.

I'm going to attempt a try run with DMD to see if I can get some insight here.
I'll also add some diagnostics to ensure no weird direct2d/skia switch happened
here.
I was able to get some DMD reports from a baseline and new try run. Stack fixing didn't work correctly, but I was able to hack it to work locally for me.

It appears the skia glyph cache really is the thing to blame here. I'll revive my patches that add a memory reporter here and confirm that fixes this regression.

As for why the skia glyph cache is the problem here, I'm not 100% sure. The glyph cache is not per thread, so we may be thrashing it with the eviction policy that skia uses. If this is truly an issue we can investigate how chrome handles this, as they use skia in a similar way here.
Attached patch skia-memory-reporter.patch (deleted) β€” β€” Splinter Review
This adds a memory reporter based on a counter exposed by skia for their font cache. The counter is computed analytically and they use for detecting when they are running into the font cache limit, so I think it should be accurate enough to use.
Attachment #8976534 - Flags: review?(lsalzman)
Attachment #8976534 - Flags: review?(n.nethercote)
Attachment #8976534 - Flags: review?(lsalzman) → review+
Comment on attachment 8976534 [details] [diff] [review]
skia-memory-reporter.patch

Review of attachment 8976534 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/SkMemoryReporter.cpp
@@ +16,5 @@
> +                                 nsISupports* aData,
> +                                 bool aAnonymize)
> +{
> +  MOZ_COLLECT_REPORT(
> +    "explicit/skia-fontcache", KIND_HEAP, UNITS_BYTES,

"explicit/skia-font-cache"?

@@ +17,5 @@
> +                                 bool aAnonymize)
> +{
> +  MOZ_COLLECT_REPORT(
> +    "explicit/skia-fontcache", KIND_HEAP, UNITS_BYTES,
> +    SkGraphics::GetFontCacheUsed(),

The downside of this is that DMD won't be informed about the allocations, and will claim they are unreported. But unless Skia has support for custom allocators (does it?) this is the best that can be done.
Attachment #8976534 - Flags: review?(n.nethercote) → review+
(In reply to Nicholas Nethercote [:njn] from comment #11)
> Comment on attachment 8976534 [details] [diff] [review]
> skia-memory-reporter.patch
> 
> Review of attachment 8976534 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/thebes/SkMemoryReporter.cpp
> @@ +16,5 @@
> > +                                 nsISupports* aData,
> > +                                 bool aAnonymize)
> > +{
> > +  MOZ_COLLECT_REPORT(
> > +    "explicit/skia-fontcache", KIND_HEAP, UNITS_BYTES,
> 
> "explicit/skia-font-cache"?
>

Fixed.

> @@ +17,5 @@
> > +                                 bool aAnonymize)
> > +{
> > +  MOZ_COLLECT_REPORT(
> > +    "explicit/skia-fontcache", KIND_HEAP, UNITS_BYTES,
> > +    SkGraphics::GetFontCacheUsed(),
> 
> The downside of this is that DMD won't be informed about the allocations,
> and will claim they are unreported. But unless Skia has support for custom
> allocators (does it?) this is the best that can be done.

Skia does appear to support implementing custom memory functions and it seems we have customized this a little bit [1]. Is there something we could do here? I didn't see anything in the adding a memory reporter  doc.

My initial attempt at a memory reporter was traversal based for this reason, but unfortunately turned out to be too invasive for this third party library. They use a custom arena allocator here which keeps a linked list of memory blocks using encoded pointers and lambdas that needed to be modified to be sane. [2] The patches worked, but would not be maintainable.

[1] https://searchfox.org/mozilla-central/rev/da499aac682d0bbda5829327b60a865cbc491611/gfx/skia/skia/src/ports/SkMemory_mozalloc.cpp#37
[2] https://searchfox.org/mozilla-central/rev/da499aac682d0bbda5829327b60a865cbc491611/gfx/skia/skia/src/core/SkArenaAlloc.h#68
> Skia does appear to support implementing custom memory functions

Excellent! Take a look at xpcom/base/CountingAllocatorBase.h, which implements a helper class exactly for this situation. gfx/thebes/gfxAndroidPlatform.cpp is an example use that you should be able to use as a guide.
(In reply to Nicholas Nethercote [:njn] from comment #13)
> > Skia does appear to support implementing custom memory functions
> 
> Excellent! Take a look at xpcom/base/CountingAllocatorBase.h, which
> implements a helper class exactly for this situation.
> gfx/thebes/gfxAndroidPlatform.cpp is an example use that you should be able
> to use as a guide.

On further examination, it appears that the sk_malloc functions are not used consistently. Sometimes they are used [1], and sometimes they are not [2].

[1] https://searchfox.org/mozilla-central/rev/da499aac682d0bbda5829327b60a865cbc491611/gfx/skia/skia/src/core/SkMask.cpp#44
[2] https://searchfox.org/mozilla-central/rev/da499aac682d0bbda5829327b60a865cbc491611/gfx/skia/skia/src/core/SkArenaAlloc.cpp#141

The SkArenaAlloc is the one that's most important for this use case because it's used by the font cache. I am not sure why they aren't consistent with this. Lee might know the answer to this, but he is on PTO.

I think for now I'll try and land this as is. If we get an idea for a better solution, we'll do it.
Pushed by rhunt@eqrion.net:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f556e56e1653
Add a skia memory reporter and report on font cache usage. r=lsalzman, r=njn
https://hg.mozilla.org/mozilla-central/rev/f556e56e1653
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Ionut, can you confirm that this patch made the expected impact on heap unclassified?
Flags: needinfo?(igoldan)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #17)
> Ionut, can you confirm that this patch made the expected impact on heap
> unclassified?

Yes, I can confirm that:

== Change summary for alert #13296 (as of Fri, 18 May 2018 13:52:17 GMT) ==

Improvements:

  8%  Heap Unclassified windows7-32 pgo stylo     36,110,815.05 -> 33,336,291.38
  7%  Heap Unclassified windows7-32 opt stylo     35,919,514.89 -> 33,329,121.56
  5%  Heap Unclassified windows10-64 opt stylo    43,167,784.68 -> 40,916,520.79
  4%  Heap Unclassified osx-10-10 opt stylo       72,350,957.79 -> 69,241,654.98
  4%  Heap Unclassified windows10-64 pgo stylo    43,212,202.06 -> 41,528,134.66

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=13296
Flags: needinfo?(igoldan)
Please request Beta approval on this when you get a chance.
Flags: needinfo?(rhunt)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #19)
> Please request Beta approval on this when you get a chance.

Did this memory regression affect beta? The patch causing this regression was a feature flip for nightly only [1], so I don't think this should affect beta.

The new memory reporter is simple so it's not a big deal to uplift, but I just want to be sure it's needed.

[1] https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=d28c827d1e8538b7dd7c957edd927212c32cbcad&tochange=c460245ddcff73d4a9b24a5f6fb47bb288ca2746
Flags: needinfo?(rhunt)
That's a really good point. Let's just have this ride the trains then.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: