1487906 - [meta] Network panel recording overhead

Reporter

Description

•

6 years ago

Meta bug to keep filing bugs with various root causes.

Loading cnn with ads and network recording: https://perfht.ml/2N8Ljv2

- Looks like of 33sec loading, 9sec on main frame are spent rendering the network panel (probably more, as layout isn’t included)
- Displaylist rendering also busts frame budget with 30ms
- Every frame on the main thread takes about 150-500ms

:Harald Kirschner :digitarald

Reporter

Comment 1

•

6 years ago

Dan, any idea on why DisplayList is so expensive (not sure if this is layout related)?

Flags: needinfo?(dholbert)

Daniel Holbert [:dholbert]

Comment 2

•

6 years ago

Maybe mattwoodrow might have ideas on why the DisplayList phase is expensive here?

FWIW: when I run the STR, I can see the network panel getting populated with ~500 rows and ~8 columns, rapidly with many of the cells being individually scrollable (revealed by the fact that they're ellipsized), and with lots of the cells' content being updated on each re-render.

Here are two long DisplayList phases, for reference:
   * 34ms long: https://perfht.ml/2LJMJaQ
   * 25ms long: https://perfht.ml/2LM36n2

Flags: needinfo?(dholbert) → needinfo?(matt.woodrow)

Daniel Holbert [:dholbert]

Comment 3

•

6 years ago

One interesting note: in both of those long displaylist phases from comment 2, we're spending a significant fraction of our time in BaseAllocator::malloc. (16ms of the 34ms section; 7ms of the 25ms section).

Matt Woodrow (:mattwoodrow)

Comment 4

•

6 years ago

There is indeed a lot of sadness in this profile, mainly (heap) memory allocations.

It looks like there are a lot of table cells that have an element with hidden overflow and text-overflow:ellipsis, which appears to be a bit slow.

* Every element with hidden overflow, and ellipsis does a heap allocation for the 'TextOverflow' object to figure out if we need the ellipsis to be rendered. It seems like this could be on the stack.

* overflow:hidden introduces a clip, and we track each of these in mClipChainsToDestroy (for retained-dl). Retained-dl isn't enabled for the parent process, so this is a bit of a waste.
- Enabling parent process RDL (layout.display-list.retain.chrome) might help here.
- We could probably store the linked list of clip chains using pointers on the items themselves, rather than in an auxilliary heap allocation.

* We keep a hashtable of allocated clips so that we can avoid duplicates. Useful sometimes, but each clip is actually different here, so checking and inserting into the hashtable doesn't help us.
- We might be able to guess which clips are unlikely to show up again and skip the hashtable.
- Again, RDL might help, since we'd at least hit the re-use path on later paints.

* Looks like the code is using -moz-element for background images, which is doing something slow allocating a URI object. No idea why, but -moz-element isn't well optimized. Avoiding this if possible would be good.

* Table display list building allocates nsTArrays, which is more slow allocations. We have AutoTArray<1>, but maybe it should use a bigger constant.

* Looks like the ellipsis code builds a hashtable and then clears it? Wasteful, we could probably do better.

* Something is causing a lot of the cell contents to be considered animated (some of the time?), which triggers an allocation. Using layers.draw-borders shows us occasionally using separate layers for every line, which is unlikely to be helpful.
- Is something using will-change or a transform/opacity animation? Not sure how to inspect the markup for this.

* Invalidation of tables is known to be problematic (though unrelated to the display list times). Paint flashing shows a lot of repainting on this, which is likely slow.

Flags: needinfo?(matt.woodrow)

Matt Woodrow (:mattwoodrow)

Comment 5

•

6 years ago

CC'ing Miko and Doug, one of whom might have time to look into this a bit more.

Doug Thayer [:dthayer] (he/him)

Comment 6

•

6 years ago

Taking a look.

Assignee: nobody → dothayer

Status: NEW → ASSIGNED

Whiteboard: [fxperf:p1]

Daniel Holbert [:dholbert]

Comment 7

•

6 years ago

Matt: do you think a fade-out filter (like we have at the end of tab-titles in the tabstrip), plus clipping, might be more efficient than overflow:hidden here?  Or is there anything else that might be more efficient for us to do on the devtools side here?

Flags: needinfo?(matt.woodrow)

Matt Woodrow (:mattwoodrow)

Comment 8

•

6 years ago

(In reply to Daniel Holbert [:dholbert] from comment #7)
> Matt: do you think a fade-out filter (like we have at the end of tab-titles
> in the tabstrip), plus clipping, might be more efficient than
> overflow:hidden here?  Or is there anything else that might be more
> efficient for us to do on the devtools side here?

Unlikely, filters are very heavyweight.

I think we should just do better with our handling of lots of clips :)

The only suggestion I have would be to remove overflow:hidden if you know the content won't/can't overflow, since then you avoid getting a scroll frame and the clip. Dynamic changes in/out of this state might be expensive though.

Flags: needinfo?(matt.woodrow)

Doug Thayer [:dthayer] (he/him)

Updated

•

6 years ago

Depends on: 1489588

Doug Thayer [:dthayer] (he/him)

Updated

•

6 years ago

Depends on: 1489702

Daniel Holbert [:dholbert]

Comment 9

•

6 years ago

Here's a new recording of comment 0's STR in latest nightly (which has dthayer's fixes for dependent bugs that have been filed so far):
https://perfht.ml/2CPgTKx

As in comment 0, this is a clean load of cnn.com, with content blocking disabled (which I had to manually disable since we block by default now).  The above profile URL is set to only show the main thread (chrome process), to focus on network panel's performance.

This particular load only took 9 seconds, which is much faster than in comment 0, though that's probably due to changes in content on cnn.com and perhaps different hardware between me & Harald.

The longest DisplayList phases I can find there are these 3, all longer than one frame (16ms)
 - 19ms long: https://perfht.ml/2x7cT2S
 - 18ms long: https://perfht.ml/2x89oZF
 - 18ms long: https://perfht.ml/2x8xpjl

(...and there are quite a few others in the 10ms-16ms range, too).

These may not be the lowest hanging fruit anymore, though. I see other somewhat-larger contributors to long refresh-driver-tick times - in particular, a long Rasterize:
 - 29ms long: https://perfht.ml/2xaEaBl

...and some long restyles:
 - 25ms long: https://perfht.ml/2x9vsmN
 - 25ms long: https://perfht.ml/2x9UiTI

I'm noticing that the first 5-8ms of each restyle here are in rayon_core::latch::LockLatch::wait (which wraps pthread_cond_wait) -- is that expected/scary?

Flags: needinfo?(emilio)

Emilio Cobos Álvarez (:emilio)

Comment 10

•

6 years ago

That means that we're doing work on the worker threads, so no.

You can toggle the sequential styling checkbox in the profiler or include StyleThreads to see what's going on during that time.

Flags: needinfo?(emilio)

Daniel Holbert [:dholbert]

Comment 11

•

6 years ago

Thanks. I captured a profile with "sequential styling" checkbox ticked. From that profile, here's a:
 - 40ms long restyle: https://perfht.ml/2CR2JbW
 - 38ms long restyle: https://perfht.ml/2CV1vfM

Daniel Holbert [:dholbert]

Updated

•

6 years ago

Depends on: 1491111

Daniel Holbert [:dholbert]

Comment 12

•

6 years ago

I filed bug 1491111 for the slow restyles.

:Harald Kirschner :digitarald

Reporter

Comment 13

•

6 years ago

Honza, Nicolas,

I just found bug 1392869 #c0 with a benchmark https://blooming-hamlet-93701.herokuapp.com/. It seems to be a great stress test for network logging as well (even though it is done for debugger).

Network panel: https://perfht.ml/2P6Vo9z
Console panel with Network logs: https://perfht.ml/2P9DxPh

Both panels exhibit 2s hangs during the end of the test, mostly time spend in react (not layout, like in past findings).

Are there any actionable follow-up bugs that we should attach to this bug that could improve the situation?

Flags: needinfo?(odvarko)

Flags: needinfo?(nchevobbe)

Nicolas Chevobbe [:nchevobbe]

Comment 14

•

6 years ago

The last hang we are seeing is probably the update on all the messages to display the response info (e.g. the status code and the time it took to complete the request).

we could only update the messages that are visible at the moment the update comes in, but that would mean having a way to update when the user scrolls to messages that weren't visible when the update came. This could be done with IntersectionObserver and windowing, even if we need to be careful of the lag it could produce while the user scrolls.

Flags: needinfo?(nchevobbe)

Jan Honza Odvarko [:Honza] (always need-info? me)

Comment 15

•

6 years ago

(In reply to Nicolas Chevobbe from comment #14)
> The last hang we are seeing is probably the update on all the messages to
> display the response info (e.g. the status code and the time it took to
> complete the request).
> 
> we could only update the messages that are visible at the moment the update
> comes in, but that would mean having a way to update when the user scrolls
> to messages that weren't visible when the update came. This could be done
> with IntersectionObserver and windowing, even if we need to be careful of
> the lag it could produce while the user scrolls.
Yes, the other option is react-virtualized [1], but having the same
async-scrolling issue.

Honza

[1] https://github.com/bvaughn/react-virtualized

Flags: needinfo?(odvarko)

Mike Conley (:mconley) (:⚙️)

Comment 16

•

6 years ago

This seems like it might have a pageload impact, specifically for web developers who are trying to measure the performance of their pages. Putting into the qf triage queue.

Whiteboard: [fxperf:p1] → [fxperf:p3][qf]

Mike Conley (:mconley) (:⚙️)

Comment 17

•

6 years ago

dthayer is on leave until early January, so clearing this for now.

Assignee: dothayer → nobody

Status: ASSIGNED → NEW

Mike Conley (:mconley) (:⚙️)

Comment 18

•

6 years ago

Hey digitarald, do we have a more recent sense of how bad this is?

We triaged this today, and the problem is basically that the DevTools are running in the parent process. Moving them outside of the parent process would probably really help here. Does the DevTools team have plans to render the web developer toolbox OOP?

Flags: needinfo?(hkirschner)

:Harald Kirschner :digitarald

Reporter

Comment 19

•

6 years ago

> Hey digitarald, do we have a more recent sense of how bad this is?

Still bad (on a high end machine): https://perfht.ml/2UVqpkg

> Does the DevTools team have plans to render the web developer toolbox OOP?

We don't have near-term or medium-term goals to move off-main-thread. Alex can add more insights if this would help with our Fission work; which our P1 priority for refactoring atm.

Flags: needinfo?(hkirschner) → needinfo?(poirot.alex)

Jean Gong :jgong

Updated

•

6 years ago

Whiteboard: [fxperf:p3][qf] → [fxperf:p3][qf:p3:responsiveness]

:Harald Kirschner :digitarald

Reporter

Updated

•

5 years ago

Depends on: 1557862

:Harald Kirschner :digitarald

Reporter

Updated

•

5 years ago

Blocks: netmonitor-perf
No longer blocks: devtools-performance

Type: enhancement → task

Priority: -- → P2

Summary: [meta] Reduce overhead of Network panel recording → [meta] Network panel recording overhead

:Harald Kirschner :digitarald

Reporter

Updated

•

5 years ago

Depends on: 1599604

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

2 years ago

Performance Impact: --- → P3

Keywords: perf:responsiveness

Whiteboard: [fxperf:p3][qf:p3:responsiveness] → [fxperf:p3]

Alexandre Poirot [:ochameau]

Comment 24

•

2 years ago

A little while ago, bug 1529018, bug 1358414 and bug 1533764 should have improved reflow durations.
More recently bug 1439509 helped free lots of memory while the netmonitor was opened.
Otherwise bug 1753177 is about to introduce viewport for the console.
We should be able to do the same in the netmonitor to drastically reduce reflow durations.

Flags: needinfo?(poirot.alex)

BMO Automation

Updated

•

2 years ago

Severity: normal → S3