Open Bug 1487906 Opened 6 years ago Updated 2 years ago

[meta] Network panel recording overhead

Categories

(DevTools :: Netmonitor, task, P2)

task

Tracking

(Performance Impact:low)

Performance Impact low

People

(Reporter: Harald, Unassigned)

References

(Depends on 2 open bugs, Blocks 1 open bug)

Details

(Keywords: meta, perf:responsiveness, Whiteboard: [fxperf:p3])

Meta bug to keep filing bugs with various root causes.

Loading cnn with ads and network recording: https://perfht.ml/2N8Ljv2

- Looks like of 33sec loading, 9sec on main frame are spent rendering the network panel (probably more, as layout isn’t included)
- Displaylist rendering also busts frame budget with 30ms
- Every frame on the main thread takes about 150-500ms
Dan, any idea on why DisplayList is so expensive (not sure if this is layout related)?
Flags: needinfo?(dholbert)
Maybe mattwoodrow might have ideas on why the DisplayList phase is expensive here?

FWIW: when I run the STR, I can see the network panel getting populated with ~500 rows and ~8 columns, rapidly with many of the cells being individually scrollable (revealed by the fact that they're ellipsized), and with lots of the cells' content being updated on each re-render.

Here are two long DisplayList phases, for reference:
   * 34ms long: https://perfht.ml/2LJMJaQ
   * 25ms long: https://perfht.ml/2LM36n2
Flags: needinfo?(dholbert) → needinfo?(matt.woodrow)
One interesting note: in both of those long displaylist phases from comment 2, we're spending a significant fraction of our time in BaseAllocator::malloc. (16ms of the 34ms section; 7ms of the 25ms section).
There is indeed a lot of sadness in this profile, mainly (heap) memory allocations.

It looks like there are a lot of table cells that have an element with hidden overflow and text-overflow:ellipsis, which appears to be a bit slow.

* Every element with hidden overflow, and ellipsis does a heap allocation for the 'TextOverflow' object to figure out if we need the ellipsis to be rendered. It seems like this could be on the stack.

* overflow:hidden introduces a clip, and we track each of these in mClipChainsToDestroy (for retained-dl). Retained-dl isn't enabled for the parent process, so this is a bit of a waste.
    - Enabling parent process RDL (layout.display-list.retain.chrome) might help here.
    - We could probably store the linked list of clip chains using pointers on the items themselves, rather than in an auxilliary heap allocation.

* We keep a hashtable of allocated clips so that we can avoid duplicates. Useful sometimes, but each clip is actually different here, so checking and inserting into the hashtable doesn't help us.
    - We might be able to guess which clips are unlikely to show up again and skip the hashtable.
    - Again, RDL might help, since we'd at least hit the re-use path on later paints.

* Looks like the code is using -moz-element for background images, which is doing something slow allocating a URI object. No idea why, but -moz-element isn't well optimized. Avoiding this if possible would be good.

* Table display list building allocates nsTArrays, which is more slow allocations. We have AutoTArray<1>, but maybe it should use a bigger constant.

* Looks like the ellipsis code builds a hashtable and then clears it? Wasteful, we could probably do better.

* Something is causing a lot of the cell contents to be considered animated (some of the time?), which triggers an allocation. Using layers.draw-borders shows us occasionally using separate layers for every line, which is unlikely to be helpful.
    - Is something using will-change or a transform/opacity animation? Not sure how to inspect the markup for this.

* Invalidation of tables is known to be problematic (though unrelated to the display list times). Paint flashing shows a lot of repainting on this, which is likely slow.
Flags: needinfo?(matt.woodrow)
CC'ing Miko and Doug, one of whom might have time to look into this a bit more.
Taking a look.
Assignee: nobody → dothayer
Status: NEW → ASSIGNED
Whiteboard: [fxperf:p1]
Matt: do you think a fade-out filter (like we have at the end of tab-titles in the tabstrip), plus clipping, might be more efficient than overflow:hidden here?  Or is there anything else that might be more efficient for us to do on the devtools side here?
Flags: needinfo?(matt.woodrow)
(In reply to Daniel Holbert [:dholbert] from comment #7)
> Matt: do you think a fade-out filter (like we have at the end of tab-titles
> in the tabstrip), plus clipping, might be more efficient than
> overflow:hidden here?  Or is there anything else that might be more
> efficient for us to do on the devtools side here?

Unlikely, filters are very heavyweight.

I think we should just do better with our handling of lots of clips :)

The only suggestion I have would be to remove overflow:hidden if you know the content won't/can't overflow, since then you avoid getting a scroll frame and the clip. Dynamic changes in/out of this state might be expensive though.
Flags: needinfo?(matt.woodrow)
Here's a new recording of comment 0's STR in latest nightly (which has dthayer's fixes for dependent bugs that have been filed so far):
https://perfht.ml/2CPgTKx

As in comment 0, this is a clean load of cnn.com, with content blocking disabled (which I had to manually disable since we block by default now).  The above profile URL is set to only show the main thread (chrome process), to focus on network panel's performance.

This particular load only took 9 seconds, which is much faster than in comment 0, though that's probably due to changes in content on cnn.com and perhaps different hardware between me & Harald.

The longest DisplayList phases I can find there are these 3, all longer than one frame (16ms)
 - 19ms long: https://perfht.ml/2x7cT2S
 - 18ms long: https://perfht.ml/2x89oZF
 - 18ms long: https://perfht.ml/2x8xpjl

(...and there are quite a few others in the 10ms-16ms range, too).

These may not be the lowest hanging fruit anymore, though. I see other somewhat-larger contributors to long refresh-driver-tick times - in particular, a long Rasterize:
 - 29ms long: https://perfht.ml/2xaEaBl

...and some long restyles:
 - 25ms long: https://perfht.ml/2x9vsmN
 - 25ms long: https://perfht.ml/2x9UiTI

I'm noticing that the first 5-8ms of each restyle here are in rayon_core::latch::LockLatch::wait (which wraps pthread_cond_wait) -- is that expected/scary?
Flags: needinfo?(emilio)
That means that we're doing work on the worker threads, so no.

You can toggle the sequential styling checkbox in the profiler or include StyleThreads to see what's going on during that time.
Flags: needinfo?(emilio)
Thanks. I captured a profile with "sequential styling" checkbox ticked. From that profile, here's a:
 - 40ms long restyle: https://perfht.ml/2CR2JbW
 - 38ms long restyle: https://perfht.ml/2CV1vfM
Depends on: 1491111
I filed bug 1491111 for the slow restyles.
Honza, Nicolas,

I just found bug 1392869 #c0 with a benchmark https://blooming-hamlet-93701.herokuapp.com/. It seems to be a great stress test for network logging as well (even though it is done for debugger).

Network panel: https://perfht.ml/2P6Vo9z
Console panel with Network logs: https://perfht.ml/2P9DxPh

Both panels exhibit 2s hangs during the end of the test, mostly time spend in react (not layout, like in past findings).

Are there any actionable follow-up bugs that we should attach to this bug that could improve the situation?
Flags: needinfo?(odvarko)
Flags: needinfo?(nchevobbe)
The last hang we are seeing is probably the update on all the messages to display the response info (e.g. the status code and the time it took to complete the request).

we could only update the messages that are visible at the moment the update comes in, but that would mean having a way to update when the user scrolls to messages that weren't visible when the update came. This could be done with IntersectionObserver and windowing, even if we need to be careful of the lag it could produce while the user scrolls.
Flags: needinfo?(nchevobbe)
(In reply to Nicolas Chevobbe from comment #14)
> The last hang we are seeing is probably the update on all the messages to
> display the response info (e.g. the status code and the time it took to
> complete the request).
> 
> we could only update the messages that are visible at the moment the update
> comes in, but that would mean having a way to update when the user scrolls
> to messages that weren't visible when the update came. This could be done
> with IntersectionObserver and windowing, even if we need to be careful of
> the lag it could produce while the user scrolls.
Yes, the other option is react-virtualized [1], but having the same
async-scrolling issue.

Honza

[1] https://github.com/bvaughn/react-virtualized
Flags: needinfo?(odvarko)
This seems like it might have a pageload impact, specifically for web developers who are trying to measure the performance of their pages. Putting into the qf triage queue.
Whiteboard: [fxperf:p1] → [fxperf:p3][qf]
dthayer is on leave until early January, so clearing this for now.
Assignee: dothayer → nobody
Status: ASSIGNED → NEW
Hey digitarald, do we have a more recent sense of how bad this is?

We triaged this today, and the problem is basically that the DevTools are running in the parent process. Moving them outside of the parent process would probably really help here. Does the DevTools team have plans to render the web developer toolbox OOP?
Flags: needinfo?(hkirschner)
> Hey digitarald, do we have a more recent sense of how bad this is?

Still bad (on a high end machine): https://perfht.ml/2UVqpkg

> Does the DevTools team have plans to render the web developer toolbox OOP?

We don't have near-term or medium-term goals to move off-main-thread. Alex can add more insights if this would help with our Fission work; which our P1 priority for refactoring atm.
Flags: needinfo?(hkirschner) → needinfo?(poirot.alex)
Whiteboard: [fxperf:p3][qf] → [fxperf:p3][qf:p3:responsiveness]
Depends on: 1557862
Blocks: netmonitor-perf
No longer blocks: devtools-performance
Type: enhancement → task
Priority: -- → P2
Summary: [meta] Reduce overhead of Network panel recording → [meta] Network panel recording overhead
Depends on: 1599604
Performance Impact: --- → P3
Whiteboard: [fxperf:p3][qf:p3:responsiveness] → [fxperf:p3]

A little while ago, bug 1529018, bug 1358414 and bug 1533764 should have improved reflow durations.
More recently bug 1439509 helped free lots of memory while the netmonitor was opened.
Otherwise bug 1753177 is about to introduce viewport for the console.
We should be able to do the same in the netmonitor to drastically reduce reflow durations.

Flags: needinfo?(poirot.alex)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.