Open Bug 1438989 Opened 6 years ago Updated 1 year ago

Allow to cancel page-icon fetches if the menu requesting them closes, or give those events a lower priority

Categories

(Toolkit :: Places, defect, P3)

defect

Tracking

()

Performance Impact low

People

(Reporter: mconley, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf, perf:frontend, perf:responsiveness, Whiteboard: [snt-scrubbed][places-performance])

I just had a sluggish tab close, and was able to capture a profile:

https://perfht.ml/2ByeAcr

The profile is dominated with a "NotifyIconObservers::Run" function that seems to be doing a lot on the main thread. About 301ms worth.

That seems bad.
(In reply to Mike Conley (:mconley) (:⚙️) (Totally backlogged on reviews and needinfos) from comment #0)

> The profile is dominated with a "NotifyIconObservers::Run" function that
> seems to be doing a lot on the main thread. About 301ms worth.

In this profile I see permitUnload taking 361ms, and spinning the event loop during that time, causing PageIconProtocolHandler.js to run some random code (in small chunks that probably wouldn't cause jank).
(In reply to Florian Quèze [:florian] from comment #1)
> (In reply to Mike Conley (:mconley) (:⚙️) (Totally backlogged on reviews and
> needinfos) from comment #0)
> 
> > The profile is dominated with a "NotifyIconObservers::Run" function that
> > seems to be doing a lot on the main thread. About 301ms worth.
> 
> In this profile I see permitUnload taking 361ms, and spinning the event loop
> during that time, causing PageIconProtocolHandler.js to run some random code
> (in small chunks that probably wouldn't cause jank).

I take back "(in small chunks that probably wouldn't cause jank)", this is just a supposition, I don't see clearly in the profile if the time spent in mozilla::places::NotifyIconObservers::Run is happening in one chunk or in several small chunks.
Keywords: perf
Priority: -- → P2
Whiteboard: [fxperf][qf] → [fxperf][qf][fxsearch]
I talked to mstange IRL about this, and I think I can help clear this up.

The stack in perf.html says:

...
  ...
    newChannel2/<
      serveIcon

mak was concerned because newChannel2 does not call serveIcon directly - it is supposed to queue it to run asynchronously via PlacesUtils.favicons.getFaviconDataForPage.

However, what I learned today is that the /< at the end of newChannel2 isn't just noise and shouldn't be ignored; the /< means that the next frame beneath it was defined as a lambda within the /< function. So this is saying that serveIcon is a lambda inside newChannel2 - _not_ that newChannel2 calls serveIcon.
Almost: newChannel2/< means "this is an anonymous lambda that was created inside a function called newChannel2". So in your case, the anonymous lambda is the caller, and serveIcon is called by that lambda.
Got it, thanks!
That makes more sense.

(In reply to Markus Stange [:mstange] from comment #8)
> NotifyIconObservers runnables were dispatched *before* the onPageChanged
> notification was dispatched.

NotifyIconObservers is a runnable dispatched by the async thread to the main thread, and when it runs it call onPageChanged.
the order looks correct regarding that.
Excluding the high priority queue, since I don't think we even have one for these runnables, it sounds like:
NotifyIconObserver is already enqueued, we close the tab and start spinning, notifyIconObserver runs immediately, it sends onPageChanged, someone collects that and requires an updated icon (newChannel2).
=> At this point we dispatch a runnable to the async thread, that sends back a runnable on the main thread (serveIcon).
It sounds like only at this point the jank timer event gets enqueued, otherwise it would be served before serveIcon (that is what mstange says as well, IIUC).
I'm still not sure why the timer event is not between the 2 main thread events... maybe the scheduler decided to run our async thread before the thread that dispatches the timer event.

I'm still not sure what to do here, apart from adding artificial delays.
FWIW Looking at the profile it doesn't look to me like bug 1340498 will help, unfortunately. It looks to me like the time spent in XPConvert is due to converting for the callback, and not for the relatively simple data in the notification. (Also I suspect we didn't inline everything all the way down to nsNavHistory::SendPageChangedNotification, so if the notifications were taking time, I'd expect samples with those frames to be present.)
Giving this an [fxperf:p2] for now. I think we could at least experiment with making the callback less costly. It looks like changing the aData parameter of the callback to a jsval, and manually constructing a Uint8Array from the buffer might be worth exploring?
Whiteboard: [fxperf][qf][fxsearch] → [fxperf:p2][qf][fxsearch]
Whiteboard: [fxperf:p2][qf][fxsearch] → [fxperf:p2][qf:p1][qf:f61][fxsearch]
Whiteboard: [fxperf:p2][qf:p1][qf:f61][fxsearch] → [fxperf:p2][qf:p1][qf:f64][fxsearch]
Whiteboard: [fxperf:p2][qf:p1][qf:f64][fxsearch] → [fxperf:p2][qf:p1:f64][fxsearch]
I've come back around to this, and I think there's a piece of the puzzle missing.

Mainly, I think before I closed my tab, I must have opened up the Bookmarks menu:

https://perfht.ml/2K8sT94

Which, I would imagine, queued up a bunch of favicon requests inside a PlacesView.

So right off the bat, we have a large chunk of favicon requests in flight, and Places is about to get busy retrieving those.

Secondly, I must have been on a page with a beforeunload event handler set. That's why when I decided to close the tab, it started to spin the event loop. Once it started to spin the event loop, I guess that's when it started to service all of the icon events that were in the queue. Once once those events were serviced did the main thread service the message from content saying, "I don't have any beforeunload events".

A few ideas:

1) Since the bookmark menu was likely closed when I closed the tab, I wonder if it makes sense to cancel any in-flight favicon requests from a PlacesView for which the menu has closed. I don't even know if that's possible.

2) I don't know how possible it is, but it might make sense to decrease the priority of favicon events for the Bookmarks menu in the parent process. That menu (and maybe some other menus) are all likely to request a huge slew of favicons all at once... it's probably more important to service content process messages than it is to service favicon events. That might have helped us here, since the "go ahead and close" message from the content process would have taken priority over the favicon loads.

Hey mak, is my mental model here correct? And do you know if it's possible to cancel in-flight favicon loads?

Also cc'ing Mossop, since I think he's been working with favicons lately, and might have some insight here.
Flags: needinfo?(mak77)
(In reply to Mike Conley (:mconley) (:⚙️) (Catching up on needinfos / reviews) from comment #16)
> 1) Since the bookmark menu was likely closed when I closed the tab, I wonder
> if it makes sense to cancel any in-flight favicon requests from a PlacesView
> for which the menu has closed. I don't even know if that's possible.

Technically it's possible. Currently Places views for menus are kept alive, so the menus can be reopened faster, but I imagine image requests are redone every time. There's not much linking the 2 though, we may need to associate those icon loads to a loadgroup and cancel that in browserPlacesViews.js::_onPopupHidden handler (the trees may want something similar). This would probably not cancel the database fetching though, we'd need to forward cancellation from the ui to the channel created by PageIconProtocolHandler.js and from there to nsIFaviconService::getFaviconDataForPage through a canceler function.
So, while it's technically feasible, it sounds like quite some work.

> 2) I don't know how possible it is, but it might make sense to decrease the
> priority of favicon events for the Bookmarks menu in the parent process.

Or more in general for Places views. I'd have nothing against that, but I don't know if technically we already support re-prioritization of the main-thread events queue. Off-hand it sounds like it would be easier to mark certain events as critical/important, than to mark many events as not-critical. Thus, maybe we should give the onbeforeunload event an high priority, rather than reduce priority of everything else?
Flags: needinfo?(mak77)
Okay, thanks mak.

I'm downgrading this to a fxperf:p3 and a qf:p3 because it looks like the favicon loading was being kicked off by the bookmarks menu. I think it's still worth lowering the priority somehow of these favicon requests, but I don't think this is something that needs to be addressed asap.
Whiteboard: [fxperf:p2][qf:p1:f64][fxsearch] → [fxperf:p3][qf:p3:f64][fxsearch]
Whiteboard: [fxperf:p3][qf:p3:f64][fxsearch] → [fxperf:p3][qf:p3][fxsearch]
Performance Impact: --- → P3
Whiteboard: [fxperf:p3][qf:p3][fxsearch] → [fxperf:p3][fxsearch]
Severity: normal → S3
Whiteboard: [fxperf:p3][fxsearch] → [fxsearch]

I think using memory-mapped IO, and a separate read-only connection could help here.
Not sure how feasible would be to cancel those requests, or make those runnables have a lower priority (one may imagine a read-only connection using a lower priority thread for its queries, maybe?)

Summary: Long-running NotifyIconObservers::Run on main-thread in parent process when closing a tab → Allow to cancel page-icon fetches if the menu requesting them closes, or give those events a lower priority
Priority: P2 → P3
Whiteboard: [fxsearch] → [snt-scrubbed][places-performance]
You need to log in before you can comment on or make changes to this bug.