Closed Bug 1401817 Opened 7 years ago Closed 6 years ago

ActivitySteam's screenshotting appears to hang the main process for multiple seconds, apparently after visiting links

Categories

(Firefox :: New Tab Page, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Firefox 60
Iteration:
61.2 - Apr 9
Tracking Status
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix

People

(Reporter: markh, Unassigned)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [thumbnails] [AS61MVP])

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #1401236 +++

From that bug:

---8<---

I can't seem to trigger this issue reliably, so that's not the best...

In any case, I noticed in recently Nightlies that the main process would hang for multiple seconds occasionally.

Here's a profile of the issue happening:

https://perfht.ml/2hfamef

---8<---

This bug was originally attributed to Sync due to nested event loop spinning done by Sync. However, we are now somewhat confident that the actual problem is in ActivityStream. From IRC:

> 11:25 AM <tcsc> it looks like it is the screenshots still looking at that profile
> 11:25 AM <tcsc> if you invert the call stack it helps
> 11:26 AM <tcsc> specifically this function http://searchfox.org/mozilla-central/source/browser/extensions/activity-stream/lib/Screenshots.jsm#29
> 11:28 AM <•markh> tcsc: I wonder how big the string is?
> 11:28 AM <•markh> I mean, that only looks bad for massive strings?
> 11:29 AM <tcsc> yeah, not sure. hopefully it's not a full size screenshot of anything
> 11:30 AM <tcsc> i'm surprised there's no alternate way of doing this. i'm also surprised that slice() is in the profile and String.fromCharCode on the result isn't

I apologize for the vagueness of the above.
> i'm also surprised that slice() is in the profile and String.fromCharCode on the result isn't
Didn't you see the comment? ;) "Convert bytes to a string using *extremely* fast String.fromCharCode"

The screenshots are probably from top sites, and those are 560x380 for me for 2xdpi based on
https://searchfox.org/mozilla-central/source/browser/app/profile/firefox.js#1620-1624

Whereas activity stream uses them in a 96x96 context. If we assume most people will be using activity stream instead of tiles, then the screenshots could be 192x192 instead, so almost 6x smaller.
I think you could also use try using File and FileReader.prototype.readAsDataURL, which should be async (you'd need to import these via Cu.importGlobalProperties, but they are supported there).
I am a bit worried about marking this wontfix for 57...  The hang makes Firefox totally unusable for multiple seconds, which seems like it would be quite bad after all the energy spent on 57 thus far.

Is there reason to believe I am the only one affected by this?

At the moment, I have disabled AS in about:config so that I can go back to good performance, but I doubt most people would know to do so.

Tim, could you reconsider a fix here for 57, or else elaborate on why wontfix is appropriate?
Flags: needinfo?(tspurway)
Does this happen for all your screenshots? In a test profile, I just wiped all the thumbnails and favicons from a test profile that would show 12 top sites. (thumbnails are only fetched for sites without rich icons):

rm -rf obj/tmp/scratch_user/{favicons.sqlite,thumbnails} && MOZ_PROFILER_STARTUP=1 ./mach run

The total time for processing all 12 top sites screenshots data for me is 26ms. (searched for Screenshots.jsm)

However if I do try with smaller pageThumbs prefs, the screenshots captured are at 240x192 (120x96 @2x):

rm -rf obj/tmp/scratch_user/{favicons.sqlite,thumbnails} && MOZ_PROFILER_STARTUP=1 ./mach run --setpref toolkit.pageThumbs.minHeight=96 --setpref toolkit.pageThumbs.minWidth=96

The total measured time in Screenshots.jsm is 0ms, so there is a benefit of reducing those sizes.

jryans, if you can consistently get it to hang, could you try setting those prefs (might need a restart).
Flags: needinfo?(jryans)
Oh, one caveat of using 96x96 for pageThumbs is that Activity Stream also uses screenshots as a fallback for Highlights bookmarks without preview_image_url, and that looks to be 224x122. Although in most cases, Highlights should be showing history items *with* preview_image_url, and that will be at most 448x448px.

So if setting to 96x96 pageThumbs prefs doesn't help, it might actually be highlights caching the preview_image_url.
Oh, and if you get in a state where it is hanging because it's processing screenshots, you should be able to see what urls it's trying to screenshot:

Cu.import("resource://gre/modules/BackgroundPageThumbs.jsm"); BackgroundPageThumbs._captureQueue.map(v => v.url).join("\n")
I suppose another quick sanity check is do you have any custom pref values for:

browser.pagethumbnails.*
toolkit.pageThumbs.*
(In reply to Ed Lee :Mardak from comment #7)
> I suppose another quick sanity check is do you have any custom pref values
> for:
> 
> browser.pagethumbnails.*
> toolkit.pageThumbs.*

The sizing prefs are all set to their defaults (280x190) currently, and browser.pagethumbnails.storage_version is listed as modified to "3".

I'll try the more in-depth tests you mentioned.
Using m-c from today (2017-09-25) at rev 33b7b8e81b4befcba503c0e48cd5370aeb715085.

For these tests, I used a copy of my default profile, which is large, with many open tabs across many open browser windows.

All of the following are tests of Firefox startup, as suggested by comments in this bug, so that’s a bit different than the hang from visiting links that triggered the initial bug report.

Notes
-----

* `thumbnails` is less obvious to clear for a real profile, since it’s stored separately in the Caches directory (on macOS)

* Highlights and TopSites seem to make many requests for the same URLs
  * Are they fetching separately for every browser window?
  * HighlightsFeed.fetchImage was called ~42 times for ~9 URLs = 378 times
  * TopSitesFeed._fetchIcon was called ~21 times for ~11 URLs = 231 times
  * This doesn’t _seem_ to trigger duplicate _thumbnail captures_, but it wasn’t clear to me if this is expected

* Even with toolkit.pageThumbs.minWidth / toolkit.pageThumbs.minHeight set to 96, PageThumbUtils.getThumbnailSize returns [ 366, 206 ], which is more than 96 * 2 or [ 192, 192 ].
  * My main screen is 2560 x 1440, and screenSizeDivisor is 7, giving 366 x 206, so it ignores the prefs and uses these sizes instead

Startup profile A
-----------------

* Removed thumbnails and favicon.sqlite
* Start Firefox with default capture sizes
* Capture queue grows up to 20, waited for it drain to 0, then I collected the profile

https://perfht.ml/2ht43au

Work associated with Screenshots: 5,125 ms

Startup profile B
-----------------

* Changed toolkit.pageThumbs.minWidth / toolkit.pageThumbs.minHeight to 96
* Removed thumbnails and favicon.sqlite
* Start Firefox with adjusted capture sizes
* Capture queue grows up to 20, waited for it drain to 0, then I collected the profile

https://perfht.ml/2hsxQAi

Work associated with Screenshots: 4,793 ms

Startup profile C
-----------------

* Hacked PTU.getThumbnailSize to always return [ 192, 192 ] and not use the prefs
  * Some thumbnails still captured at 448 x 448, so I guess that’s Highlights with preview?
* Removed thumbnails and favicon.sqlite
* Start Firefox with adjusted capture sizes
* Capture queue grows up to 20, waited for it drain to 0, then I collected the profile

https://perfht.ml/2hsY1qz

Work associated with Screenshots: 3,373 ms

Summary
-------

It seems like reducing thumbnail size does help, but even at smaller sizes, we still appear to do multiple seconds of work (for my profile at least).
Flags: needinfo?(jryans)
Thanks for the profiles and details. The multiple calls to fetchImage/_fetchIcon are known and a result of initialization and opening an initial about:newtab/about:home that could be optimized. Those duplicate calls are short circuited to reuse an existing promise that actually fetches the screenshot (although the duplicate requests make probably unnecessary places queries).

Comparing A to C roughly reading out the times to process each image and calculating the ratios:

372	113	0.3037634409
124	26	0.2096774194
238	53	0.2226890756
127	32	0.2519685039
28	14	0.5000000000
144	59	0.4097222222
81	31	0.3827160494
97	70	0.7216494845
272	84	0.3088235294

145	142	0.9793103448
222	181	0.8153153153
1143	909	0.7952755906
302	298	0.9867549669
580	369	0.6362068966
1204	784	0.6511627907

The first set definitely seem like the top sites taking less time to process with the lower resolution while highlights aren't affected as much as they're not affected by the thumbnail size.

Although taking a step back, why can't we just use images without passing around data URIs? Probably something with being in the content process and not wanting to expose content accessible history-based URIs?
Flags: needinfo?(usarracini)
jyrans, any custom prefs for "javascript.*" ? And OS/machine/etc details?

Still pretty confused how my machine can process 12 top site screenshots ~60x faster (assuming you're also using a development machine for these measurements). I believe this is my machine http://www.everymac.com/systems/apple/macbook_pro/specs/macbook-pro-core-i7-2.9-15-late-2016-retina-display-touch-bar-specs.html

We also have some reference machines to get profiles with as well to see how things behave there too.
Flags: needinfo?(jryans)
Attached file about:support report (deleted) —
(In reply to Ed Lee :Mardak from comment #11)
> jyrans, any custom prefs for "javascript.*" ? 

All javascript.* prefs are set to defaults, except "javascript.options.asyncstack" is set to "false".  I don't think that's an issue here because "false" should make it _faster_, if it has any affect at all.  Also, "false" is the default value for Release and Beta.

> And OS/machine/etc details?
> 
> Still pretty confused how my machine can process 12 top site screenshots
> ~60x faster (assuming you're also using a development machine for these
> measurements). I believe this is my machine
> http://www.everymac.com/systems/apple/macbook_pro/specs/macbook-pro-core-i7-
> 2.9-15-late-2016-retina-display-touch-bar-specs.html

Yes, I am also using a Mozilla-supplied development machine.

Computer
--------

macOS 10.12.6
MacBook Pro (15-inch, 2016)
2.9 GHz Intel Core i7
16 GB RAM

https://support.apple.com/kb/SP749?locale=en_US&viewlocale=en_US

Firefox
-------

Nightly 58
No legacy add-ons installed, WebExtensions only.

about:support report attached.
Flags: needinfo?(jryans)
Okay, I think I'm closer to understanding the issue.

With more logging calls, I discovered that a _single_ call to `_bytesToString` is actually fast for me as well, about ~1 ms.

However, the caching behavior in Screenshots.jsm isn't actually working. The _capture_ step only happens once, but the rest of `getScreenshotForURL` to read the file and convert to string is repeated many times for the same URL.

To detect this, I tagged the `link` object coming into `maybeGetAndSetScreenshot` with an ID.  Even though only 20 thumbnails are captured, we actually see 240 unique link objects.

In addition, it looks like Highlights works correctly: although it does call `maybeGetAndSetScreenshot` multiple times per URL, it reuses the same link object for a given URL, so the caching works as expected.

However, with TopSites, it seems to generate a unique link object on each call, so the caching can't work properly.

Does this help you spot the problem?
Flags: needinfo?(edilee)
Flags: needinfo?(tspurway)
(In reply to Ed Lee :Mardak from comment #10)

> Although taking a step back, why can't we just use images without passing
> around data URIs? Probably something with being in the content process and
> not wanting to expose content accessible history-based URIs?


That's right, we can't load remote URLs for images on the newtab page for security concerns, so we had to use data URIs.
Flags: needinfo?(usarracini)
Thanks. It looks like there's some concurrency issues where multiple early requests happen at the same time and both get copies of links without screenshots, and both decide to get screenshots as updating the cache doesn't update the copy.
Assignee: nobody → edilee
Flags: needinfo?(edilee)
Assignee: edilee → nobody
Priority: P2 → P3
Whiteboard: [thumbnails]
Iteration: --- → 60.4 - Mar 12
Priority: P3 → P2
Whiteboard: [thumbnails] → [thumbnails] [AS60MVP]
Blocks: 1437659
Iteration: 60.4 - Mar 12 → ---
Priority: P2 → P3
Iteration: --- → 61.1 - Mar 26
Whiteboard: [thumbnails] [AS60MVP] → [thumbnails] [AS61MVP]
Iteration: 61.1 - Mar 26 → ---
Iteration: --- → 61.2 - Apr 9
Priority: P3 → P2
Blocks: 1445085
I just checked on latest m-c and copied over a places.sqlite without favicons/thumbnails to see the thumbnailing behavior, and looks like there's just a single request for each top site / highlight now. Whereas if I check out m-c that has the fix from bug 1401236 (the bug this was duplicate from), there were multiple requests being made.

Additionally, 6 months ago when this was filed, highlights would more proactively update its contents after regular link visits resulting in thumbnail requests while now the regular highlights updates are delayed.

For the thumbnails that do end up being needed after, it will result in the page / image being loaded in the background. One fix to reduce the size of thumbnails and related processing is bug 1432330.

r1cky, a separate thing I noticed is that now that we have tippy top icons, we will likely unnecessarily fetch a thumbnail when it would soon be replaced by a tippy top icon. Filed bug 1449062 to try to be smarter in avoiding the thumbnail request.

Overall, things seem better now with additional potential fixes to make things better.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Forgot to mention that because highlights updates are delayed, instead of potentially every visited link needing a highlight, only the selected highlights items that would actually get shown to the user would result in thumbnails now.
Blocks: 1449792
Target Milestone: --- → Firefox 60
No longer blocks: 1449792
Component: Activity Streams: Newtab → New Tab Page
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: