Fennec - first page load on cold start is very slow, Gecko “hangs” for 3 seconds on entry level phone
Categories
(Firefox for Android Graveyard :: General, defect, P2)
Tracking
(firefox-esr60 wontfix, firefox-esr68 affected, firefox65 wontfix, firefox66 wontfix, firefox68 wontfix, firefox69 wontfix, firefox70 wontfix)
People
(Reporter: mark.paxman99, Unassigned)
References
(Depends on 1 open bug)
Details
(Keywords: regression)
Attachments
(6 files)
Comment 1•6 years ago
|
||
Comment 4•6 years ago
|
||
Comment 7•6 years ago
|
||
Comment 9•6 years ago
|
||
Reporter | ||
Comment 10•6 years ago
|
||
Reporter | ||
Comment 11•6 years ago
|
||
Updated•6 years ago
|
Comment 12•6 years ago
|
||
Reporter | ||
Comment 13•6 years ago
|
||
Updated•6 years ago
|
Reporter | ||
Comment 14•6 years ago
|
||
Comment 15•6 years ago
|
||
Comment 16•6 years ago
|
||
Comment 17•6 years ago
|
||
Reporter | ||
Comment 18•6 years ago
|
||
Done with today's Nightly based on suggestion in https://bugzilla.mozilla.org/show_bug.cgi?id=1507253#c4
Is it any use?
This is on my slow entry level phone, Snapdragon 412
Timings were very very approximately
0 sec launch up Nightly from adb
~10 sec first paint
~14 sec initial page load complete
~18 sec hit reload button manually to trigger a reload
~23 sec reload complete
~45 sec hit quit button to close & save profile
For my own future reference I used
./adb shell am start -n org.mozilla.fennec_aurora/.App --es env0 MOZ_PROFILER_STARTUP=1 --es env1 MOZ_PROFILER_STARTUP_ENTRIES=8000000 --es env2 MOZ_PROFILER_SHUTDOWN=/storage/emulated/0/profile.json
from platform-tools folder
Reporter | ||
Comment 19•6 years ago
|
||
From the Marker Table there seems to be a 2.5 second period, from 4.343 seconds to 6.776 seconds, when nothing is happening. Perhaps waiting on network stuff??? But what do I know :)
Reporter | ||
Comment 20•6 years ago
|
||
Yeah I did some more checking and it's repeatable.
seconds
from launch
2.639 Page load start of www.mozilla.org
2.798 First network requests for mozilla.org elements
**** these elements take 4000 ms to load! - why??? ****
4.3 Gecko runs out of things to do and sits waiting for network requests to complete
6.8 First data comes in from the network, Gecko gets busy again
So I think the issue is these first network requests take ~4000 ms to complete eg first load after launch, "Load 3: https://www.mozilla.org/media/js/BUNDLES/site.ec263d31b78e.js" takes 3937.6 ms.
On the second load which I triggered manually at 17.7 seconds the same element "Load 3: https://www.mozilla.org/media/js/BUNDLES/site.ec263d31b78e.js" takes only 145 ms
So I think on first load from launch it takes ~4000 ms to get the first network data in compared to ~150 ms for same data on subsequent manual loads.
And if I look at adb logcat timing:- with the profiler active Fennec is somewhat slower than without, so with the profiler running, on launch mozilla.org takes ~8.6 seconds from "page load start" to "page load stop" in adb logcat vs ~4 seconds on subsequent manually triggered loads. Without the profilee running the figures are around ~7.2 and ~3 seconds. Delta between first and subsequent loads is ~4 seconds.
This is on my slow phone, Snapdragon 412.
Reporter | ||
Comment 21•6 years ago
|
||
Take stuff in comment 19 and comment 20 with a pinch of salt
I don't believe what the startup profiler is telling me. For example during the "manually triggered" load at 17.7 seconds it tells me that xxx231d.png takes 2950 ms to load which seems absurd. I tried a test using WebIDE without startup profiling and that element was loading in 2 ms from cache or 74 ms from the network on the same phone.
So I don't know if startup profiling is enormously slowing down the network stack, or if the profiler is reporting bogus results, or what.
I tried the same profiling using my super fast Galaxy S7 and got even weirder results for the first load, lots of 4000 ms delay in the networking and Fennec was super slow.
Gaaaaah it's doing my head in. Any help appreciated.
Comment 22•6 years ago
|
||
AFAIK the profiler certainly has some overhead, especially on mobile.
Reporter | ||
Comment 23•6 years ago
|
||
Yeah I understand that, but here's a profile for my Galaxy S7 which is perhaps 4x faster than my Snapdragon 412 phone, but the load times with the startup profiler running are even longer, the early load "Load 3: ...b78e.js" is a cache hit and yet it reports 2188 ms for that, then "Load 20: ...e034.svg" seems to be downloaded via the network and it takes 5465 ms, for 143 bytes :)
If I look in the Marker Table view there are two entries for "Load 3: ...b78e.js", one is 1.3 ms which could be the real cache hit, then one of 2188 ms which is odd?
And again at 16.938 seconds I when I did the manual reload I get two entries for b78e.js, one of 1.8 ms and one of 70 ms.
I would expect the overhead to be perhaps a consistent percentage of load time, but the numbers here seem bonkers.
I think the startup profiler either causes or reports two loads of each element, one "real", ms or tens of ms and then one "wrong" and enormously long, 100s or 1000s of ms. On two different phones one old one new-ish.
Perhaps I'm using it wrong :)
Help appreciated!
Reporter | ||
Comment 24•6 years ago
|
||
Perf-html.io reports that the startup profiler has 1 ms samples but I don't think that's true. I think they are of the order of 10 microseconds.
I need several million samples eg 4000000 to fully record my entire ~30 second cycle which is:- launch Fennec from adb ; load mozilla.org automatically ; wait a couple of seconds ; reload mozilla.org manually ; wait a couple of seconds ; quit Fennec.
You can see from comment 18 I've been using eight million samples 8000000 in my adb launch. If they really were 1 ms samples that would be over 2 hours of recording :) In fact it's only half a minute.
I guess if the profiler is sampling 100 times faster than you expect it might explain the large performance overhead and the screwy numbers for the network activity etc?
Something odd with the startup profiler.
Reporter | ||
Comment 25•6 years ago
|
||
And the number of samples requested in the adb startup command affect the duration of at least some of the network requests. The screengrab is of two startup profiles with 500k samples (upper) and 8000k samples (lower) and they have been aligned to show the same manual page load which I do about 15 seconds into the adb triggered test run with startup profiling enabled.
The two loads should be substantially the same but they're not, the lower trace has network requests taking ~4000 ms vs ~1000 ms on the upper trace. 1000 ms is still stupid long, it should be milliseconds.
But I think startup profiling affects/slows Gecko's networking code in some way????
I can file a separate bug if you can tell me what to say ;)
Reporter | ||
Comment 26•6 years ago
|
||
Re comment 24 I tried changing the sample interval using MOZ_PROFILER_STARTUP_INTERVAL and it didn't make any difference to the recording time for a given number of samples or to the slow network request issue of comment 25. So I don't think changing MOZ_PROFILER_STARTUP_INTERVAL has any actual effect on the startup profiler and it stays at its ~10 microsecond value.
Reporter | ||
Comment 27•6 years ago
|
||
I'm keen to progress this (I know you're all very busy). Can you connect me with the devtools guys, particularly if there's anyone who can continue the investigation into MOZ_PROFILER_STARTUP_INTERVAL in Fennec. It seems we need to keep peeling back onion layers.
Thanks.
Comment 28•6 years ago
|
||
I've noticed similar behavior (although less pronounced, generally ~1100ms on moto g5) in geckoview_example when loading pages after the initial about:blank.
I use this as the target url https://acreskeymoz.github.io/ where I show a pop-up with the reported performance metrics.
When there's a delay (i.e. the ~1100ms), it's always in the tcp connection. Even if I switch to a different target url and then attach remote debug tools to query performance api I still see the variation in the tcp connection time.
I'm looking at wireshark logs to see what connections are being made and will be disabling some of these features via pref.
It might also be useful to try and capture this with SimplePerf (caveat: I haven't tried this yet, but will).
Reporter | ||
Comment 29•6 years ago
|
||
Hi Andrew, sounds similar
I ran your target url on my slow Snapdragon 412 phone with GeckoView (reference-browser) and got the numbers 121 0 2 25 25 0 0 on first launch and 328 16 2 224 114 33 75 on manual reload. Those are typical, I ran it a few times and not much variation.
Yes, my Big Pause happens after the initial about:blank on both Fennec and Geckoview. The Big Pause happens a few hundred milliseconds after the "real" page load begins, mozilla.org or whatever.
My Big Pause happens whether I load the page over the network or from local storage.
If the page is very simple (eg just some text) and the page load is only a few hundred milliseconds, the page load can complete before the Big Pause. Then start up is very fast, e.g. 3 seconds from launch to progress bar completes and I think the pause happens without the user being aware. But if the page is more complex and runs into the pause the start up immediately goes up dramatically to 8-10 seconds because the pause adds to the page load time rather than happening after. (I think)
For example, www.york.ac.uk/teaching/cws/wws/webpage1.html cold loads in 3 seconds on my slow phone, very very fast. But if I quickly do a manual reload as soon as it's loaded, the manual reload suffers the Big Pause. The Big Pause doesn't seem to affect browser functionality like pinch zoom or the menus.
Timing for reference-browser from adb goes roughly
0.0 launch
1.7 Chrome startup finished
2.3 RUNNING
2.4 start loading about:blank
2.7 page show about:blank
2.7 start loading mozilla.org
3.0 PAUSE begins
6.0 PAUSE ends
7.5 page load stop
whereas a manual reload of mozilla.org takes around a second (nice and fast, GeckoView!)
I think the pause might be the content process hanging, at least I assume [C] in adb logcat is Content process. Excerpt from cold start below, the Big Pause is from 41.175 seconds to 44.335 so 3.2 seconds, and the first event after the pause always seems to be a [C] event. So, content process hanging???????????????
02-07 18:01:41.143 14276 14294 D GeckoViewContent[C]: handleEvent: pagehide
02-07 18:01:41.144 14276 14294 D GeckoViewAutoFill: Clearing auto-fill
02-07 18:01:41.156 14276 14294 D GeckoViewContent[C]: onLocationChange
02-07 18:01:41.157 14276 14294 D GeckoViewProgress[C]: onLocationChange: location=https://www.mozilla.org/en-US/, flags=0
02-07 18:01:41.158 14276 14294 D GeckoViewProgress[C]: ProgressTracker changeLocation https://www.mozilla.org/en-US/
02-07 18:01:41.163 14206 14224 D GeckoViewNavigation: onLocationChange
02-07 18:01:41.165 14206 14224 D GeckoViewProgress: onLocationChange: location=https://www.mozilla.org/en-US/, flags=0
02-07 18:01:41.175 14206 14224 D GeckoViewProgress: onSecurityChange
02-07 18:01:44.335 14276 14294 D GeckoViewProgress[C]: onProgressChange 0/0 176610/176610
02-07 18:01:44.336 14276 14294 D GeckoViewProgress[C]: ProgressTracker handleProgress null 176610/176610
02-07 18:01:44.337 14276 14294 D GeckoViewProgress[C]: ProgressTracker updateProgress
02-07 18:01:44.339 14276 14294 D GeckoViewProgress[C]: ProgressTracker updateProgress data={"prev":10.000057220458984,"uri":"https://www.mozilla.org/en-US/","locationChange":true,"pageStart":true,"pageStop":false,"firstPaint":false,"pageShow":false,"parsed":false,"totalReceived":176611,"totalExpected":176611} progress=30.105762481689453
Reporter | ||
Comment 30•6 years ago
|
||
Some fun, I dusted of my ancient Moto E1 which is so slow I can nearly hear the transistors switching.
I found that I can plug it into WebIDE, fire up Reference Browser, and start a WebIDE recording while the Big Pause is going on. On this Moto the Big Pause happens about 7 seconds after launch and has about 7 seconds duration. WebIDE starts recording about 10 seconds after launch so captures about 4 seconds of the Big Pause.
So this profile goes something like
time from time from WebIDE
launch recording start
0 sec - launch
7 sec - Big Pause begins
10 sec 0 sec WebIDE starts recording
14 sec 4 sec Big Pause ends
23 sec 13 sec Page load complete (mozilla.org)
30 sec 20 sec Manual page reload
So I think the first 4 seconds of the profile recording are of the Big Pause. Any use? (doesn't look like it to me, just devtools stuff).
Comment 31•6 years ago
|
||
Interesting. So that first 4 seconds ... what is going on? I didn't see anything in the content thread.
Maybe try to capture the parent process as well.
What's I find strange about this is that you can reproduce it by simply swipe-killing the app.
There's a similar problem that I'm looking at but reproducing it requires the clearing of the app's data (includes user profile, network cache, safebrowsing data etc). We do this with adb shell pm clear
.
You could try disabling some of the preferences that could affecting the load.
These are some that are disabled by some of the automated tests:
https://searchfox.org/mozilla-central/source/testing/profiles/perf/user.js#23
It's not always easy to load from profile on Android, but there are steps and gotchas here: https://bugzilla.mozilla.org/show_bug.cgi?id=1525661
Comment 32•6 years ago
|
||
I wonder whether that might be related to safebrowsing and/or the URL classifier?
A year ago we've had bug 1420641, where some abnormally long safebrowsing initialisation would block page loads for a few minutes after startup, so maybe even the regular initialisation takes a few seconds.
It can't be extension startup (where a blocking WebRequest listener, as commonly used by ad blockers and the like, will hold all network connections between the moment the extension starts loading until it has initialised far enough to process those requests and decide what to do with them, see bug 1503721 more info), because the reference browser doesn't even have any user-facing extension support.
Reporter | ||
Comment 33•6 years ago
|
||
(In reply to Andrew Creskey from comment #31)
There's a similar problem that I'm looking at but reproducing it requires the clearing of the app's data (includes user profile, network cache, safebrowsing data etc). We do this with
adb shell pm clear
.
IIUC when you delete Firefox app data it deletes the decompressed library files. The first launch after this requires Fennec/GV to decompress them again - which takes ages. Could this be what you see?? Issue below is about Focus & cache clearing but I remember discussing it for Fennec & app data clearing too (I'm sure you know all this already)
https://github.com/mozilla-mobile/focus-android/issues/3853
It's not always easy to load from profile on Android, but there are steps and gotchas here: https://bugzilla.mozilla.org/show_bug.cgi?id=1525661
I am out of my depth with this stuff!
@Andrew, Jan
I tried flipping a few of the safebrowsing prefs including phishing, with no change to the delay. Ditto I noodled with the timeouts on the URL classifier prefs, same story.
I am intrigued by the URL classifier. I didn't see an obvious way of turning it off. Any suggestions?
Comment 34•6 years ago
|
||
IUC when you delete Firefox app data it deletes the decompressed library files.
As of bug 1486524, that is no longer the case.
Any suggestions?
No idea. We really need some way of getting a profile of all relevant threads and with working symbols.
Reporter | ||
Comment 35•6 years ago
|
||
Yep, the answer is "be patient" which I'm not very good at! Wait for the devtools guys to take a look at the startup profiling stuff. Cheers.
Comment 36•6 years ago
|
||
Mark -- I'm not sure if it's the case in this bug, but in my case the page load delay is at least mostly caused by the NSS library's cert DB. See: https://bugzilla.mozilla.org/show_bug.cgi?id=1529044
There is a preference for this but it's not exposed.
I can send you apk's that have it enabled, if you like.
I would also like to know if the pause also occurs if you let the browser sit idle (say 1 minute) before loading the target url.
Reporter | ||
Comment 37•6 years ago
|
||
Andrew - send me the apk and I'll have a play. Not sure about the 1 minute, haven't looked for it. Sounds intriguing.
Comment 38•6 years ago
|
||
Mark - here you'll find two apks of geckoview_example:
gv_local_log.apk
- this is effectively a baseline build
gv_local_no_cert_db.apk
- this build has the certdb's disabled
https://drive.google.com/open?id=1QYgFcP_ZKM9jC1tWhfysm1KqN3gvtTNz
I would be curious as to whether you see a performance difference in this case.
Reporter | ||
Comment 39•6 years ago
|
||
Nailed it. I think. Your tweaked gv_local_no_cert_db.apk does not show the slowness in launch that "stock" geckoview_example does.
Here's what I did: installed the various APKs. For each APK, repeatedly run the following sequence
1 launch it from adb and tell it to go to mozilla.org
2 start logcat and watch the zerdatime stamps for the "main" page load start/stop -> "launch load"
3 when page has fully loaded, trigger a manual reload using 3 dot menu,reload -> "manual load"
4 kill geckoview_example from the Android task manager
go back to (1)
With "stock" geckoview (tonight's api 16 build from taskcluster or your baseline) I see the following on my slow phone which is similar to your Moto G5.
After install, the first "launch load" is slow-ish (~3 seconds). I assume cache is being populated. The subsequent "manual load" is fast (~1.2 seconds). Then once I've killed it and relaunched it, subsequent "launch loads" are fast (~2.0 seconds) though not as fast as the "manual loads". I can go round the kill & relaunch cycle several times and the "launch loads" stay fast (~2.0 seconds) BUT after a while (a few minutes) they switch to being slow (~5.0 seconds) and remain like that indefinitely. If I uninstall and reinstall geckoview_example or (I think) clear data & cache and Force Stop, the fast "launch loads" return for a few minutes.
With your special build gv_local_no_cert_db.apk this switch from fast "launch loads" to slow "launch loads" doesn't seem to happen. I've tested for about an hour now which is more than enough to make the switch on stock geckoview_example so I am reasonably confident it's a fix.
I see the same issue & fix on my Galaxy S7 but everything happens somewhat faster.
Here's an example of your "stock" variant gv_local_log.apk on my slow BQ Aquaris, table is run number / launch load time / manual load time
BQ Aquaris X5
fresh install
1 2.8 1.3 <---- first "launch load" is slightly slow, cache being populated?
2 2.2 1.3 <---- "launch loads" stay fast for a few minutes
3 2.1 1.3
4 2.1 1.0
5 2.7 1.0
wait few minutes
6 4.9 1.0 <---- switches to slow "launch load"
7 5.0 1.1
8 5.0 1.0
wait few minutes
9 5.3 1.1 <---- slowness persists, I think indefinitely
uninstall and reinstall same apk
10 2.8 1.0 <---- first "launch load" is slightly slow, cache being populated?
11 2.1 1.0 <---- back to fast "launch load"
wait few minutes
12 5.1 1.0 <---- oops, gone slow again.
13 5.0 1.0
14 4.9 1.0
I won't post a table for your fixed gv_local_no_cert_db.apk because it's boring, it looks exactly the same only there isn't the switch to ~5 second slow "launch loads", it stays at ~2 seconds indefinitely (I think).
I will continue to do some checks, it's very tedious but necessary.
So thanks very much! This one has been driving me crazy for months!
Comment 40•6 years ago
|
||
Mark I really appreciate the high-quality work you're putting into this one.
Can you still reproduce the abnormally slow loads when the app is swipe-killed?
Reporter | ||
Comment 41•6 years ago
|
||
The short anser is "Yes", in comment 39 I was swipe killing the app between each test and the slow load persisted between tests. Doing more tests now, I think we are missing something.
Reporter | ||
Comment 42•6 years ago
|
||
OK this was boring
Observations:-
yesterday's geckoview_example.apk from Taskcluster; mozilla.org on my slow Snapdragon 412 phone
- "launch load" is fast after clean install (~2 seconds)
- stays fast for a few seconds or minutes
- then drops down to slow and stays there indefinitely (~5.5 seconds)
- swipe kill doesn't fix (I was swipe killing between each run)
- Force Stop doesn't fix
- Clear data from Android App menu does fix
- then stays fast for a few seconds or minutes
- rinse, repeat
Your special build gv_local_no_cert_db.apk; same page, same phone
- stays fast indefinitely
- its "launch load" speed MIGHT be slightly faster than stock geckoview_example when it's behaving itself, e.g. geckoview_example typically >2.0 seconds, gv_local_no_cert_db typically <2.0 seconds
I also tried with your German eBay page, similar story, but your gv_local_no_cert_db could be A LOT faster on launch loads, eg ~5 seconds vs ~10 seconds. Lots of noise though.
BQ Aquaris X5; Snapdragon 412; Android 7
Clock time, GMT ; "launch load", seconds ; "manual load", second
1615 install yesterday's geckoview_example from taskcluster
1615 3.2 1.0
1615 2.3 1.3
1616 2.1 1.0
1616 2.1 1.1
1617 leave geckoview_example frontmost, phone plugged into ADB
1620 4.9 1.1
1620 4.0 1.1
1621 Force Stop geckoview_example
1622 5.3 1.1
1622 5.3 1.8
1622 5.6 0.9
1622 leave geckoview_example frontmost, phone plugged into ADB
1632 unlock phone
1632 6.7 1.0
1632 5.6 1.1
1633 5.7 1.0
1633 leave geckoview_example frontmost, phone plugged into ADB
1637 5.0 1.0
1637 5.0 0.9
1637 Clear data and force stop
1638 2.6 1.1
1638 2.2 1.1
1639 2.2 1.8
1639 2.0 1.0
1639 leave geckoview_example frontmost, phone plugged into ADB
1705 unlock phone
1705 5.2 1.0
1706 5.5 1.0
1707 Clear data and force stop
1707 2.4 1.6
1707 5.4 1.0
1708 Clear data and force stop
1708 8.0 1.0
1708 2.2 1.4
1708 2.2 1.3
1709 2.1 1.8
1709 2.1 1.3
1715 uninstall & replace with gv_local_no_cert_db.apk
1716 3.0 1.4
1716 2.9 1.4
1717 1.9 1.1
1717 1.9 1.2
1718 1.8 1.0
1718 leave geckoview_example frontmost, phone plugged into ADB
1724 unlock phone
1724 2.0 1.2
1724 1.8 1.2
1725 6.5 1.6
1725 1.9 1.2
1726 1.9 1.2
1726 leave geckoview_example frontmost, phone plugged into ADB
1733 unlock phone
1733 1.8 1.2
1733 2.2 1.1
1734 1.8 1.7
1734 2.7 1.1
1735 leave geckoview_example frontmost, phone plugged into ADB
1739 2.0 1.2
1739 1.8 1.6
1740 1.8 1.2
1740 leave geckoview_example frontmost, phone plugged into ADB
1748 2.5 1.2
1748 2.2 1.2
1748 2.0 1.2
1748 leave geckoview_example frontmost, phone plugged into ADB
1755 2.4 1.1
1755 1.9 1.1
1755 1.6 1.1
===========================================================
load https://m.ebay-kleinanzeigen.de/s-anzeigen/auf-zeit-wg-berlin/zimmer/c199-l3331
1756 uninstall & replace with yesterday's geckoview_example from taskcluster
1757 13.3 15.5
1758 16.0 13.0
1759 13.0 10.0
1800 15.0 18.0
1800 leave geckoview_example frontmost, phone plugged into ADB
1805 13.1 6.0
1806 11.1 12.0
1806 uninstall and replace with gv_local_no_cert_db.apk
1807 11.3 10.0
1808 7.9 8.8
1809 5.0 8.8
1809 4.5 10.0
1810 5.7 7.0
1810 leave geckoview_example frontmost, phone plugged into ADB
1816 5.5 5.8
1817 8.4 8.8
1818 5.4 5.0
1818 leave geckoview_example frontmost, phone plugged into ADB
1820 8.8 8.6
1821 5.4 6.3
1822 uninstall & replace with yesterday's geckoview_example from taskcluster
1823 23.0 7.2
1824 8.8 17.0
1825 16.0 7.8
1826 10.0 8.2
1827 11.5 12.0
1831 11.0 11.9
1832 10.8 6.9
Reporter | ||
Comment 43•6 years ago
|
||
Andrew, it's more complex than I thought, I knew we were missing something. This transition from fast "first launch" to slow "first launch" ONLY happens with the geckoview_example.apk I download from taskcluster. It does not happen with either of your builds from comment 38.
i.e.
geckoview_example.apk, api 16 from https://tools.taskcluster.net/index/gecko.v2.mozilla-central.nightly.latest.mobile/android-api-16-opt - switches from fast "launch load" to slow "launch load" after a couple of minutes. I am using today's version which I downloaded at 1030 GMT 22 Feb 2018, but I see the same with yesterday's version and one from a few weeks ago.
gv_local_log.apk - does NOT show the switch
gv_local_no_cert_db.apk - does NOT show the switch
My bad, I was checking against my downloaded versions of geckoview_example not against your baseline. Sorry.
I've checked this as thoroughly as I can, I've used 3 different vintages of geckoview_example.apk from taskcluster and switched between the various different versions.
So what does that mean?
Supporting data below, previously I compared gv_local_no_cert_db.apk with geckoview_example.apk from taskcluster, in the tests below I compared gv_local_log.apk with geckoview_example.apk
www.mozilla.org
uninstall & install gv_local_log.apk
1008 2.9 1.3
1009 2.0 1.3
1009 1.9 1.1
1010 2.4 1.1
1010 leave geckoview_example frontmost, phone plugged into ADB
1013 1.9 1.2
1013 1.8 1.1
1013 leave geckoview_example frontmost, phone plugged into ADB
1018 2.2 1.2
1018 leave geckoview_example frontmost, phone plugged into ADB
1023 2.4 1.1
1027 1.9 1.1
1028 2.1 1.1
uninstall & install today's geckoview_example from taskcluster
(geckoview_example-21-feb-api-16-opt.apk)
1031 2.5 1.0
1031 2.3 4.5
1032 5.5 1.0 <--- launch load goes slow and stays that way
1033 5.1 1.6
1034 5.5 1.1
Clear data & force stop
1035 9.0 1.0
1035 2.1 1.0
1036 1.7 0.9
1037 5.6 1.1 <--- launch load goes slow
uninstall & re-install gv_local_log.apk
1038 2.4 0.8
1038 1.9 1.2
1039 2.4 1.1
1039 leave geckoview_example frontmost, phone plugged into ADB
1042 2.2 1.0
1042 leave geckoview_example frontmost, phone plugged into ADB
1048 2.3 1.1
Reporter | ||
Comment 44•6 years ago
|
||
Me again, FWIW Fennec Nightly shows same behaviour fast "launch load" switching to slow after a few minutes, Clear Data & Force Stop brings back fast again for a few minutes then switches to slow again.
This is standard Fennec Nightly downloaded via Play Store, my version is a few weeks old now but what the hey.
It's doing my head in
Fennec Nightly, slow phone, mozilla.org
clear data, force stop
1124 3.3 2.0
1124 2.4 1.6
1125 2.4 1.4
1126 2.4 1.3
1126 6.5 1.7 <---- "launch load" goes slow
1127 6.4 1.5
1127 6.3 1.4
1128 6.3 1.6
clear data, force stop
1129 3.7 1.5
1129 2.6 1.4
1130 3.0 2.0
leave 15 mins
1146 6.3 1.7 <---- "launch load" goes slow
1147 6.4 1.9
clear data, force stop
1148 1.9 2.3
1148 2.3 1.6
leave 10 mins
1158 6.3 1.8 <---- "launch load" goes slow
1159 6.3 1.6
leave 20 mins
1220 6.4 1.6
Comment 45•6 years ago
|
||
Ok, those are interesting results - there is definitely something else going on here!
I've needinfo'ed myself to profile the cases where load performance drops after a delay (in taskcluster builds).
Btw, we'll be able to profile and capture all threads on Android with remote dev tools once this is complete: https://bugzilla.mozilla.org/show_bug.cgi?id=1529076
Reporter | ||
Comment 46•6 years ago
|
||
Just to close this out for now, I tested your two builds from comment 38 head to head. i.e. answered your original question!
I tried a few sites, guardian.co.uk seems as good as any.
I don't think I see any difference in "launch load" speed between the two builds. There are some weirdly fast outliers but eyeball average is 13-14 seconds load time for both builds. I think I would see the several second difference you mentioned over on the other thread. But I haven't gone to too much effort.
We need some better diagnostic tools than an idiot (aka "me") with a stopwatch!
local_log.apk
www.guardian.co.uk
1446 15 14
1447 5.2 13
1448 10 13
1449 14 6.0
1637 12.8 12.2
1637 14.4 12.7
1638 5.2 10.4
1639 10.6 5.2
gv_local_no_cert_db.apk
www.guardian.co.uk
1641 19.5 5.0
1642 14.0 14.0
1643 15.0 13.2
1644 14.5 13.2
1645 15.5 12.0
1645 14.6 11.7
1646 5.5 13.9
Comment 47•6 years ago
|
||
I wrote a small script to drive these tests on geckoview_example:
https://gist.github.com/acreskeyMoz/9430c66db5d5dfba1c6391efcc2cff7a
Since GeckoViewActivity logs the page load start and stop zerdatimes, I capture that to file from logcat and extracted it with grep -E 'page load stop|page load start'
Pasting the row of zerdatimes to calculate the difference shows the load time for www.mozilla.org
(the other logs are for app load and the about:blank)
https://docs.google.com/spreadsheets/d/1acVc5osNsnanNiHtYAp38kgzCyeOgBRka9G3Lmedc0g/edit#gid=0
(this is using the nightly geckoview_example from the taskcluster link you provided.)
I'm seeing a lot of very slow loads on what's an existing profile and with presumably some of the resources in the local network disk cache.
And some quicker loads slipping in.
Reporter | ||
Comment 48•6 years ago
|
||
Andrew - great - I think you see exactly the same as me.
I assume each vertical column is a "run" with a GV reset at the top and consecutive loops of your script going down the page. And each loop seems to take ~30 seconds.
I think we both see this:-
- <GV reset>
- initial slow launch load 2500 - 3500 ms (because no cached data? and/or bug 1529044?)
- N x 30 second cycles of FAST launch loads, each load ~2000 ms
- switch to SLOW launch loads which persist indefinitely, each load ~4000 ms
N is typically 0 to 9. N=0 is confusing because we don't see the behavior switch FAST to SLOW.
For the three columns of your spreadsheet I see N=2; N=0; N=9.
I think in reality it's not a question of number of loops, it's an actual elapsed time since reset, I think if I leave GV idle the switch occurs anyway after e.g. zero to five minutes.
SLOW is always ~2000 ms slower than FAST, the delta seems a fixed time so even if I use a very complicated page the times become eg 10000 ms FAST and 12000 ms SLOW.
If I use a much faster phone everything speeds up proportionally so the difference between FAST and SLOW decreases somewhat, so I don't think it's a fixed timeout or handshake, it's CPU speed related.
and, this stuff is tentative I haven't really been forensic on it:-
- geckoview_example from taskcluster has the switch from FAST to SLOW after a few minutes (N from zero to ~10)
- gv_local_log.apk from comment 38 is always FAST (N>10 or maybe infinity?)
- (not sure about Fennec, I'll do some tests later)
Even a FAST launch load takes longer than a "normal use" load (tapping the reload button) e.g. for mozilla.org a FAST launch load is ~2000 ms vs ~1200 ms for a "normal use" load. (*). Perhaps this is expected, there might be background stuff going on, but I don't know what that would be in geckoview_example.
So I think there might be three issues
- bug 1529044 slow initial page load, adds ~1000 ms to first page visit after GV reset
- this bug, slow launch loads after some elapsed time since GV reset, adds ~2000 ms to every "launch load"
- (*) bug X, even when both above are taken account of, launch loads still always ~800 ms slower than "normal use" loads
Comment 49•6 years ago
|
||
(In reply to Mark from comment #48)
I assume each vertical column is a "run" with a GV reset at the top and consecutive loops of your script going down the page. And each loop seems to take ~30 seconds.
Yes, with each vertical column starting by clearing all the app data via adb shell pm clear org.mozilla.geckoview_example
I think we both see this:-
- <GV reset>
- initial slow launch load 2500 - 3500 ms (because no cached data? and/or bug 1529044?)
I think the initial slow load is both those things: nothing in cache and bug 1529044, as well as other components doing startup tasks.
- N x 30 second cycles of FAST launch loads, each load ~2000 ms
- switch to SLOW launch loads which persist indefinitely, each load ~4000 ms
Glad I was able to reproduce it, this will be interesting to hunt down.
N is typically 0 to 9. N=0 is confusing because we don't see the behavior switch FAST to SLOW.
For the three columns of your spreadsheet I see N=2; N=0; N=9.
I think in reality it's not a question of number of loops, it's an actual elapsed time since reset, I think if I leave GV idle the switch occurs anyway after e.g. zero to five minutes.
SLOW is always ~2000 ms slower than FAST, the delta seems a fixed time so even if I use a very complicated page the times become eg 10000 ms FAST and 12000 ms SLOW.
If I use a much faster phone everything speeds up proportionally so the difference between FAST and SLOW decreases somewhat, so I don't think it's a fixed timeout or handshake, it's CPU speed related.
and, this stuff is tentative I haven't really been forensic on it:-
- geckoview_example from taskcluster has the switch from FAST to SLOW after a few minutes (N from zero to ~10)
- gv_local_log.apk from comment 38 is always FAST (N>10 or maybe infinity?)
I'll kick off more runs with local builds to get a better idea.
- (not sure about Fennec, I'll do some tests later)
Even a FAST launch load takes longer than a "normal use" load (tapping the reload button) e.g. for mozilla.org a FAST launch load is ~2000 ms vs ~1200 ms for a "normal use" load. (*). Perhaps this is expected, there might be background stuff going on, but I don't know what that would be in geckoview_example.
As we're going to collect in Bug 1529333, there are a lot of components still working when the browser starts accepting input.
Another factor is the network cache -- there's a memory cache and a disk cache. The memory cache is lost when the process is killed.
So I think there might be three issues
- bug 1529044 slow initial page load, adds ~1000 ms to first page visit after GV reset
- this bug, slow launch loads after some elapsed time since GV reset, adds ~2000 ms to every "launch load"
- (*) bug X, even when both above are taken account of, launch loads still always ~800 ms slower than "normal use" loads
Agreed, with some component of bug X covered in Bug 1529333 and memory cache, but still possibly something there :)
Reporter | ||
Comment 50•6 years ago
|
||
Let me know how I can help.
Re bug 1529333, I also filed bug 1506471 ages ago. Clearly there's a lot going on at launch, some of it undesirable :). I think it needs a benchmark to help improve that UX.
Comment 51•6 years ago
|
||
Thanks Mark, will do.
Right now I think our best coverage for this problem is here https://health.graphics/android, under Nimbledroid. These are pages tested right after app launch on a clean profile.
And soon the more detail raptor test suite will cover this case:
https://bugzilla.mozilla.org/show_bug.cgi?id=1518975
Comment 52•6 years ago
|
||
I ran this test a few more times on a new local build and on the nightly --
https://docs.google.com/spreadsheets/d/1acVc5osNsnanNiHtYAp38kgzCyeOgBRka9G3Lmedc0g/edit#gid=0&range=K1
I think it's fair to say that the nightly has very different performance characteristics from my local release build.
The big pause is very evident in Nightly from those numbers.
I'll compare the about:buildconfig's of each build to see if the difference is there.
I did capture a longer load profile of Nightly, but we may need more threads included to find the root cause:
https://perfht.ml/2SG5Vtu
Comment 53•6 years ago
|
||
A next good step could be to use the https://perf-html.io/compare/ feature to try and find the reason for the slow load.
Reporter | ||
Comment 54•6 years ago
|
||
re comment 52, good at least we see the same thing! And hopefully you've got some clues. The payoff from this and the certificate stuff should be worth the effort, GV spends a big chunk of time spent doing ... stuff !
re comment 53 I spent ages trying to get startup profiling to work, see comment 21, I think there are some bugs there and ATM I think it gives highly suspect answers...
Comment 55•6 years ago
|
||
Wow, looking at about:buildconfig from the MOZILLA_OFFICIAL nightly vs my own local --enable-release
there are a lot more differences than I would have thought.
e.g.
MOZ_AUTOMATION=1,
MOZ_ANDROID_POCKET=1,
MOZ_ANDROID_MMA=1 (Mobile Marketing Automation)
From Comment 52 I'm reasonably confident that this bug is only in Nightly, as you observed a while back, Mark.
I think a good task would be to build locally from the config used in Nightly and verify that I can reproduce the delayed big pause.
Reporter | ||
Comment 56•6 years ago
|
||
(In reply to Andrew Creskey from comment #55)
From Comment 52 I'm reasonably confident that this bug is only in Nightly, as you observed a while back, Mark.
If by this you mean "not in Release and Beta" I think I disagree. I see the same behavior on Firefox Release at least. I haven't gone into it exhaustively though & am not running the latest greatest Release on my phone, but not sure that matters.
I think a lot of my earlier conclusions were before I realised that Fennec is fast for a few minutes after first install and then goes slow. So some of my comparisons were of e.g. a freshly installed "clean and fast" Release vs a "stale and slow" Nightly. Not apples-to-apples.
BQ Aquaris X5 (Snapdragon 412), Firefox 65.0 Release, mozilla.org, zerdatime timing
launch load manual reload
a few days since reset 6300 ms 1400 ms
0 mins after reset 3300 ms 2100 ms
1 mins after reset 3400 ms 2300 ms
2 mins after reset 6400 ms 2100 ms <----- switch to slow launch loads
Comment 57•6 years ago
|
||
(In reply to Mark from comment #56)
(In reply to Andrew Creskey from comment #55)
From Comment 52 I'm reasonably confident that this bug is only in Nightly, as you observed a while back, Mark.
If by this you mean "not in Release and Beta" I think I disagree. I see the same behavior on Firefox Release at least. I haven't gone into it exhaustively though & am not running the latest greatest Release on my phone, but not sure that matters.
Sorry, I should have been more clear - I meant that this bug is not in the local release builds that I had shared, i.e. that it's only in official Mozilla builds.
Comment 58•6 years ago
|
||
Hmm, while capturing profiles I noticed that my own script to reproduce this bug is very timing sensitive.
If I wait 10 seconds after the browser is loaded (instead of the 3 in the script) then I don't see the big delay:
https://docs.google.com/spreadsheets/d/1acVc5osNsnanNiHtYAp38kgzCyeOgBRka9G3Lmedc0g/edit#gid=0&range=AI1
So the scenario I've been looking at is a fresh launch case where loadtime is likely affected by a background task.
I've tried profiling this and capturing every thread I could think of but I haven't found a smoking gun yet.
I did add the dependency on Bug 1529076 so we can capture all threads on android and hopefully track this down.
Reporter | ||
Comment 59•6 years ago
|
||
Yeah I've seen the same, I may or may not have mentioned it in all my waffle upthread.
Comment 60•6 years ago
|
||
A profile from a slow load:
https://perfht.ml/2Y4aYrE
Certainly some network resources start very late (e.g. the last ones being youtube iframe_api and it's included script from y.stimg.com
).
Reporter | ||
Comment 61•6 years ago
|
||
I saw similar but what seemed odd to me is not just the lateness but how slow some of those network resources are, eg 1700 ms to load a PNG on the content process vs tens of ms for similar (?) on the parent process. If you do a profile which has the launch load and then a manual reload of the same page those content process PNGs take tens of milliseconds on the second load. I think. So the whole content process is molasses for several seconds at launch? (I am no expert on profiles).
Comment 62•6 years ago
|
||
I confirmed that the delayed network requests in the content process are due to the long layout reflow.
So the content processes's main thread can't respond to the completed network request callbacks.
In a faster loading case, the reflow takes about half the time: https://perfht.ml/2T3TszX
So at the moment I have to agree that the content process is running slow as molasses but I don't know why.
Comment 63•5 years ago
|
||
Andrew, is this bug still relevant? Is it still a priority P1?
Comment 64•5 years ago
|
||
I'm not sure that it's priority P1, but let me profile this again now that we have simpleperf and can capture all the threads,
Comment 65•5 years ago
|
||
Bugbug thinks this bug is a regression, but please revert this change in case of error.
Comment 66•5 years ago
|
||
Unfortunately, now that I have the tools to analyze this, I haven't been able to reproduce it.
I've been using this script and adb logcat | grep -E 'page load stop|page load start'
on a Moto G5 with the latest nightly
My results are here
Previously I would see load times of 3200-4000 milliseconds in the test scenario and now I'm seeing load times of 1300-1900ms.
I've tried a few variations on the settle time but with no luck (3 seconds used to reproduce it for) .
Only 1 load out of the 50 is slow (2869ms).
Mark, can you still reproduce this?
Reporter | ||
Comment 67•5 years ago
|
||
Yeah I think it’s mostly or completely gone.
www.mozilla.org, Snapdragon 412, zerdatime difference between page load stop and page load start
cold start
1400 ms to 2100 ms, average around 1700 ms
manual reload (3 dot menu, reload)
1000 ms to 1400 ms, average around 1100 ms
Difference between a cold start and a warm start is now only around 600 ms down from around 3000 ms a few months ago. Much better! Perhaps you think the 600 ms is OK given the startup housekeeping which is going on.
GeckoView cold start is still slow compared to Chrome, it still take about 5 seconds to get a readable page on GeckoView vs around 3 seconds on Chrom(ium). But it’s progress!
I wonder what changed and how we stop it regressing back again….
Has GV become more multi-threaded? There are now two instances of runGecko:-
08-08 17:17:12.756 8836 8836 I GeckoViewActivity: zerdatime 6354732 - application start
08-08 17:17:12.949 8836 8852 W GeckoThread: zerdatime 6354925 - runGecko
08-08 17:17:15.268 8858 8875 W GeckoThread: zerdatime 6357244 - runGecko
whereas some months ago IIRC there only used to be one????
Did you see my recent comments against bug 1506099? I think you guys need to track this with an automated test.
Reporter | ||
Comment 68•5 years ago
|
||
I think Fennec Nightly still has the ~3 second pause (ie GeckoView is fixed, Fennec is not).
Comment 69•5 years ago
|
||
(In reply to Mark from comment #67)
Yeah I think it’s mostly or completely gone.
Thank you for checking that Mark.
www.mozilla.org, Snapdragon 412, zerdatime difference between page load stop and page load start
cold start
1400 ms to 2100 ms, average around 1700 msmanual reload (3 dot menu, reload)
1000 ms to 1400 ms, average around 1100 msDifference between a cold start and a warm start is now only around 600 ms down from around 3000 ms a few months ago. Much better! Perhaps you think the 600 ms is OK given the startup housekeeping which is going on.
Yes, it's something that we are focusing on. The warm load will have certain advantages (all those resources in the cache). Even though we revalidate them, it will still be faster to fetch from the cache.
GeckoView cold start is still slow compared to Chrome, it still take about 5 seconds to get a readable page on GeckoView vs around 3 seconds on Chrom(ium). But it’s progress!
Yes, we are working on that too!
I wonder what changed and how we stop it regressing back again….
I wish I knew that as well.
As of this spring we now have Android cold page load tests running in automation.
You've probably seen https://health.graphics/android
The cold load tests are new and a big step forward. Although it's not clear to me that they would catch this one.
Has GV become more multi-threaded? There are now two instances of runGecko:-
08-08 17:17:12.756 8836 8836 I GeckoViewActivity: zerdatime 6354732 - application start 08-08 17:17:12.949 8836 8852 W GeckoThread: zerdatime 6354925 - runGecko 08-08 17:17:15.268 8858 8875 W GeckoThread: zerdatime 6357244 - runGecko
whereas some months ago IIRC there only used to be one????
It's been two processes for a while (parent and content), but that started before my time.
So :cpeterson, this is now only an issue on Fennec (I'll rename the bug).
I don't know when Fennec updates its gecko version -- perhaps we can retest it then?
Updated•5 years ago
|
Comment 70•5 years ago
|
||
(In reply to Andrew Creskey from comment #69)
So :cpeterson, this is now only an issue on Fennec (I'll rename the bug).
I don't know when Fennec updates its gecko version -- perhaps we can retest it then?
Fennec has moved to the ESR 68 channel, so it will only get minor releases with fixes for security fixes and critical bugs until mid-2020. There will be no Fennec 69 or later. If this bug now only affects Fennec and we don't have a fix in hand now, we should not spend time investigating this issue. GeckoView and Fenix are the future! :)
Comment 71•5 years ago
|
||
Thanks Chris.
I did take a shot at trying to catch the culprit on a February 2019 build via simpleperf but I only have thread names and no symbols and was not successful.
Updated•5 years ago
|
Comment 73•4 years ago
|
||
Assignee | ||
Updated•4 years ago
|
Description
•