Enable PGO for Android ARMv7
Categories
(Firefox Build System :: General, defect, P3)
Tracking
(firefox-esr60 wontfix, firefox65 wontfix, firefox66 wontfix, firefox67 wontfix, firefox68 fixed)
People
(Reporter: blassey, Assigned: mshal)
References
(Depends on 1 open bug)
Details
(Keywords: mobile, perf, Whiteboard: mobilestartupshrink [geckoview:p2])
Attachments
(14 files, 2 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details |
Updated•14 years ago
|
Reporter | ||
Comment 1•14 years ago
|
||
Comment 2•14 years ago
|
||
Comment 3•13 years ago
|
||
Updated•13 years ago
|
Comment 4•13 years ago
|
||
Comment 5•13 years ago
|
||
Comment 6•13 years ago
|
||
Updated•13 years ago
|
Comment 7•13 years ago
|
||
Comment 8•13 years ago
|
||
Updated•13 years ago
|
Comment 9•13 years ago
|
||
Updated•13 years ago
|
Updated•13 years ago
|
Comment 10•13 years ago
|
||
Comment 11•13 years ago
|
||
Comment 12•13 years ago
|
||
Comment 13•13 years ago
|
||
Comment 14•13 years ago
|
||
Comment 15•13 years ago
|
||
Updated•13 years ago
|
Comment 16•13 years ago
|
||
Updated•9 years ago
|
Comment 20•8 years ago
|
||
Updated•7 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 22•6 years ago
|
||
Without this flag, Android PGO profile-use builds may fail with
"Function control flow change detected" errors.
Assignee | ||
Comment 23•6 years ago
|
||
When Android shuts down the ndk process, it doesn't call the registered
atexit() handlers, which is normally where the profile data gets written
to file. Since the PGO test suite closes the browser when it is
finished, the nativeRun routine can manually call out to
__llvm_profile_dump() before returning.
This method has a downside that only the profile data from the calling
library gets written out, rather than for the whole process. Since we
are most interested in optimizing libxul, a new hook is added in
Bootstrap to make sure we get the profile data for the right library.
Depends on D22816
Assignee | ||
Comment 24•6 years ago
|
||
The Fennec process needs a few extra environment variables for PGO,
notably LLVM_PROFILE_FILE and MOZ_JAR_LOG_FILE to give the locations for
the profile run outputs. FennecInstance needs to pass the "env"
parameter on down so it can be used by DeviceRunner.
Depends on D22817
Assignee | ||
Comment 25•6 years ago
|
||
Depends on D22818
Assignee | ||
Comment 26•6 years ago
|
||
test-linux.sh defaults to true for NEED_XVFB, while build-linux.sh
defaults to false. If we are using test-linux.sh from mozharness (rather
than mozharness-test), we need to explicitly set NEED_XVFB to false in
order to not use xvfb.
Depends on D22819
Assignee | ||
Comment 27•6 years ago
|
||
In order to call test-linux.sh with the job-script parameter, it needs
to have executable permissions.
Depends on D22820
Assignee | ||
Comment 28•6 years ago
|
||
The mozharness.py transform passes in "options" parameters through the
MOZHARNESS_OPTIONS environment variable. This will allow the Android PGO
run task to pass in the mozharness script name to test-linux.sh
Depends on D22821
Assignee | ||
Comment 29•6 years ago
|
||
The regex should match '-' characters as well as \w to properly trim the
error message if the device string contains a dash.
Depends on D22822
Assignee | ||
Comment 30•6 years ago
|
||
This is the first stage of the Android PGO task pipeline to generate an
instrumented build.
Depends on D22823
Assignee | ||
Comment 31•6 years ago
|
||
This introduces a mozharness script, android_emulator_pgo.py, to run the
profileserver suite with the PGO-instrumented Android build, and collect
the profile data and jarlog.
The mozharness script contains some redundancy with
build/pgo/profileserver.py, but the additional requirements for Android
to use adb and existing mozharness classes to control the emulator made
it difficult to share the desktop profileserver implementation.
Since this runs similar to a test case, but uses a source checkout
rather than a test tarball, a new MozbaseSourceMixin was created to use
mozbase_source_requirements.txt so ensure that we use the correct paths
for the virtualenv.
Depends on D22824
Assignee | ||
Comment 32•6 years ago
|
||
Depends on D22825
Assignee | ||
Comment 33•6 years ago
|
||
Depends on D22826
Assignee | ||
Comment 34•6 years ago
|
||
Per https://bugzilla.mozilla.org/show_bug.cgi?id=1298921#c14 we won't be able to use marionette.quit() to close the browser. I am testing out the alternate solutions that whimboo suggested.
Updated•6 years ago
|
Assignee | ||
Comment 35•6 years ago
|
||
I am hitting some intermittent failures in the run task, sometimes failing on the Marionette() initialization, and sometimes on the driver.navigate call. I had to bump the Marionette startup_timeout which fixed a few other failures, since the startup delay ran between 360-552s, and I had the timeout set to 480:
gbrown, any idea if the other two cases are something that's easy to fix? I tried to add a loop on the driver.navigate call if it hit an ADBTimeoutError (https://treeherder.mozilla.org/#/jobs?repo=try&revision=f492621762b86cfe0c7406f566b18cbf0c617fc1&selectedJob=234436371) but that just resulted in: "InvalidSessionIdException: Please start a session"
Comment 36•6 years ago
|
||
The Android 4.3 emulator intermittently becomes unresponsive. We have spent a lot of time investigating and never fully understood why; instead, we abandon the task and retry when we detect conditions that look like the emulator is unresponsive. Basically, whenever a mozharness test encounters ADBTimeoutError, it makes sure to return TBPL_RETRY from the mozharness script, and taskcluster retries the task. You can see that here:
All of the blue jobs have a log with content like
https://taskcluster-artifacts.net/a89H6Hd0QIGg3lL3sL6xCA/0/public/logs/live_backing.log
[task 2019-03-18T11:59:17.023Z] 11:59:17 CRITICAL - ADBTimeoutError: args: adb-5554 wait-for-device shell rm /sdcard/tests/logs/mochitest.log; echo adb_returncode=$?, exitcode: None, stdout:
...
[task 2019-03-18T12:04:17.094Z] 12:04:17 CRITICAL - # TBPL RETRY #
[task 2019-03-18T12:04:17.094Z] 12:04:17 WARNING - setting return code to 4
I think you should do the same: in android_emulator_pgo.py, catch ADBTimeoutError and when you see it, exit the script with TBPL_RETRY.
Comment 37•6 years ago
|
||
There is some taskcluster plumbing that translates the TBPL_RETRY into a task retry, like:
I think that will just work for you once your script returns TBPL_RETRY, but if it doesn't, let me know.
Assignee | ||
Comment 38•6 years ago
|
||
Adding TBPL_RETRY works great, thanks! The latest patch contains this fix, as well as a the new image type from bug 1535132. We can land this once that bug lands.
Comment 39•6 years ago
|
||
Comment 40•6 years ago
|
||
Backed out 12 changesets (Bug 632954) for causing Android Bpgo(run) pending jobs CLOSED TREE
https://hg.mozilla.org/integration/autoland/rev/d7e6fff52db3179219d39da39b248c8f2571c048
Comment 41•6 years ago
|
||
Comment 42•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/3dfc0e4f8e7c
https://hg.mozilla.org/mozilla-central/rev/0615c775a0cf
https://hg.mozilla.org/mozilla-central/rev/142ae187478d
https://hg.mozilla.org/mozilla-central/rev/503bcac73583
https://hg.mozilla.org/mozilla-central/rev/53d3443e55d9
https://hg.mozilla.org/mozilla-central/rev/6f5fc0d644dd
https://hg.mozilla.org/mozilla-central/rev/097f141a499d
https://hg.mozilla.org/mozilla-central/rev/26031d362333
https://hg.mozilla.org/mozilla-central/rev/b96dd954a456
https://hg.mozilla.org/mozilla-central/rev/c151ebf303ca
https://hg.mozilla.org/mozilla-central/rev/de8beacc5eb4
https://hg.mozilla.org/mozilla-central/rev/429c96e4de32
Updated•6 years ago
|
Comment 43•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/9954a61f0b3a
https://hg.mozilla.org/mozilla-central/rev/5fce433c8670
https://hg.mozilla.org/mozilla-central/rev/c5e1e425dd5c
https://hg.mozilla.org/mozilla-central/rev/8acddb36a316
https://hg.mozilla.org/mozilla-central/rev/7df96a607be8
https://hg.mozilla.org/mozilla-central/rev/a257e15d267f
https://hg.mozilla.org/mozilla-central/rev/0450380b2693
https://hg.mozilla.org/mozilla-central/rev/578d02d41139
https://hg.mozilla.org/mozilla-central/rev/831b503b1e5c
https://hg.mozilla.org/mozilla-central/rev/407f264cc3ee
https://hg.mozilla.org/mozilla-central/rev/d78d224fbb3f
https://hg.mozilla.org/mozilla-central/rev/d8203a0665d2
Updated•6 years ago
|
Updated•6 years ago
|
Comment 44•6 years ago
|
||
I believe PGO wasn't enabled for Android AARCH64, right? If so, are there plans for that?
Comment 45•6 years ago
|
||
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #44)
I believe PGO wasn't enabled for Android AARCH64, right? If so, are there plans for that?
That is correct. Android PGO has only been enabled for ARMv7 (32-bit) at this time. AArch64/ARM64 support should be ready in 68 or 69. More than half of Fennec users have ARM64 devices, so it's a high priority. I filed bug 1543212 for ARM64.
Comment 46•6 years ago
|
||
I filed bug 1543016, because the addition of the "-pgo" suffix to the platform name brought some confusion in interpreting the Perf dashboards related to Android 32bit.
Could you explain the need for this platform renaming? Is this a temporary change? For example, are you planning to remove the android-hw-p2-8-0-arm7-api-16-pgo/opt
definition from taskcluster/ci/test/test-platforms.yml
in favor of the old android-hw-g5-7-0-arm7-api-16/opt
?
Updated•6 years ago
|
Comment 47•6 years ago
|
||
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #46)
Could you explain the need for this platform renaming? Is this a temporary change? For example, are you planning to remove the
android-hw-p2-8-0-arm7-api-16-pgo/opt
definition fromtaskcluster/ci/test/test-platforms.yml
in favor of the oldandroid-hw-g5-7-0-arm7-api-16/opt
?
^ mshal or chmanchester?
Assignee | ||
Comment 48•6 years ago
|
||
I think jmaher covered this in bug 1543016, in particular in comment https://bugzilla.mozilla.org/show_bug.cgi?id=1543016#c3. Historically for Windows/Linux, PGO has been a separate platform from opt. When enabling Android PGO for the first time in this bug, I followed that example and created a new platform type. What probably adds to the confusion is that there is a separate effort to reduce the number of test jobs, in particular the redundant coverage that PGO tests and opt tests provide. So when we turned on Android PGO builds and tests, jmaher suggested that at the same time we turn off Android opt tests, otherwise we'd run into capacity limitations for the physical device tests. This ends up looking like the platform name changed unnecessarily, but in reality we added a new one and turned off an old one.
I'm not saying this was necessarily the right thing to do in this case, but that was how we got here. I'm also not sure if there's a way to unify the two platforms (ie: have the tests named "opt" for consistency, but have them run against the PGO build, while also keeping the opt build enabled).
Updated•5 years ago
|
Updated•5 years ago
|
Description
•