Closed Bug 1591725 Opened 5 years ago Closed 5 years ago

Consider optimizing gecko android builds for speed (-O2) rather than size (-Oz)

Categories

(Firefox Build System :: Android Studio and Gradle Integration, enhancement)

ARM
Android
enhancement
Not set
normal

Tracking

(firefox74 disabled, firefox75 disabled, firefox76 fixed)

RESOLVED FIXED
mozilla74
Tracking Status
firefox74 --- disabled
firefox75 --- disabled
firefox76 --- fixed

People

(Reporter: acreskey, Assigned: acreskey)

References

Details

Attachments

(3 files)

Currently the android builds are heavily optimized for size, "-Oz"
https://searchfox.org/mozilla-central/rev/7536d7f480a7f18c941a590a2d4c5119d9f52770/old-configure.in#602

I did a quick test where I changed the build flag from "-Oz" to "-O3" and fixed and hacked the resulting link errors.

This looks to be very beneficial performance-wise:

12% improvement in raptor-speedometer-geckoview on all three pgo builds:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6beb413001270be7f351d4cc0579cb21e882161f&newProject=try&newRevision=874e37247c7823ede2e693945d1492635493cd67&framework=10

Numerous double-digit improvements in raptor-tp6 cold and warm loads
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=86d75e90c1e046b69a800362c8444033904f38f0&newProject=try&newRevision=d457d2e7df09a05b7b7c42a8973877b509d61d51&framework=10

As expected, the resulting binary is now larger.
geckoview_example, aarch64, pgo goes from 50.7MB to to 62.6MB
geckoview_example, aarch32, pgo goes from 44.0MB to 54.9MB

James, is this a tradeoff we've looked at before?
To me, this looks well worth the additional binary size.

Flags: needinfo?(snorp)

What are the results with -O2? -O3 usually just brings bloat along for marginal performance benefit.

Yeah I guess I would like to see what -O2 does. That might be a good compromise between size and speed. At one point, though, I think -Os was faster than -O2 because it gave better cache performance. Maybe we could look at -Os again too?

Flags: needinfo?(snorp)

Looks like Chrome may use -O3 unless you specifically request to optimize for size.

https://chromium.googlesource.com/chromium/src/build/config/+/master/compiler/BUILD.gn

Good points, thanks.
I've kicked off -O2 builds.
I did try -Os and it showed gains almost as big on speedometer for the opt-builds. I haven't figured out why yet, but the PGO builds of -Os are not there.

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) (he/him) from comment #4)

Looks like Chrome may use -O3 unless you specifically request to optimize for size.

https://chromium.googlesource.com/chromium/src/build/config/+/master/compiler/BUILD.gn

That's a good data point.

The speedometer results for O2 look very similar to O3 results and the binaries are indeed a bit smaller:

Baseline, left (-Oz) vs -O2 on speedometer
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6beb413001270be7f351d4cc0579cb21e882161f&newProject=try&newRevision=fd97efb7e1590f1976f51a0edd057f72383d7170&framework=10

-O3 vs -O2 on speedometer
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=874e37247c7823ede2e693945d1492635493cd67&newProject=try&newRevision=fd97efb7e1590f1976f51a0edd057f72383d7170&framework=10

And the geckoview_example sizes:

Optimization  AArch  Size, MB
-OZ           32     44.0
-O3           32     54.9
-O2           32     53.7

-OZ           64     50.7
-O3           64     62.6
-O2           64     60.9

(In reply to Andrew Creskey from comment #6)

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) (he/him) from comment #4)

Looks like Chrome may use -O3 unless you specifically request to optimize for size.

https://chromium.googlesource.com/chromium/src/build/config/+/master/compiler/BUILD.gn

That's a good data point.

FYI it looks like on Android chromium does optimize for size. By default chromium uses default_optimization and optimize_max and I don't see anything that turns on optimize_speed on Android. Also optimize_for_size is true on Android (and weirdly on MacOS too): so the full flags should be -Oz -O2 for Android.

So they are using both flags on Android, -Oz and -O2? It seems to me that some options would get overwritten that way.
I did build it and the binaries are the same size as the O2 build. Performance looks good.

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6beb413001270be7f351d4cc0579cb21e882161f&newProject=try&newRevision=d779529c92d9cff9327eb3c0b7cb0b306299a9b8&framework=10

I made a 2nd attempt at building -Os, but while the OPT build succeeds the PGO profiling fails
https://treeherder.mozilla.org/#/jobs?repo=try&revision=cb24f139285bc6d9081a54eb033bd278d31feb22&selectedJob=273279539
Error:
INFO - Failed to install /builds/worker/fetches/geckoview-androidTest.apk on None: ADBError install failed for /builds/worker/fetches/geckoview-androidTest.apk. Got: Performing Push Install
The geckoview-androidTest.apk artifact is built and when I build this locally it installs correctly.
Michael - would you have any ideas on this?

Flags: needinfo?(mshal)

Sharing an idea of :Agi's from slack:

I think the size limit on android is 100mb so having GeckoView be 62mb would be a big ask for non-browser apps.
Maybe we can provide both? (I would expect e.g. Fenix to want more speed)

It does look like the APK limit is 100MB, increased from 50MB in 2015.
Not an area that I know a lot about, but it looks like if your APK is generated from an App Bundle then the limit is 150MB:
https://android-developers.googleblog.com/2019/03/google-mobile-developer-day-at-game.html

But either way, it's a big footprint increase for now-browser apps.
So while the additional build configuration adds a lot of overhead and maintenance, maybe it's the best choice.

There's also some discussion of this in bug 1507636.

dmajor pointed out this code in Chromium:

https://cs.chromium.org/chromium/src/build/config/compiler/BUILD.gn?rcl=97b30d58566267263a872131f9720f1a841f8681&l=641-655

which might help them control code growth a little better. I can't recall offhand whether our automation builds use lld for Android (I don't think they do), but maybe we could translate those bits into something that would work better?

Thanks for the cc, I was unaware of this bug.

FWIW I've been investigating using the above flag for all platforms, not just Android. On Windows, we can remove over 9MB from xul.dll with no change in Speedometer. (More+broader testing still needed.)

I was unaware of this bug.

(Perhaps this bug should be under build system in case it might notify others who are interested in this kind of thing?)

(In reply to :dmajor from comment #16)

I was unaware of this bug.

(Perhaps this bug should be under build system in case it might notify others who are interested in this kind of thing?)

Makes sense - I moved the bug but feel free to adjust it.

(In reply to :dmajor from comment #15)

Thanks for the cc, I was unaware of this bug.

FWIW I've been investigating using the above flag for all platforms, not just Android. On Windows, we can remove over 9MB from xul.dll with no change in Speedometer. (More+broader testing still needed.)

That's quite interesting.
If you can help me with a patch for Android I would be very happy to see how it performs and the resulting binary size.

Type: task → enhancement
Component: Performance → Android Studio and Gradle Integration
Product: Core → Firefox Build System

(In reply to Andrew Creskey from comment #10)

I made a 2nd attempt at building -Os, but while the OPT build succeeds the PGO profiling fails
https://treeherder.mozilla.org/#/jobs?repo=try&revision=cb24f139285bc6d9081a54eb033bd278d31feb22&selectedJob=273279539
Error:
INFO - Failed to install /builds/worker/fetches/geckoview-androidTest.apk on None: ADBError install failed for /builds/worker/fetches/geckoview-androidTest.apk. Got: Performing Push Install
The geckoview-androidTest.apk artifact is built and when I build this locally it installs correctly.
Michael - would you have any ideas on this?

I haven't seen an error like that before. Does it happen again if you do a fresh push (so a new instr build in addition to a new run task)?

I diffed the geckoview-androidTest.apk from that push with your -O2 push, and the only differences are in the compiled libraries and incidental files (sha manifests and files containing the hg revision). So it doesn't look like the package was built incorrectly.

The line "Failed to install /builds/worker/fetches/geckoview-androidTest.apk on None" looked suspicious at first, but on further investigation the "None" just comes from the fact that we don't set self.device_name in android_emulator_pgo.py. The device_name is only used in error messages, which explains why things work fine without it.

If it happens again on a re-push, maybe check with gbrown to see if he has any ideas? I'm not sure what else to check here.

Flags: needinfo?(mshal)

(In reply to Andrew Creskey from comment #17)

If you can help me with a patch for Android I would be very happy to see how it performs and the resulting binary size.

I believe this patch ought to do it: https://hg.mozilla.org/try/rev/c37802d5c0ac94a41de9fb3116ce1aa403c27d5d

However, although that patch got impressive wins on Windows and Linux, it only saved a few hundred KB on Android, and only a few hundred KB more when I further lowered the limit to 5. I'm puzzled by why Android behaves so differently.

Depends on: 1592797

(In reply to Michael Shal [:mshal] from comment #18)

I haven't seen an error like that before. Does it happen again if you do a fresh push (so a new instr build in addition to a new run task)?

I diffed the geckoview-androidTest.apk from that push with your -O2 push, and the only differences are in the compiled libraries and incidental files (sha manifests and files containing the hg revision). So it doesn't look like the package was built incorrectly.

The line "Failed to install /builds/worker/fetches/geckoview-androidTest.apk on None" looked suspicious at first, but on further investigation the "None" just comes from the fact that we don't set self.device_name in android_emulator_pgo.py. The device_name is only used in error messages, which explains why things work fine without it.

If it happens again on a re-push, maybe check with gbrown to see if he has any ideas? I'm not sure what else to check here.

Thank you for looking into that Michael - I'm still seeing a mysterious failure on a fresh push so I'll follow up and see what I can find.
-Os is an interesting option.

(In reply to :dmajor from comment #19)

(In reply to Andrew Creskey from comment #17)

If you can help me with a patch for Android I would be very happy to see how it performs and the resulting binary size.

I believe this patch ought to do it: https://hg.mozilla.org/try/rev/c37802d5c0ac94a41de9fb3116ce1aa403c27d5d

However, although that patch got impressive wins on Windows and Linux, it only saved a few hundred KB on Android, and only a few hundred KB more when I further lowered the limit to 5. I'm puzzled by why Android behaves so differently.

dmajor, when I add your -import-instr-limit=10 option to the -O2 build I'm seeing very significant size savings.
Perhaps without the higher level optimizations there weren't that many long functions being imported?
libxul.so for arm32 goes from 84.9MB to to 79.1MB
libxul.so for aarch64 goes from 123.1MB to to 112.9MB
Your patch with -O2
-O2

So updated APK sizes are:

Optimization      AArch  Size, MB
-Oz                 32     44.0
-O2,instr-limit=10  32     50.9
-O2                 32     53.7
-O3                 32     54.9

-Oz                 64     50.7
-O2,instr-limit=10  64     57.0
-O2                 64     60.9
-O3                 64     62.6

-O2,instr-limit=10 looks to have roughly have the size-penalty of -O3, so I'll retitle this bug.

Performance of O2, instr-limit=10against mozilla-central still looks great.

Summary: Consider optimizing gecko android builds for speed (-O3) rather than size (-Oz) → Consider optimizing gecko android builds for speed (-O2) rather than size (-Oz)

(In reply to Andrew Creskey from comment #21)

Perhaps without the higher level optimizations there weren't that many long functions being imported?

Yes, I had just written up a comment speculating that, and we mid-aired. Glad to hear it helped!

I'm trying it with -import-instr-limit=5 now :)

(In reply to Nathan Froyd [:froydnj] from comment #13)

There's also some discussion of this in bug 1507636.

Not that this will be my decision, but the conclusions from bug 1507636 make sense to me -- Fennec is scoring ~10.5 on Speedometer while Chrome is at ~18.
So why increase the binary size just to score a bit higher, 11.2?

But now that we can measure Android page load performance, I think we are actually very close to Chrome.
From these results, Fenix with strict tracking protection is comparable to Chrome on load event timing, and ~10% slower on most visual metrics.

The raptor pageload tests show -O2 being a big win.
So increasing the binary size may make us faster than competing browsers.

I'm running visual metrics tests now, so we'll get a better idea of the impact on SpeedIndex, etc.

Depends on: 1593104

This is how a very tight -import-instr-limit impacts binary size:

Optimization      AArch  Size, MB (geckoview_example.apk)
-Oz                 32     44.0
-O2,instr-limit=1   32     49.2      
-O2,instr-limit=3   32     49.5
-O2,instr-limit=5   32     50.2
-O2,instr-limit=10  32     50.9
-O2                 32     53.7
-O3                 32     54.9

-Oz                 64     50.7
-O2,instr-limit=1   64     55.5                
-O2,instr-limit=3   64     55.7
-O2,instr-limit=5   64     56.2
-O2,instr-limit=10  64     57.0
-O2                 64     60.9
-O3                 64     62.6

So far the speedometer results for all of these combinations are within noise of the plain -O2 build.
I'll run more pageload tests on the weekend when the device farm is in less demand.

Pardon my ignorance, but wouldn't something like -import-instr-limit=1 effectively disable inlining? Is that what "import" means here? Surely that has to affect performance, right?

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) (he/him) from comment #26)

Pardon my ignorance, but wouldn't something like -import-instr-limit=1 effectively disable inlining? Is that what "import" means here? Surely that has to affect performance, right?

I believe this is cross-translation-unit inlining, so old-school inlining would still happen. Additionally, PGO puts a 10-100x multiplier on the limit for hot functions.

I agree that very small numbers somehow feel wrong though, I'm not sure we should try to go chasing every single possible byte. 5 seems like a pretty strict limit already.

I was mostly curious about the degree to which the -import-instr-limit could reduce the binary size.
-import-instr-limit=1 is as low as it gets (-import-instr-limit=0 binaries are the same size, I guess not a lot of single-instructions being imported...).
Looking mostly at speedometer because reproducibility, I think that performance starts to degrade slightly at -import-instr-limit=3, particularly on Pixel 2 pgo.

-O2 (left) compare against -O2, limit=5

-O2 (left) compare against -O2, limit=3

-O2 (left) compare against -O2, limit=1

These are the first visual metric results (Moto G5), cold loads.
They compare the baseline configuration (-Oz) to the (-O2) build.
SpeedIndex and ContentfulSpeedIndex both look to be improved between 8-10%. pageLoadTime is onload event timing, and is also improved, perhaps a bit more.
https://docs.google.com/spreadsheets/d/1g9idJimqLgvwK5QK2HOtOVQro2bshZDD5cYPwxAOKwU/edit#gid=1852601574
These tests were run locally with Browsertime and WebPageReplay recordings.

Looking mostly at speedometer because reproducibility, I think that performance starts to degrade slightly at -import-instr-limit=3, particularly on Pixel 2 pgo.

I've had too many bad experiences with surprise perf regressions from suites that I didn't test, or didn't test to high enough confidence. Even if you have try runs, I highly recommend initially committing a value that is greater than what you want, let it settle for several days, make sure the sheriffs have gone through all their alerts, and only then reduce it further.

(In reply to :dmajor from comment #30)

Looking mostly at speedometer because reproducibility, I think that performance starts to degrade slightly at -import-instr-limit=3, particularly on Pixel 2 pgo.

I've had too many bad experiences with surprise perf regressions from suites that I didn't test, or didn't test to high enough confidence. Even if you have try runs, I highly recommend initially committing a value that is greater than what you want, let it settle for several days, make sure the sheriffs have gone through all their alerts, and only then reduce it further.

That makes sense.
In this bug I would like to simply collect the performance characteristics of each optimization option so that folks can compare them.

The one I'm missing is -Os: I logged Bug 1593785 as I'm attempting to track down the problems with its PGO runs.

Size-wise it's quite promising, closer to -Oz, at least in the opt build. The -Os opt performance wasn't as quite good as -O2's (~8-9% speedometer improvement vs ~10-11%), but it still interesting.

Optimization      AArch  Size, MB (geckoview_example.apk)
-Oz, opt            32     43.6
-Os, opt            32     46.7

-Oz, opt            64     50.2
-Os, opt            64     52.5
Depends on: 1593785

As an anecdote I'll note that in bug 1592981 I got some regressions even at limit=10 on linux/win with pgo and limit=40 on mac without pgo. All it took was for one important function (nsStringBuffer::Release) to be now considered too big for inlining, and some of the more tight-C++-loops benchmarks noticed the change, even though Speedometer alone didn't turn up anything.

Just a thought: if we do modify the optimizations, we may wish to document this or provide a configuration flag so that external GeckoView consumers also have a choice in how large their APKs will be (I'd guess they'd prefer to optimize for APK size over speed).

NI Snorp for awareness: no action needed.

Flags: needinfo?(snorp)

Since Bug 1592981 has landed, builds of -O2 are now smaller relative to the current -Oz.
I've updated the sizes based on a recent push to try (PGO builds).

Optimization        AArch  Size, MB (geckoview_example.apk)
-Oz                   32     43.8
-O2 (instr-limit=10)  32     50.4 (+6.6)

-Oz                   64     50.0
-O2 (instr-limit=10)  64     56.3 (+6.3)

Performance looks to be in the same ballpark, although I have yet to do a visual metric comparison and there are pending jobs:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=c9d0e32140705667e1384d73362216549b65c763&newProject=try&newRevision=dbcf100a4314589debe141c52ec0767abb1fb458&framework=10

I also noticed that 'official' nightlies of geckoview_example are a lot larger than my try pushes.
i.e. these:
https://firefox-ci-tc.services.mozilla.com/tasks/index/gecko.v2.mozilla-central.nightly.latest.mobile/android-api-16-opt
https://firefox-ci-tc.services.mozilla.com/tasks/index/gecko.v2.mozilla-central.nightly.latest.mobile/android-aarch64-opt

Optimization        AArch  Size, MB (geckoview_example.apk)
-Oz official          32     48.3
-Oz official          64     54.5

Digging into the apks, it's looks like this is due to the localization resources in the omni.ja (assets/omni/chrome, etc.)

We've cleared the hurdles in building -Os PGO (Bug 1593785).

This is an updated view of the binary sizes and speedometer improvements (geckoview_example.apk):
I included a slightly tightened variant on -Os where the import-instr-limit is set to 5 instead of the default 10.

Optimization      AArch  Size,MB  Delta   Speedometer Improvement
-Oz                 32     44.1     -        --
-Os instr-limit=5   32     47.9   +3.8     10.9% 
-Os 		    32     48.5   +4.4     11.0%
-O2                 32     50.7   +6.6     11.7%

-Oz                 64     50.3     -        --
-Os instr-limit=5   64     54.1   +3.8     10.1%
-Os 		    64     54.7   +4.4     10.8%
-O2                 64     56.6   +6.3     12.9%

Raptor pageload comparisons:

Oz compared to Os (instr-limit=5) here
Oz compared to Os here
Oz compared to O2 here

Similar to the speedometer results, -O2 looks to be a bit faster than the -Os variants, both of which are significantly faster than -Oz.

Visual metrics tests are running now.

Visual metric results from cold loads on the Moto G5 are here:
https://docs.google.com/spreadsheets/d/1g9idJimqLgvwK5QK2HOtOVQro2bshZDD5cYPwxAOKwU/edit#gid=592319917

Overall these builds look like a 5-7% improvement (even though the same -O2 build looked like a 8-10% improvement in the last run).

There is a lot of noise in these tests and locally it's not possible for me to get the high repeat counts that I can get on try.

Attached image android_build_options_g5.png (deleted) —

I think this plot (speedIndex) gives a good view of the performance on different sites.

Attached image android_build_options_p3.png (deleted) —

The Pixel 3 cold page load visual metrics results are can be found here:
https://docs.google.com/spreadsheets/d/1g9idJimqLgvwK5QK2HOtOVQro2bshZDD5cYPwxAOKwU/edit#gid=584528033

And I attached a plot of the speedIndex.
The relative noise is quite a bit higher here.

As before, the change looks to be less impactful on Pixel 3 compared to G5.

There are numerous 5-10% improvements on SpeedIndex and also many sites that are not affected within the noise.

Sites like jianshu.com with a ~30% rel std deviation end up significantly lowering or raising the geomean based on how the 25 loads played out.

I've noted the PGO build times for these options. O2 may take a bit longer to build but save time in the Instrument/Run stages.
(Caveat: I'm not sure of the variance in these)


Option	aarch  Build  Instrument Run
Oz      32     37     36          28
Os      32     38     46          28
O2      32     40     31          23

Oz      64     37    -            -
Os      64     38    -            -
O2      64     37    -            -

As an experiment,
I compared Oz vs O1 (O1 is ~10-12% faster on speedometer than Oz, and a few points slower than O2)
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=480dd25f9948aed73ecd9683e536b9f47f543cf1&newProject=try&newRevision=5d24e13545108cc8bad1191dba10c20b27552665&framework=10

For performance improvements to page load and speedometer, optimize at -O2 instead of -Oz.

The previous disabling of the outliner, "-mno-outline", was removed as it is not enabled by default with -O2.
(See Bug 1508547 and https://developer.arm.com/docs/101754/latest/armclang-reference/armclang-command-line-options/-moutline-mno-outline)

This is going to land in m-c so that stability can be assessed in a staged rollout to Fenix nightly, org.mozilla.fenix.nightly.
If successful, further experiments will be run to collect data on user engagement, retention, activation rates, and reported performance.

At that point, stake holders will determine if the performance improvements are worth the increased binary size from the -O2 build (~6.5MB increase).

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla74
Flags: needinfo?(marian.raiciof)
Flags: needinfo?(igoldan)
Flags: needinfo?(aionescu)

Thank you, Andrew! Great news!

Flags: needinfo?(marian.raiciof)

Thanks for the heads up!

Flags: needinfo?(igoldan)
Flags: needinfo?(aionescu)

== Change summary for alert #24721 (as of Tue, 21 Jan 2020 08:40:55 GMT) ==

Improvements:

16% build times android-4-0-armv7-api16 pgo instrumented taskcluster-c5.4xlarge 1,945.87 -> 1,636.80
16% build times android-4-0-armv7-api16 pgo instrumented taskcluster-m5.4xlarge 2,009.47 -> 1,693.59
15% build times android-4-0-armv7-api16 pgo instrumented taskcluster-c5d.4xlarge 1,980.25 -> 1,686.91

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=24721

== Change summary for alert #24774 (as of Thu, 23 Jan 2020 06:43:35 GMT) ==

Improvements:

10% raptor-tp6m-allrecipes-geckoview-cold loadtime android-hw-g5-7-0-arm7-api-16 pgo 7,101.17 -> 6,398.00
9% raptor-tp6m-allrecipes-geckoview-cold android-hw-g5-7-0-arm7-api-16 pgo 2,449.88 -> 2,222.98
8% raptor-tp6m-allrecipes-geckoview-cold android-hw-g5-7-0-arm7-api-16 pgo 2,439.74 -> 2,240.65
7% raptor-tp6m-booking-geckoview-cold fcp android-hw-g5-7-0-arm7-api-16 pgo 808.00 -> 754.00
6% raptor-speedometer-geckoview android-hw-g5-7-0-arm7-api-16 pgo 9.45 -> 10.02
6% raptor-speedometer-geckoview android-hw-p2-8-0-android-aarch64 pgo 24.44 -> 25.82
5% raptor-tp6m-wikipedia-geckoview-cold loadtime android-hw-g5-7-0-arm7-api-16 pgo 1,087.38 -> 1,029.50
5% raptor-tp6m-allrecipes-geckoview-cold fcp android-hw-g5-7-0-arm7-api-16 pgo 1,741.96 -> 1,655.12

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=24774

The Fenix stability and pageload vs apk size tradeoff experiment which has just started:
https://github.com/mozilla-mobile/fenix/issues/7795

Backed out from Beta74 to avoid confusion with the Beta migration builds and the experiments running around this change, per Slack discussion. It remains landed on mozilla-central for GV75+.

https://hg.mozilla.org/releases/mozilla-beta/rev/607417212e2592dc01b2dd55b6a88c9c15509450

Beta backout:
== Change summary for alert #25097 (as of Mon, 24 Feb 2020 07:33:38 GMT) ==

Regressions:

20% build times android-4-0-armv7-api16 pgo instrumented taskcluster-c5.4xlarge 1,524.47 -> 1,831.05

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=25097

Depends on: 1670948

With the focus on Speedometer 3, I've re-run the "-O2" performance comparison.
This is how the binary size changes, measured in bytes, looking at the geckoview example from a try push:

                  Arm v7          AArch 64
-Oz (current)    79,681,645      86,908,174

-Os              81,498,693      89,618,699

-O2              82,953,919      90,891,453 

Independent of these optimizations, from comment 35 it looks like geckoview example has grown by about 40 megs over the last three years.

In terms of performance, we are no longer seeing the large 10-12% improvements in speedometer/speedometer 3.
With "-O2" it looks like only 2-3% (although some subtests may show greater improvements)..

If I add in pageload tests, we see some other small 2-3% gains with "-O2":
-O2

From a quick look, "-Os" doesn't seem very promising anymore:
-Os

One idea is to first trim what we would want to add.
So if -O2 adds between 3 and 4 megabytes the final binary in Fenix, we would first invest in finding those savings elsewhere before landing the change.

An experiment is being done in Bug 1831935

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: