1591725 - Consider optimizing gecko android builds for speed (-O2) rather than size (-Oz)

Assignee

Description

•

5 years ago

Currently the android builds are heavily optimized for size, "-Oz"
https://searchfox.org/mozilla-central/rev/7536d7f480a7f18c941a590a2d4c5119d9f52770/old-configure.in#602

I did a quick test where I changed the build flag from "-Oz" to "-O3" and fixed and hacked the resulting link errors.

This looks to be very beneficial performance-wise:

12% improvement in raptor-speedometer-geckoview on all three pgo builds:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6beb413001270be7f351d4cc0579cb21e882161f&newProject=try&newRevision=874e37247c7823ede2e693945d1492635493cd67&framework=10

Numerous double-digit improvements in raptor-tp6 cold and warm loads
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=86d75e90c1e046b69a800362c8444033904f38f0&newProject=try&newRevision=d457d2e7df09a05b7b7c42a8973877b509d61d51&framework=10

As expected, the resulting binary is now larger.
geckoview_example, aarch64, pgo goes from 50.7MB to to 62.6MB
geckoview_example, aarch32, pgo goes from 44.0MB to 54.9MB

Andrew Creskey [:acreskey]

Assignee

Comment 1

•

5 years ago

James, is this a tradeoff we've looked at before?
To me, this looks well worth the additional binary size.

Flags: needinfo?(snorp)

Nathan Froyd [:froydnj]

Comment 2

•

5 years ago

What are the results with -O2? -O3 usually just brings bloat along for marginal performance benefit.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 3

•

5 years ago

Yeah I guess I would like to see what -O2 does. That might be a good compromise between size and speed. At one point, though, I think -Os was faster than -O2 because it gave better cache performance. Maybe we could look at -Os again too?

Flags: needinfo?(snorp)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 4

•

5 years ago

Looks like Chrome may use -O3 unless you specifically request to optimize for size.

https://chromium.googlesource.com/chromium/src/build/config/+/master/compiler/BUILD.gn

Andrew Creskey [:acreskey]

Assignee

Comment 5

•

5 years ago

Good points, thanks.
I've kicked off -O2 builds.
I did try -Os and it showed gains almost as big on speedometer for the opt-builds. I haven't figured out why yet, but the PGO builds of -Os are not there.

Andrew Creskey [:acreskey]

Assignee

Comment 6

•

5 years ago

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) (he/him) from comment #4)

Looks like Chrome may use -O3 unless you specifically request to optimize for size.

https://chromium.googlesource.com/chromium/src/build/config/+/master/compiler/BUILD.gn

That's a good data point.

Andrew Creskey [:acreskey]

Assignee

Comment 7

•

5 years ago

The speedometer results for O2 look very similar to O3 results and the binaries are indeed a bit smaller:

Baseline, left (-Oz) vs -O2 on speedometer
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6beb413001270be7f351d4cc0579cb21e882161f&newProject=try&newRevision=fd97efb7e1590f1976f51a0edd057f72383d7170&framework=10

-O3 vs -O2 on speedometer
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=874e37247c7823ede2e693945d1492635493cd67&newProject=try&newRevision=fd97efb7e1590f1976f51a0edd057f72383d7170&framework=10

And the geckoview_example sizes:

Optimization  AArch  Size, MB
-OZ           32     44.0
-O3           32     54.9
-O2           32     53.7

-OZ           64     50.7
-O3           64     62.6
-O2           64     60.9

[ex-Mozilla] Agi Sferro | :agi

Comment 8

•

5 years ago

(In reply to Andrew Creskey from comment #6)

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) (he/him) from comment #4)

Looks like Chrome may use -O3 unless you specifically request to optimize for size.

https://chromium.googlesource.com/chromium/src/build/config/+/master/compiler/BUILD.gn

That's a good data point.

FYI it looks like on Android chromium does optimize for size. By default chromium uses default_optimization and optimize_max and I don't see anything that turns on optimize_speed on Android. Also optimize_for_size is true on Android (and weirdly on MacOS too): so the full flags should be -Oz -O2 for Android.

Andrew Creskey [:acreskey]

Assignee

Comment 9

•

5 years ago

So they are using both flags on Android, -Oz and -O2? It seems to me that some options would get overwritten that way.
I did build it and the binaries are the same size as the O2 build. Performance looks good.

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6beb413001270be7f351d4cc0579cb21e882161f&newProject=try&newRevision=d779529c92d9cff9327eb3c0b7cb0b306299a9b8&framework=10

Andrew Creskey [:acreskey]

Assignee

Comment 10

•

5 years ago

I made a 2nd attempt at building -Os, but while the OPT build succeeds the PGO profiling fails
https://treeherder.mozilla.org/#/jobs?repo=try&revision=cb24f139285bc6d9081a54eb033bd278d31feb22&selectedJob=273279539
Error:
INFO - Failed to install /builds/worker/fetches/geckoview-androidTest.apk on None: ADBError install failed for /builds/worker/fetches/geckoview-androidTest.apk. Got: Performing Push Install
The geckoview-androidTest.apk artifact is built and when I build this locally it installs correctly.
Michael - would you have any ideas on this?

Flags: needinfo?(mshal)

Andrew Creskey [:acreskey]

Assignee

Comment 11

•

5 years ago

On raptor page load tests, this is
baseline, left, vs -O2
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=86d75e90c1e046b69a800362c8444033904f38f0&newProject=try&newRevision=5ee690875fd717c96239d301217804c223c489b0&framework=10
Looks great.

-O3 vs -O2 on pageload
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d457d2e7df09a05b7b7c42a8973877b509d61d51&newProject=try&newRevision=5ee690875fd717c96239d301217804c223c489b0&framework=10
-O2 looks roughly as good -- maybe better in some cases, maybe worse in a couple.
I'll add more jobs in selected cases.

Andrew Creskey [:acreskey]

Assignee

Comment 12

•

5 years ago

Sharing an idea of :Agi's from slack:

I think the size limit on android is 100mb so having GeckoView be 62mb would be a big ask for non-browser apps.
Maybe we can provide both? (I would expect e.g. Fenix to want more speed)

It does look like the APK limit is 100MB, increased from 50MB in 2015.
Not an area that I know a lot about, but it looks like if your APK is generated from an App Bundle then the limit is 150MB:
https://android-developers.googleblog.com/2019/03/google-mobile-developer-day-at-game.html

But either way, it's a big footprint increase for now-browser apps.
So while the additional build configuration adds a lot of overhead and maintenance, maybe it's the best choice.

Nathan Froyd [:froydnj]

Comment 13

•

5 years ago

There's also some discussion of this in bug 1507636.

Nathan Froyd [:froydnj]

Comment 14

•

5 years ago

dmajor pointed out this code in Chromium:

https://cs.chromium.org/chromium/src/build/config/compiler/BUILD.gn?rcl=97b30d58566267263a872131f9720f1a841f8681&l=641-655

which might help them control code growth a little better. I can't recall offhand whether our automation builds use lld for Android (I don't think they do), but maybe we could translate those bits into something that would work better?

(Away)

Comment 15

•

5 years ago

Thanks for the cc, I was unaware of this bug.

FWIW I've been investigating using the above flag for all platforms, not just Android. On Windows, we can remove over 9MB from xul.dll with no change in Speedometer. (More+broader testing still needed.)

(Away)

Comment 16

•

5 years ago

I was unaware of this bug.

(Perhaps this bug should be under build system in case it might notify others who are interested in this kind of thing?)

Andrew Creskey [:acreskey]

Assignee

Comment 17

•

5 years ago

(In reply to :dmajor from comment #16)

I was unaware of this bug.

(Perhaps this bug should be under build system in case it might notify others who are interested in this kind of thing?)

Makes sense - I moved the bug but feel free to adjust it.

(In reply to :dmajor from comment #15)

Thanks for the cc, I was unaware of this bug.

FWIW I've been investigating using the above flag for all platforms, not just Android. On Windows, we can remove over 9MB from xul.dll with no change in Speedometer. (More+broader testing still needed.)

That's quite interesting.
If you can help me with a patch for Android I would be very happy to see how it performs and the resulting binary size.

Type: task → enhancement

Component: Performance → Android Studio and Gradle Integration

Product: Core → Firefox Build System

Michael Shal [:mshal]

Comment 18

•

5 years ago

(In reply to Andrew Creskey from comment #10)

I made a 2nd attempt at building -Os, but while the OPT build succeeds the PGO profiling fails
https://treeherder.mozilla.org/#/jobs?repo=try&revision=cb24f139285bc6d9081a54eb033bd278d31feb22&selectedJob=273279539
Error:
INFO - Failed to install /builds/worker/fetches/geckoview-androidTest.apk on None: ADBError install failed for /builds/worker/fetches/geckoview-androidTest.apk. Got: Performing Push Install
The geckoview-androidTest.apk artifact is built and when I build this locally it installs correctly.
Michael - would you have any ideas on this?

I haven't seen an error like that before. Does it happen again if you do a fresh push (so a new instr build in addition to a new run task)?

I diffed the geckoview-androidTest.apk from that push with your -O2 push, and the only differences are in the compiled libraries and incidental files (sha manifests and files containing the hg revision). So it doesn't look like the package was built incorrectly.

The line "Failed to install /builds/worker/fetches/geckoview-androidTest.apk on None" looked suspicious at first, but on further investigation the "None" just comes from the fact that we don't set self.device_name in android_emulator_pgo.py. The device_name is only used in error messages, which explains why things work fine without it.

If it happens again on a re-push, maybe check with gbrown to see if he has any ideas? I'm not sure what else to check here.

Flags: needinfo?(mshal)

(Away)

Comment 19

•

5 years ago

(In reply to Andrew Creskey from comment #17)

If you can help me with a patch for Android I would be very happy to see how it performs and the resulting binary size.

I believe this patch ought to do it: https://hg.mozilla.org/try/rev/c37802d5c0ac94a41de9fb3116ce1aa403c27d5d

However, although that patch got impressive wins on Windows and Linux, it only saved a few hundred KB on Android, and only a few hundred KB more when I further lowered the limit to 5. I'm puzzled by why Android behaves so differently.

Andrew Creskey [:acreskey]

Assignee

Updated

•

5 years ago

Depends on: 1592797

Andrew Creskey [:acreskey]

Assignee

Comment 20

•

5 years ago

(In reply to Michael Shal [:mshal] from comment #18)

I haven't seen an error like that before. Does it happen again if you do a fresh push (so a new instr build in addition to a new run task)?

I diffed the geckoview-androidTest.apk from that push with your -O2 push, and the only differences are in the compiled libraries and incidental files (sha manifests and files containing the hg revision). So it doesn't look like the package was built incorrectly.

The line "Failed to install /builds/worker/fetches/geckoview-androidTest.apk on None" looked suspicious at first, but on further investigation the "None" just comes from the fact that we don't set self.device_name in android_emulator_pgo.py. The device_name is only used in error messages, which explains why things work fine without it.

If it happens again on a re-push, maybe check with gbrown to see if he has any ideas? I'm not sure what else to check here.

Thank you for looking into that Michael - I'm still seeing a mysterious failure on a fresh push so I'll follow up and see what I can find.
-Os is an interesting option.

Andrew Creskey [:acreskey]

Assignee

Comment 21

•

5 years ago

(In reply to :dmajor from comment #19)

(In reply to Andrew Creskey from comment #17)

If you can help me with a patch for Android I would be very happy to see how it performs and the resulting binary size.

I believe this patch ought to do it: https://hg.mozilla.org/try/rev/c37802d5c0ac94a41de9fb3116ce1aa403c27d5d

However, although that patch got impressive wins on Windows and Linux, it only saved a few hundred KB on Android, and only a few hundred KB more when I further lowered the limit to 5. I'm puzzled by why Android behaves so differently.

dmajor, when I add your -import-instr-limit=10 option to the -O2 build I'm seeing very significant size savings.
Perhaps without the higher level optimizations there weren't that many long functions being imported?
libxul.so for arm32 goes from 84.9MB to to 79.1MB
libxul.so for aarch64 goes from 123.1MB to to 112.9MB
Your patch with -O2
-O2

So updated APK sizes are:

Optimization      AArch  Size, MB
-Oz                 32     44.0
-O2,instr-limit=10  32     50.9
-O2                 32     53.7
-O3                 32     54.9

-Oz                 64     50.7
-O2,instr-limit=10  64     57.0
-O2                 64     60.9
-O3                 64     62.6

-O2,instr-limit=10 looks to have roughly have the size-penalty of -O3, so I'll retitle this bug.

Performance of O2, instr-limit=10against mozilla-central still looks great.

Andrew Creskey [:acreskey]

Assignee

Updated

•

5 years ago

Summary: Consider optimizing gecko android builds for speed (-O3) rather than size (-Oz) → Consider optimizing gecko android builds for speed (-O2) rather than size (-Oz)

(Away)

Comment 22

•

5 years ago

(In reply to Andrew Creskey from comment #21)

Perhaps without the higher level optimizations there weren't that many long functions being imported?

Yes, I had just written up a comment speculating that, and we mid-aired. Glad to hear it helped!

Andrew Creskey [:acreskey]

Assignee

Comment 23

•

5 years ago

I'm trying it with -import-instr-limit=5 now :)

Andrew Creskey [:acreskey]

Assignee

Comment 24

•

5 years ago

(In reply to Nathan Froyd [:froydnj] from comment #13)

There's also some discussion of this in bug 1507636.

Not that this will be my decision, but the conclusions from bug 1507636 make sense to me -- Fennec is scoring ~10.5 on Speedometer while Chrome is at ~18.
So why increase the binary size just to score a bit higher, 11.2?

But now that we can measure Android page load performance, I think we are actually very close to Chrome.
From these results, Fenix with strict tracking protection is comparable to Chrome on load event timing, and ~10% slower on most visual metrics.

The raptor pageload tests show -O2 being a big win.
So increasing the binary size may make us faster than competing browsers.

I'm running visual metrics tests now, so we'll get a better idea of the impact on SpeedIndex, etc.

Andrew Creskey [:acreskey]

Assignee

Updated

•

5 years ago

Depends on: 1593104

Andrew Creskey [:acreskey]

Assignee

Comment 25

•

5 years ago

This is how a very tight -import-instr-limit impacts binary size:

Optimization      AArch  Size, MB (geckoview_example.apk)
-Oz                 32     44.0
-O2,instr-limit=1   32     49.2      
-O2,instr-limit=3   32     49.5
-O2,instr-limit=5   32     50.2
-O2,instr-limit=10  32     50.9
-O2                 32     53.7
-O3                 32     54.9

-Oz                 64     50.7
-O2,instr-limit=1   64     55.5                
-O2,instr-limit=3   64     55.7
-O2,instr-limit=5   64     56.2
-O2,instr-limit=10  64     57.0
-O2                 64     60.9
-O3                 64     62.6

So far the speedometer results for all of these combinations are within noise of the plain -O2 build.
I'll run more pageload tests on the weekend when the device farm is in less demand.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 26

•

5 years ago

Pardon my ignorance, but wouldn't something like -import-instr-limit=1 effectively disable inlining? Is that what "import" means here? Surely that has to affect performance, right?

(Away)

Comment 27

•

5 years ago

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) (he/him) from comment #26)

Pardon my ignorance, but wouldn't something like -import-instr-limit=1 effectively disable inlining? Is that what "import" means here? Surely that has to affect performance, right?

I believe this is cross-translation-unit inlining, so old-school inlining would still happen. Additionally, PGO puts a 10-100x multiplier on the limit for hot functions.

I agree that very small numbers somehow feel wrong though, I'm not sure we should try to go chasing every single possible byte. 5 seems like a pretty strict limit already.

Andrew Creskey [:acreskey]

Assignee

Comment 28

•

5 years ago

I was mostly curious about the degree to which the -import-instr-limit could reduce the binary size.
-import-instr-limit=1 is as low as it gets (-import-instr-limit=0 binaries are the same size, I guess not a lot of single-instructions being imported...).
Looking mostly at speedometer because reproducibility, I think that performance starts to degrade slightly at -import-instr-limit=3, particularly on Pixel 2 pgo.

-O2 (left) compare against -O2, limit=5

-O2 (left) compare against -O2, limit=3

-O2 (left) compare against -O2, limit=1

Andrew Creskey [:acreskey]

Assignee

Comment 29

•

5 years ago

These are the first visual metric results (Moto G5), cold loads.
They compare the baseline configuration (-Oz) to the (-O2) build.
SpeedIndex and ContentfulSpeedIndex both look to be improved between 8-10%. pageLoadTime is onload event timing, and is also improved, perhaps a bit more.
https://docs.google.com/spreadsheets/d/1g9idJimqLgvwK5QK2HOtOVQro2bshZDD5cYPwxAOKwU/edit#gid=1852601574
These tests were run locally with Browsertime and WebPageReplay recordings.

(Away)

Comment 30

•

5 years ago

Looking mostly at speedometer because reproducibility, I think that performance starts to degrade slightly at -import-instr-limit=3, particularly on Pixel 2 pgo.

I've had too many bad experiences with surprise perf regressions from suites that I didn't test, or didn't test to high enough confidence. Even if you have try runs, I highly recommend initially committing a value that is greater than what you want, let it settle for several days, make sure the sheriffs have gone through all their alerts, and only then reduce it further.

Andrew Creskey [:acreskey]

Assignee

Comment 31

•

5 years ago

(In reply to :dmajor from comment #30)

Looking mostly at speedometer because reproducibility, I think that performance starts to degrade slightly at -import-instr-limit=3, particularly on Pixel 2 pgo.

I've had too many bad experiences with surprise perf regressions from suites that I didn't test, or didn't test to high enough confidence. Even if you have try runs, I highly recommend initially committing a value that is greater than what you want, let it settle for several days, make sure the sheriffs have gone through all their alerts, and only then reduce it further.

That makes sense.
In this bug I would like to simply collect the performance characteristics of each optimization option so that folks can compare them.

The one I'm missing is -Os: I logged Bug 1593785 as I'm attempting to track down the problems with its PGO runs.

Size-wise it's quite promising, closer to -Oz, at least in the opt build. The -Os opt performance wasn't as quite good as -O2's (~8-9% speedometer improvement vs ~10-11%), but it still interesting.

Optimization      AArch  Size, MB (geckoview_example.apk)
-Oz, opt            32     43.6
-Os, opt            32     46.7

-Oz, opt            64     50.2
-Os, opt            64     52.5

Andrew Creskey [:acreskey]

Assignee

Updated

•

5 years ago

Depends on: 1593785

(Away)

Comment 32

•

5 years ago

As an anecdote I'll note that in bug 1592981 I got some regressions even at limit=10 on linux/win with pgo and limit=40 on mac without pgo. All it took was for one important function (nsStringBuffer::Release) to be now considered too big for inlining, and some of the more tight-C++-loops benchmarks noticed the change, even though Speedometer alone didn't turn up anything.

Michael Comella (:mcomella) [NI reported issues only; no longer employed by Mozilla]

Comment 33

•

5 years ago

Just a thought: if we do modify the optimizations, we may wish to document this or provide a configuration flag so that external GeckoView consumers also have a choice in how large their APKs will be (I'd guess they'd prefer to optimize for APK size over speed).

NI Snorp for awareness: no action needed.

Flags: needinfo?(snorp)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Updated

•

5 years ago

Flags: needinfo?(snorp)

Andrew Creskey [:acreskey]

Assignee

Comment 34

•

5 years ago

Since Bug 1592981 has landed, builds of -O2 are now smaller relative to the current -Oz.
I've updated the sizes based on a recent push to try (PGO builds).

Optimization        AArch  Size, MB (geckoview_example.apk)
-Oz                   32     43.8
-O2 (instr-limit=10)  32     50.4 (+6.6)

-Oz                   64     50.0
-O2 (instr-limit=10)  64     56.3 (+6.3)

Performance looks to be in the same ballpark, although I have yet to do a visual metric comparison and there are pending jobs:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=c9d0e32140705667e1384d73362216549b65c763&newProject=try&newRevision=dbcf100a4314589debe141c52ec0767abb1fb458&framework=10

I also noticed that 'official' nightlies of geckoview_example are a lot larger than my try pushes.
i.e. these:
https://firefox-ci-tc.services.mozilla.com/tasks/index/gecko.v2.mozilla-central.nightly.latest.mobile/android-api-16-opt
https://firefox-ci-tc.services.mozilla.com/tasks/index/gecko.v2.mozilla-central.nightly.latest.mobile/android-aarch64-opt

Optimization        AArch  Size, MB (geckoview_example.apk)
-Oz official          32     48.3
-Oz official          64     54.5

Digging into the apks, it's looks like this is due to the localization resources in the omni.ja (assets/omni/chrome, etc.)

Andrew Creskey [:acreskey]

Assignee

Comment 35

•

5 years ago

We've cleared the hurdles in building -Os PGO (Bug 1593785).

This is an updated view of the binary sizes and speedometer improvements (geckoview_example.apk):
I included a slightly tightened variant on -Os where the import-instr-limit is set to 5 instead of the default 10.

Optimization      AArch  Size,MB  Delta   Speedometer Improvement
-Oz                 32     44.1     -        --
-Os instr-limit=5   32     47.9   +3.8     10.9% 
-Os 		    32     48.5   +4.4     11.0%
-O2                 32     50.7   +6.6     11.7%

-Oz                 64     50.3     -        --
-Os instr-limit=5   64     54.1   +3.8     10.1%
-Os 		    64     54.7   +4.4     10.8%
-O2                 64     56.6   +6.3     12.9%

Raptor pageload comparisons:

Oz compared to Os (instr-limit=5) here
Oz compared to Os here
Oz compared to O2 here

Similar to the speedometer results, -O2 looks to be a bit faster than the -Os variants, both of which are significantly faster than -Oz.

Visual metrics tests are running now.

Andrew Creskey [:acreskey]

Assignee

Comment 36

•

5 years ago

Visual metric results from cold loads on the Moto G5 are here:
https://docs.google.com/spreadsheets/d/1g9idJimqLgvwK5QK2HOtOVQro2bshZDD5cYPwxAOKwU/edit#gid=592319917

Overall these builds look like a 5-7% improvement (even though the same -O2 build looked like a 8-10% improvement in the last run).

There is a lot of noise in these tests and locally it's not possible for me to get the high repeat counts that I can get on try.

Andrew Creskey [:acreskey]

Assignee

Comment 37

•

5 years ago

Attached image android_build_options_g5.png (deleted) — Details

I think this plot (speedIndex) gives a good view of the performance on different sites.

Andrew Creskey [:acreskey]

Assignee

Comment 38

•

5 years ago

Attached image android_build_options_p3.png (deleted) — Details

The Pixel 3 cold page load visual metrics results are can be found here:
https://docs.google.com/spreadsheets/d/1g9idJimqLgvwK5QK2HOtOVQro2bshZDD5cYPwxAOKwU/edit#gid=584528033

And I attached a plot of the speedIndex.
The relative noise is quite a bit higher here.

As before, the change looks to be less impactful on Pixel 3 compared to G5.

There are numerous 5-10% improvements on SpeedIndex and also many sites that are not affected within the noise.

Sites like jianshu.com with a ~30% rel std deviation end up significantly lowering or raising the geomean based on how the 25 loads played out.

Andrew Creskey [:acreskey]

Assignee

Comment 39

•

5 years ago

I've noted the PGO build times for these options. O2 may take a bit longer to build but save time in the Instrument/Run stages.
(Caveat: I'm not sure of the variance in these)


Option	aarch  Build  Instrument Run
Oz      32     37     36          28
Os      32     38     46          28
O2      32     40     31          23

Oz      64     37    -            -
Os      64     38    -            -
O2      64     37    -            -

As an experiment,
I compared Oz vs O1 (O1 is ~10-12% faster on speedometer than Oz, and a few points slower than O2)
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=480dd25f9948aed73ecd9683e536b9f47f543cf1&newProject=try&newRevision=5d24e13545108cc8bad1191dba10c20b27552665&framework=10

Andrew Creskey [:acreskey]

Assignee

Comment 40

•

5 years ago

Attached file Bug 1591725 - Optimize at -O2 on android, clang r=froydnj,dmajor (deleted) — Details

For performance improvements to page load and speedometer, optimize at -O2 instead of -Oz.

The previous disabling of the outliner, "-mno-outline", was removed as it is not enabled by default with -O2.
(See Bug 1508547 and https://developer.arm.com/docs/101754/latest/armclang-reference/armclang-command-line-options/-moutline-mno-outline)

Andrew Creskey [:acreskey]

Assignee

Comment 41

•

5 years ago

This is going to land in m-c so that stability can be assessed in a staged rollout to Fenix nightly, org.mozilla.fenix.nightly.
If successful, further experiments will be run to collect data on user engagement, retention, activation rates, and reported performance.

At that point, stake holders will determine if the performance improvements are worth the increased binary size from the -O2 build (~6.5MB increase).

Daniel Varga [:dvarga]

Comment 42

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/1997e30d6a5e

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

status-firefox74: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla74

Andrew Creskey [:acreskey]

Assignee

Comment 43

•

5 years ago

FYI, this change will trigger a series of performance improvement sheriffing alerts on android:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=e599f431034ecca4d749fda872a2de60c3c1d721&newProject=try&newRevision=d9491acbc0d5195d09e80c36c988e740a9a3cb93&framework=10

Andrew Creskey [:acreskey]

Assignee

Updated

•

5 years ago

Flags: needinfo?(marian.raiciof)

Flags: needinfo?(igoldan)

Flags: needinfo?(aionescu)

Marian Raiciof [:marauder]

Comment 44

•

5 years ago

Thank you, Andrew! Great news!

Flags: needinfo?(marian.raiciof)

Ionuț Goldan [:igoldan]

Comment 45

•

5 years ago

Thanks for the heads up!

Flags: needinfo?(igoldan)

Alexandru Ionescu (needinfo me) [:alexandrui]

Updated

•

5 years ago

Flags: needinfo?(aionescu)

Alexandru Ionescu (needinfo me) [:alexandrui]

Comment 46

•

5 years ago

== Change summary for alert #24721 (as of Tue, 21 Jan 2020 08:40:55 GMT) ==

Improvements:

16% build times android-4-0-armv7-api16 pgo instrumented taskcluster-c5.4xlarge 1,945.87 -> 1,636.80
16% build times android-4-0-armv7-api16 pgo instrumented taskcluster-m5.4xlarge 2,009.47 -> 1,693.59
15% build times android-4-0-armv7-api16 pgo instrumented taskcluster-c5d.4xlarge 1,980.25 -> 1,686.91

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=24721

Florin Strugariu [:Bebe]

Comment 47

•

5 years ago

== Change summary for alert #24774 (as of Thu, 23 Jan 2020 06:43:35 GMT) ==

Improvements:

10% raptor-tp6m-allrecipes-geckoview-cold loadtime android-hw-g5-7-0-arm7-api-16 pgo 7,101.17 -> 6,398.00
9% raptor-tp6m-allrecipes-geckoview-cold android-hw-g5-7-0-arm7-api-16 pgo 2,449.88 -> 2,222.98
8% raptor-tp6m-allrecipes-geckoview-cold android-hw-g5-7-0-arm7-api-16 pgo 2,439.74 -> 2,240.65
7% raptor-tp6m-booking-geckoview-cold fcp android-hw-g5-7-0-arm7-api-16 pgo 808.00 -> 754.00
6% raptor-speedometer-geckoview android-hw-g5-7-0-arm7-api-16 pgo 9.45 -> 10.02
6% raptor-speedometer-geckoview android-hw-p2-8-0-android-aarch64 pgo 24.44 -> 25.82
5% raptor-tp6m-wikipedia-geckoview-cold loadtime android-hw-g5-7-0-arm7-api-16 pgo 1,087.38 -> 1,029.50
5% raptor-tp6m-allrecipes-geckoview-cold fcp android-hw-g5-7-0-arm7-api-16 pgo 1,741.96 -> 1,655.12

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=24774

Andrew Creskey [:acreskey]

Assignee

Comment 48

•

5 years ago

The Fenix stability and pageload vs apk size tradeoff experiment which has just started:
https://github.com/mozilla-mobile/fenix/issues/7795

Ryan VanderMeulen [:RyanVM]

Comment 49

•

5 years ago

Landed on Beta to trigger the GV builds needed for the experiment.
https://hg.mozilla.org/releases/mozilla-beta/rev/c270bc80557a52d64796167aa11798fb0961cfaf

Then backed out after the builds were triggered.
https://hg.mozilla.org/releases/mozilla-beta/rev/adaa66ecae24973f1a75d2d955024b64ba192f7b

Ryan VanderMeulen [:RyanVM]

Comment 50

•

5 years ago

backout

Backed out from Beta74 to avoid confusion with the Beta migration builds and the experiments running around this change, per Slack discussion. It remains landed on mozilla-central for GV75+.

https://hg.mozilla.org/releases/mozilla-beta/rev/607417212e2592dc01b2dd55b6a88c9c15509450

status-firefox74: fixed → disabled

status-firefox75: --- → fixed

Florin Strugariu [:Bebe]

Comment 51

•

5 years ago

Beta backout:
== Change summary for alert #25097 (as of Mon, 24 Feb 2020 07:33:38 GMT) ==

Regressions:

20% build times android-4-0-armv7-api16 pgo instrumented taskcluster-c5.4xlarge 1,524.47 -> 1,831.05

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=25097

Julien Cristau [:jcristau]

Comment 52

•

5 years ago

backout

Backed out from Beta75, same story as comment 50.

https://hg.mozilla.org/releases/mozilla-beta/rev/f87e8a58c39a8964bbb0bf7aa1e87b96c408ca63

status-firefox75: fixed → disabled

status-firefox76: --- → fixed

Julien Cristau [:jcristau]

Updated

•

5 years ago

Blocks: 1626344

Andrew Creskey [:acreskey]

Assignee

Updated

•

4 years ago

Depends on: 1670948

Andrew Creskey [:acreskey]

Assignee

Comment 53

•

2 years ago

With the focus on Speedometer 3, I've re-run the "-O2" performance comparison.
This is how the binary size changes, measured in bytes, looking at the geckoview example from a try push:

                  Arm v7          AArch 64
-Oz (current)    79,681,645      86,908,174
 -Os              81,498,693      89,618,699 
-O2              82,953,919      90,891,453

Independent of these optimizations, from comment 35 it looks like geckoview example has grown by about 40 megs over the last three years.

In terms of performance, we are no longer seeing the large 10-12% improvements in speedometer/speedometer 3.
With "-O2" it looks like only 2-3% (although some subtests may show greater improvements)..

If I add in pageload tests, we see some other small 2-3% gains with "-O2":
-O2

From a quick look, "-Os" doesn't seem very promising anymore:
-Os

Andrew Creskey [:acreskey]

Assignee

Comment 54

•

2 years ago

One idea is to first trim what we would want to add.
So if -O2 adds between 3 and 4 megabytes the final binary in Fenix, we would first invest in finding those savings elsewhere before landing the change.

Andrew Creskey [:acreskey]

Assignee

Comment 55

•

2 years ago

An experiment is being done in Bug 1831935

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1831935

android_build_options_g5.png 5 years ago Andrew Creskey [:acreskey] (deleted), image/png		Details
android_build_options_p3.png 5 years ago Andrew Creskey [:acreskey] (deleted), image/png		Details
Bug 1591725 - Optimize at -O2 on android, clang r=froydnj,dmajor 5 years ago Andrew Creskey [:acreskey] (deleted), text/x-phabricator-request		Details