Closed
Bug 225433
Opened 21 years ago
Closed 17 years ago
investigate -Os for nightly/release builds
Categories
(Firefox Build System :: General, defect, P3)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: dwitte, Unassigned)
References
Details
(Keywords: memory-footprint)
So, bryner ran an experiment on the redwood firebird tbox (gcc 3.3.2) a few days
ago, by switching the optimization flag to use -Os instead of -O2. codesize
reduced by 1,545kb out of 14,387kb, a 10.7% reduction. the impact on perf
metrics seems to be neutral overall - Ts remained about the same, Txul improved
by ~1%, and Tp got larger by a barely measurable amount (maybe ~0.5%).
http://tinderbox.mozilla.org/showbuilds.cgi?tree=Phoenix&hours=24&maxdate=1068502523&legend=0
i've done a comparison between my local builds of seamonkey, between -O2 and
-Os. for seamonkey, we save 9.3% in binary codesize. i don't have a -O
comparison on-hand at the moment, but i'm rolling one to compare.
given that perf remains neutral, and we save about 10% in binary size, is this
something we want to do for gcc builds of seamonkey nightlies/releases?
Reporter | ||
Comment 1•21 years ago
|
||
afaict we use -O2 for gcc release builds. it seems the contributed gtk2/xft
linux builds use -O3... ;)
Reporter | ||
Comment 2•21 years ago
|
||
my -O build finished: binary size reduces by 2.7% relative to -O2. i'm assuming
performance metrics are in between those of -Os and -O2, and hence are also neutral.
so it appears -Os is the sweet spot here...
I'd prefer to see a bit more detailed performance analysis before we do this.
However if things look good then I'm all for it.
The reason why perfomance doesn't degrade could simply be that there is less
code. So we'd end up swapping code less frequently and we'd hit
instruction-caches more often.
Reporter | ||
Comment 4•21 years ago
|
||
what kind of performance analysis would you suggest? imo, for this kind of
change we'd be interested in fairly broad metrics like the ones we have,
Ts/Tp/Txul. so perhaps switching one of the tinderboxen to -Os (luna?) would be
a good start (ignoring for the moment that it runs a slightly older gcc, 3.2).
that said, i think the data we already have for firebird is perfectly applicable
to seamonkey.
i've run some Ts/Txul tests locally on a p3-550, linux/gtk2, gcc 3.3.2. the Ts
tests are not useful, because the standard deviation is far too high (~10%) for
any changes to be visible:
-Os -O2
Ts avg 3518.6 3505.15
Ts stdev 280.6 299.2
however, my Txul tests show a larger improvement than the firebird tests did
(most likely due to the different perf characteristics of the p3-550). these
results have a pretty low standard deviation (< 0.5%), and so are statistically
significant:
-Os -O2 improvement (Os relative to O2)
Txul avg 970.4 998.2 2.8%
Txul stdev 27.3 26.0
i'm unable to test Tp since i'm outside the firewall.
Reporter | ||
Comment 5•21 years ago
|
||
er, those standard deviations should read:
Txul stdev 4.5 3.8
I'd like to see at least Tp measured as well so making the switch on one of the
tinderboxen sounds like a good idea. Also if you have any dhtml-tests or
js-tests handy that would be good but no requirement from my part (I know they
exist but i don't know where, sorry).
Comment 7•21 years ago
|
||
There are some scattered in various bugs... (search for "dhtml perf").
Another thing that someone might want to investigate is tweaking gcc's
inliner. Dropping the inline limit in half (-finline-limit=300) on
gcc-3.3.2 reduced the code size by another 440K. More is probably
achievable by changing this value or the underlying parameters
(max-inline-*).
Comment 9•21 years ago
|
||
There are some large functions that we really do want to inline, since they're
only used once or twice. I'd rather tweak inlining by finding the things that
really shouldn't be inlined (probably in the string code) and making them not
inline.
David: Are you sure these functions are really being inlined? MSVC has a pretty
low limit for what it is willing to inline (for example some of the nsVoidArray
functions arn't always inlined) and gcc too has a limit for what it will inline.
So in general you shouldn't rely on having your functions inlined unless they
are really small.
Reporter | ||
Comment 11•21 years ago
|
||
the only way to positively force inlining is by using the gcc
__attribute__((always_inline)). having said that, i agree with dbaron's view,
especially as applied to strings... the inlining model there is whacky. i'm sure
we could do great things for both codesize and perf by fixing that.
Comment 12•21 years ago
|
||
Note that -Os seems to trigger a bunch of compiler bugs; depending on the target
CPU type you may see "simple" defects like bug 233497 (on x86/IA32, a simply
|if()|/|else|-construct will only use the |else|-branch etc.) or totally defunct
binaries (like on SPARC).
Comment 13•21 years ago
|
||
Of note is that while overall (compressed) tarball size does in fact drop by
about 10%, the size of some libraries drops by more than that. gklayout and
necko (both stripped) drop by about 20% here (-O2 compared to -Os, gcc 3.2).
xpcom, docshell, and a few others drop by 10%. uconv drops by 2%. So on some
libraries we're actually seing a huge win from -Os (20% of gklayout is about 900KB).
Frankly, I would be in favor of flipping the switch sometime in an alpha
milestone (like now, say) for tinderbox and the nightlies and seeing what
happens. Once we have nightlies with the change, we can put out a call to
people who do DHTML stuff (most of whom don't build) to compare the new and old
builds....
Comment 14•21 years ago
|
||
In other words, we have all these nighlies that are _supposed_ to be for testing
purposes and we have people testing them. We should make use of that.
Comment 15•21 years ago
|
||
Compare bug 53486
> if you have any dhtml-tests
<http://www.world-direct.com/mozilla/dhtml/funo/domtestcases/index.htm>
Comment 16•20 years ago
|
||
firefox is using -Os, any reason not to switch comet (seamonkey release) or luna
(seamonkey perf tests) over to doing -Os builds at this point, or do we want to
wait for post 1.8?
Assignee: leaf → cmp
Priority: -- → P3
Comment 17•20 years ago
|
||
*** Bug 53486 has been marked as a duplicate of this bug. ***
Comment 18•20 years ago
|
||
granrose: switching now sounds entirely reasonable to me.
Comment 19•20 years ago
|
||
I think we should get dbaron's approval to change the tinderboxen; we generally
prefer the historical comparison in the numbers by using the same build flags
(which is why btek still uses egcs), even if this doesn't produce the most
optimized builds.
Comment 20•20 years ago
|
||
FWIW, I'd expect -O2 builds to be faster than -Os, especially with newer gccs,
thanks to basic block reordering. (We've tagged a few hotspots with NS_LIKELY /
NS_UNLIKELY since comment 0 happened, so it could be worth re-measuring.)
I'd rather not change tinderboxes that are generating performance data. I think
we already have some with -O2 and some with -Os.
Reporter | ||
Comment 21•20 years ago
|
||
dbaron: the results in comment 4 (alas, Txul only, no Tp measurements) were done
with 3.3.2... did block reordering come in recently (3.4), or are my results
still representative?
Comment 22•20 years ago
|
||
IIRC, NS_LIKELY and NS_UNLIKELY are more recent than comment 4.
From memory:
* gcc 3.3.x does basic block reordering (-freorder-blocks) at -O2 but not -Os
* gcc 3.4 also does ,pt / ,pf annotations on conditional jump instructions
(which solves the branch prediction problem but not the cache miss problem
that's solved by -freorder-blocks), but I'm not sure at what optimization levels.
Updated•20 years ago
|
Product: Browser → Seamonkey
Comment 23•19 years ago
|
||
Mass reassign of open bugs for chase@mozilla.org to build@mozilla-org.bugs.
Assignee: chase → build
Comment 24•18 years ago
|
||
Mass re-assign of bugs that aren't on the build team radar, so bugs assigned to build@mozilla-org.bugs reflects reality.
If there is a bug you really think we need to be looking at, please *email* build@mozilla.org with a bug number and explanation.
Assignee: build → nobody
Updated•18 years ago
|
Assignee: nobody → stanshebs
Comment 25•18 years ago
|
||
Apparently Linux releases on the 1.8 branch have been built -Os for awhile; Chris Cooper added this in November as part of migrating tinderbox bits to the public repository, as seen in http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/tools/tinderbox-configs/firefox/linux/mozconfig&rev=MOZILLA_1_8_BRANCH_release .
Mac is being built -O2 on trunk and branches.
Comment 26•18 years ago
|
||
Perf?
Comment 27•17 years ago
|
||
Is still still something we're looking into or should it be closed in some way?
Comment 28•17 years ago
|
||
I still think this deserves investigation. At least, we should revisit some performance testing with newer gccs
Comment 29•17 years ago
|
||
At the very least we need to do a -Os/-O2 comparison on Macs.
Updated•17 years ago
|
Assignee: stanshebs → nobody
Product: Mozilla Application Suite → Core
QA Contact: build-config
Comment 30•17 years ago
|
||
What's the relation to bug 409803 and possibly other bugs (cc'ing sayrer)? I can guess, but it would be great to have our story for 1.9/fx3 sorted out soon, so nominating blocking.
/be
Flags: blocking1.9?
Comment 31•17 years ago
|
||
(In reply to comment #30)
> What's the relation to bug 409803 and possibly other bugs (cc'ing sayrer)? I
> can guess, but it would be great to have our story for 1.9/fx3 sorted out soon,
> so nominating blocking.
To recap:
We build -Os for release builds on linux.
We build -O2 for release builds on mac.
We build -O1 on msvc (it's somewhere between GCC's -Os and -O2, it does inline etc.)
I tried building mac at -Os, and saw a ~5% slowdown on Tdhtml and a 2-3% slowdown on Tp/Tp2. However, the code was quite a bit smaller.
To me, that indicates certain parts of the tree are faster at -O2 and others at -Os. For example, we know spidermonkey is better at -Os.
Reporter | ||
Comment 32•17 years ago
|
||
the 5% slowdown could be due (in full or part) to bug 409803 - any data we can get on mac gcc4.0 regarding that would be gold, and might make it easier to figure out module-specific settings. (speculation here, but the bug mostly affects code that makes heavy use of c++ wrappers, e.g. string libs, which might explain why spidermonkey isn't affected?)
Comment 33•17 years ago
|
||
+ing so we figure out one way or another
Flags: blocking1.9? → blocking1.9+
Updated•17 years ago
|
Status: NEW → RESOLVED
Closed: 17 years ago
Flags: tracking1.9+
Resolution: --- → WORKSFORME
Updated•7 years ago
|
Product: Core → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•