707515 - (ObjShrink) sunspider-string-fasta regression

Assignee

Description

•

13 years ago

It looks like a recent objshrink change made string-fasta really slow. SS on a recent trunk build vs. JM tip: Trunk: ============================================ RESULTS (means and 95% confidence intervals) -------------------------------------------- Total: 248.7ms +/- 2.9% -------------------------------------------- 3d: 41.4ms +/- 1.5% cube: 16.0ms +/- 2.1% morph: 10.6ms +/- 3.5% raytrace: 14.8ms +/- 2.0% access: 23.2ms +/- 4.1% binary-trees: 3.7ms +/- 13.0% fannkuch: 8.4ms +/- 4.4% nbody: 5.4ms +/- 6.8% nsieve: 5.7ms +/- 8.5% bitops: 13.5ms +/- 5.1% 3bit-bits-in-byte: 0.9ms +/- 25.1% bits-in-byte: 5.1ms +/- 4.4% bitwise-and: 3.7ms +/- 9.3% nsieve-bits: 3.8ms +/- 7.9% controlflow: 2.7ms +/- 12.8% recursive: 2.7ms +/- 12.8% crypto: 21.5ms +/- 3.2% aes: 11.4ms +/- 3.2% md5: 6.2ms +/- 4.9% sha1: 3.9ms +/- 13.5% date: 34.9ms +/- 3.9% format-tofte: 19.3ms +/- 4.3% format-xparb: 15.6ms +/- 4.9% math: 20.3ms +/- 4.4% cordic: 3.4ms +/- 10.9% partial-sums: 13.3ms +/- 3.6% spectral-norm: 3.6ms +/- 10.3% regexp: 15.5ms +/- 4.5% dna: 15.5ms +/- 4.5% string: 75.7ms +/- 4.5% base64: 6.1ms +/- 3.7% fasta: 8.1ms +/- 7.7% tagcloud: 23.9ms +/- 3.0% unpack-code: 27.3ms +/- 7.7% validate-input: 10.3ms +/- 3.4% JM: ============================================ RESULTS (means and 95% confidence intervals) -------------------------------------------- Total: 304.3ms +/- 0.9% -------------------------------------------- 3d: 41.4ms +/- 1.2% cube: 16.2ms +/- 1.9% morph: 10.3ms +/- 3.4% raytrace: 14.9ms +/- 1.5% access: 21.0ms +/- 2.3% binary-trees: 3.1ms +/- 7.3% fannkuch: 8.0ms +/- 0.0% nbody: 5.2ms +/- 5.8% nsieve: 4.7ms +/- 7.3% bitops: 13.1ms +/- 6.0% 3bit-bits-in-byte: 0.9ms +/- 25.1% bits-in-byte: 5.0ms +/- 0.0% bitwise-and: 3.5ms +/- 10.8% nsieve-bits: 3.7ms +/- 9.3% controlflow: 2.8ms +/- 10.8% recursive: 2.8ms +/- 10.8% crypto: 21.4ms +/- 2.8% aes: 11.7ms +/- 3.0% md5: 5.8ms +/- 5.2% sha1: 3.9ms +/- 5.8% date: 35.0ms +/- 1.9% format-tofte: 19.6ms +/- 1.9% format-xparb: 15.4ms +/- 2.4% math: 19.8ms +/- 3.7% cordic: 3.6ms +/- 10.3% partial-sums: 12.8ms +/- 3.5% spectral-norm: 3.4ms +/- 10.9% regexp: 15.0ms +/- 3.9% dna: 15.0ms +/- 3.9% string: 134.8ms +/- 1.1% base64: 6.0ms +/- 0.0% fasta: 66.2ms +/- 1.4% tagcloud: 25.9ms +/- 0.9% unpack-code: 26.4ms +/- 1.4% validate-input: 10.3ms +/- 3.4% The 55.6ms total difference is all in the 58.1ms difference on string-fasta. This is recent, I didn't see this when comparing SS scores between the branches a week or two ago.

blechri

Comment 1

•

13 years ago

I'm seeing the same regression happening between the 2011-12-01 and the 12-04 Nightly build.

Brian Hackett [Laid off!]

Assignee

Comment 2

•

13 years ago

Attached patch patch (deleted) — Details — Splinter Review

Because of the teleporting optimization, the entire prototype chain was getting marked with an uncacheable proto, which will end up inhibiting pretty much all iterator caching. Using generateOwnShape() all the time ends up creating too many dictionary objects, so some balance needs to be struck. Using generateOwnShape on objects with singleton types takes care of the regression and seems to avoid a lot of dictionary creation during normal browsing, but this is still pretty unsatisfying and is really just a temporary fix until either bug 707717 gets fixed or the prototype is moved to BaseShape.

Assignee: general → bhackett1024

Attachment #579716 - Flags: review?(luke)

Luke Wagner [:luke]

Comment 3

•

13 years ago

How about just removing the teleporting optimization already? This is like the 5th gross-but-it-solves-this-perf-regression hack.

Brian Hackett [Laid off!]

Assignee

Comment 4

•

13 years ago

(In reply to Luke Wagner [:luke] from comment #3) > How about just removing the teleporting optimization already? This is like > the 5th gross-but-it-solves-this-perf-regression hack. That is definitely a third alternative to the two I listed above. I really want to remove the teleporting optimization but need time to collect data and determine whether doing so is a good idea. In the meantime, this is a fire which needs to be put out.

David Mandelin [:dmandelin]

Comment 5

•

13 years ago

(In reply to Brian Hackett (:bhackett) from comment #4) > (In reply to Luke Wagner [:luke] from comment #3) > > How about just removing the teleporting optimization already? This is like > > the 5th gross-but-it-solves-this-perf-regression hack. > > That is definitely a third alternative to the two I listed above. I really > want to remove the teleporting optimization but need time to collect data > and determine whether doing so is a good idea. In the meantime, this is a > fire which needs to be put out. Yes, we need to be careful. Access time is constant with proto chain length in all browsers. So if that's true even without teleporting, then we of course could just remove it, but that's not likely.

Brendan Eich [:brendan]

Comment 6

•

13 years ago

(In reply to Luke Wagner [:luke] from comment #3) > How about just removing the teleporting optimization already? This is like > the 5th gross-but-it-solves-this-perf-regression hack. Maybe this is the last of five, and someone shoulda read jorendorff's doc [1] first :-P. Really, if an optimization matters, then the code has to preserve its invariants. If that is hard, then welcome to JS! It's not enough to bemoan complexity if there's no simpler way to achieve the same performance. Is there? For the long chains Vitek et al. [2] found by instrumenting JS execution of popular web apps, getting rid of teleportation could regress performance. Or perhaps the long chains are rare outliers, and the quartiles shown in Figure 6 are short enough that checking every shape is tolerable. Hard to know without measuring. Generally, lookups dominate hard cases such as __proto__ assignment, and hard cases make bad law. So in principle, we should not sacrifice performance in the lookup path for simplicity in the abnormal cases. But it's hard to know what is "abnormal". I believe Dart is touted because its Smalltalk-style classes are not subject to shadowing and foreshadowing that can invalidate PICs in the absence of teleportation or checking every shape along the chain. So I bet one can show JS performance in V8 suffering with long chains where Dart does not suffer. It would be interesting to find that microbenchmark and run it. /be [1] https://developer.mozilla.org/En/SpiderMonkey/Internals/Property_cache [2] http://sss.cs.purdue.edu/projects/dynjs/pldi275-richards.pdf

Brian Hackett [Laid off!]

Assignee

Comment 7

•

13 years ago

The hope is that TI has reduced the pressure on teleporting to handle accesses with long prototype chains. With TI we can statically resolve such accesses to a single object or even a specific value, with no shape checks. We just don't know how often this actually holds; across all getprop accesses (on a breadth of websites, measured a couple months back) we statically resolve about 40% of accesses, and in principle this should be higher for accesses going through a prototype.

Luke Wagner [:luke]

Comment 8

•

13 years ago

(In reply to Brendan Eich [:brendan] from comment #6) > Maybe this is the last of five, and someone shoulda read jorendorff's doc > [1] first :-P. These haven't been correctness bugs, > Generally, lookups dominate hard cases such as __proto__ assignment, and > hard cases make bad law. So in principle, we should not sacrifice > performance in the lookup path for simplicity in the abnormal cases. they have been performance bugs because the hard cases aren't as rare. > Really, if an optimization matters, then the code has to preserve its > invariants. Whether it matters is the issue under question. I didn't say "after measuring whether it actually buys us anything" in my comment b/c Brian and I have already talked about this issue and I know that he plans to do such a measurement. My comment was just "how about sooner rather than later" because there is a very real possibility that "later" may become "never" and we will just be left a sprinkling of magic code that only makes sense in the context of the browser+benchmark suite. I'll r+ since in 12 days ObjShrink moves to aurora and it would be nice to use that 12 days for baking, not new-patch-writing.

Luke Wagner [:luke]

Comment 9

•

13 years ago

Comment on attachment 579716 [details] [diff] [review] patch Please provide a comment explaining the tradeoff being made and the browser use case.

Attachment #579716 - Flags: review?(luke) → review+

Brian Hackett [Laid off!]

Assignee

Comment 10

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/592d5034c8a9

Boris Zbarsky [:bzbarsky]

Comment 11

•

13 years ago

There's a good chance that this improved Dromaeo (CSS) by 2-3%, based on tree-management mails.

Ed Morley [:emorley]

Comment 12

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/592d5034c8a9

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla11

Bugzilla

(ObjShrink) sunspider-string-fasta regression

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: bhackett1024, Assigned: bhackett1024)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Attachment

General

Description

File Name

Content Type