Closed
Bug 1054972
Opened 10 years ago
Closed 10 years ago
90% performance regression on The Stanford Javascript Crypto Library/SHA-512
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
VERIFIED
FIXED
mozilla35
People
(Reporter: alexander.kjeldaas, Assigned: sunfish)
References
Details
(Keywords: perf, regression)
Attachments
(4 files)
(deleted),
application/javascript
|
Details | |
(deleted),
patch
|
nbp
:
review+
Sylvestre
:
approval-mozilla-aurora+
Sylvestre
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
nbp
:
review+
Sylvestre
:
approval-mozilla-aurora+
Sylvestre
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
nbp
:
review+
Sylvestre
:
approval-mozilla-aurora+
Sylvestre
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 (Beta/Release)
Build ID: 20140715214327
Steps to reproduce:
Run the jsperf tests on a mobile device using the FF and FF Beta versions.
Actual results:
I see a performance degradation of around 70% for Crypto-JS and around 80-90% for SJCL.
Expected results:
The performance should not degrade significantly.
Reporter | ||
Comment 1•10 years ago
|
||
I forgot to point to the test: http://jsperf.com/sha-512/3
Reporter | ||
Updated•10 years ago
|
OS: Linux → Android
Hardware: x86_64 → ARM
Comment 2•10 years ago
|
||
I don't see any regression for Crypto-JS on desktop nor mobile, but I do see a 90% performance regression on SJCL on all platforms, so this isn't mobile specific.
Component: General → JavaScript Engine
OS: Android → All
Product: Firefox for Android → Core
Hardware: ARM → All
Summary: Crypto-JS and SJCL performance regressions on SHA-512 implementations → 90% performance regression on The Stanford Javascript Crypto Library/SHA-512
Version: Firefox 32 → 32 Branch
Updated•10 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Updated•10 years ago
|
Keywords: regression,
regressionwindow-wanted
Comment 3•10 years ago
|
||
My money is on bug 976446, so testing builds immediately before and after that landed will probably speed up regression hunting.
Updated•10 years ago
|
Flags: needinfo?(jdemooij)
Comment 4•10 years ago
|
||
Regression window(m-i)
Good:
https://hg.mozilla.org/integration/mozilla-inbound/rev/46d57b8f48ce
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0 ID:20140505141944
Bad:
https://hg.mozilla.org/integration/mozilla-inbound/rev/695049af7654
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:32.0) Gecko/20100101 Firefox/32.0 ID:20140505145245
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=46d57b8f48ce&tochange=695049af7654
Regressed by:
695049af7654 Dan Gohman — Bug 999580 - IonMonkey: Generalize RangeAnalysis truncation to handle other kinds of paths to integer types. r=nbp
Blocks: 999580
Updated•10 years ago
|
status-firefox31:
--- → unaffected
status-firefox32:
--- → affected
status-firefox33:
--- → affected
status-firefox34:
--- → affected
Updated•10 years ago
|
Keywords: regressionwindow-wanted
Reporter | ||
Comment 6•10 years ago
|
||
OP here, I'm testing Crypto-JS again on Android and I get:
Firefox Mobile (31?): 118 +- 4.30
Firefox Mobile Beta (32?): 63.36 +- 38.78%
For me there is a significant regression also on the Crypto-JS benchmark. Notice that the uncertainty is much larger on the 32 version, so the JIT might be a bit on and off.
Reporter | ||
Comment 7•10 years ago
|
||
Changing my mind. The Crypto-JS results have a long "warmup period" on android, especially when my phone is low on battery. I get anywhere from 30, to 60 in the beginning and then it tops out at 118 on version 31 and 108 on version 32, so there is no major perf regression. Maybe a small one though.
Comment 8•10 years ago
|
||
You probably want to test performance with a fully charged battery and plugged into the power outlet, otherwise whatever power savings stuff Android and your CPU have will totally throw the results off - which is what that large error margin is trying to tell you.
I tested this and there is no performance difference between 31 and 32 on Crypto-JS, desktop nor mobile.
Comment 9•10 years ago
|
||
NI from sunfish based on comment 4.
Flags: needinfo?(jdemooij) → needinfo?(sunfish)
Assignee | ||
Comment 10•10 years ago
|
||
Assignee | ||
Comment 11•10 years ago
|
||
I can reproduce the slowdown at the revision isolated in comment 4 with the standalone script in comment 10.
Assignee: nobody → sunfish
Flags: needinfo?(sunfish)
Assignee | ||
Comment 12•10 years ago
|
||
I see some effects which appear to be related to bailouts.
In the fast version, IONFLAGS=bailouts prints just this:
[Bailouts] Took bailout! Snapshot offset: 919
In the slow version, it prints this:
[Bailouts] Took bailout! Snapshot offset: 919
[Bailouts] Took bailout! Snapshot offset: 1956
[Bailouts] Baseline info failure all-sjcl.js:593, inlined into all-sjcl.js:593
[Bailouts] Took bailout! Snapshot offset: 2033
[Bailouts] Baseline info failure all-sjcl.js:593, inlined into all-sjcl.js:593
Annoyingly, running the slow version with --ion-parallel-compile=off produces the same output same the fast version.
Assignee | ||
Comment 13•10 years ago
|
||
With this diff to the attached testcase,
--- all-sjcl.js.orig 2014-08-22 14:12:32.739372163 -0700
+++ all-sjcl.js 2014-08-22 14:58:23.007290901 -0700
@@ -682,7 +682,7 @@
t1h += chh + ((t1l >>> 0) < (chl >>> 0) ? 1 : 0);
t1l += krl;
t1h += krh + ((t1l >>> 0) < (krl >>> 0) ? 1 : 0);
- t1l += wrl;
+ t1l = t1l + wrl|0;
t1h += wrh + ((t1l >>> 0) < (wrl >>> 0) ? 1 : 0);
// t2 = sigma0 + maj
it goes fast.
It appears that the problem here is that we're speculating more of the types to be int32, which is generally a good thing for code like this, but we're still using bailout checks, and they're needlessly triggering. In the above diff, after the add, t1l temporarily has a value which is beyond the int32 range, but it is truncated at every use (sometimes indirectly), so it shouldn't need a bailout check.
It looks like there are two things in the way here. One is phi nodes. It's probably possible to propagate truncation through phi nodes, though I'll have to think about it. The other is resume points. replaceAllUsesWith is setting UseRemoved on the original instruction's operands, which makes range analysis be conservative about resume points. I think we can do something better there. I have a mock-up patch simulating both of these changes in my tree, and with this patch the unmodified benchmark goes fast, which is encouraging.
Updated•10 years ago
|
Updated•10 years ago
|
Assignee | ||
Comment 14•10 years ago
|
||
Here are the patches.
This first patch implements the truncation analysis for phis. In the SJCL code base, some of the important adds are used by phis that are then truncated. This patch allows the truncation to flow through phis to allow the adds to be truncated.
Attachment #8477921 -
Flags: review?(nicolas.b.pierron)
Assignee | ||
Comment 15•10 years ago
|
||
This patch has some minor cleanups split out from the following patch.
Attachment #8477922 -
Flags: review?(nicolas.b.pierron)
Assignee | ||
Comment 16•10 years ago
|
||
This is the patch reducing the usage of the UseRemoved flag.
With these three patches applied, the attached SJCL benchmark goes fast.
While looking at this benchmark, I also noticed that there were still some floating-point operations in what otherwise should have been purely integer code in the hot loop even after this patch series. I filed bug 1057779 to track this.
Attachment #8477923 -
Flags: review?(nicolas.b.pierron)
Comment 17•10 years ago
|
||
Marking this as won't fix for 32 as it's too late to take a fix for a small perf regression (comment 7).
Assignee | ||
Comment 18•10 years ago
|
||
Comment 19•10 years ago
|
||
It's not a small regression (90%!) but we're already at the RC stage and the code hasn't even been in Nightly yet, so unfortunately this got reported a little too late to still be fixable for 32.
Comment 20•10 years ago
|
||
IIUC this regressed because of a correctness fix (bug 998580), and the improvements needed to get the desired performance legitimately (the patches here) are fairly involved. I doubt this would land on beta even if we were early in the cycle.
Comment 21•10 years ago
|
||
Comment on attachment 8477921 [details] [diff] [review]
truncate-phis.patch
Review of attachment 8477921 [details] [diff] [review]:
-----------------------------------------------------------------
Nice :)
::: js/src/jit/RangeAnalysis.cpp
@@ +2579,5 @@
> + if (kind == MDefinition::TruncateAfterBailouts)
> + op = MToInt32::New(alloc, truncated->getOperand(i));
> + else
> + op = MTruncateToInt32::New(alloc, truncated->getOperand(i));
> + if (truncated->isPhi()) {
style-nit: add a new line above.
Attachment #8477921 -
Flags: review?(nicolas.b.pierron) → review+
Comment 22•10 years ago
|
||
Comment on attachment 8477922 [details] [diff] [review]
gvn-more-misc.patch
Review of attachment 8477922 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/ValueNumbering.cpp
@@ +501,5 @@
> + mozilla::DebugOnly<bool> r = deleteDef(def);
> + MOZ_ASSERT(r, "deleteDef shouldn't have tried to add anything to the worklist, "
> + "so it shouldn't have failed");
> + MOZ_ASSERT(deadDefs_.empty(),
> + "deleteDef shouldn't have added anything to the worklist");
I think the following comment makes it easier to understand why these assertions hold:
deleteDef should not add anything to the deadDefs, as the redundant operation should have the same input operands.
Attachment #8477922 -
Flags: review?(nicolas.b.pierron) → review+
Comment 23•10 years ago
|
||
Comment on attachment 8477923 [details] [diff] [review]
gvn-dont-set-use-removed.patch
Review of attachment 8477923 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/MIR.cpp
@@ +467,5 @@
> void
> MDefinition::replaceAllUsesWith(MDefinition *dom)
> {
> + for (size_t i = 0, e = numOperands(); i < e; i++)
> + getOperand(i)->setUseRemovedUnchecked();
The flag setUseRemovedUnchecked is used to prevent optimizing code which might have been captured under a certain form, but due to the removal of one of the use cases, this form is no longer needed and could be optimized.
One of the example of such optimization is Range Analysis, where the removal of a path which does not expect a truncation can be removed, and a truncation can bubble up to an MDefinition which is captured by a ResumePoint.
The only case where we can do this kind of optimization is if the removed definition had no additional constraints compared to the remaining use cases.
::: js/src/jit/MIR.h
@@ +667,5 @@
> }
> void replaceAllUsesWith(MDefinition *dom);
>
> + // Like replaceAllUsesWith, but doesn't set UseRemoved on |this|'s operands.
> + void justReplaceAllUsesWith(MDefinition *dom);
"just" reflects the lack of additional constraints added by |this|. Maybe a better name could be something which express the fact that |this| is redundant or accept anything as operands.
Attachment #8477923 -
Flags: review?(nicolas.b.pierron) → review+
Assignee | ||
Comment 24•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/61f05ae95aa4
https://hg.mozilla.org/integration/mozilla-inbound/rev/afac7b1435bc
https://hg.mozilla.org/integration/mozilla-inbound/rev/0cf223c85b3b
I left the name justReplaceAllUsesWith for now. It's actually not just about |this| being redundant or accepting anything as operands. In GVN's case, GVN is setting UseRemoved manually when it needs to, so it doesn't need or want replaceAllUsesWith to do it. I'm open to renaming it though, if we can think of a better name.
In the long term, I'd like to get replaceAllUsesWith out of the UseRemoved business altogether, since it doesn't have enough information to really get it right. For now, justReplaceAllUsesWith marks those places which are already ok with this.
Comment 25•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/61f05ae95aa4
https://hg.mozilla.org/mozilla-central/rev/afac7b1435bc
https://hg.mozilla.org/mozilla-central/rev/0cf223c85b3b
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Updated•10 years ago
|
Comment 27•10 years ago
|
||
I'm fairly sure the flags change was wrong here.
status-firefox35:
--- → fixed
Comment 28•10 years ago
|
||
Dan, 33 (beta) and 34 (aurora) are affected, if you don't think it is too risky, could you fill an uplift request? Thanks
Flags: needinfo?(sunfish)
Comment 29•10 years ago
|
||
Gian-Carlo, yes, I made a mistake, sorry :)
Assignee | ||
Comment 30•10 years ago
|
||
There was a bug in one of my patches here which is bug 1062612. The fix is simple, and I just checked it into mozilla-inbound. Once that's landed on mozilla-central, I'll nominate the patches for uplift.
Assignee | ||
Comment 31•10 years ago
|
||
Also waiting for resolution on bug 1063653 before proceeding.
Comment 32•10 years ago
|
||
Dan, seems ok now. Could you fill the uplift request this (Monday) morning to make sure the fix arrive in beta4?
thanks
Assignee | ||
Comment 33•10 years ago
|
||
Comment on attachment 8477921 [details] [diff] [review]
truncate-phis.patch
Approval Request Comment
[Feature/regressing bug #]:
bug 998580
[User impact if declined]:
90% slowdown on SJCL SHA-512
[Describe test coverage new/current, TBPL]:
TBPL on mozilla-central
[Risks and why]:
The patch itself is fairly straightforward, but though it did cause bug 1062612, which while trivial to fix, was surprising for having passed through testing.
[String/UUID change made/needed]:
None.
Attachment #8477921 -
Flags: approval-mozilla-beta?
Attachment #8477921 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 34•10 years ago
|
||
Comment on attachment 8477922 [details] [diff] [review]
gvn-more-misc.patch
Approval Request Comment
[Feature/regressing bug #]:
bug 998580
[User impact if declined]:
90% slowdown on SJCL SHA-512
[Describe test coverage new/current, TBPL]:
TBPL on mozilla-central
[Risks and why]:
This patch is just a cleanup in preparation for the next patch. It is low-risk.
[String/UUID change made/needed]:
None.
Attachment #8477922 -
Flags: approval-mozilla-beta?
Attachment #8477922 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 35•10 years ago
|
||
Comment on attachment 8477923 [details] [diff] [review]
gvn-dont-set-use-removed.patch
Approval Request Comment
[Feature/regressing bug #]:
bug 998580
[User impact if declined]:
90% slowdown on SJCL SHA-512
[Describe test coverage new/current, TBPL]:
TBPL on mozilla-central
[Risks and why]:
This patch interacts with some complex code, and did participate in bug 1063653. That said, the patch itself is fairly straightforward.
[String/UUID change made/needed]:
None.
Attachment #8477923 -
Flags: approval-mozilla-beta?
Attachment #8477923 -
Flags: approval-mozilla-aurora?
Flags: needinfo?(sunfish)
Updated•10 years ago
|
Attachment #8477921 -
Flags: approval-mozilla-beta?
Attachment #8477921 -
Flags: approval-mozilla-beta+
Attachment #8477921 -
Flags: approval-mozilla-aurora?
Attachment #8477921 -
Flags: approval-mozilla-aurora+
Updated•10 years ago
|
Attachment #8477922 -
Flags: approval-mozilla-beta?
Attachment #8477922 -
Flags: approval-mozilla-beta+
Attachment #8477922 -
Flags: approval-mozilla-aurora?
Attachment #8477922 -
Flags: approval-mozilla-aurora+
Updated•10 years ago
|
Attachment #8477923 -
Flags: approval-mozilla-beta?
Attachment #8477923 -
Flags: approval-mozilla-beta+
Attachment #8477923 -
Flags: approval-mozilla-aurora?
Attachment #8477923 -
Flags: approval-mozilla-aurora+
Comment 36•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/d7245149a156
https://hg.mozilla.org/releases/mozilla-aurora/rev/2387457e8c0b
https://hg.mozilla.org/releases/mozilla-aurora/rev/22ac2e65664b
https://hg.mozilla.org/releases/mozilla-beta/rev/94dc71a06159
https://hg.mozilla.org/releases/mozilla-beta/rev/c0d46e44a6cb
https://hg.mozilla.org/releases/mozilla-beta/rev/316374007734
Updated•10 years ago
|
Flags: qe-verify+
Comment 37•10 years ago
|
||
Testing performed on Windows 7 64-bit (Intel Core 2 Duo 2.93Ghz) by running the benchmark [1] on:
* Nightly 35.0a1 (2014-09-25) - sjcl: 847, ±4.97%, 62% slower
* Aurora 34.0a2 (2014-09-25) - sjcl: 903, ±5.96%, 59% slower
* Beta 33.0b7 - sjcl: 742, ±4.96%, 66% slower
I believe that the lower values are caused by my hardware/platform, as the benchmark displayed a higher ops/sec for an Ubuntu x86 machine with AMD FX(tm)-8320 Eight-Core Processor: e.g. Nightly 35 - 1,062, ±6.26%, 57% slower.
Marking this fix as verified. Dan, please let me know if you think otherwise regarding this data.
[1] http://jsperf.com/sha-512/3
Status: RESOLVED → VERIFIED
Comment 38•10 years ago
|
||
I am not sure we can consider that a going from 90 => 60 % slower can qualify this bug as closed.
Comment 39•10 years ago
|
||
(In reply to Sylvestre Ledru [:sylvestre] from comment #38)
> I am not sure we can consider that a going from 90 => 60 % slower can
> qualify this bug as closed.
I get numbers that warrant closing as fixed (all numbers with current Nightly versions):
On my 1st-gen 2.7Ghz 15" rMBP, I get an sjcl score of 2006 (with TweetNaCl.js at 4301).
On my Galaxy S4 I get an sjcl score of 161 (with TweetNaCl.js at 201).
These scores are (very roughly) on par with Chrome's on the same machines.
I think the reason for the numbers is that they're done on a different machine. Ideally, a comparison would be made on the same machine between current builds and old builds without the fixes.
You need to log in
before you can comment on or make changes to this bug.
Description
•