1092171 - 4.53% Win7+8 V8 Regression on Inbound (v.36) Oct 30 from push 13ecff800114

Reporter

Description

•

10 years ago

Here is a graph showing the regression:
http://graphs.mozilla.org/graph.html#tests=%5B%5B230,131,25%5D%5D&sel=none&displayrange=7&datatype=running

I did some retriggers on tbpl:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&startdate=2014-10-30&enddate=2014-10-31&jobname=Windows%207%2032-bit%20mozilla-inbound%20talos%20dromaeojs

it points out that this is the push causing the regression:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=13ecff800114

Here is some information about v8:
https://wiki.mozilla.org/Buildbot/Talos/Tests#V8.2C_version_7

I looked on other platforms and none experience a regression, so this is limited to Win7 (our most popular desktop platform).

Boris Zbarsky [:bzbarsky]

Comment 1

•

10 years ago

This is quite odd, since V8 shouldn't be using DOM things at all, except maybe Window.

Boris Zbarsky [:bzbarsky]

Comment 2

•

10 years ago

That said, the numbers are very clear, I agree.

Looking at the subtest results, they look like this on the 4 runs before the change, dropping everything after the decimal point:

Crypto: 17997, 17777, 18339, 17733
DeltaBlue: 25038, 25177, 25138, 25131
EarlyBoyer: 23450, 23591, 23526, 23593
NavierStokes: 22341, 22274, 22408, 22237
RayTrace: 40329, 40289, 40033, 39697
RegExp: 3479, 3444, 3469, 3461
Richards: 21869, 21491, 21714, 21774
Splay: 14815, 14595, 14285, 14912

And like this on the 4 runs after the change:

Crypto: 17797, 17557, 17496, 17759
DeltaBlue: 25111, 24569, 24913, 24820
EarlyBoyer: 22928, 22788, 21573, 22531
NavierStokes: 22215, 22341, 22363, 22363
RayTrace: 40215, 38958, 40067, 39919
RegExp: 3454, 3437, 3466, 3458
Richards: 21491, 21922, 21689, 21710
Splay: 10443, 9932, 10683, 10440

So most obviously, something happened with Splay here..

Boris Zbarsky [:bzbarsky]

Comment 3

•

10 years ago

And note that http://arewefastyet.com/#machine=17&view=single&suite=octane&subtest=Splay does not show a regression either, as far as I can tell.  Though maybe octane-splay and v8-v7-splay are not the same thing...

Boris Zbarsky [:bzbarsky]

Comment 4

•

10 years ago

OK, so diff says that http://v8.googlecode.com/svn/data/benchmarks/v7/splay.js and http://mxr.mozilla.org/mozilla-central/source/js/src/octane/splay.js are in fact not the same thing.  And neither matches http://mxr.mozilla.org/mozilla-central/source/js/src/jit-test/tests/v8-v5/check-splay.js

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 5

•

10 years ago

should be we updating our tests?

Boris Zbarsky [:bzbarsky]

Comment 6

•

10 years ago

So I tried http://v8.googlecode.com/svn/data/benchmarks/v7/run.html on Mac with and without the changeset in question and unsurprisingly I see no performance difference on Mac.

I'm going to do some try pushes to see if I can reproduce there, and if so, I guess I get to do some debugging via try...

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 7

•

10 years ago

we also have a 2.74% win8 regression here:
http://graphs.mozilla.org/graph.html#tests=%5B%5B230,131,31%5D%5D&sel=1414604226202,1414777777506.348&displayrange=7&datatype=running

Summary: 4.53% Win7 V8 Regression on Inbound (v.36) Oct 30 from push 13ecff800114 → 4.53% Win7+8 V8 Regression on Inbound (v.36) Oct 30 from push 13ecff800114

Boris Zbarsky [:bzbarsky]

Comment 8

•

10 years ago

> should be we updating our tests?

I have no idea.  What is "our tests" in this case?  ;)

Just to make sure, the regression here is on non-PGO boxes, so I don't have to worry about weird PGO effects, right?

In any case, I've done a bunch of try pushes to try to narrow this down and here's what I see: 

1)  The regression is definitely in Splay only.
2)  The regression is due to changing what the parent (in the JS engine sense) is for
    nsXULElement instances from their parent element to the global (!).  Specifically,
    compare https://tbpl.mozilla.org/?tree=Try&rev=6d9ab47bdf0d and
    https://tbpl.mozilla.org/?tree=Try&rev=0a1a122280de

There shouldn't even be any nsXULElement instances in splay, except _possibly_ for viewport scrollbars.  So my best guess is that changing the parenting here changes the GC timing for the browser window itself somehow and this affects the performance on splay (which iirc is largely an object allocation benchmark).

Terrence, is there something GC-related I could try logging on my try pushes to verify this hypothesis?

Flags: needinfo?(terrence)

Terrence Cole [:terrence]

Comment 9

•

10 years ago

Yes, small changes to the GC timing can dramatically change the splay score. In particular, we've had problems in the past with a GC that normally happens in the harness moving forward a bit and happening inside splay's timed region. Steve has the most recent experience here, looking at splay numbers on Flame. Those devices are slow enough that we only do a handful of GC's so one extra is an enormous relative slowdown. Desktop, on the other hand, should be fast enough that the difference would be relatively small -- like about 5%. Steve might also have some ideas for how to debug this on try -- I certainly don't.

Flags: needinfo?(terrence) → needinfo?(sphink)

Avi Halachmi (:avih)

Comment 10

•

10 years ago

Terrence, am I right to understand that if GC timing changes and as a result the splay numbers change, then it's not a real regression, but rather incidental?

If that's the case, and you feel that the modified GC timing is not a bad thing, then I don't think we should go out of our way to make the splay numbers get back to their previous values, since, at least for this change, the change in numbers doesn't represent a user visible change - it only happens that the splays use case triggers more GCs than it used to, but the overall behavior (i.e. for more cases than just splay's) is supposedly better, right?

Emanuel Hoogeveen [:ehoogeveen]

Comment 11

•

10 years ago

Perhaps we could change the test harness in Talos a bit to ensure that all GCs resulting from the warm-up run have happened before the timed run starts (it would probably make sense to move Talos over to Octane v2 at the same time). e.g. by inserting a gcslice() call before starting the timed run?

Boris Zbarsky [:bzbarsky]

Comment 12

•

10 years ago

> should be fast enough that the difference would be relatively small -- like about 5%.

We have a 30-40% Splay regression, so whatever our hypothesis is should be able to account for that.

The discussion in comments 10 and 11 seems premature; let's figure out whether GC is in fact the problem before we start worrying about how to solve it.  ;)

Terrence Cole [:terrence]

Comment 13

•

10 years ago

(In reply to Boris Zbarsky [:bz] from comment #12)
> > should be fast enough that the difference would be relatively small -- like about 5%.
> 
> We have a 30-40% Splay regression, so whatever our hypothesis is should be
> able to account for that.

Oh, I thought the 4.33% was for splay alone. If it's 30-40% on Windows, GC timing is not going to be an adequate explanation. I guess the first question is if GC is even culpable. Ideally we could get MOZ_GCTIMER logs off talos, but I don't know how feasible that is.

Boris Zbarsky [:bzbarsky]

Comment 14

•

10 years ago

> I guess the first question is if GC is even culpable. 

That was precisely my question in comment 8, yes.  What can I log on try to test this hypothesis?

Boris Zbarsky [:bzbarsky]

Comment 15

•

10 years ago

I talked to sfink.

His approach involves adding shell functions and changing tests to call them; neither one is that viable here, since this regression is browser-only and in tests I can't edit.

I'll see if I can come up with some logging based on his patch, though.

Steve Fink [:sfink] [:s:]

Comment 16

•

10 years ago

I gave bz a patch at http://people.mozilla.org/~sfink/data/upload-stats that hopefully will make talos upload the MOZ_GCTIMER log. But this strikes me as something generally useful -- seems like talos ought to be logging major GCs to stderr by default, for cases like this. Possibly in a way compatible with the fancy new structured logging stuff, since we can already output as JSON.

The one thing that's going to be difficult is figuring where splay is in the output. I guess if talos already logs the individual test starts/ends, perhaps this would be possible with interleaved logs. So bz, an alternative to the above patch is instead hardcoding char *env = "stderr". That will spew very basic summary timings to stderr, to get mixed in with the regular output.

Boris Zbarsky [:bzbarsky]

Comment 17

•

10 years ago

Unfortunately, with the logging patch the Talos run fails.  :(  See https://tbpl.mozilla.org/?tree=Try&rev=b41ea8fcaceb

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 18

•

10 years ago

looking at that talos run, the browser terminates after the second time it loads http://localhost/page_load_test/dromaeo/cssquery-dojo.html.  For reference this is the first page it loads in the test, so maybe this patch is causing the browser to terminate?

Boris Zbarsky [:bzbarsky]

Comment 19

•

10 years ago

What's interesting to me is that in a run without that patch, I only see cssquery-dojo.html loaded once, unless I'm missing something...

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 20

•

10 years ago

ah, good point- that is true, what we see here is the stdout from the browser, then when it fails we dump all we have again (probably a bug in talos although on some platform like android this is a benefit).  Is there a chance that we could have the browser crash or hang somehow?

Boris Zbarsky [:bzbarsky]

Comment 21

•

10 years ago

So here's a question.  Is this reproducible on win7/8 outside our test infra?

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 22

•

10 years ago

I am not sure, that is a great question.  I don't have access to a hardware win7 machine, but I could do this on a VM.

Avi, do you have a win7/8 machine?

Flags: needinfo?(avihpit)

Avi Halachmi (:avih)

Comment 23

•

10 years ago

I do, though my own builds crash randomly for some reason (regardless of this bug). Nightly builds don't crash though.

So if there some procedure you want me to run/test on windows 8.1 with try builds, let me know what it is.

Flags: needinfo?(avihpit)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Updated

•

9 years ago

Status: NEW → RESOLVED

Closed: 9 years ago

Flags: needinfo?(sphink)

Resolution: --- → WONTFIX