1283302 - Determine correct nglayout.initialpaint.delay for modern platforms

Reporter

Description

•

8 years ago

nglayout.initialpaint.delay is currently set to 250ms for all platforms. That preference is the amount of time we'll allow a page to finish its initial layout before painting to the screen. By setting it to a lower number, the user can potentially see content sooner, though that content may change again (causing FOUT or other visible layout shifts.) I don't know how we got to 250ms as a default, or if that number even makes sense anymore. I also don't know if this should remain a constant or dynamically computed based on system capabilities.

Jet Villegas (inactive)

Reporter

Updated

•

8 years ago

Blocks: 1283300

Jet Villegas (inactive)

Reporter

Comment 1

•

8 years ago

Attached patch Sets the default value for nglayout.initialpaint.delay to 5ms (was 250 ms) per user research conclusions (obsolete) (deleted) — Details — Splinter Review

Attachment #8767332 - Flags: review?(matt.woodrow)

David Baron :dbaron:

Comment 2

•

8 years ago

Since 5ms is substantially smaller than a refresh tick, we might want to follow up by getting rid of the entire delay mechanism.

Matt Woodrow (:mattwoodrow)

Comment 3

•

8 years ago

(In reply to David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) from comment #2) > Since 5ms is substantially smaller than a refresh tick, we might want to > follow up by getting rid of the entire delay mechanism. Yeah I agree. Once we've had the pref change on release for a bit without fallout, then we should rip the remaining code out.

Matt Woodrow (:mattwoodrow)

Updated

•

8 years ago

Attachment #8767332 - Flags: review?(matt.woodrow) → review+

Jet Villegas (inactive)

Reporter

Comment 4

•

8 years ago

Try results coming up: remote: View your changes here: remote: https://hg.mozilla.org/try/rev/9c41428cc88b remote: https://hg.mozilla.org/try/rev/ff5c0c16a40e remote: remote: Follow the progress of your build on Treeherder: remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ff5c0c16a40e remote: remote: It looks like this try push has talos jobs. Compare performance against a baseline revision: remote: https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=ff5c0c16a40e

Pulsebot

Comment 5

•

8 years ago

Pushed by jvillegas@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/24ec7aba60e7 Sets the default value for nglayout.initialpaint.delay to 5ms (was 250 ms) per user research conclusions. r=mattwoodrow

Jet Villegas (inactive)

Reporter

Updated

•

8 years ago

Assignee: nobody → bugs

Chris Peterson [:cpeterson]

Comment 6

•

8 years ago

Some hg blame history: 2001-04-25 - bug 77002 originally defined PAINTLOCK_EVENT_DELAY as 1200 ms. 2001-05-01 - bug 76495 changed it to 1000 ms. 2001-05-31 - bug 78695 changed it back to 1200 ms. 2002-12-26 - bug 180241 changed it to 250 ms "to match Phoenix and Chimera". 2016-07-06 - bug 1283302 (this bug) changed it to 5 ms.

Blocks: 180241

status-firefox48: --- → affected

status-firefox49: --- → affected

Carsten Book [:Tomcat]

Comment 7

•

8 years ago

sorry backout for test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=31325455&repo=mozilla-inbound

Flags: needinfo?(bugs)

Pulsebot

Comment 8

•

8 years ago

Backout by cbook@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/1052561d1faf Backed out changeset 24ec7aba60e7 for testfailures in 518172-1b.html

Joel Maher ( :jmaher ) (UTC -8)

Comment 9

•

8 years ago

this caused a 4% tp5o regression on linux64 and it reverted when this was backed out: https://treeherder.mozilla.org/perf.html#/alerts?id=1723 please keep talos in mind when looking to reland this!

Jet Villegas (inactive)

Reporter

Comment 10

•

8 years ago

(In reply to Pulsebot from comment #8) > Backout by cbook@mozilla.com: > https://hg.mozilla.org/integration/mozilla-inbound/rev/1052561d1faf > Backed out changeset 24ec7aba60e7 for testfailures in 518172-1b.html Thanks, filed bug 1285317 for that.

Flags: needinfo?(bugs)

Jet Villegas (inactive)

Reporter

Updated

•

8 years ago

Depends on: 1285317

Joel Maher ( :jmaher ) (UTC -8)

Comment 11

•

8 years ago

in addition to tp5o regression there is a 13% glterrain regression on win7 with this patch.

Jet Villegas (inactive)

Reporter

Updated

•

8 years ago

Depends on: 1285839

David Baron :dbaron:

Comment 12

•

8 years ago

(In reply to Joel Maher ( :jmaher - PTO - back on Friday July 15 ) from comment #9) > this caused a 4% tp5o regression on linux64 and it reverted when this was > backed out: > https://treeherder.mozilla.org/perf.html#/alerts?id=1723 I'm somewhat suspicious of the glterrain regression (Windows) and the tp5o_scroll and tscrollx regressions (Linux64). The rest (except probably the RSS one) seem expected.

Joel Maher ( :jmaher ) (UTC -8)

Comment 13

•

8 years ago

the backout shows a 10%+ win for glterrain for win7: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=cc866385dd01&newProject=mozilla-inbound&newRevision=1052561d1faf&framework=1 this follows the pattern on the graph of the regression aligning with the commit and the improvement coming with the backout. Possibly we can push to try with a --spsProfile flag to learn more of what is going on?

:Dominik Strohmeier [:dstrohmeier]

Comment 14

•

8 years ago

(In reply to David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) from comment #12) > I'm somewhat suspicious of the glterrain regression (Windows) and the > tp5o_scroll and tscrollx regressions (Linux64). The rest (except probably > the RSS one) seem expected. Pinging this bug. I am wondering how we can move forward with this bug. What will make you guys feel comfortable fixing this and set the value to 5ms? Reading dbaron's comment it seems that the regressions are mostly expected.

:Dominik Strohmeier [:dstrohmeier]

Comment 15

•

8 years ago

see my comment above. What do we need to move forward here?

Flags: needinfo?(bugs)

Jet Villegas (inactive)

Reporter

Comment 16

•

8 years ago

(In reply to Dominik Strohmeier from comment #15) > see my comment above. What do we need to move forward here? This change will come with an inevitable increase in measured page load time but may come with a decrease in *perceived* time to initial rendering. This was true the last time we changed this value, see: https://bugzilla.mozilla.org/show_bug.cgi?id=180241#c17 I recommend we try it early in a cycle, see how it looks/feels, and gather feedback. We'll still want to figure out what's up with the GLTerrain test in comment 13 before we do that.

Flags: needinfo?(bugs)

:Harald Kirschner :digitarald

Comment 17

•

8 years ago

> I recommend we try it early in a cycle, see how it looks/feels, and gather feedback. Just to clarify the next steps: Given the SHIELD study confirmed improved perceived performance, what other feedback are we looking for that is blocking this?

Flags: needinfo?(bugs)

Jet Villegas (inactive)

Reporter

Comment 18

•

8 years ago

(In reply to Harald Kirschner :digitarald from comment #17) > > I recommend we try it early in a cycle, see how it looks/feels, and gather feedback. > > Just to clarify the next steps: Given the SHIELD study confirmed improved > perceived performance, what other feedback are we looking for that is > blocking this? We're looking for confirmation that the SHIELD study sample set actually represents the larger population with more diverse configurations. We also need Platform Operations team to approve the measured Load time regression.

Flags: needinfo?(bugs)

:Harald Kirschner :digitarald

Comment 19

•

8 years ago

ni? jgaunt. Could we check Jet's concern on the study's sample being representative?

Flags: needinfo?(jgaunt)

Jet Villegas (inactive)

Reporter

Comment 20

•

8 years ago

(In reply to Joel Maher ( :jmaher) from comment #13) > the backout shows a 10%+ win for glterrain for win7: > https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla- > inbound&originalRevision=cc866385dd01&newProject=mozilla- > inbound&newRevision=1052561d1faf&framework=1 I believe you :) ...but I can't reproduce the regression on Win7 with our without this change. > this follows the pattern on the graph of the regression aligning with the > commit and the improvement coming with the backout. Possibly we can push to > try with a --spsProfile flag to learn more of what is going on? +cc: jgilbert Jeff: this is the bug I mentioned last time we met. You're probably in a better spot to run this Try build and comment. Thx!

Flags: needinfo?(jgilbert)

:Harald Kirschner :digitarald

Comment 21

•

8 years ago

Jason, can we somehow use Project Presto to see if this regresses Speed Index?

Flags: needinfo?(jduell.mcbugs)

Jason Duell

Comment 22

•

8 years ago

Sadly Presto has been using https://www.webpagetest.org/ for our tests, and it uses standard builds from mozilla.org, so there's no easy way to get it to use a custom build. I do think that the webpagetest SpeedIndex is a good metric for page load responsivity. I'm asking Valentin (Presto coder) and Nick Hurley (who had a hand-rolled webpagetest setup at one point: maybe he has a recipe for standing one up) if they have ideas here.

Flags: needinfo?(jduell.mcbugs) → needinfo?(valentin.gosu)

Jason Duell

Updated

•

8 years ago

Flags: needinfo?(hurley)

Jason Duell

Comment 23

•

8 years ago

Valentin--how often does webpagetest.org update its nightly build? Could we throw the patch from this bug onto nightly for a few days till it shows up there, and then use your scripts to compare with nightlies from a few days before (w/o patch)?

Jason Duell

Comment 24

•

8 years ago

Note: Valentin is on PTO so it may be a week or so before he responds.

u408661

Comment 25

•

8 years ago

(In reply to Jason Duell [:jduell] (needinfo me) from comment #22) > I'm asking Valentin (Presto coder) and Nick Hurley (who had a hand-rolled > webpagetest setup at one point: maybe he has a recipe for standing one up) > if they have ideas here. I didn't hand-roll a wpt setup, I hand-rolled a not-quite-wpt-like-thing-that-doesn't-measure-speedindex. Bob Clary was the one that set up wpt for us, redirecting to him.

u408661

Updated

•

8 years ago

Flags: needinfo?(hurley)

u408661

Updated

•

8 years ago

Flags: needinfo?(bob)

:Harald Kirschner :digitarald

Comment 26

•

8 years ago

> it uses standard builds from mozilla.org, so there's no easy way to get it to use a custom build. We can also reach out to speedcurve to set up one of their instances with a custom build and run benchmarks. For reference, the BYO-Server solution is described in more detail: https://www.instartlogic.com/blog/configuring-private-instance-webpagetest-browser-extensions

Bob Clary [:bc] (inactive)

Comment 27

•

8 years ago

wpt is using the directory: http://archive.mozilla.org/pub/firefox/nightly/latest-mozilla-central/ to get builds. which does appear to be updated regularly. I had thought taskcluster killed updating the latest dirs, but perhaps they changed their minds. https://github.com/WPO-Foundation/webpagetest/blob/master/www/installers/browsers/updateFirefoxNightly.php#L8 pulls the 32bit installer and places it on https://www.webpagetest.org/installers/browsers/ https://www.webpagetest.org/installers/browsers/nightly.dat http://www.webpagetest.org/installers/browsers/nightly-51-1471627440.exe is current. Doing an experiment on nightly should work.

Flags: needinfo?(bob)

:Dominik Strohmeier [:dstrohmeier]

Comment 28

•

8 years ago

> Doing an experiment on nightly should work. Testing this on Nightly as a second way of validating the patch in addition to the results from user research sounds like a great path forward. I suggest that we'll use the Alexa TOP100-curated list for testing to cover a wide range of popular websites [1]. I think that it will also be important to webpagetest also for slower connections. I figured out that webpagetest also allows for using credentials to login to pages like FB or Pinterest through scripting for more valid results [2][3] beyond landing pages. [1] https://github.com/WPO-Foundation/webpagetest/blob/master/bulktest/alexa-curated.txt [2] https://sites.google.com/a/webpagetest.org/docs/using-webpagetest/scripting [3] https://www.webpagetest.org/forums/archive/index.php?thread-11504.html

Gregg Lind (Fx Strategy and Insights - Shield - Heartbeat )

Comment 29

•

8 years ago

(answering for jgaunt) Hello! I designed and implemented the Shield study. To answer representativeness: 0. All the concerns are valid. 1. It was a self selected study of EN-* users from Release. These users self-selected to participate, not knowing what performance enhancements they would receive. It is entirely possible that these users had existing bad performance and that lured them into the study. 2. At the time we had no good access to merge these packets with Telemetry. 3. I would be delighted to have some code to compare a bunch of profiles to decide what is representative enough. 4. A decision here is whether this sample is more representative than a NIGHTLY sample. I personally doubt that users in Nightly are more representative of the full range of user configs. I especially doubt that their modal / median / centroid is the same. But I have no good evidence to back that up. The evidence we do have: 1. Surveys in the population reported better perceived performance than the other settings. 2. That effect followed a dose-response curve, and got worse when the setting was worse 3. Setting the timeout to worse than current (1000ms) was the worst group. 4. Participating user uninstalled the addon in that same pattern. I want to put a hard question forward: Suppose Nightly shows a trend in the other direction? Or no-effect? Is nightly the product we are actually building? Or release? If we want to move to a standard of evidence that is conclusive and acceptable, I would gladly hear thoughts on what that is. Until then, we have the evidence we have.

Flags: needinfo?(jgaunt)

:Dominik Strohmeier [:dstrohmeier]

Comment 30

•

8 years ago

Hey Gregg, thanks for these details. I briefly want to chime in here to clarify a misunderstanding that I read in your comment. The discussion to do tests in Nightly are targeting more the engineering side of things. We do see some regressions with nglayout.initialpaint.delay set to 5ms and there's been the question about validity of this regression in the wild. So the idea would be to run webpagetest.org against a Nightly version that has the pref set to 5ms and compare to results for the old value of 250ms. That has nothing to do with the results from the SHIELD research. Second, I agree with what you described regarding the validity and reliability of the SHIELD results. Thanks for clarifying. Makes sense?

Flags: needinfo?(glind)

Terrell Kelley

Comment 31

•

8 years ago

(In reply to David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) from comment #2) > Since 5ms is substantially smaller than a refresh tick, we might want to > follow up by getting rid of the entire delay mechanism. Please do not. There are negative side effects to too low a setting. I personally keep the delay at 56ms after a bunch of testing. This is usually enough to prevent incorrect rendering of pages. Imagest usually take around 50ms to start loading, and having the whole page move is annoying. Similarly, I would definitely suggest testing among people who are not looking to prioritize speed. They may also like the smoother operation better.

:Harald Kirschner :digitarald

Comment 32

•

8 years ago

> We're looking for confirmation that the SHIELD study sample set actually represents the larger population with more diverse configurations. We got that from Gregg on a higher level. The assumption is that SHIELD does random sampling on release and therefor is diverse enough. > We also need Platform Operations team to approve the measured Load time regression. Looks like we had some general agreement in the Friday meeting that the regression isn't a major concern.

Flags: needinfo?(bugs)

Spencer Selander [greenknight]

Comment 33

•

8 years ago

A delay of 50ms works for me, 5ms makes the browser nearly unusable.

:Dominik Strohmeier [:dstrohmeier]

Comment 34

•

8 years ago

(In reply to Spencer Selander [greenknight] from comment #33) > A delay of 50ms works for me, 5ms makes the browser nearly unusable. I have had the pref at 5ms for nearly 6 weeks now and haven't had any issues. I am curious what "nearly unusable" means for you.

Flags: needinfo?(spencerselander)

Spencer Selander [greenknight]

Comment 35

•

8 years ago

(In reply to Dominik Strohmeier from comment #34) > (In reply to Spencer Selander [greenknight] from comment #33) > > A delay of 50ms works for me, 5ms makes the browser nearly unusable. > > I have had the pref at 5ms for nearly 6 weeks now and haven't had any > issues. I am curious what "nearly unusable" means for you. It had been a while since I tried it, so I just gave it another shot - no problem now with 5ms. None of the reflows and glitches I got before, even image-heavy pages render very smoothly. I'm leaving it set that way for now.

Flags: needinfo?(spencerselander)

Jet Villegas (inactive)

Reporter

Comment 36

•

8 years ago

Attached patch Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. r=tn (deleted) — Details — Splinter Review

Modify the patch to apply only to desktop platforms. Android will remain at 250 ms.

Attachment #8767332 - Attachment is obsolete: true

Flags: needinfo?(bugs)

Attachment #8784602 - Flags: review?(tnikkel)

Timothy Nikkel (:tnikkel)

Updated

•

8 years ago

Attachment #8784602 - Flags: review?(tnikkel) → review+

Jet Villegas (inactive)

Reporter

Comment 37

•

8 years ago

After conferring with the Platform and Ops teams, we've decided to enable this on mozilla-central for all desktop platforms. For more detailed information on the User Research backing this change, please read this doc: https://docs.google.com/document/d/1BvCoZzk2_rNZx3u9ESPoFjSADRI0zIPeJRXFLwWXx_4/edit#heading=h.28ki6m8dg30z

Jet Villegas (inactive)

Reporter

Updated

•

8 years ago

Attachment #8784602 - Attachment description: Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. → Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. r=tn

Pulsebot

Comment 38

•

8 years ago

Pushed by jvillegas@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/e68b376d1191 Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. r=tn

Phil Ringnalda (:philor)

Comment 39

•

8 years ago

Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/ccf3d0b90b0422b169c60d429029cdfdc0e5a5a0 for making a bunch of existing intermittent failures nearly permaorange. Going to be hard to point out without just a ton of copy-paste, but see in particular the Linux64 reftest-4 and reftest-7 chunks in https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&tochange=ccf3d0b90b0422b169c60d429029cdfdc0e5a5a0&fromchange=e59fbcb1da07bb3e330c8c35759e58404981ffa6&filter-searchStr=reftest&group_state=expanded

Phil Ringnalda (:philor)

Comment 40

•

8 years ago

And some of the same reftests on Win7 (it'll be hours yet before Win8 or WinXP actually run).

:Dominik Strohmeier [:dstrohmeier]

Comment 41

•

8 years ago

Hi Jet, what'S the status here? Any chance that this will still ship in August? Thanks, Dominik

Flags: needinfo?(bugs)

Jet Villegas (inactive)

Reporter

Comment 42

•

8 years ago

(In reply to Dominik Strohmeier from comment #41) > Hi Jet, > what'S the status here? Any chance that this will still ship in August? > > Thanks, > Dominik mstange & I are looking through the failing tests. Note that we didn't plan to uplift this change (ie. it wasn't ever shipping in August.)

Flags: needinfo?(bugs)

Valentin Gosu [:valentin] (he/him)

Comment 43

•

8 years ago

(In reply to Bob Clary [:bc:] from comment #27) > Doing an experiment on nightly should work. I'll do a run in the next couple of days so we get a baseline, then we can compare with the build after we land on nightly.

Flags: needinfo?(valentin.gosu)

:Dominik Strohmeier [:dstrohmeier]

Comment 44

•

8 years ago

Jet, what's the status with this bug? Bug 1297996 landed and got uplifted to Beta and we're planning to run some tests on page2page transitions. In our pretests, we've seen that there are some more or less announced effects for the page2page transitions for different initial delay settings. It would be great if this would land and ride the trains. Thanks.

Flags: needinfo?(bugs)

Jet Villegas (inactive)

Reporter

Comment 45

•

8 years ago

(In reply to Dominik Strohmeier from comment #44) > Jet, what's the status with this bug? Markus is investigating the test failures this week. > Bug 1297996 landed and got uplifted to Beta and we're planning to run some > tests on page2page transitions. In our pretests, we've seen that there are > some more or less announced effects for the page2page transitions for > different initial delay settings. I had a look at the patches in bug 1297996 and the checks for suppressed painting will certainly be affected by landing this one. This bug affects all platforms, whereas your experiments in bug 1297996 are non-e10s-only. I wouldn't assume your results will hold true for e10s. > It would be great if this would land and ride the trains. Thanks. Agreed :) though we can't ship with the tests in this current state. I would note that the failing tests are perma-failures on my Linux VM with or without this fix, so that should help debugging at least.

Flags: needinfo?(bugs)

:Dominik Strohmeier [:dstrohmeier]

Comment 46

•

8 years ago

In bug 1311448, we will run a Telemetry Experiment on different page-to-page transitions. In this context, we will also have a test cohort that has nglayout.initialpaint.delay set to 0 and one that combines transition over blank with a initialpaint.delay of 0 as paint delays and the experience of different page to page transitions is very much linked. What's the state of landing this? Any progress on Markus's investigations? Does it make sense to wait for some more Telemetry data from the Telemetry Experiment and then target to get rid of the delay mechanisms maybe completely?

Flags: needinfo?(bugs)

Jet Villegas (inactive)

Reporter

Comment 47

•

8 years ago

(In reply to Dominik Strohmeier from comment #46) > What's the state of landing this? Any progress on Markus's investigations? Markus diagnosed the test failures. Most will add additional fuzzing to the test results to account for paint invalidation that occurs prior to the reftest screen captures. Neerja will land the test fixes. > Does it make sense to wait for some more Telemetry data from the Telemetry Experiment and then target to get rid of the delay mechanisms maybe completely? I think we'll see the same test failures at 0 delay. It would be good to see what the Telemetry Experiment results say about stability, since we do intend to remove the delay mechanism entirely (see comment 2.)

Flags: needinfo?(bugs) → needinfo?(npancholi)

Terrell Kelley

Comment 48

•

8 years ago

(In reply to Jet Villegas (:jet) from comment #47) > I think we'll see the same test failures at 0 delay. It would be good to see > what the Telemetry Experiment results say about stability, since we do > intend to remove the delay mechanism entirely (see comment 2.) Comment 2 has one person suggest that. I have specifically asked that the preference not be removed, and provide my reasoning--that I personally prefer the smoother experience of no flashing on simple sites like xkcd.com, where the image is the primary content and thus there is no need to display anything until the image is ready. What reason is there to remove the mechanism entirely instead of leaving it there for those of us who might want to change it? It does not seem that it would be possible for an addon to overwrite this change (as there is no delayPaint property in JavaScript or CSS.) I would understand it if you were overhauling the entire painting system, and thus you'd need to reimplement the delay mechanism. Then, if few users are using it, it makes sense to remove it. But, otherwise, why not leave it?

Neerja Pancholi[:neerja]

Assignee

Updated

•

8 years ago

Assignee: bugs → npancholi

Flags: needinfo?(npancholi)

Neerja Pancholi[:neerja]

Assignee

Updated

•

8 years ago

Blocks: 1293550, 1260629, 1313772, 1313773, 1294278, 1295466

Neerja Pancholi[:neerja]

Assignee

Updated

•

8 years ago

No longer blocks: 1260629, 1293550, 1294278, 1295466, 1313772, 1313773

Neerja Pancholi[:neerja]

Assignee

Updated

•

8 years ago

Depends on: 1293550, 1260629, 1313772, 1313773, 1294278, 1295466

Neerja Pancholi[:neerja]

Assignee

Comment 49

•

8 years ago

Added fuzzy annotations to intermittent failures corresponding to bugs 1293550, 1260629, 1313772, 1313773, 1294278, 1295466 and review request sent to mstange for each of them. (these bugs were also added to 'Depends on' list) Started new try run with patch for this bug + patches for all bugs listed above: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7d70c7ae79678fb56c1ea09e9d640e719f5c8c22

Neerja Pancholi[:neerja]

Assignee

Comment 50

•

8 years ago

Failure for test invalidate-1.html on windows needed to be looked into more closely: https://treeherder.mozilla.org/#/jobs?repo=try&revision=20b632bf5f252b8e21789a5e74e4f2c51a07273c&selectedJob=28813830&exclusion_profile=false I triggered a Try run with "reftest-wait" and a small setTimeout added to this test: You can see that this does fix the issue. (Test doesn’t fail anymore in the try run below) That means we do *eventually* paint the correct thing, but the default reftest snapshot is just happening too quickly. https://treeherder.mozilla.org/#/jobs?repo=try&revision=1aa1e5359ecb84245c545bc9758a58474771fa3c So the explanation is probably that we've got a bug in one of the following two places: 1. Since delaying the snapshot taken by the test harness by 100ms does make the test pass, there might be a bug in the paint stabilization in the test harness where it doesn’t correctly wait for the second div in the testcase to be painted. 2. Or the paint stabilization is correct but the -moz-element invalidation is too delayed for it to catch for some reason Eg: -moz-element references one frame too late i.e. we paint the first div, then we discover that we need to invalidate the other div, and we trigger another paint which then paints the other div so maybe we must modify the code for -moz-element so that both the invalidation happens in the first paint itself? mattwoodrow, do you have any thoughts on what the explanation might be? (since you've worked on -moz-element as well as paint stabilization) (Thanks very much mstange and dholbert for your help with this!)

Flags: needinfo?(matt.woodrow)

Daniel Holbert [:dholbert]

Comment 51

•

8 years ago

(Slight tangent: IMO, we should feel free to paper over the invalidate-1.html failure using a small setTimeout (since that does fix it), in the short-term interests of getting this bug landed. And then we can investigate comment 50 more closely in bug 1178202 or a new bug. A small setTimeout would be basically equivalent to the paint delay that we're already giving this test in current trunk, so in practice it wouldn't be reducing the sensitivity of that test.)

Matt Woodrow (:mattwoodrow)

Comment 52

•

8 years ago

I did some digging, and I strongly suspect that this is bug 1209751 coming back to bite us. My guess is that when we go to take the reftest snapshot of the page, the background image hasn't loaded yet. Reftests snapshots force synchronous image decoding (by not passing DRAWWINDOW_ASYNC_DECODE_IMAGES to drawWindow), so we block (during painting) when we try to draw the first div until the image has loaded and then draw it correctly. We then try painting the div with -moz-element (which didn't have the instrinsic image size available during layout, as the image was only ready during painting) and then that fails. Seth's comment suggests that this should only be a problem if the element's size depends on the image's intrinsic size, which shouldn't be the case here (we specify a size ourselves). There could easily be something else depending on the instrinsic size though which causes painting to fail. It looks like this is reproducible, so shouldn't be too hard to confirm. I'm not sure what the best fix is here. I think we need to detect that the image size has changed and schedule another paint so it can be draw with the correct layout. The reftest harness should detect this scheduled paint and wait for it.

Flags: needinfo?(matt.woodrow)

Neerja Pancholi[:neerja]

Assignee

Updated

•

8 years ago

Depends on: 1178202

Neerja Pancholi[:neerja]

Assignee

Updated

•

8 years ago

No longer depends on: 1260629, 1293550, 1294278, 1295466, 1313772, 1313773

Neerja Pancholi[:neerja]

Assignee

Updated

•

8 years ago

No longer depends on: 1178202

Neerja Pancholi[:neerja]

Assignee

Comment 53

•

8 years ago

(In reply to Matt Woodrow (:mattwoodrow) from comment #52) > I did some digging, and I strongly suspect that this is bug 1209751 coming > back to bite us. > > My guess is that when we go to take the reftest snapshot of the page, the > background image hasn't loaded yet. > > Reftests snapshots force synchronous image decoding (by not passing > DRAWWINDOW_ASYNC_DECODE_IMAGES to drawWindow), so we block (during painting) > when we try to draw the first div until the image has loaded and then draw > it correctly. > > We then try painting the div with -moz-element (which didn't have the > instrinsic image size available during layout, as the image was only ready > during painting) and then that fails. > > Seth's comment suggests that this should only be a problem if the element's > size depends on the image's intrinsic size, which shouldn't be the case here > (we specify a size ourselves). There could easily be something else > depending on the instrinsic size though which causes painting to fail. > > It looks like this is reproducible, so shouldn't be too hard to confirm. > > I'm not sure what the best fix is here. I think we need to detect that the > image size has changed and schedule another paint so it can be draw with the > correct layout. The reftest harness should detect this scheduled paint and > wait for it. Thanks for looking into this Matt!

Neerja Pancholi[:neerja]

Assignee

Comment 54

•

8 years ago

List of all bugs that had to be modified in some way to prevent paint delay from causing a test-failure -> (Note that some of these bugs will remain open since we've simply papered over the failure with a "fuzzy" annotation -- that's why they're not marked as blocking in this bug) Bug 1293550, Bug 1260629, Bug 1313772, Bug 1313773, Bug 1294278, Bug 1295466, Bug 1178202, Bug 1315834, Bug 1295466, Bug 1313253, Bug 1312951, Bug 1315846 Clean try run -> https://treeherder.mozilla.org/#/jobs?repo=try&revision=0268d8e0c8def2152d1b9adb390c7687e8b4608c

Neerja Pancholi[:neerja]

Assignee

Updated

•

8 years ago

Flags: needinfo?(bugs)

Daniel Holbert [:dholbert]

Comment 55

•

8 years ago

Just spoke with jet -- I'm going to land this on inbound tomorrow, after the test-tweaks (on autoland) have been merged around and have made it to inbound. --> transferring jet's needinfo to myself (I think bug 1316430 is the last test-tweak that needed landing, so once that's on inbound, we should be OK to land.)

Flags: needinfo?(bugs) → needinfo?(dholbert)

Pulsebot

Comment 56

•

8 years ago

Pushed by dholbert@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/ff60dd49591d Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. r=tn

Daniel Holbert [:dholbert]

Updated

•

8 years ago

Flags: needinfo?(dholbert)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 57

•

8 years ago

I had to back this out for frequent intermittent reftest failures like https://treeherder.mozilla.org/logviewer.html#?job_id=39017948&repo=mozilla-inbound https://hg.mozilla.org/integration/mozilla-inbound/rev/6deb8408930c

Flags: needinfo?(dholbert)

Daniel Holbert [:dholbert]

Updated

•

8 years ago

Depends on: 1316762

Daniel Holbert [:dholbert]

Comment 58

•

8 years ago

Thanks -- I've just landed a fix for that particular issue on bug 1316762. Did you see any others that seem to have been caused by this push? If so, please mention them here, so we can fix them before the next time we re-land. :) (Per comment 54, we thought we'd caught all the newly-fuzzy tests, but apparently there were a few more hiding.) I'll hopefully re-land this later tonight or tomorrow morning.

Flags: needinfo?(dholbert) → needinfo?(wkocher)

Daniel Holbert [:dholbert]

Updated

•

8 years ago

Flags: needinfo?(dholbert)

Kelsey Gilbert [:jgilbert]

Updated

•

8 years ago

Flags: needinfo?(jgilbert)

Pulsebot

Comment 59

•

8 years ago

Pushed by dholbert@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/b2b359340a84 Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. r=tn

Daniel Holbert [:dholbert]

Comment 60

•

8 years ago

(Hoping it sticks this time around. If there are any new intermittents that look suspiciously tied to this, please ping me before backing this out again -- we can hopefully just land any remaining targeted tweaks as followups, as-needed.)

Flags: needinfo?(wkocher)

Flags: needinfo?(dholbert)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 61

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/b2b359340a84

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox52: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla52

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

8 years ago

Depends on: 1317048

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Updated

•

8 years ago

No longer blocks: 180241

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Updated

•

8 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1271691

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Comment 64

•

8 years ago

@ Gregg Lind - Half a year passed (bug #180241 Comment 73) and still no results of Shield Study 1 & 2.

Daniel Holbert [:dholbert]

Comment 65

•

8 years ago

I think we need to back this out, until we better-understand the glterrain regression (and possibly modify that test and tsvgx so this change doesn't regress/noisify them) See bug 1317048 comment 8 and 9 for more. But for now, I'll be backing this out.

Daniel Holbert [:dholbert]

Comment 66

•

8 years ago

Attached patch backout patch (deleted) — Details — Splinter Review

Not sure if I need aurora approval to land the backout on Aurora, but here goes just in case. Approval Request Comment [Feature/regressing bug #]: This bug. (I'd like to back it out for now.) [User impact if declined]: Talos regressions; see previous comment. [Describe test coverage new/current, TreeHerder]: N/A [Risks and why]: Extremely low risk. Just reverting a tweak to a paint-delay timeout value that *just* landed last week. This backout-patch restores us to our previous state. [String/UUID change made/needed]: None.

Attachment #8811055 - Flags: approval-mozilla-aurora?

Daniel Holbert [:dholbert]

Updated

•

8 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Pulsebot

Comment 67

•

8 years ago

Backout by dholbert@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/1c43a08d519f Backed out changeset b2b359340a84 for causing talos regressions.

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 68

•

8 years ago

Aurora backout: https://hg.mozilla.org/releases/mozilla-aurora/rev/f99c3a999e81a017ca3f6026b1e75fe4ec31a50a

status-firefox52: fixed → affected

status-firefox53: --- → affected

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 69

•

8 years ago

Comment on attachment 8811055 [details] [diff] [review] backout patch Went ahead and pushed the backout to aurora.

Attachment #8811055 - Flags: approval-mozilla-aurora?

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

status-firefox48: affected → wontfix

status-firefox49: affected → wontfix

status-firefox50: affected → wontfix

status-firefox51: --- → wontfix

status-firefox52: affected → fix-optional

Target Milestone: mozilla52 → ---

Daniel Holbert [:dholbert]

Updated

•

8 years ago

Depends on: 1318530

David Baron :dbaron:

Updated

•

8 years ago

Depends on: 1067360

David Baron :dbaron:

Updated

•

8 years ago

Depends on: 1220414

David Baron :dbaron:

Updated

•

8 years ago

No longer depends on: 1220414

David Baron :dbaron:

Updated

•

8 years ago

No longer depends on: 1067360

Daniel Holbert [:dholbert]

Updated

•

8 years ago

Depends on: 1319458

Daniel Holbert [:dholbert]

Comment 70

•

8 years ago

I'm planning on re-landing this later this afternoon. (I'm waiting for bug 1319458 to get a few cycles of talos runs on inbound, so that it's easier to distinguish glterrain differences [if any] caaused by bug 1319458's patch vs. by this bug's patch.)

Flags: needinfo?(dholbert)

Pulsebot

Comment 71

•

8 years ago

Pushed by dholbert@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/a494dd0ae2b0 Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. r=tn

Daniel Holbert [:dholbert]

Updated

•

8 years ago

Flags: needinfo?(dholbert)

Daniel Holbert [:dholbert]

Comment 72

•

8 years ago

Note: This re-landing might initially be implicated as causing a tsvgx regression on inbound and/or central. Bug 1318530 (currently on autoland) should address that regression. After bug 1318530 is merged to a branch that has this paint-suppression reduction, the tsvgx regression should mostly go away, though there may be a small regression in the new "svg_static" suite (which is the tsvgx pieces that we spun off in bug 1318530). A small regression there is an expected & acceptable consequence of this bug, per comment 16.

Phil Ringnalda (:philor)

Comment 73

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a494dd0ae2b0

Status: REOPENED → RESOLVED

Closed: 8 years ago → 8 years ago

status-firefox53: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla53

Alison Shiue

Comment 74

•

8 years ago

Hi Daniel, Here is perf alert for this landed patch: https://treeherder.mozilla.org/perf.html#/alerts?id=4345 Do you think all of these are expected?

Daniel Holbert [:dholbert]

Comment 75

•

8 years ago

Thanks, Alison! I think these are all OK. Specifically: * The "damp" regressions weren't reported the previous times this was landed, but they don't surprise me. (And they look like they might be sporadic / within the range of noise on the graph too). * The tart improvements make sense. * The "fileio" ones are *NOT* expected, and almost certainly unrelated to this change. (Looking at the graph for those, it looks like this "regression" is actually us returning to the same value we had a few days ago. Not sure if that's from a backout or from a bimodal test result or what, but I'm confident it's unrelated to this bug's one-liner non-fileio-related patch. * The tp5o tsvgx and regressions are expected.

Joel Maher ( :jmaher ) (UTC -8)

Comment 76

•

8 years ago

I actually think the fileio regressions are related, they were improved on backing this out, and if you look at the 30 day history, you can see when we first landed this, backed it out, and now recently landed this: https://treeherder.mozilla.org/perf.html#/graphs?timerange=2592000&series=%5Bmozilla-inbound,2f3af3833d55ff371ecf01c41aeee1939ef3a782,1,1%5D&series=%5Bautoland,2f3af3833d55ff371ecf01c41aeee1939ef3a782,1,1%5D these are all non-main startup fileio, I am not sure what this is- I would have to push to try with some hacks to talos to make it output data or an artifact which I could use to figure out which files are the problem. I did this about a month ago for another problem and it turned out we were loading different fonts. Let me know if you want me to do that and I could get some try pushes put together.

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

status-firefox52: fix-optional → wontfix

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Depends on: 1332558

Alice0775 White

Updated

•

7 years ago

Depends on: 1385479

MarjaE

Updated

•

3 years ago

Regressions: 1720582

MarjaE

Updated

•

3 years ago

Regressions: 1698396

Marco Castelluccio [:marco]

Updated

•

3 years ago

No longer regressions: 1698396

Marco Castelluccio [:marco]

Updated

•

3 years ago

No longer regressions: 1720582

Sets the default value for nglayout.initialpaint.delay to 5ms (was 250 ms) per user research conclusions 8 years ago Jet Villegas (inactive) (deleted), patch	mattwoodrow : review+	Details \| Diff \| Splinter Review
Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. r=tn 8 years ago Jet Villegas (inactive) (deleted), patch	tnikkel : review+	Details \| Diff \| Splinter Review
backout patch 8 years ago Daniel Holbert [:dholbert] (deleted), patch		Details \| Diff \| Splinter Review