Closed
Bug 1283302
Opened 8 years ago
Closed 8 years ago
Determine correct nglayout.initialpaint.delay for modern platforms
Categories
(Core :: Layout, defect)
Tracking
()
RESOLVED
FIXED
mozilla53
People
(Reporter: bugs, Assigned: neerja, NeedInfo)
References
(Depends on 2 open bugs, Blocks 1 open bug)
Details
Attachments
(2 files, 1 obsolete file)
(deleted),
patch
|
tnikkel
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review |
nglayout.initialpaint.delay is currently set to 250ms for all platforms. That preference is the amount of time we'll allow a page to finish its initial layout before painting to the screen. By setting it to a lower number, the user can potentially see content sooner, though that content may change again (causing FOUT or other visible layout shifts.)
I don't know how we got to 250ms as a default, or if that number even makes sense anymore. I also don't know if this should remain a constant or dynamically computed based on system capabilities.
Reporter | ||
Comment 1•8 years ago
|
||
Attachment #8767332 -
Flags: review?(matt.woodrow)
Comment 2•8 years ago
|
||
Since 5ms is substantially smaller than a refresh tick, we might want to follow up by getting rid of the entire delay mechanism.
Comment 3•8 years ago
|
||
(In reply to David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) from comment #2)
> Since 5ms is substantially smaller than a refresh tick, we might want to
> follow up by getting rid of the entire delay mechanism.
Yeah I agree. Once we've had the pref change on release for a bit without fallout, then we should rip the remaining code out.
Updated•8 years ago
|
Attachment #8767332 -
Flags: review?(matt.woodrow) → review+
Reporter | ||
Comment 4•8 years ago
|
||
Try results coming up:
remote: View your changes here:
remote: https://hg.mozilla.org/try/rev/9c41428cc88b
remote: https://hg.mozilla.org/try/rev/ff5c0c16a40e
remote:
remote: Follow the progress of your build on Treeherder:
remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ff5c0c16a40e
remote:
remote: It looks like this try push has talos jobs. Compare performance against a baseline revision:
remote: https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=ff5c0c16a40e
Pushed by jvillegas@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/24ec7aba60e7
Sets the default value for nglayout.initialpaint.delay to 5ms (was 250 ms) per user research conclusions. r=mattwoodrow
Reporter | ||
Updated•8 years ago
|
Assignee: nobody → bugs
Comment 6•8 years ago
|
||
Some hg blame history:
2001-04-25 - bug 77002 originally defined PAINTLOCK_EVENT_DELAY as 1200 ms.
2001-05-01 - bug 76495 changed it to 1000 ms.
2001-05-31 - bug 78695 changed it back to 1200 ms.
2002-12-26 - bug 180241 changed it to 250 ms "to match Phoenix and Chimera".
2016-07-06 - bug 1283302 (this bug) changed it to 5 ms.
Comment 7•8 years ago
|
||
sorry backout for test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=31325455&repo=mozilla-inbound
Flags: needinfo?(bugs)
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1052561d1faf
Backed out changeset 24ec7aba60e7 for testfailures in 518172-1b.html
Comment 9•8 years ago
|
||
this caused a 4% tp5o regression on linux64 and it reverted when this was backed out:
https://treeherder.mozilla.org/perf.html#/alerts?id=1723
please keep talos in mind when looking to reland this!
Reporter | ||
Comment 10•8 years ago
|
||
(In reply to Pulsebot from comment #8)
> Backout by cbook@mozilla.com:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/1052561d1faf
> Backed out changeset 24ec7aba60e7 for testfailures in 518172-1b.html
Thanks, filed bug 1285317 for that.
Flags: needinfo?(bugs)
Comment 11•8 years ago
|
||
in addition to tp5o regression there is a 13% glterrain regression on win7 with this patch.
Comment 12•8 years ago
|
||
(In reply to Joel Maher ( :jmaher - PTO - back on Friday July 15 ) from comment #9)
> this caused a 4% tp5o regression on linux64 and it reverted when this was
> backed out:
> https://treeherder.mozilla.org/perf.html#/alerts?id=1723
I'm somewhat suspicious of the glterrain regression (Windows) and the tp5o_scroll and tscrollx regressions (Linux64). The rest (except probably the RSS one) seem expected.
Comment 13•8 years ago
|
||
the backout shows a 10%+ win for glterrain for win7:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=cc866385dd01&newProject=mozilla-inbound&newRevision=1052561d1faf&framework=1
this follows the pattern on the graph of the regression aligning with the commit and the improvement coming with the backout. Possibly we can push to try with a --spsProfile flag to learn more of what is going on?
Comment 14•8 years ago
|
||
(In reply to David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) from comment #12)
> I'm somewhat suspicious of the glterrain regression (Windows) and the
> tp5o_scroll and tscrollx regressions (Linux64). The rest (except probably
> the RSS one) seem expected.
Pinging this bug. I am wondering how we can move forward with this bug. What will make you guys feel comfortable fixing this and set the value to 5ms? Reading dbaron's comment it seems that the regressions are mostly expected.
Comment 15•8 years ago
|
||
see my comment above. What do we need to move forward here?
Flags: needinfo?(bugs)
Reporter | ||
Comment 16•8 years ago
|
||
(In reply to Dominik Strohmeier from comment #15)
> see my comment above. What do we need to move forward here?
This change will come with an inevitable increase in measured page load time but may come with a decrease in *perceived* time to initial rendering. This was true the last time we changed this value, see:
https://bugzilla.mozilla.org/show_bug.cgi?id=180241#c17
I recommend we try it early in a cycle, see how it looks/feels, and gather feedback. We'll still want to figure out what's up with the GLTerrain test in comment 13 before we do that.
Flags: needinfo?(bugs)
Comment 17•8 years ago
|
||
> I recommend we try it early in a cycle, see how it looks/feels, and gather feedback.
Just to clarify the next steps: Given the SHIELD study confirmed improved perceived performance, what other feedback are we looking for that is blocking this?
Flags: needinfo?(bugs)
Reporter | ||
Comment 18•8 years ago
|
||
(In reply to Harald Kirschner :digitarald from comment #17)
> > I recommend we try it early in a cycle, see how it looks/feels, and gather feedback.
>
> Just to clarify the next steps: Given the SHIELD study confirmed improved
> perceived performance, what other feedback are we looking for that is
> blocking this?
We're looking for confirmation that the SHIELD study sample set actually represents the larger population with more diverse configurations. We also need Platform Operations team to approve the measured Load time regression.
Flags: needinfo?(bugs)
Comment 19•8 years ago
|
||
ni? jgaunt. Could we check Jet's concern on the study's sample being representative?
Flags: needinfo?(jgaunt)
Reporter | ||
Comment 20•8 years ago
|
||
(In reply to Joel Maher ( :jmaher) from comment #13)
> the backout shows a 10%+ win for glterrain for win7:
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-
> inbound&originalRevision=cc866385dd01&newProject=mozilla-
> inbound&newRevision=1052561d1faf&framework=1
I believe you :) ...but I can't reproduce the regression on Win7 with our without this change.
> this follows the pattern on the graph of the regression aligning with the
> commit and the improvement coming with the backout. Possibly we can push to
> try with a --spsProfile flag to learn more of what is going on?
+cc: jgilbert Jeff: this is the bug I mentioned last time we met. You're probably in a better spot to run this Try build and comment. Thx!
Flags: needinfo?(jgilbert)
Comment 21•8 years ago
|
||
Jason, can we somehow use Project Presto to see if this regresses Speed Index?
Flags: needinfo?(jduell.mcbugs)
Comment 22•8 years ago
|
||
Sadly Presto has been using
https://www.webpagetest.org/
for our tests, and it uses standard builds from mozilla.org, so there's no easy way to get it to use a custom build.
I do think that the webpagetest SpeedIndex is a good metric for page load responsivity.
I'm asking Valentin (Presto coder) and Nick Hurley (who had a hand-rolled webpagetest setup at one point: maybe he has a recipe for standing one up) if they have ideas here.
Flags: needinfo?(jduell.mcbugs) → needinfo?(valentin.gosu)
Updated•8 years ago
|
Flags: needinfo?(hurley)
Comment 23•8 years ago
|
||
Valentin--how often does webpagetest.org update its nightly build? Could we throw the patch from this bug onto nightly for a few days till it shows up there, and then use your scripts to compare with nightlies from a few days before (w/o patch)?
Comment 24•8 years ago
|
||
Note: Valentin is on PTO so it may be a week or so before he responds.
Comment 25•8 years ago
|
||
(In reply to Jason Duell [:jduell] (needinfo me) from comment #22)
> I'm asking Valentin (Presto coder) and Nick Hurley (who had a hand-rolled
> webpagetest setup at one point: maybe he has a recipe for standing one up)
> if they have ideas here.
I didn't hand-roll a wpt setup, I hand-rolled a not-quite-wpt-like-thing-that-doesn't-measure-speedindex. Bob Clary was the one that set up wpt for us, redirecting to him.
Comment 26•8 years ago
|
||
> it uses standard builds from mozilla.org, so there's no easy way to get it to use a custom build.
We can also reach out to speedcurve to set up one of their instances with a custom build and run benchmarks.
For reference, the BYO-Server solution is described in more detail: https://www.instartlogic.com/blog/configuring-private-instance-webpagetest-browser-extensions
Comment 27•8 years ago
|
||
wpt is using the directory:
http://archive.mozilla.org/pub/firefox/nightly/latest-mozilla-central/ to get builds.
which does appear to be updated regularly. I had thought taskcluster killed updating the latest dirs, but perhaps they changed their minds.
https://github.com/WPO-Foundation/webpagetest/blob/master/www/installers/browsers/updateFirefoxNightly.php#L8 pulls the 32bit installer and places it on
https://www.webpagetest.org/installers/browsers/
https://www.webpagetest.org/installers/browsers/nightly.dat
http://www.webpagetest.org/installers/browsers/nightly-51-1471627440.exe is current.
Doing an experiment on nightly should work.
Flags: needinfo?(bob)
Comment 28•8 years ago
|
||
> Doing an experiment on nightly should work.
Testing this on Nightly as a second way of validating the patch in addition to the results from user research sounds like a great path forward.
I suggest that we'll use the Alexa TOP100-curated list for testing to cover a wide range of popular websites [1]. I think that it will also be important to webpagetest also for slower connections.
I figured out that webpagetest also allows for using credentials to login to pages like FB or Pinterest through scripting for more valid results [2][3] beyond landing pages.
[1] https://github.com/WPO-Foundation/webpagetest/blob/master/bulktest/alexa-curated.txt
[2] https://sites.google.com/a/webpagetest.org/docs/using-webpagetest/scripting
[3] https://www.webpagetest.org/forums/archive/index.php?thread-11504.html
(answering for jgaunt)
Hello! I designed and implemented the Shield study.
To answer representativeness:
0. All the concerns are valid.
1. It was a self selected study of EN-* users from Release. These users self-selected to participate, not knowing what performance enhancements they would receive. It is entirely possible that these users had existing bad performance and that lured them into the study.
2. At the time we had no good access to merge these packets with Telemetry.
3. I would be delighted to have some code to compare a bunch of profiles to decide what is representative enough.
4. A decision here is whether this sample is more representative than a NIGHTLY sample.
I personally doubt that users in Nightly are more representative of the full range of user configs. I especially doubt that their modal / median / centroid is the same. But I have no good evidence to back that up.
The evidence we do have:
1. Surveys in the population reported better perceived performance than the other settings.
2. That effect followed a dose-response curve, and got worse when the setting was worse
3. Setting the timeout to worse than current (1000ms) was the worst group.
4. Participating user uninstalled the addon in that same pattern.
I want to put a hard question forward: Suppose Nightly shows a trend in the other direction? Or no-effect? Is nightly the product we are actually building? Or release?
If we want to move to a standard of evidence that is conclusive and acceptable, I would gladly hear thoughts on what that is. Until then, we have the evidence we have.
Flags: needinfo?(jgaunt)
Comment 30•8 years ago
|
||
Hey Gregg,
thanks for these details. I briefly want to chime in here to clarify a misunderstanding that I read in your comment. The discussion to do tests in Nightly are targeting more the engineering side of things. We do see some regressions with nglayout.initialpaint.delay set to 5ms and there's been the question about validity of this regression in the wild.
So the idea would be to run webpagetest.org against a Nightly version that has the pref set to 5ms and compare to results for the old value of 250ms. That has nothing to do with the results from the SHIELD research.
Second, I agree with what you described regarding the validity and reliability of the SHIELD results. Thanks for clarifying. Makes sense?
Flags: needinfo?(glind)
Comment 31•8 years ago
|
||
(In reply to David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) from comment #2)
> Since 5ms is substantially smaller than a refresh tick, we might want to
> follow up by getting rid of the entire delay mechanism.
Please do not. There are negative side effects to too low a setting. I personally keep the delay at 56ms after a bunch of testing. This is usually enough to prevent incorrect rendering of pages. Imagest usually take around 50ms to start loading, and having the whole page move is annoying.
Similarly, I would definitely suggest testing among people who are not looking to prioritize speed. They may also like the smoother operation better.
Comment 32•8 years ago
|
||
> We're looking for confirmation that the SHIELD study sample set actually represents the larger population with more diverse configurations.
We got that from Gregg on a higher level. The assumption is that SHIELD does random sampling on release and therefor is diverse enough.
> We also need Platform Operations team to approve the measured Load time regression.
Looks like we had some general agreement in the Friday meeting that the regression isn't a major concern.
Flags: needinfo?(bugs)
Comment 33•8 years ago
|
||
A delay of 50ms works for me, 5ms makes the browser nearly unusable.
Comment 34•8 years ago
|
||
(In reply to Spencer Selander [greenknight] from comment #33)
> A delay of 50ms works for me, 5ms makes the browser nearly unusable.
I have had the pref at 5ms for nearly 6 weeks now and haven't had any issues. I am curious what "nearly unusable" means for you.
Flags: needinfo?(spencerselander)
Comment 35•8 years ago
|
||
(In reply to Dominik Strohmeier from comment #34)
> (In reply to Spencer Selander [greenknight] from comment #33)
> > A delay of 50ms works for me, 5ms makes the browser nearly unusable.
>
> I have had the pref at 5ms for nearly 6 weeks now and haven't had any
> issues. I am curious what "nearly unusable" means for you.
It had been a while since I tried it, so I just gave it another shot - no problem now with 5ms. None of the reflows and glitches I got before, even image-heavy pages render very smoothly. I'm leaving it set that way for now.
Flags: needinfo?(spencerselander)
Reporter | ||
Comment 36•8 years ago
|
||
Modify the patch to apply only to desktop platforms. Android will remain at 250 ms.
Attachment #8767332 -
Attachment is obsolete: true
Flags: needinfo?(bugs)
Attachment #8784602 -
Flags: review?(tnikkel)
Updated•8 years ago
|
Attachment #8784602 -
Flags: review?(tnikkel) → review+
Reporter | ||
Comment 37•8 years ago
|
||
After conferring with the Platform and Ops teams, we've decided to enable this on mozilla-central for all desktop platforms. For more detailed information on the User Research backing this change, please read this doc:
https://docs.google.com/document/d/1BvCoZzk2_rNZx3u9ESPoFjSADRI0zIPeJRXFLwWXx_4/edit#heading=h.28ki6m8dg30z
Reporter | ||
Updated•8 years ago
|
Attachment #8784602 -
Attachment description: Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. → Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. r=tn
Comment 38•8 years ago
|
||
Pushed by jvillegas@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e68b376d1191
Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. r=tn
Comment 39•8 years ago
|
||
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/ccf3d0b90b0422b169c60d429029cdfdc0e5a5a0 for making a bunch of existing intermittent failures nearly permaorange.
Going to be hard to point out without just a ton of copy-paste, but see in particular the Linux64 reftest-4 and reftest-7 chunks in https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&tochange=ccf3d0b90b0422b169c60d429029cdfdc0e5a5a0&fromchange=e59fbcb1da07bb3e330c8c35759e58404981ffa6&filter-searchStr=reftest&group_state=expanded
Comment 40•8 years ago
|
||
And some of the same reftests on Win7 (it'll be hours yet before Win8 or WinXP actually run).
Comment 41•8 years ago
|
||
Hi Jet,
what'S the status here? Any chance that this will still ship in August?
Thanks,
Dominik
Flags: needinfo?(bugs)
Reporter | ||
Comment 42•8 years ago
|
||
(In reply to Dominik Strohmeier from comment #41)
> Hi Jet,
> what'S the status here? Any chance that this will still ship in August?
>
> Thanks,
> Dominik
mstange & I are looking through the failing tests. Note that we didn't plan to uplift this change (ie. it wasn't ever shipping in August.)
Flags: needinfo?(bugs)
Comment 43•8 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #27)
> Doing an experiment on nightly should work.
I'll do a run in the next couple of days so we get a baseline, then we can compare with the build after we land on nightly.
Flags: needinfo?(valentin.gosu)
Comment 44•8 years ago
|
||
Jet, what's the status with this bug?
Bug 1297996 landed and got uplifted to Beta and we're planning to run some tests on page2page transitions. In our pretests, we've seen that there are some more or less announced effects for the page2page transitions for different initial delay settings.
It would be great if this would land and ride the trains. Thanks.
Flags: needinfo?(bugs)
Reporter | ||
Comment 45•8 years ago
|
||
(In reply to Dominik Strohmeier from comment #44)
> Jet, what's the status with this bug?
Markus is investigating the test failures this week.
> Bug 1297996 landed and got uplifted to Beta and we're planning to run some
> tests on page2page transitions. In our pretests, we've seen that there are
> some more or less announced effects for the page2page transitions for
> different initial delay settings.
I had a look at the patches in bug 1297996 and the checks for suppressed painting will certainly be affected by landing this one. This bug affects all platforms, whereas your experiments in bug 1297996 are non-e10s-only. I wouldn't assume your results will hold true for e10s.
> It would be great if this would land and ride the trains. Thanks.
Agreed :) though we can't ship with the tests in this current state. I would note that the failing tests are perma-failures on my Linux VM with or without this fix, so that should help debugging at least.
Flags: needinfo?(bugs)
Comment 46•8 years ago
|
||
In bug 1311448, we will run a Telemetry Experiment on different page-to-page transitions. In this context, we will also have a test cohort that has nglayout.initialpaint.delay set to 0 and one that combines transition over blank with a initialpaint.delay of 0 as paint delays and the experience of different page to page transitions is very much linked.
What's the state of landing this? Any progress on Markus's investigations?
Does it make sense to wait for some more Telemetry data from the Telemetry Experiment and then target to get rid of the delay mechanisms maybe completely?
Flags: needinfo?(bugs)
Reporter | ||
Comment 47•8 years ago
|
||
(In reply to Dominik Strohmeier from comment #46)
> What's the state of landing this? Any progress on Markus's investigations?
Markus diagnosed the test failures. Most will add additional fuzzing to the test results to account for paint invalidation that occurs prior to the reftest screen captures.
Neerja will land the test fixes.
> Does it make sense to wait for some more Telemetry data from the Telemetry Experiment and then target to get rid of the delay mechanisms maybe completely?
I think we'll see the same test failures at 0 delay. It would be good to see what the Telemetry Experiment results say about stability, since we do intend to remove the delay mechanism entirely (see comment 2.)
Flags: needinfo?(bugs) → needinfo?(npancholi)
Comment 48•8 years ago
|
||
(In reply to Jet Villegas (:jet) from comment #47)
> I think we'll see the same test failures at 0 delay. It would be good to see
> what the Telemetry Experiment results say about stability, since we do
> intend to remove the delay mechanism entirely (see comment 2.)
Comment 2 has one person suggest that. I have specifically asked that the preference not be removed, and provide my reasoning--that I personally prefer the smoother experience of no flashing on simple sites like xkcd.com, where the image is the primary content and thus there is no need to display anything until the image is ready.
What reason is there to remove the mechanism entirely instead of leaving it there for those of us who might want to change it? It does not seem that it would be possible for an addon to overwrite this change (as there is no delayPaint property in JavaScript or CSS.)
I would understand it if you were overhauling the entire painting system, and thus you'd need to reimplement the delay mechanism. Then, if few users are using it, it makes sense to remove it. But, otherwise, why not leave it?
Assignee | ||
Updated•8 years ago
|
Assignee: bugs → npancholi
Flags: needinfo?(npancholi)
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Comment 49•8 years ago
|
||
Added fuzzy annotations to intermittent failures corresponding to bugs 1293550, 1260629, 1313772, 1313773, 1294278, 1295466 and review request sent to mstange for each of them.
(these bugs were also added to 'Depends on' list)
Started new try run with patch for this bug + patches for all bugs listed above:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7d70c7ae79678fb56c1ea09e9d640e719f5c8c22
Assignee | ||
Comment 50•8 years ago
|
||
Failure for test invalidate-1.html on windows needed to be looked into more closely:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=20b632bf5f252b8e21789a5e74e4f2c51a07273c&selectedJob=28813830&exclusion_profile=false
I triggered a Try run with "reftest-wait" and a small setTimeout added to this test:
You can see that this does fix the issue. (Test doesn’t fail anymore in the try run below)
That means we do *eventually* paint the correct thing, but the default reftest snapshot is just happening too quickly.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1aa1e5359ecb84245c545bc9758a58474771fa3c
So the explanation is probably that we've got a bug in one of the following two places:
1. Since delaying the snapshot taken by the test harness by 100ms does make the test pass, there might be a bug in the paint stabilization in the test harness where it doesn’t correctly wait for the second div in the testcase to be painted.
2. Or the paint stabilization is correct but the -moz-element invalidation is too delayed for it to catch for some reason Eg: -moz-element references one frame too late i.e. we paint the first div, then we discover that we need to invalidate the other div, and we trigger another paint which then paints the other div so maybe we must modify the code for -moz-element so that both the invalidation happens in the first paint itself?
mattwoodrow, do you have any thoughts on what the explanation might be? (since you've worked on -moz-element as well as paint stabilization)
(Thanks very much mstange and dholbert for your help with this!)
Flags: needinfo?(matt.woodrow)
Comment 51•8 years ago
|
||
(Slight tangent: IMO, we should feel free to paper over the invalidate-1.html failure using a small setTimeout (since that does fix it), in the short-term interests of getting this bug landed. And then we can investigate comment 50 more closely in bug 1178202 or a new bug. A small setTimeout would be basically equivalent to the paint delay that we're already giving this test in current trunk, so in practice it wouldn't be reducing the sensitivity of that test.)
Comment 52•8 years ago
|
||
I did some digging, and I strongly suspect that this is bug 1209751 coming back to bite us.
My guess is that when we go to take the reftest snapshot of the page, the background image hasn't loaded yet.
Reftests snapshots force synchronous image decoding (by not passing DRAWWINDOW_ASYNC_DECODE_IMAGES to drawWindow), so we block (during painting) when we try to draw the first div until the image has loaded and then draw it correctly.
We then try painting the div with -moz-element (which didn't have the instrinsic image size available during layout, as the image was only ready during painting) and then that fails.
Seth's comment suggests that this should only be a problem if the element's size depends on the image's intrinsic size, which shouldn't be the case here (we specify a size ourselves). There could easily be something else depending on the instrinsic size though which causes painting to fail.
It looks like this is reproducible, so shouldn't be too hard to confirm.
I'm not sure what the best fix is here. I think we need to detect that the image size has changed and schedule another paint so it can be draw with the correct layout. The reftest harness should detect this scheduled paint and wait for it.
Flags: needinfo?(matt.woodrow)
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Comment 53•8 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #52)
> I did some digging, and I strongly suspect that this is bug 1209751 coming
> back to bite us.
>
> My guess is that when we go to take the reftest snapshot of the page, the
> background image hasn't loaded yet.
>
> Reftests snapshots force synchronous image decoding (by not passing
> DRAWWINDOW_ASYNC_DECODE_IMAGES to drawWindow), so we block (during painting)
> when we try to draw the first div until the image has loaded and then draw
> it correctly.
>
> We then try painting the div with -moz-element (which didn't have the
> instrinsic image size available during layout, as the image was only ready
> during painting) and then that fails.
>
> Seth's comment suggests that this should only be a problem if the element's
> size depends on the image's intrinsic size, which shouldn't be the case here
> (we specify a size ourselves). There could easily be something else
> depending on the instrinsic size though which causes painting to fail.
>
> It looks like this is reproducible, so shouldn't be too hard to confirm.
>
> I'm not sure what the best fix is here. I think we need to detect that the
> image size has changed and schedule another paint so it can be draw with the
> correct layout. The reftest harness should detect this scheduled paint and
> wait for it.
Thanks for looking into this Matt!
Assignee | ||
Comment 54•8 years ago
|
||
List of all bugs that had to be modified in some way to prevent paint delay from causing a test-failure ->
(Note that some of these bugs will remain open since we've simply papered over the failure with a "fuzzy" annotation -- that's why they're not marked as blocking in this bug)
Bug 1293550, Bug 1260629, Bug 1313772, Bug 1313773, Bug 1294278, Bug 1295466, Bug 1178202, Bug 1315834, Bug 1295466, Bug 1313253, Bug 1312951, Bug 1315846
Clean try run ->
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0268d8e0c8def2152d1b9adb390c7687e8b4608c
Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(bugs)
Comment 55•8 years ago
|
||
Just spoke with jet -- I'm going to land this on inbound tomorrow, after the test-tweaks (on autoland) have been merged around and have made it to inbound. --> transferring jet's needinfo to myself
(I think bug 1316430 is the last test-tweak that needed landing, so once that's on inbound, we should be OK to land.)
Flags: needinfo?(bugs) → needinfo?(dholbert)
Comment 56•8 years ago
|
||
Pushed by dholbert@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ff60dd49591d
Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. r=tn
Updated•8 years ago
|
Flags: needinfo?(dholbert)
I had to back this out for frequent intermittent reftest failures like https://treeherder.mozilla.org/logviewer.html#?job_id=39017948&repo=mozilla-inbound
https://hg.mozilla.org/integration/mozilla-inbound/rev/6deb8408930c
Flags: needinfo?(dholbert)
Comment 58•8 years ago
|
||
Thanks -- I've just landed a fix for that particular issue on bug 1316762.
Did you see any others that seem to have been caused by this push? If so, please mention them here, so we can fix them before the next time we re-land. :) (Per comment 54, we thought we'd caught all the newly-fuzzy tests, but apparently there were a few more hiding.)
I'll hopefully re-land this later tonight or tomorrow morning.
Flags: needinfo?(dholbert) → needinfo?(wkocher)
Updated•8 years ago
|
Flags: needinfo?(dholbert)
Updated•8 years ago
|
Flags: needinfo?(jgilbert)
Comment 59•8 years ago
|
||
Pushed by dholbert@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b2b359340a84
Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. r=tn
Comment 60•8 years ago
|
||
(Hoping it sticks this time around. If there are any new intermittents that look suspiciously tied to this, please ping me before backing this out again -- we can hopefully just land any remaining targeted tweaks as followups, as-needed.)
Flags: needinfo?(wkocher)
Flags: needinfo?(dholbert)
Comment 61•8 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox52:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Updated•8 years ago
|
Comment 64•8 years ago
|
||
@ Gregg Lind - Half a year passed (bug #180241 Comment 73) and still no results of Shield Study 1 & 2.
Comment 65•8 years ago
|
||
I think we need to back this out, until we better-understand the glterrain regression (and possibly modify that test and tsvgx so this change doesn't regress/noisify them)
See bug 1317048 comment 8 and 9 for more. But for now, I'll be backing this out.
Comment 66•8 years ago
|
||
Not sure if I need aurora approval to land the backout on Aurora, but here goes just in case.
Approval Request Comment
[Feature/regressing bug #]: This bug. (I'd like to back it out for now.)
[User impact if declined]: Talos regressions; see previous comment.
[Describe test coverage new/current, TreeHerder]: N/A
[Risks and why]: Extremely low risk. Just reverting a tweak to a paint-delay timeout value that *just* landed last week. This backout-patch restores us to our previous state.
[String/UUID change made/needed]: None.
Attachment #8811055 -
Flags: approval-mozilla-aurora?
Updated•8 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 67•8 years ago
|
||
Backout by dholbert@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1c43a08d519f
Backed out changeset b2b359340a84 for causing talos regressions.
Aurora backout: https://hg.mozilla.org/releases/mozilla-aurora/rev/f99c3a999e81a017ca3f6026b1e75fe4ec31a50a
status-firefox53:
--- → affected
Comment on attachment 8811055 [details] [diff] [review]
backout patch
Went ahead and pushed the backout to aurora.
Attachment #8811055 -
Flags: approval-mozilla-aurora?
Updated•8 years ago
|
status-firefox51:
--- → wontfix
Target Milestone: mozilla52 → ---
Comment 70•8 years ago
|
||
I'm planning on re-landing this later this afternoon.
(I'm waiting for bug 1319458 to get a few cycles of talos runs on inbound, so that it's easier to distinguish glterrain differences [if any] caaused by bug 1319458's patch vs. by this bug's patch.)
Flags: needinfo?(dholbert)
Comment 71•8 years ago
|
||
Pushed by dholbert@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a494dd0ae2b0
Sets the default value for nglayout.initialpaint.delay to 5ms on desktop (stays 250 ms on Android) per user research conclusions. r=tn
Updated•8 years ago
|
Flags: needinfo?(dholbert)
Comment 72•8 years ago
|
||
Note: This re-landing might initially be implicated as causing a tsvgx regression on inbound and/or central. Bug 1318530 (currently on autoland) should address that regression.
After bug 1318530 is merged to a branch that has this paint-suppression reduction, the tsvgx regression should mostly go away, though there may be a small regression in the new "svg_static" suite (which is the tsvgx pieces that we spun off in bug 1318530). A small regression there is an expected & acceptable consequence of this bug, per comment 16.
Comment 73•8 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 8 years ago → 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Comment 74•8 years ago
|
||
Hi Daniel,
Here is perf alert for this landed patch: https://treeherder.mozilla.org/perf.html#/alerts?id=4345
Do you think all of these are expected?
Comment 75•8 years ago
|
||
Thanks, Alison! I think these are all OK. Specifically:
* The "damp" regressions weren't reported the previous times this was landed, but they don't surprise me. (And they look like they might be sporadic / within the range of noise on the graph too).
* The tart improvements make sense.
* The "fileio" ones are *NOT* expected, and almost certainly unrelated to this change. (Looking at the graph for those, it looks like this "regression" is actually us returning to the same value we had a few days ago. Not sure if that's from a backout or from a bimodal test result or what, but I'm confident it's unrelated to this bug's one-liner non-fileio-related patch.
* The tp5o tsvgx and regressions are expected.
Comment 76•8 years ago
|
||
I actually think the fileio regressions are related, they were improved on backing this out, and if you look at the 30 day history, you can see when we first landed this, backed it out, and now recently landed this:
https://treeherder.mozilla.org/perf.html#/graphs?timerange=2592000&series=%5Bmozilla-inbound,2f3af3833d55ff371ecf01c41aeee1939ef3a782,1,1%5D&series=%5Bautoland,2f3af3833d55ff371ecf01c41aeee1939ef3a782,1,1%5D
these are all non-main startup fileio, I am not sure what this is- I would have to push to try with some hacks to talos to make it output data or an artifact which I could use to figure out which files are the problem. I did this about a month ago for another problem and it turned out we were loading different fonts. Let me know if you want me to do that and I could get some try pushes put together.
Updated•8 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•