Closed Bug 1257584 Opened 9 years ago Closed 8 years ago

2 - 4% sessionrestore / sessionrestore_no_auto_restore / tp5o_scroll / tpaint / ts_paint windows* regression on push 1e05c8d75716 (Thu Mar 10 2016)

Categories

(Core :: JavaScript: GC, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox48 --- affected

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

Talos has detected a Firefox performance regression from push 1e05c8d75716. As author of one of the patches included in that push, we need your help to address this regression. This is a list of all known regressions and improvements related to the push: https://treeherder.mozilla.org/perf.html#/alerts?id=406 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#sessionrestore.2Fsessionrestore_no_auto_restore https://wiki.mozilla.org/Buildbot/Talos/Tests#tpaint https://wiki.mozilla.org/Buildbot/Talos/Tests#ts_paint Reproducing and debugging the regression: If you would like to re-run this Talos test on a potential fix, use try with the following syntax: try: -b o -p win64,win32 -u none -t other-e10s,svgr-e10s,g1-e10s,other,g2,svgr --rebuild 5 # add "mozharness: --spsProfile" to generate profile data (we suggest --rebuild 5 to be more confident in the results) To run the test locally and do a more in-depth investigation, first set up a local Talos environment: https://wiki.mozilla.lorg/Buildbot/Talos/Running#Running_locally_-_Source_Code Then run the following command from the directory where you set up Talos: talos --develop -e [path]/firefox -a sessionrestore_no_auto_restore:tpaint:sessionrestore:ts_paint (add --e10s to run tests in e10s mode) Making a decision: As the patch author we need your feedback to help us handle this regression. *** Please let us know your plans by Monday, or the offending patch(es) will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
here is a compare view: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=c138edad150f&newProject=mozilla-inbound&newRevision=676b6db552c8&framework=1 there are some false positives there, but what regressed is: win7 - ts_paint, opt + opt e10s winxp - sessionrestore_no_auto_restore opt e10s winxp - sessionrestore opt + opt e10s :terrence, can you take a look at this?
Flags: needinfo?(terrence)
The only new code added here is an exact duplicate of the code here: http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/md/windows/w95cv.c#314 According to the comment there, every PRLock in the system will leak if we don't call PR_Init before calling PR_NewLock; windows caches library loads at several levels, so I initially thought that this must be a measurement error. Looking closer, however, we never actually call PR_Init or PR_Initialize, so I think we're currently just leaking a ton of memory on Windows. Whoops! These sort of hidden gotchas are exactly the reason why I'm trying to move us away from NSPR. Given that the landed patch has already saved us hundreds of lines of code in subsequent landings, and given that long-running memory leaks are a huge, ongoing problem in windows, I think we should just accept a small regression here.
Flags: needinfo?(terrence)
(In reply to Terrence Cole [:terrence] from comment #3) > we never actually call PR_Init or PR_Initialize, so I think we're > currently just leaking a ton of memory on Windows. PR_NewLock will initialize NSPR if needed: if (!_pr_initialized) _PR_ImplicitInitialization(); (I've been in NSPR land recently for bug 1257522...)
is there something we can try out here then? If there is nothing, I think closing this as wontfix is reasonable.
(In reply to Jan de Mooij [:jandem] from comment #4) > (In reply to Terrence Cole [:terrence] from comment #3) > > we never actually call PR_Init or PR_Initialize, so I think we're > > currently just leaking a ton of memory on Windows. > > PR_NewLock will initialize NSPR if needed: > > if (!_pr_initialized) _PR_ImplicitInitialization(); > > (I've been in NSPR land recently for bug 1257522...) In that case, what is the regression from? I cannot image that we're not creating any locks, given that we have to run JS to restore the session. Maybe we changed the DLL load order and are getting hit by disk caching that has learned a different order? (In reply to Joel Maher (:jmaher) from comment #5) > is there something we can try out here then? If there is nothing, I think > closing this as wontfix is reasonable. Given the size of the slowdown, I think this is almost certainly an artifact of windows disk caching. I'll give some thought to how we might be able to test this.
this is now on mozilla-beta for sessionrestore (winxp, win8): https://treeherder.mozilla.org/perf.html#/alerts?id=1012&filter=session
This bug is simply moving overhead to a different location. Considering that these tests do not interlock at any fixed point, I think that this is not actually a regression.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.