Closed
Bug 1245092
Opened 9 years ago
Closed 9 years ago
Install reftest and specialpowers extensions at runtime via AddonManager.loadTemporaryAddon()
Categories
(Testing :: Reftest, defect)
Testing
Reftest
Tracking
(firefox46 fixed, firefox47 fixed, firefox48 fixed)
RESOLVED
FIXED
mozilla48
People
(Reporter: ahal, Assigned: ahal)
References
Details
Attachments
(2 files)
(deleted),
text/x-review-board-request
|
jgriffin
:
review+
|
Details |
(deleted),
patch
|
Details | Diff | Splinter Review |
We'll need to install these at runtime as temporary addons so they don't need to get signed. Specialpowers can't be signed, and reftest is modified enough that signing it would be a huge pain.
Comment 1•9 years ago
|
||
Hi Andrew, Is this blocking for Fx46 going live with signing - or can land after?
Flags: needinfo?(ahalberstadt)
Assignee | ||
Comment 2•9 years ago
|
||
It's blocking. Though I'm almost done, I just haven't been keeping the bug up to date. Here's the latest try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=23c5c5d833af
Flags: needinfo?(ahalberstadt)
Assignee | ||
Comment 3•9 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/35635/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/35635/
Assignee | ||
Comment 4•9 years ago
|
||
This patch also pretty much totally busts b2g reftests, which apparently I don't care about. If no one is going to commit to fixing it, we should turn them off. I'll file a follow-up after this lands.
Assignee | ||
Comment 5•9 years ago
|
||
Good news, I fixed some test failures that this patch caused.
Bad news, there are some memory leaks that I didn't notice until now (they were masked by the test failures). There is an intermittent leak on linux debug e10s crashtest, and a permanent leak on win 7 e10s crashtest. This has could potentially take me awhile to figure out.
Assignee | ||
Comment 6•9 years ago
|
||
Jgriffin suggested I try pushing only a specific directory of tests to try, to see if the leaks have something to do with a particular test. He may have been on to something, as running only dom crashtests came back green. Here are some more try runs that run about 5 root directories each:
https://treeherder.mozilla.org/#/jobs?repo=try&author=ahalberstadt@mozilla.com&fromchange=dc63b6583aec&tochange=012a5eddeaaa
Unforunately, try is taking a long time to start them.
Assignee | ||
Comment 7•9 years ago
|
||
Here's the latest bisection range:
https://treeherder.mozilla.org/#/jobs?repo=try&author=ahalberstadt@mozilla.com&fromchange=dc63b6583aec&tochange=52ab2496e8e2
It looks like tests in /widget could be causing this, but looking at the first 3 pushes, there are either more problematic directories, or this is a function of the number of tests. My money would be the latter. I'll do some more pushes in the meantime though.
Assignee | ||
Comment 8•9 years ago
|
||
Hey Andrew, I heard you may know a thing or two about memory leaks and e10s. Basically my patch here is hitting some leaks on shutdown, and I'm not really sure why. The important bits are probably in bootstrap.js. See the above try runs.
Do you see anything I'm obviously doing wrong? Or do you have tips on how I can debug this? I'm quite far outside my area of expertise here. Thanks!
Flags: needinfo?(continuation)
Assignee | ||
Comment 9•9 years ago
|
||
Hm, in this try run the windows debug e10s crashtests are actually green:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fdcc93b965a9
I wonder if this got fixed by bug 1251692. Pushing a fresh try run.
Comment 10•9 years ago
|
||
That new try run is extremely orange.
Generally these sorts of leaks are caused by failing to remove event listeners or observers or some other thing where you register yourself with a central service. It looks like you are leaking whole pages. Another kind of chrome JS leak I've seen before is when a timer fires during shutdown that makes you do something, after the cleanup for that something has already happened, so you end up leaking.
Let me know if the leak persists and you have a test case you can run and I can try to help you figure out what is causing it.
Flags: needinfo?(continuation)
Assignee | ||
Comment 11•9 years ago
|
||
Thanks. You can ignore the orangeness of that try run. It was meant to help us figure out what still needs to be fixed when addon signing is enforced. Though, it seems like it was just luck that nothing leaked judging by this even more recent one:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=bdb168296e90
I wonder if this has something to do with the fact that reftest.js is no longer a command line component. Maybe this wasn't introduced by my patch, but latent in reftest.js and not exposed until now.
Comment 12•9 years ago
|
||
mccr8, if you have any time to help here this week, it would really be appreciated. If this isn't addressed by then, it may block the addon signing project.
Flags: needinfo?(continuation)
Comment 13•9 years ago
|
||
How do I reproduce this issue locally?
Assignee | ||
Comment 14•9 years ago
|
||
I haven't been able to reproduce locally, though I only tried running the full suite a couple times (which is partly why I'm banging my head on the wall). This seems to happen only on linux or windows e10s. Here's the theoretical STR:
1. Apply the patch in this bug
2. Do a debug build
3. ./mach crashtest --e10s
It's even intermittent on try, so it's some kind of timing thing. My machine might be too fast to reproduce it.
Assignee | ||
Comment 15•9 years ago
|
||
For the record, those "command timed out after 1800 second" errors in the previous try run are unrelated to this patch as they time out in mozharness before they even make it to the reftest harness. Though, the intermittent leak does show up, both in linux and windows still.
Comment 16•9 years ago
|
||
The approach here, if we could reproduce it locally, would be to get a shutdown CC and GC log, and then I could try to analyze that to figure out what is wrong. That is done by setting MOZ_CC_LOG_SHUTDOWN=1 and other variables described at the start of xpcom/base/nsCycleCollector.cpp. In order to get that to work on try, you have to set it up to upload an artifact containing the log files. I'm not sure how to do that.
Flags: needinfo?(continuation)
Assignee | ||
Comment 17•9 years ago
|
||
I can do that. All we have to do is copy the logs to MOZ_UPLOAD_DIR and mozharness will take care of the rest. Do you know where the log files get saved? I could hack up a patch and push it for you.
Assignee | ||
Comment 18•9 years ago
|
||
Ah, MOZ_CC_LOG_DIRECTORY. I'll do a quick push now.
Assignee | ||
Comment 19•9 years ago
|
||
I *think* this should do the trick:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=646cd901bc5d
Once the job is finished you should be able to click it and see a link to the logs in the "Job Details" pane at the bottom.
Comment 20•9 years ago
|
||
https://reviewboard.mozilla.org/r/35635/#review34799
::: layout/tools/reftest/bootstrap.js:14
(Diff revision 1)
> + win = win.QueryInterface(Ci.nsIInterfaceRequestor).getInterface(Ci.nsIDOMWindow);
Doesn't this create a global var called 'win' who's livetime is bound by the life of the component? Which means -- we'll leak everything in some cases?
Assignee | ||
Comment 21•9 years ago
|
||
https://reviewboard.mozilla.org/r/35635/#review34799
> Doesn't this create a global var called 'win' who's livetime is bound by the life of the component? Which means -- we'll leak everything in some cases?
Does it? It's a function argument. At least redeclaring it is an error.
Assignee | ||
Comment 22•9 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #19)
> I *think* this should do the trick:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=646cd901bc5d
Oops, think I forgot quotes around the "1", here's a new one:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d143f8520145
Assignee | ||
Comment 23•9 years ago
|
||
Hey Andrew, the previous try run has the CC and GC logs attached. Click one of the orange jobs, then click "Job Details" on the bottom pane and you'll see them. They're gzipped, so run them through gunzip if you download them locally.
Is this what you need? Is there any other information that would be helpful?
Flags: needinfo?(continuation)
Comment 25•9 years ago
|
||
In one log I looked at, we're leaking an inner and outer nsGlobalWindow. That window is being held alive by a nsXPCWrappedJS (nsIBrowserDOMWindow) with a missing reference:
1B275D88 [nsXPCWrappedJS (nsIBrowserDOMWindow)]
--[mJSObj]--> 17625CA0 [JS Object (Object)]
--[global]--> 1716A040 [JS Object (Window)]
--[UnwrapDOMObject(obj)]--> 11F5B800 [nsGlobalWindow #6 inner chrome://browser/content/browser.xul]
nsGlobalWindow has a field with that type, but it is reported to the CC. The only other field I see with that type is TabParent::mBrowserDOMWindow. TabParent isn't cycle collected, and we are leaking a TabParent.
In IRC, Olli said we might be able to clear this field in TabParent::Destroy(), but we're probably leaking a TabParent anyways. I don't know how helpful this is to fixing the leak.
I'll try looking at the patch some more.
Comment 26•9 years ago
|
||
I noticed that shutdown() in bootstrap.js isn't doing anything on non-Android platforms any more. Is the listener removal etc. being done somewhere else?
Assignee | ||
Comment 27•9 years ago
|
||
Prior to this patch non-Android didn't use bootstrap.js to begin with. The listener you're talking about is set up behind a similar if (OS == "android") condition. I *think* all the listeners I've added/modified should be cleaned up properly.
I'm wondering if there's something wrong in reftest.js or reftest-content.js that wasn't modified by this patch, but only starts failing because it is now being loaded differently.
Comment 28•9 years ago
|
||
Here's the changes to just bootstrap.js with the Android and B2G specific code removed, and the style fixes reverted.
Comment 29•9 years ago
|
||
The leaked window is browser.xul if that helps. Probably not.
Assignee | ||
Comment 30•9 years ago
|
||
Actually, it does give me an idea. The original window that's being closed is a browser.xul. And this patch also enables marionette, so maybe marionette is hanging on to a reference of that window.
Here's a try run that simply leaves that initial window open for the duration of the test run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b60245728f82
Comment 31•9 years ago
|
||
The networking leak might be different. That one at least makes a little more sense, as the IO service involves networking. I'm doing a try push to see if we are maybe poking at it later in shutdown.
Assignee | ||
Comment 32•9 years ago
|
||
That last patch seems to have fixed the linux debug leaks at least, I guess we'll wait for windows. This change means there will be an extra window hanging around for the duration of the test run, but I guess as long as it doesn't impact the test results, then it isn't really a big deal.
Assignee | ||
Comment 33•9 years ago
|
||
That last try run seems to have fixed the leak. And while it isn't perfect (as there's now a random window being left open during the test run), it seems to work and doesn't seem to impact the tests:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4903fb40eb21
So I'll get what I have reviewed and landed ASAP, and file a follow-up bug to look into why there are memory leaks when we close that window and whether or not it has something to do with marionette.
Thanks for helping dig into this Andrew.
Assignee | ||
Comment 34•9 years ago
|
||
So after the memory leak, I was about to upload the patch and realized my patch was disabling the non-local network requirement (something I had done a long time ago in order to make some quick progress and then forgot about). Of course re-enabling that turned all the jobs red :).
But I figured out what prefs needed to be set and we seem to be good to go once more:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=48e61b59a9ad
Patch shortly.
Assignee | ||
Comment 35•9 years ago
|
||
Comment on attachment 8721320 [details]
MozReview Request: Bug 1245092 - Install reftest and specialpowers extensions at runtime via AddonManager.installTemporaryAddon, r?jgriffin
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/35635/diff/1-2/
Attachment #8721320 -
Attachment description: MozReview Request: Bug 1245092 - Install reftest and specialpowers extensions at runtime via AddonManager.installTemporaryAddon → MozReview Request: Bug 1245092 - Install reftest and specialpowers extensions at runtime via AddonManager.installTemporaryAddon, r?jgriffin
Attachment #8721320 -
Flags: review?(jgriffin)
Comment 36•9 years ago
|
||
Comment on attachment 8721320 [details]
MozReview Request: Bug 1245092 - Install reftest and specialpowers extensions at runtime via AddonManager.installTemporaryAddon, r?jgriffin
https://reviewboard.mozilla.org/r/35635/#review36029
::: testing/marionette/driver/marionette_driver/marionette.py:1150
(Diff revision 1)
> + self.wait_for_port(timeout=timeout)
I'm not sure what the implicaitons are of removing this wait_for_port; are we sure it's harmless?
Attachment #8721320 -
Flags: review?(jgriffin) → review+
Assignee | ||
Comment 37•9 years ago
|
||
https://reviewboard.mozilla.org/r/35635/#review36029
> I'm not sure what the implicaitons are of removing this wait_for_port; are we sure it's harmless?
I think you're looking at the interdiff rather than the full diff. This change is actually already in-tree, it landed as part of the mochitest patch. Also it was moving it up a line rather than removing it.
Comment 38•9 years ago
|
||
Comment 39•9 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #37)
> https://reviewboard.mozilla.org/r/35635/#review36029
>
> > I'm not sure what the implicaitons are of removing this wait_for_port; are we sure it's harmless?
>
> I think you're looking at the interdiff rather than the full diff. This
> change is actually already in-tree, it landed as part of the mochitest
> patch. Also it was moving it up a line rather than removing it.
Yes, you're right, I was; thanks for the clarification.
Comment 40•9 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox48:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Assignee | ||
Comment 41•9 years ago
|
||
bugherder uplift |
status-firefox47:
--- → fixed
Assignee | ||
Comment 42•9 years ago
|
||
bugherder uplift |
status-firefox46:
--- → fixed
Assignee | ||
Comment 43•9 years ago
|
||
There was some non-local network bustage on the beta patch that didn't happen on my beta-based try push. I believe this is the same problem I hit with the mochitest patch, where there was a pref that was only being configured on beta/release, but nowhere else. Here is my attempt to fix it:
https://hg.mozilla.org/releases/mozilla-beta/rev/f902121e4d1b
Note: I'm not sure how test this seeing as I can't reproduce locally or on try, so this is a bit of a shot in the dark.
Comment 44•9 years ago
|
||
Assignee | ||
Comment 45•9 years ago
|
||
I forgot to uplift that non-local network fix to aurora/central. Here is the aurora change:
https://hg.mozilla.org/releases/mozilla-aurora/rev/faa65506b112a63784802378f8fa5610605c96b0
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 47•9 years ago
|
||
Thanks, the Try simulations are much happier now :)
Comment 48•9 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•