Closed
Bug 1345284
Opened 8 years ago
Closed 2 years ago
Intermittent dom/promise/tests/test_promise_argument.html | application crashed [@ RunWatchdog]
Categories
(Core :: DOM: Core & HTML, defect, P3)
Core
DOM: Core & HTML
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: intermittent-bug-filer, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: intermittent-failure)
Attachments
(1 file)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•8 years ago
|
Depends on: RunWatchdogShutdownhang
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 12•7 years ago
|
||
we are working to run all our tests on ubuntu 16.04, and when running on try we found this error 33% of the time:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8d191df67e0a14bbde01312a28d827587fa9e8fb&filter-searchStr=linux%20debug%20mochitest-e10s-4
:overholt, could we get some help in debugging this?
Blocks: 1319782
Flags: needinfo?(overholt)
Comment 14•7 years ago
|
||
Wow, I hadn't seen this intermittent before.
Andrew, is there anyone who could help give me a bit more background on the nature of this failure? Looking at the logs, I see test_webassembly_compile.html starting and finishing successfully, and it seems like the failure happens when the subsequent browser shutdown times out. Since many other tests run in the same test run / browser instance, it seems like any of them could be causing to the shutdown hang. So one question is: how does the test harness choose who to blame? I'm guessing test_webassembly_compile.html is being blamed b/c it was the last test to run (due to it going alphabetically last), but maybe someone more familiar with the harness could explain if there is a more sophisticated mechanism?
Flags: needinfo?(luke) → needinfo?(overholt)
Comment 16•7 years ago
|
||
Maybe Joel knows the harness and/or blaming mechanism (I don't, sorry)?
Flags: needinfo?(overholt) → needinfo?(jmaher)
Comment 17•7 years ago
|
||
oh, this is my fault- I overlooked something here and see that this is a crash on shutdown. We can bisect this and figure out which test(s) are responsible for the shutdown crash, I will get back tomorrow with more actionable data.
Flags: needinfo?(jmaher)
Updated•7 years ago
|
Flags: needinfo?(luke)
Comment 18•7 years ago
|
||
coming back to this, I realize that one of the last 2 tests in the mochitest.ini are causing the problem:
https://dxr.mozilla.org/mozilla-central/source/dom/promise/tests/mochitest.ini#19
this leaves:
test_promise_callback_retval.html
test_promise_argument.html
you can see a try push here:
https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher@mozilla.com&selectedJob=103704658
I looked at the last two and commenting one out at a time, I see the crash take place- so something in these tests are causing the problem, and it doesn't help that they run at the end of the suite.
Does this help narrow it down? Are there more things you would like help in narrowing this down?
Flags: needinfo?(luke)
Comment 19•7 years ago
|
||
Thanks! Looks like wasm is off the hook, so forwarding ni? back to Andrew.
Flags: needinfo?(luke) → needinfo?(overholt)
Comment hidden (Intermittent Failures Robot) |
Comment 21•7 years ago
|
||
I ran a little test with changing where 'SimpleTest.waitForExplicitFinish()' is called at the top of the script and it seems to have affected the error without changing how often it comes up: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2de2c9438fe0a07fed2d63114ad5eff98033bca2
Comment 22•7 years ago
|
||
I've found a solution or the root cause of this problem, you can see a test run at [1]. It seems to be happening because of the forced garbage collection that this test does. I also don't see why it is necessary, and in my opinion, we can get rid of it.
[1]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=cfb03725715d45e789dad1b69f1fca5c6cfc6fd0
Comment 23•7 years ago
|
||
:luke, I see you authored this test, could you weigh in with your thoughts on what :gmierz is proposing, likewise maybe look at the code in the above try push to see if that is reasonable or offer up other suggestions?
Flags: needinfo?(luke)
Comment 24•7 years ago
|
||
If we're still only talking about the tests in comment 18 (test_promise_callback_retval.html and test_promise_argument.html), then hg says bz is the author, so forwarding ni? to him.
Flags: needinfo?(luke) → needinfo?(bzbarsky)
Summary: Intermittent dom/promise/tests/test_webassembly_compile.html | application crashed [@ RunWatchdog] → Intermittent dom/promise/tests/test_promise_argument.html | application crashed [@ RunWatchdog]
Comment 25•7 years ago
|
||
So instead of removing the forced garbage collection, I've made it so that the PrefEnv changes made by the two test mentioned in comment 18 are removed after they finish. So far I haven't seen any failures and I am retrying 10 more times at [1].
[1]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9aa868009850487e772b02c22ac2277d2cff2b18
Comment 26•7 years ago
|
||
:luke, can you explain to me why the call to SpecialPowers.gc() is needed in this test? Thanks!
Updated•7 years ago
|
Flags: needinfo?(luke)
Comment 27•7 years ago
|
||
Comment 23 was not about the promise tests.
As far as the promise tests go... I don't see anything in there offhand that should affect shutdown, except maybe the SpecialPowers usage?
Flags: needinfo?(bzbarsky)
Comment 28•7 years ago
|
||
(In reply to Boris Zbarsky [:bz] (if a patch has no decent message, automatic r-) from comment #27)
Unless I'm missing something, comment 18 points out two promise tests that alone cause this hang.
Flags: needinfo?(luke)
Comment 29•7 years ago
|
||
I can't make sense of comment 18 (and in particular, I don't see how it corresponds to the try run it links to). If there are details, I would love to know what they are.
Comment 22 is about a change solely to the wasm test that makes the problem go away and comment 23 is a question about that change to the wasm tests.
So far it sounds to me like some combination of all these tests (or at least of gc and "something") is causing the problem. How I wish we could just record in rr on infra.... :(
Updated•7 years ago
|
Flags: needinfo?(overholt)
Comment 30•7 years ago
|
||
Ah, sorry for the noise, I hadn't read the patches in the try push; I had assumed they were for the two tests mentioned in comment 18. I think I added the System.gc() just to to attempt to reveal and catch bugs: basically to ensure the rooting I did for wasm compilation promises was done correctly. Moving the waitForExplicitFinish() to the top seems perfectly fine; does that solve the issue or is it necessary to remove System.gc()? Are there any known problems with calling System.gc()?
Comment 31•7 years ago
|
||
Not a problem :)
Sorry for not removing that line, it didn't help with anything but it did change the error: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2de2c9438fe0a07fed2d63114ad5eff98033bca2
Removing the gc() line is all that was needed to fix the problem. Comment 25 goes a step further and fixes the two tests mentioned in comment 18 by reseting the pushPrefEnv change that they make. I'm not sure of any known problems, but it looks like there's one here.
Comment 32•7 years ago
|
||
Or so far so good for the fix in comment 25, I'm going to run another ~20 tests on it to make sure.
Comment 33•7 years ago
|
||
> by reseting the pushPrefEnv change that they make.
The whole point of pushPrefEnv is that it gets undone automatically when your test ends. So what you're adding is more pushes that then should all get unwound.
Specifically, see the SpecialPowers.flushPrefEnv call that happens inside SimpleTest.finish.
Comment 34•7 years ago
|
||
Thanks for the information and clarification :bz. It did fail eventually also. I now think it's because of where the variables are declared, see [1]. I declared the variables that were within the functions, just outside of them, and after 45 test runs there are still no errors. So what I think is happening here is that the garbage collector destroys those variables before they can be used. This also means that there are no problems with gc().
[1]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e0f394490a9ade0931ae5abada29cdf76bcd42b1
Comment hidden (mozreview-request) |
Comment 36•7 years ago
|
||
mozreview-review |
Comment on attachment 8875326 [details]
Bug 1345284 - Move variable declarations in test_promise_argument.html, and test_promise_callback_retval.html.
https://reviewboard.mozilla.org/r/146744/#review150856
thanks for fixing this.
Attachment #8875326 -
Flags: review?(jmaher) → review+
Comment 37•7 years ago
|
||
Greg, I don't understand that patch. What exactly do you think is being destroyed and at what point? globalWrapper is only touched right after the getGlobalForObject call right there; why would it need to be scoped to anything outside that function?
Similarly, why do we care what "func" is scoped to? It's not used after the passThroughCallbackPromise call returns.
Comment 38•7 years ago
|
||
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ab4f6b6cd8dd
Move variable declarations in test_promise_argument.html, and test_promise_callback_retval.html. r=jmaher
Comment 39•7 years ago
|
||
Or put another way, I don't see why that patch should change any behavior at all.
Comment 40•7 years ago
|
||
And please consider that r- from a module peer... :(
Comment 41•7 years ago
|
||
:bz, I was looking through some common things that cause oranges and found this: https://developer.mozilla.org/en-US/docs/Mozilla/QA/Avoiding_intermittent_oranges#Using_objects_without_accounting_for_the_possibility_of_their_death
Which is about the same as what these tests are doing.
In my mind, what happens is that because the gc is called in the other test, the garbage collector gets called again once we get to these tests and unfortunately it happens right between those two lines of code. When we don't force the gc in the other test, then the garbage collection happens elsewhere most often (either before or after those function calls) and that is why it severely reduces the frequency of occurrence. Does this explanation help?
Comment 42•7 years ago
|
||
> and found this: https://developer.mozilla.org/en-US/docs/Mozilla
Right, that's very specific to objects which you expect to call bacak to you later but are not owned by anyone other than yourself. That's not the situation in this case.
> because the gc is called in the other test, the garbage collector gets called again once we get to these
> tests
That's possible, sure, in that it can affect gc scheduling in general...
> and unfortunately it happens right between those two lines of code
Between what two lines of code?
> Does this explanation help?
Sorry, no.
Comment 43•7 years ago
|
||
Sorry, I meant between these two lines: https://dxr.mozilla.org/mozilla-central/source/dom/promise/tests/test_promise_argument.html?q=test_promise_argument.html&redirect_type=direct#19-20
Do you think this is another problem, and if so, what do you think it is?
Comment 44•7 years ago
|
||
> Sorry, I meant between these two lines:
Why would it matter if a gc happens between those two lines? At that point, globalWrapper is clearly alive and referenced...
> Do you think this is another problem, and if so, what do you think it is?
I have no idea what the problem is, but it's not because globalWrapper dies while there's a live variable pointing to it!
Stepping back for a moment, the fundamental problem here is a shutdown hang, right? Which part of shutdown is hanging? Have we figured _that_ out?
Comment 45•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Updated•7 years ago
|
Assignee: nobody → gmierz2
Comment 46•7 years ago
|
||
Sure looks like this bug needs to be open still given the recent comments.
Status: RESOLVED → REOPENED
status-firefox56:
--- → affected
Resolution: FIXED → ---
Target Milestone: mozilla55 → ---
Comment 47•7 years ago
|
||
Thank you for reopening this. I spent some time looking at the logs to see if there is anything that stands out during shutdown and I noticed that when the test fails, it uses another DOM window (with a different PID) just before the test ends, and you can see this at [1]. [2] is a passing test, and in there, there isn't a change in the DOM window's PID right before the end of 'test_webassembly_compile.html'. Furthermore, in [1], the shutdown hangs as it tries to close a DOCSHELL with the same PID as the one that is used for the DOM window, at the end of the test, in [1]. I haven't been able to find another passing test with this difference yet though.
One question I have is where would I find the code that deals with the shutdown phase?
[1]: https://pastebin.mozilla.org/9024477
[2]: https://pastebin.mozilla.org/9024479
Comment 48•7 years ago
|
||
Probably in xpcom/build/XPCOMInit.cpp in general...
.
Updated•7 years ago
|
Priority: -- → P2
Updated•7 years ago
|
Blocks: RunWatchdogShutdownhang
No longer depends on: RunWatchdogShutdownhang
Comment 50•6 years ago
|
||
:overholt, I don't think I'll have a chance to hack on this in the near future - you can assign someone else on this bug if needed. I'm happy to help answer any questions they may have though.
Flags: needinfo?(gmierz2)
Comment 51•6 years ago
|
||
OK, thanks, Greg.
Edgar, can you please take a look when you have a minute? Thanks.
Assignee: gmierz2 → echen
Flags: needinfo?(echen)
Assignee | ||
Updated•6 years ago
|
Component: DOM → DOM: Core & HTML
Updated•5 years ago
|
Priority: P2 → P3
Updated•4 years ago
|
Assignee: echen → nobody
Flags: needinfo?(echen)
Updated•2 years ago
|
Severity: normal → S3
Comment 52•2 years ago
|
||
I do not see any instances of this intermittent any more. But I do see that those tests are enabled only in DEBUG, apparently since they landed (so not as a reaction on the intermittent). Filed bug 1799820 to verify, this can be closed then.
Status: REOPENED → RESOLVED
Closed: 7 years ago → 2 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•