Closed Bug 1172468 Opened 9 years ago Closed 9 years ago

Intermittent browser_parsable_script.js | application terminated with exit code 11

Categories

(Core :: General, defect)

Unspecified
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox39 --- unaffected
firefox40 --- fixed
firefox41 --- fixed
firefox-esr31 --- unaffected
firefox-esr38 --- unaffected

People

(Reporter: cbook, Assigned: Gijs)

References

(Depends on 1 open bug, )

Details

(Keywords: intermittent-failure)

mozilla-inbound_ubuntu32_vm_test_pgo-mochitest-browser-chrome-1 https://treeherder.mozilla.org/logviewer.html#?job_id=10525639&repo=mozilla-inbound 03:47:35 WARNING - TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_parsable_script.js | application terminated with exit code 11
Depends on: 1172193
No longer depends on: 1172193
Depends on: 1172193
Peachy, bug 1164014 was uplifted to Aurora today, so now this is hitting there as well. Where do things stand with this? Leaving tests basically permafailing is not acceptable.
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #196) > Peachy, bug 1164014 was uplifted to Aurora today, so now this is hitting > there as well. Where do things stand with this? Leaving tests basically > permafailing is not acceptable. The test was already intermittent in 2-3 different ways and was quite frequent. I would argue that it should be turned off and the underlying problems should be fixed and then turn on again (like all the other intermittent tests). A randomly failing test that has nothing to do with the work I'm trying to get done should not become a blocker for me. If I could make the call I would turn it off and file a follow up on the GC bug. But if folks disagree then let me know and we can back out my patches from aurora, and I'll explain people why developer edition must keep being painfully slow for them.
Flags: needinfo?(gkrizsanits)
Flags: needinfo?(ryanvm)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #210) > The test was already intermittent in 2-3 different ways and was quite > frequent. Evidence for that? I don't recall seeing this test fail anywhere near like it is now before bug 1164014 landed. > I would argue that it should be turned off and the underlying > problems should be fixed and then turn on again (like all the other > intermittent tests). A randomly failing test that has nothing to do with the > work I'm trying to get done should not become a blocker for me. If I could > make the call I would turn it off and file a follow up on the GC bug. Decisions like that should be made before uplifting the regressing patch around and spreading the failures around even more. And in consultation with the relevant test owners (which AFAICT, hasn't happened anywhere). > But if folks disagree then let me know and we can back out my patches from aurora, > and I'll explain people why developer edition must keep being painfully slow > for them. I don't think guilt-tripping is an overly productive addtition to this discussion.
Flags: needinfo?(ryanvm)
I plan to disable the test by end of day given the essential permafail here and lack of productive discussion towards getting it fixed.
Flags: needinfo?(wmccloskey)
Flags: needinfo?(terrence)
Flags: needinfo?(jcoppeard)
Flags: needinfo?(gkrizsanits)
Flags: needinfo?(gijskruitbosch+bugs)
So the test loads all the JS we ship and checks that it does not create parse errors. If we're triggering OOM here there are several ways to workaround. It seems bug 1172193 comment #0 already identified at least one, to wit: > If I artificially call scheduleGC(this) after each Reflect.parse call from the test, then everything is good. Seems like that would be a workable solution, though it might need to be combined with a requestLongerTimeout if the GCs take enough time to push the test's runtime through the time limits for our individual tests. I don't have time to write up the patch until either late tonight or potentially tomorrow, and I'd like confirmation from the JS folks that such a workaround works and/or is acceptable considering they've not fixed 1172193 yet. I'm also confused because the offending patches landed with a supposed test fix that moved the reflect calls into the same zone which should be avoiding the GC issues already. Do we know why that fix is not "good enough" to avoid the issue?
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #222) > (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #210) > > The test was already intermittent in 2-3 different ways and was quite > > frequent. > > Evidence for that? I don't recall seeing this test fail anywhere near like > it is now before bug 1164014 landed. I did quite a few try runs and based on those it seemed to me that the failures went up from let's say 10% to 25%. I have not seen this one from around 50 runs or more, so I have no idea how it became perma orange. In fact if I saw this one once I would not have pushed it to mc even, I clearly looked over this one. I wonder if other patches made it worse or something. I thought it is about one of the two other intermittent failures that belongs to this test. And I double checked it now and the other frequent intermittent was actually from browser_social_activation.js. The other one was bug 1123438. Because it was not backed out and I've seen progress around the GC bug I found that acceptable. (was hoping that it will get better soon with the related GC bug fixed or by turning the test off). I was not aware that it became almost perma orange, especially not with this new oom failure! So I made a second mistake by assuming too much instead of communicating it. Sorry about that. > > > I would argue that it should be turned off and the underlying > > problems should be fixed and then turn on again (like all the other > > intermittent tests). A randomly failing test that has nothing to do with the > > work I'm trying to get done should not become a blocker for me. If I could > > make the call I would turn it off and file a follow up on the GC bug. > > Decisions like that should be made before uplifting the regressing patch > around and spreading the failures around even more. And in consultation with > the relevant test owners (which AFAICT, hasn't happened anywhere). > > > But if folks disagree then let me know and we can back out my patches from aurora, > > and I'll explain people why developer edition must keep being painfully slow > > for them. > > I don't think guilt-tripping is an overly productive addtition to this > discussion. Sorry if that sounded passive aggressive or something that was totally not intentional. It's just a fact that we have to keep in mind. If this crash is more serious than that, it can be totally the right thing to do, I did not except anyone feeling guilt for anything here. I wonder if there is any frequency change in the oom crashes that would be a reason probably to back out my patch no matter how painful that would be. (In reply to :Gijs Kruitbosch from comment #224) > I'm also confused because the offending patches landed with a supposed test > fix that moved the reflect calls into the same zone which should be avoiding > the GC issues already. Do we know why that fix is not "good enough" to avoid > the issue? No. It worked fine when I pushed it to try and tested quite thoroughly. The other intermittent failures became slightly more frequent but could not reproduce this crash. And I'm afraid just backing out my patch will not fix this issue.
Flags: needinfo?(gkrizsanits)
(In reply to :Gijs Kruitbosch from comment #224) > > If I artificially call scheduleGC(this) after each Reflect.parse call from the test, then everything is good. https://treeherder.mozilla.org/#/jobs?repo=try&revision=c0619e89a62c
Does that run include PGO? That's where the vast majority of the failures are happening. https://wiki.mozilla.org/ReleaseEngineering/TryChooser#What_if_I_want_PGO_for_my_build
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #235) > Does that run include PGO? That's where the vast majority of the failures > are happening. > https://wiki.mozilla.org/ReleaseEngineering/ > TryChooser#What_if_I_want_PGO_for_my_build Nope, I have not even heard about that, thanks! https://treeherder.mozilla.org/#/jobs?repo=try&revision=cf8a44565b91
scheduleGC does not seem to work for some reason in the PGO build, I think it's something e10s related. Tried doing manual CC and GC instead but it did not help. PGO build seem to be perma orange indeed, even with the added CC/GC calls: https://treeherder.mozilla.org/#/jobs?repo=try&revision=88b38eb764b6 Regular build seem to be green. I have no idea what that means, but it might help fixing the underlying GC issue. I have checked for any major oom crashes in crash stats but could not find anything related, so it seems like only this test is affected and only the PGO build. I'm out of ideas here. I feel like we should turn this test off and file a follow up bug on bug 1172193 to turn it back on.
Can we just skip-if this for pgo linux? :-\
(In reply to :Gijs Kruitbosch from comment #246) > Can we just skip-if this for pgo linux? :-\ Yeah that's what I mean by turning it off. Actually I was planning to do it for linux opt 32bit, but the pgo sounds like an even better idea. Do we have a specific flag for that somewhere?
No, linux32 opt is the only option we've got. The below should work: skip-if = (os == 'linux' && !debug && (bits == 32)) # Bug 1172468
https://hg.mozilla.org/releases/mozilla-aurora/rev/c30eddaafa83 Going to let this resolve under the assumption that fixing & re-enabling will occur in bug 1172193.
Assignee: nobody → gijskruitbosch+bugs
Flags: needinfo?(wmccloskey)
Flags: needinfo?(terrence)
Flags: needinfo?(jcoppeard)
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
You need to log in before you can comment on or make changes to this bug.