Closed Bug 566049 Opened 15 years ago Closed 12 years ago

Intermittent failure in test_crashing2.html (window.onerror An error occurred - uncaught exception NS_ERROR_FAILURE)

Categories

(Core Graveyard :: Plug-ins, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: mossop, Unassigned)

References

Details

(Keywords: intermittent-failure, regression)

Attachments

(2 files, 1 obsolete file)

This started failing today it seems: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1273865513.1273865980.11501.gz Linux mozilla-central opt test mochitests-5/5, Started 15:31, finished 15:41, took 10mins http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1273870984.1273872286.8663.gz WINNT 5.2 mozilla-central debug test mochitests-5/5, Started 17:03, finished 17:25, took 23mins http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1273871540.1273872068.7785.gz WINNT 5.2 mozilla-central opt test mochitests-5/5, Started 17:12, finished 17:23, took 12mins http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1273872842.1273873606.14988.gz Linux mozilla-central opt test mochitests-5/5, Started 17:34, finished 17:47, took 14mins http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1273872773.1273873900.16213.gz WINNT 5.2 mozilla-central debug test mochitests-5/5, Started 17:32, finished 17:52, took 20mins
Keywords: regression
There were a bunch of these starred as bug 541707 before this was filed. The earliest push I see in which this happened was: http://hg.mozilla.org/mozilla-central/pushloghtml?changeset=f7560e802592 However, I didn't look at any pushes more than one or two prior to that because tinderboxpushlog reports "Parsing tinderbox failed" I think the most likely cause is: http://hg.mozilla.org/mozilla-central/rev/6a9d0521a319 which was two pushes prior to that one.
Josh: can you look at this? Thanks!
This test was last modified on May 15th by bug 561596, Dave Yeo - can you please take a look?
Assignee: nobody → daveryeo
(In reply to comment #11) > This test was last modified on May 15th by bug 561596, Dave Yeo - can you > please take a look? http://hg.mozilla.org/mozilla-central/rev/6f0fe1119d4b doesn't modify any test. This bug was filed on the 14th, but the patch in bug 561596 didn't land until the 15th.
I took my information from: http://hg.mozilla.org/mozilla-central/annotate/6f0fe1119d4b/modules/plugin/test/mochitest/test_crashing2.html apologies if I misinterpreted, but that seems to indicate the bug I mentioned.
(That link is what "HG Blame" links to for the file, fwiw)
(In reply to comment #13) > I took my information from: > > http://hg.mozilla.org/mozilla-central/annotate/6f0fe1119d4b/modules/plugin/test/mochitest/test_crashing2.html > > apologies if I misinterpreted, but that seems to indicate the bug I mentioned. That was just telling you what was the tipmost revision at the time you looked (independent of whether it touched that file). (I'm not saying the UI makes sense, I'm just saying that's what the information you were looking at was trying to tell you.)
The test hasn't changed since January: http://hg.mozilla.org/mozilla-central/log/6f0fe1119d4b/modules/plugin/test/mochitest/test_crashing2.html I think this is most likely related to josh's removal of the default plugin. Josh, could you look into this?
Assignee: daveryeo → joshmoz
I'm looking into it.
I wasn't able to reproduce but this code in the test appears to be the problem: // The plugin will now crash when it is instantiated, but // that happens asynchronously. HACK: this should use an // event instead of nested pending runnables, but I don't // know of any DOM event that's fired when a plugin is // instantiated. SimpleTest.executeSoon(function() { SimpleTest.executeSoon(function() { ok(p.setColor === undefined, "Plugin should have crashed on creation"); window.frameLoaded = reloaded1; iframe.contentWindow.location.replace('crashing_subpage.html'); }) }); There is a race here, and I suspect the test fails when the race results in "p.setColor" throwing an exception rather than simply an undefined value. Prior to my patch both possible race conditions probably resulted in the undefined value. The difference is probably the fact that "nsObjectLoadingContent::GetTypeOfContent" can now return 'eType_Null' for a crashed plugin, instead of eType_Plugin. The "ok(p.setColor === undefined..." test appears to simply be verifying that the plugin crashed. If that is the case, it probably needs to be a better test. We could simply consider an exception to be an acceptable alternative to the undefined value, either one indicating a crashed plugin. It shouldn't be too hard to fix this problem one way or another. If we can't decide on a solution quickly we should just disable it and file a bug.
.setColor should definitely be undefined eventually: the prototype object should be unhooked when the plugin crashes, and no matter whether it's eType_Plugin or eType_Null there's nothing that should throw an exception by the time we're in a stable state. What exactly is racing, in your scenario? I really believe that the nested .executeSoon, although ugly, should be deterministic.
Attached file manual test 1.0 (deleted) —
I think this HTML test case shows what is going wrong on the tinderbox, but I can't be sure because I can't reproduce with the actual test. First it prints whether or not 'setFocus' is equal to 'undefined' for two plugins, the test plugin and an unknown plugin. It is always false for the test plugin, true for the unknown plugin. The it crashes the test plugin and immediately tests whether 'setFocus' is equal to 'undefined' again. This always throws an exception. WARNING: Failed to send message!: file /Volumes/MozillaBuilds/10_6/ff_193_debug_32/dom/plugins/PluginScriptableObjectParent.cpp, line 219 JavaScript error: file:///Users/josh/Desktop/test.html, line 9: Error calling method on NPObject! After that, you can run the test again by clicking the button on the page. When clicking the button there is no exception, the comparison comes out true every time. This difference has to do with timing or scope, I don't know which one. Strangely, if you navigate away from this test HTML after clicking the button a few times, the browser will crash after about 10 seconds, usually with a bad memory access. This seems like a different bug.
Attachment #445748 - Attachment is patch: false
Attachment #445748 - Attachment mime type: text/plain → text/html
My manual test case shows the same disparity (exception vs. undefined) in builds from before the default plugin patch landed.
Attached patch fix v1.0 (obsolete) (deleted) — Splinter Review
My manual test shows that there is a period after a crash during which the method will not be undefined, accessing it will throw an exception. We should allow either case to indicate a crashed plugin.
Attachment #448683 - Flags: review?(benjamin)
But that's not what we're testing. The test wants to test that the value eventually ends up being undefined (after the PluginCrashed event fully propagates). This just wallpapers over the problem instead of actually figuring out why the dual .executeSoon isn't sufficient.
Attached patch (unlikely) possibility v2.0 (deleted) — Splinter Review
I'm not sure this is really a fix, I can't reproduce the problem, but this patch reverts the only change in my default plugin patch that seems like a potential cause for this bug. If this isn't a fix then the double executeSoon is probably not deterministic - that is the only other potential cause I can identify. Benjamin - why exactly are you convinced that the double executeSoon is deterministic?
Attachment #448683 - Flags: review?(benjamin)
Attachment #448683 - Attachment is obsolete: true
Attachment #449793 - Attachment description: fix v2.0 → (unlikely) possibility v2.0
I don't see why a double executeSoon should be deterministic, and if it isn't then it is possible that the check will run before the PluginCrashed event is handled. That seems like the most likely cause for this bug. Most of the time the double executeSoon should delay the check long enough to avoid failure, which is in line with this failure being intermittent. I think we have 3 options here: 1. Find a deterministic way to make sure the PluginCrashed notification fires and that the method value ends up being undefined after processing. I'm not sure we can do this from content. 2. Get rid of the test on the grounds that it is inherently flawed. 3. Add another executeSoon level and make the bad race condition even less likely to occur. Unfortunate that the race would remain, but realistically this would probably work fine.
Assignee: joshmoz → nobody
Hardware: x86 → All
Argh, I've written a reply twice and it hasn't submitted. It definitely should be deterministic: there aren't any other threads in play, really, and the main thread even order should be: * iframe.contentDocument.body.appendChild(p); ** posts plugininstantiate event * .executeSoon #1 * plugininstantiate event runs, calls NPP_New ** plugin crashes ** ActorDestroy is called ** posts PluginCrashed event * .executeSoon #1 runs, posts .executeSoon #2 * PluginCrashed runs ** calls into nsNPAPIPluginInstance::Stop ** calls into nsJSNPRuntime::OnPluginDestroy(NPP npp) ** unhooks the plugin prototype object * .executeSoon #2 runs, prototype should already be unhooked, .setColor should be undefined
Mass marking whiteboard:[orange] bugs WFM (to clean up TBPL bug suggestions) that: * Haven't changed in > 6months * Whose whiteboard contains none of the strings: {disabled,marked,random,fuzzy,todo,fails,failing,annotated,leave open,time-bomb} * Passed a (quick) manual inspection of bug summary/whiteboard to ensure they weren't a false positive. I've also gone through and searched for cases where the whiteboard wasn't labelled correctly after test disabling, by using attachment description & basic comment searches. However if the test for which this bug was about has in fact been disabled/annotated/..., please accept my apologies & reopen/mark the whiteboard appropriately so this doesn't get re-closed in the future (and please ping me via IRC or email so I can try to tweak the saved searches to avoid more edge cases). Sorry for the spam! Filter on: #FFA500
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Whiteboard: [orange]
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: