Closed
Bug 1125377
Opened 10 years ago
Closed 10 years ago
Marionette fails during getWindowHandles with e10s turned on when accessing a CPOW
Categories
(Remote Protocol :: Marionette, defect)
Tracking
(e10s+, firefox38 fixed)
RESOLVED
FIXED
mozilla38
People
(Reporter: chmanchester, Assigned: chmanchester)
References
(Blocks 1 open bug)
Details
(Keywords: pi-marionette-server)
Attachments
(3 files, 1 obsolete file)
This happens intermittently on mn-e10s jobs (bug 1124966), but constantly in the test suite :erahm is working on that ports the Are We Slim Yet tests to use marionette. He provided a test case that reproduces this that I'll upload shortly. Basically, it opens 15 tabs, forces a gc, closes all but one tab, and repeats the process. The second time through the test fails when figuring out how many tabs to close based on marionette.window_handles. In marionette-server.js, the failure occurs when attempting to retrieve a window ID from a CPOW (we check whether the CPOW is null, and then QI it. If fails in QI, not the null check). I can catch it in the browser toolbox debugger, but inspecting the window there crashes the tab.
I'm going to look into this some more, but I'm pretty over-subscribed for the next two weeks. We may need help from the e10s team figuring this one out, and we may need to stop keeping track of listener scripts in marionette via window ids (something we might need to do if CPOWs go away anyways).
Assignee | ||
Comment 1•10 years ago
|
||
This is the test. Run with --e10s to reproduce.
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → cmanchester
Status: NEW → ASSIGNED
Assignee | ||
Updated•10 years ago
|
Summary: Marionette fails during get_window_handles when accessing a CPOW → Marionette fails during get_window_handles with e10s turned on when accessing a CPOW
Updated•10 years ago
|
Keywords: ateam-marionette-server
Assignee | ||
Comment 2•10 years ago
|
||
From a quick test, translating use of a CPOW to use messages (a new command or small script to load into each browser we need to identify) effectively works around this.
Assignee | ||
Updated•10 years ago
|
Summary: Marionette fails during get_window_handles with e10s turned on when accessing a CPOW → Marionette fails during getWindowHandles with e10s turned on when accessing a CPOW
Comment 3•10 years ago
|
||
I see the same now for my work on handling chrome windows on bug 1123401. The failure might be a bit different, but it also happens given that the latest nightly has e10s turned on, and we do not disable it anymore by default.
https://travis-ci.org/mozilla/firefox-ui-tests/builds/48325279
File "/home/travis/build/mozilla/firefox-ui-tests/client/marionette/marionette.py", line 1293, in execute_script filename=os.path.basename(frame[0]))
File "/home/travis/build/mozilla/firefox-ui-tests/client/marionette/decorators.py", line 36, in _ return func(*args, **kwargs)
File "/home/travis/build/mozilla/firefox-ui-tests/client/marionette/marionette.py", line 634, in _send_message self._handle_error(response)
File "/home/travis/build/mozilla/firefox-ui-tests/client/marionette/marionette.py", line 689, in _handle_error raise errors.JavascriptException(message=message, status=status, stacktrace=stacktrace)JavascriptException: JavascriptException: NS_ERROR_FAILURE: stacktrace: execute_script @windows.py, line 284 inline javascript, line 1 src: " var win = window.open();"
Blocks: 1123401
Comment 4•10 years ago
|
||
The simplest command to execute for reproduction for me is:
def test_failure(self):
self.marionette.execute_script(""" window.open(); """)
Then it fails with the above mentioned error.
Assignee | ||
Comment 5•10 years ago
|
||
Henrik, this doesn't look like the same issue. I think this issue could be bug 1047603.
No longer blocks: 1123401
Assignee | ||
Comment 6•10 years ago
|
||
Hi Mike. Because this works most of the time, and working around it by sending messages instead of using a CPOW works around it, it seems like this could be an issue with CPOWs and of interest to e10s. Alternatively, this code could be making a dangerous assumption about CPOWs that I need to fix.
I don't expect you're familiar with marionette, but the code that hits it is:
https://hg.mozilla.org/mozilla-central/file/6b29617a738f/testing/marionette/marionette-server.js#l1387
When run as a part of the test case attached to this bug (described in comment 0), calling QueryInterface throws undefined. Do you have a sense of whether this sort of thing should be expected to reliably work?
Thanks in advance, and please let me know if there might be someone else on your team I should contact about this.
Flags: needinfo?(mconley)
Assignee | ||
Comment 7•10 years ago
|
||
When I say "throws undefined", I mean that when we are in a catch block starting with "catch (e) {...", "e" within that block is undefined.
Comment 8•10 years ago
|
||
It's possible that the contentWindow on the content side no longer exists, and you're attempting to access a "dead CPOW". Do you see warnings about dead CPOWs in the browser console when you hit this?
Flags: needinfo?(mconley) → needinfo?(cmanchester)
Assignee | ||
Comment 9•10 years ago
|
||
Thanks, I do have: "JavaScript error: , line 0: Error: operation not possible on dead CPOW" just before the exception. Is there a rule of thumb for safe handling of CPOWs that might be dead, or is it a question of individual callers knowing when the wrapped object must be alive (I'm not sure how to do that in this case, but I imagine it's possible)?
Flags: needinfo?(cmanchester) → needinfo?(mconley)
Comment 10•10 years ago
|
||
(In reply to Chris Manchester [:chmanchester] from comment #9)
> Thanks, I do have: "JavaScript error: , line 0: Error: operation not
> possible on dead CPOW" just before the exception. Is there a rule of thumb
> for safe handling of CPOWs that might be dead, or is it a question of
> individual callers knowing when the wrapped object must be alive (I'm not
> sure how to do that in this case, but I imagine it's possible)?
CPOWs are wonderful and dangerous things. :) While they do the job of transparently proxying synchronous calls to and from a remote object, they do not hold a reference to that object. That means if the object is garbage collected on the content side, the CPOW is now "dead" and cannot be operated upon.
In order for the CPOW to stay alive, something needs to be holding a reference to the thing you want to keep alive in the content process.
I believe you can check the dead-ness of a CPOW with Cu.isDeadWrapper, so that might help you avoid the exception. But if you really need those contentWindows to not be dead, you need something on the content side to hold a reference to them.
Flags: needinfo?(mconley)
Assignee | ||
Comment 11•10 years ago
|
||
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #10)
> (In reply to Chris Manchester [:chmanchester] from comment #9)
> > Thanks, I do have: "JavaScript error: , line 0: Error: operation not
> > possible on dead CPOW" just before the exception. Is there a rule of thumb
> > for safe handling of CPOWs that might be dead, or is it a question of
> > individual callers knowing when the wrapped object must be alive (I'm not
> > sure how to do that in this case, but I imagine it's possible)?
>
> CPOWs are wonderful and dangerous things. :) While they do the job of
> transparently proxying synchronous calls to and from a remote object, they
> do not hold a reference to that object. That means if the object is garbage
> collected on the content side, the CPOW is now "dead" and cannot be operated
> upon.
>
> In order for the CPOW to stay alive, something needs to be holding a
> reference to the thing you want to keep alive in the content process.
>
> I believe you can check the dead-ness of a CPOW with Cu.isDeadWrapper, so
> that might help you avoid the exception. But if you really need those
> contentWindows to not be dead, you need something on the content side to
> hold a reference to them.
Guarding with isDeadWrapper does the trick, thanks! In our case we're using the window IDs just to figure out what windows are around, so avoiding dead ones is probably enough.
Updated•10 years ago
|
Blocks: e10s-harness
tracking-e10s:
--- → +
Assignee | ||
Comment 12•10 years ago
|
||
Things aren't quite as easy to fix as I've said in comment 11, but I have a fix for this based on mapping browsers to outerWindowId's rather than always going through a CPOW (waiting on try results now). I noticed this same feature is being added in bug 1077168, so maybe we can piggyback on that when it lands.
Assignee | ||
Comment 13•10 years ago
|
||
Assignee | ||
Comment 14•10 years ago
|
||
/r/3877 - Bug 1125377 - Store window ids in a weak map keyed by browser objects to prevent attempts to access dead windows in marionette's getWindowHandles.
/r/3879 - Bug 1125377 - Marionette test case reproducing the issue.
Pull down these commits:
hg pull review -r d7bb50999c3557513af8a6501b65c41abe53c8ba
Assignee | ||
Comment 15•10 years ago
|
||
I've been working on this concurrently with bug 1096488 (it's sort of a flavor of the same problem, assuming we can always access an outerWindowId safely, and assuming the outerWindowId we're using as a listenerId never changes), but I think this should land first and can race with bug 1077168, which will afford a simplification but isn't strictly needed. The second commit above is erahm's test, it takes about 60 seconds to run but might be a worthy regression test.
Assignee | ||
Comment 16•10 years ago
|
||
Comment on attachment 8564438 [details]
MozReview Request: bz://1125377/chmanchester
/r/3877 - Bug 1125377 - Store window ids in a weak map keyed by browser objects to prevent attempts to access dead windows in marionette's getWindowHandles.
/r/3879 - Bug 1125377 - Marionette test case reproducing the issue.
Pull down these commits:
hg pull review -r d7bb50999c3557513af8a6501b65c41abe53c8ba
Attachment #8564438 -
Flags: review?(jgriffin)
Assignee | ||
Comment 17•10 years ago
|
||
Comment on attachment 8564438 [details]
MozReview Request: bz://1125377/chmanchester
/r/3877 - Bug 1125377 - Store window ids in a weak map keyed by browser objects to prevent attempts to access dead windows in marionette's getWindowHandles.
/r/3879 - Bug 1125377 - Marionette test case reproducing the issue.
Pull down these commits:
hg pull review -r 0e1246544cb80f6b21b4bf79d08a85b19cac4b19
Updated•10 years ago
|
Attachment #8564438 -
Flags: review?(jgriffin)
Comment 18•10 years ago
|
||
Comment on attachment 8564438 [details]
MozReview Request: bz://1125377/chmanchester
https://reviewboard.mozilla.org/r/3875/#review3115
::: testing/marionette/marionette-server.js
(Diff revision 2)
> register: function BO_register(uid, id) {
With this patch, this method no longer uses the 'id' parameter, so we should adjust it and all call sites.
Very nice, I didn't know about browser.permanentKey.
Assignee | ||
Comment 19•10 years ago
|
||
Assignee | ||
Comment 20•10 years ago
|
||
Comment on attachment 8564438 [details]
MozReview Request: bz://1125377/chmanchester
/r/3877 - Bug 1125377 - Store window ids in a weak map keyed by browser objects to prevent attempts to access dead windows in marionette's getWindowHandles.
/r/3879 - Bug 1125377 - Marionette test case reproducing the issue.
Pull down these commits:
hg pull review -r 643f714fe9b114ca20121e25ae3a5eb8060019da
Attachment #8564438 -
Flags: review?(jgriffin)
Comment 21•10 years ago
|
||
Comment on attachment 8564438 [details]
MozReview Request: bz://1125377/chmanchester
https://reviewboard.mozilla.org/r/3875/#review3267
Ship It!
Attachment #8564438 -
Flags: review?(jgriffin) → review+
Assignee | ||
Comment 22•10 years ago
|
||
Thanks for the review!
https://hg.mozilla.org/integration/mozilla-inbound/rev/23c081755b11
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
status-firefox38:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Assignee | ||
Comment 24•10 years ago
|
||
This took care of the issue in AWSY and the test in comment 1 but not the cause of the intermittents. The failure is at the point we go to access the contentWindowAsCPOW of a newly opened window so reopening. Bug 1077168 will allow us to get our window ID without using a CPOW at all, so that should take care of this.
Short of that we could just try/catch around that point, which is unfortunate, but this really seems like a timing quirk that needs to be worked around.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 25•10 years ago
|
||
Assignee | ||
Updated•10 years ago
|
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 26•9 years ago
|
||
Attachment #8564438 -
Attachment is obsolete: true
Attachment #8619216 -
Flags: review+
Attachment #8619217 -
Flags: review+
Assignee | ||
Comment 27•9 years ago
|
||
Assignee | ||
Comment 28•9 years ago
|
||
Updated•2 years ago
|
Product: Testing → Remote Protocol
You need to log in
before you can comment on or make changes to this bug.
Description
•