Closed
Bug 1290186
Opened 8 years ago
Closed 8 years ago
Intermittent test_end for test_windows.py TestBaseWindow.test_open_close logged while not in progress. Logged with data: {"status": "ERROR", "extra": {"class_name": "test_windows.TestBaseWindow", "method_name": "test_open_close"}
Categories
(Testing :: Firefox UI Tests, defect)
Tracking
(firefox50 fixed, firefox51 fixed)
RESOLVED
FIXED
mozilla51
People
(Reporter: intermittent-bug-filer, Assigned: mconley)
References
Details
(Keywords: intermittent-failure, regression)
Attachments
(1 file)
Comment hidden (Intermittent Failures Robot) |
Comment 2•8 years ago
|
||
I ran the test 20 times locally but no luck to reproduce.
I saw WriteMinidump() in the stack and thought about bug 1267329, but it doesn't seem the reason of the crash.
Comment 3•8 years ago
|
||
04:29:24 INFO - ###!!! [Parent][MessageChannel] Error: (msgtype=0x2000E,name=PAPZ::Msg_Destroy) Channel error: cannot send/recv
I am not sure, but does this mean somehow the message is sent too late?
Flags: needinfo?(bugmail)
Comment 4•8 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #2)
> I saw WriteMinidump() in the stack and thought about bug 1267329, but it
> doesn't seem the reason of the crash.
See bug 1289243 for the originally filed bug but which now gets hidden on Treeherder.
When I try to run our tests with Ubuntu 16.04 its a permanent failure as it looks like:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8d007a1512b5&selectedJob=24777458
Maybe I should grab a one click loaner and check it.
Updated•8 years ago
|
Comment 5•8 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #3)
> 04:29:24 INFO - ###!!! [Parent][MessageChannel] Error:
> (msgtype=0x2000E,name=PAPZ::Msg_Destroy) Channel error: cannot send/recv
>
> I am not sure, but does this mean somehow the message is sent too late?
Yeah, it means the message is sent after the channel is torn down. However this could happen for any number of reasons, like if the child process crashes then the parent process might still be in the process of sending it messages which would fail with this output.
Flags: needinfo?(bugmail)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•8 years ago
|
Blocks: 1261842
Keywords: regression
Comment 11•8 years ago
|
||
Mike, I checked Treeherder for this failure and indeed the first time it appears is for the landing of your patch! See this link:
https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=Firefox%20UI&fromchange=84b5a1027550&tochange=bda9fd505371&selectedJob=913100
Sadly there were infra issues for this push. So I just retriggered some more tasks to check if we can reproduce.
Comment 12•8 years ago
|
||
I checked this locally and I see dozen of crashes which happen each time a new chrome window gets opened. All of them have an assertion occurring first:
> [Parent 1906] WARNING: NS_ENSURE_TRUE(document) failed: file /home/worker/workspace/build/src/extensions/cookie/nsPermissionManager.cpp, line 2004
> [Parent 1906] WARNING: NS_ENSURE_SUCCESS(rv, false) failed with result 0x80004002: file /home/worker/workspace/build/src/dom/base/nsDocument.cpp, line 5947
> [Parent 1906] ###!!! ASSERTION: Why is there a document channel?: '!chan', file /home/worker/workspace/build/src/embedding/components/windowwatcher/nsWindowWatcher.cpp, line 2187
I think we have to file a bug under cookies to get this fixed, and mark all test failures dependent on it.
Comment 13•8 years ago
|
||
To correct myself the assertion is for the window watcher, and not the cookie file.
Comment hidden (Intermittent Failures Robot) |
Comment 15•8 years ago
|
||
Bug 1290280 might be what we see here for our tests? It says virtualization. Maybe given that our tests are run in AWS there is something related to that? At least because we cannot reproduce on our local machines. Mike, what do you think?
Flags: needinfo?(mconley)
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 18•8 years ago
|
||
So I looked into this a bit, and I think I know what's going on.
test_windows.py opens and closes a bunch of windows, while a background window pointed at about:newtab just kinda hangs around.
Because bug 1261842 made initial browsers remote, this means that we're creating and destroying remote browsers pretty rapidly, which means we're spawning content processes, and then sending messages to them to shut down pretty rapidly.
In ContentParent, when we tell the content process to shut down, we also start a timer to kill that content process manually if it doesn't close itself down in time. By default, that timer is 5 seconds long.
So we're opening and closing a bunch of content processes, and we're also scheduling these things to be terminated if they don't shut down in time. In at least one case, we open and close a window so quickly that the shut down timer starts before the content process has done any of its own initialization. The parent is busy during this time, and the content process usually asks the parent for information during initialization (synchronously). This means that the content process takes longer to initialize, which means that it takes longer for it to pop the "shut down" message off the queue, which means that it's far more likely that we'll timeout and the parent will kill the content process manually.
When the parent kills the content process, it creates a paired minidump. A stack is generated for the content (which is mostly useless since the content could have been doing anything at the time of the kill) and one is generated for the parent (which shows us that we're doing a shut down kill of the content process).
Assignee | ||
Comment 19•8 years ago
|
||
The solution that whimboo and I have come up with is to set dom.ipc.tabs.shutdownTimeoutSecs to 0 for the test. This causes us to skip the content process shut down timer logic, which should get us past the orange here.
We might want to find a longer-term solution here, but this should do for now.
Comment 20•8 years ago
|
||
I filed bug 1293270 to get some e10s related improvements into mozcrash.
Comment hidden (mozreview-request) |
Updated•8 years ago
|
Comment 22•8 years ago
|
||
mozreview-review |
Comment on attachment 8778917 [details]
Bug 1290186 - Workaround content process shutdown kills in Firefox UI tests by disabling timeout for some tests.
https://reviewboard.mozilla.org/r/70026/#review67186
<p>The changes are looking great. If try is also happy with a couple of retriggeres included for e10s-local, we can get this patch landed. Thank you Mike!</p>
::: testing/firefox-ui/tests/puppeteer/test_windows.py:29
(Diff revision 1)
> + the content process manually, which generates a crash report,
> + which causes these tests to orange. We side-step this by
> + setting dom.ipc.tabs.shutdownTimeoutSecs to 0, which disables
> + the shutdown timer.
> + """
> + self.prefs.set_pref('dom.ipc.tabs.shutdownTimeoutSecs', 0)
Great explanation! It will ensure that no-one will play with it.
Attachment #8778917 -
Flags: review?(hskupin) → review+
Assignee | ||
Comment 25•8 years ago
|
||
Hrm, my try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=cc7a20b91544ef850d7a70d95a24491f8d5d0588 shows us still oranging here.
The stack is still the same, which suggests to me that we're still somehow killing the content process off of the 5 second timer.
whimboo - am I somehow setting the pref incorrectly?
Flags: needinfo?(hskupin)
Assignee | ||
Comment 26•8 years ago
|
||
According to https://public-artifacts.taskcluster.net/R19qUIvtQimZeAjpYFxp7A/0/public/logs/live_backing.log (I added some manual logging - search for "mconley") even with this patch, we're still reading dom.ipc.tabs.shutdownTimeoutSecs as 5.
Which suggests to me that this patch is just not properly setting the pref. Any idea what I'm doing wrong?
Comment 27•8 years ago
|
||
Could it be that this preference needs a restart of the application? It would be the only possible idea I currently have.
Flags: needinfo?(hskupin)
Comment hidden (mozreview-request) |
Comment 29•8 years ago
|
||
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8f2224b00d74
Workaround content process shutdown kills in Firefox UI tests by disabling timeout for some tests. r=whimboo
Comment 30•8 years ago
|
||
mozreview-review |
Comment on attachment 8778917 [details]
Bug 1290186 - Workaround content process shutdown kills in Firefox UI tests by disabling timeout for some tests.
https://reviewboard.mozilla.org/r/70026/#review67544
::: testing/firefox-ui/tests/puppeteer/test_windows.py:30
(Diff revisions 1 - 2)
> which causes these tests to orange. We side-step this by
> setting dom.ipc.tabs.shutdownTimeoutSecs to 0, which disables
> the shutdown timer.
> """
> - self.prefs.set_pref('dom.ipc.tabs.shutdownTimeoutSecs', 0)
> + FirefoxTestCase.setUp(self)
> + self.marionette.set_pref('dom.ipc.tabs.shutdownTimeoutSecs', 0)
Can you please explain the change here to use `self.marionette.set_pref()`? It has different behavior and the modified pref is not being reset at the end of the test! So we would need a custom tearDown method for BaseWindowTestCase.
Comment 31•8 years ago
|
||
I requested a backout of the patch from autoland due to the above mentioned issue. We decided not to affect other tests than the test_windows one.
Comment 32•8 years ago
|
||
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/643a67968688
Backed out changeset 8f2224b00d74 on request from whimboo
Assignee | ||
Comment 33•8 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #30)
> Can you please explain the change here to use `self.marionette.set_pref()`?
> It has different behavior and the modified pref is not being reset at the
> end of the test! So we would need a custom tearDown method for
> BaseWindowTestCase.
Oh, yikes! I apologize - I shouldn't have been so cavalier. I assumed that self.marionette.set_pref was an alias for self.prefs.set_pref. I didn't realize their behaviour was different.
Out of curiosity, how does self.marionette.set_pref differ from self.marionette.enforce_gecko_prefs?
I'll switch this back to self.prefs.set_pref and re-request review.
Comment hidden (mozreview-request) |
Assignee | ||
Updated•8 years ago
|
Attachment #8778917 -
Flags: review+ → review?(hskupin)
Comment 35•8 years ago
|
||
(In reply to Mike Conley (:mconley) - (Needinfo me!) from comment #33)
> Oh, yikes! I apologize - I shouldn't have been so cavalier. I assumed that
> self.marionette.set_pref was an alias for self.prefs.set_pref. I didn't
> realize their behaviour was different.
Yeah, puppeteer has its own prefs methods at the moment. I filed a mentored project to get rid of them. See bug 1293588.
> Out of curiosity, how does self.marionette.set_pref differ from
> self.marionette.enforce_gecko_prefs?
set_pref() will not survive a restart while with enforce_gecko_prefs they do.
Comment 36•8 years ago
|
||
mozreview-review |
Comment on attachment 8778917 [details]
Bug 1290186 - Workaround content process shutdown kills in Firefox UI tests by disabling timeout for some tests.
https://reviewboard.mozilla.org/r/70026/#review67660
::: testing/firefox-ui/tests/puppeteer/test_windows.py:30
(Diff revision 3)
> + which causes these tests to orange. We side-step this by
> + setting dom.ipc.tabs.shutdownTimeoutSecs to 0, which disables
> + the shutdown timer.
> + """
> + FirefoxTestCase.setUp(self)
> + self.prefs.set_pref('dom.ipc.tabs.shutdownTimeoutSecs', 0)
I can remember you mentioned problems that the new pref value is not accounted and we still crash. Were you able to figure it out why it happened? I miss an appropriate comment.
Attachment #8778917 -
Flags: review?(hskupin) → review+
Assignee | ||
Comment 37•8 years ago
|
||
mozreview-review-reply |
Comment on attachment 8778917 [details]
Bug 1290186 - Workaround content process shutdown kills in Firefox UI tests by disabling timeout for some tests.
https://reviewboard.mozilla.org/r/70026/#review67660
> I can remember you mentioned problems that the new pref value is not accounted and we still crash. Were you able to figure it out why it happened? I miss an appropriate comment.
Ah, yeah, the reason was because I had erroneously named the method `setup` instead of `setUp`, so it wasn't being called correctly. I also needed to call the parent class `setUp`, otherwise `self.prefs` is not defined.
Comment 38•8 years ago
|
||
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d4b23db9ec3a
Workaround content process shutdown kills in Firefox UI tests by disabling timeout for some tests. r=whimboo
Comment 39•8 years ago
|
||
Ah, right. I now remember this change! Thanks for the additional information.
Comment 40•8 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox51:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Comment 41•8 years ago
|
||
bugherder uplift |
status-firefox50:
--- → fixed
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•