Closed Bug 1528456 Opened 6 years ago Closed 6 years ago

Intermittent dom/ipc/tests/browser_domainPolicy.js | Test timed out -

Categories

(Core :: DOM: Content Processes, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla67
Tracking Status
firefox-esr60 --- unaffected
firefox65 --- unaffected
firefox66 --- unaffected
firefox67 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: bzbarsky)

References

Details

(Keywords: intermittent-failure, regression, Whiteboard: [retriggered][stockwell disable-recommended])

Attachments

(1 file)

#[markdown(off)]
Filed by: nbeleuzu [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=228803701&repo=autoland

https://queue.taskcluster.net/v1/task/Bbgv8C-PSpuIi5o20KhpGQ/runs/1/artifacts/public/logs/live_backing.log

[task 2019-02-16T07:07:36.276Z] 07:07:36 INFO - TEST-PASS | dom/ipc/tests/browser_domainPolicy.js | Checking script-enabled for root (https://sub1.test1.example.com/browser/dom/ipc/tests/file_disableScript.html) -
[task 2019-02-16T07:07:36.278Z] 07:07:36 INFO - Creating child process first, activating domainPolicy after
[task 2019-02-16T07:07:36.283Z] 07:07:36 INFO - Buffered messages logged at 07:06:54
[task 2019-02-16T07:07:36.285Z] 07:07:36 INFO - Console message: [JavaScript Error: "The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol." {file: "http://mochi.test:8888/browser/dom/ipc/tests/file_domainPolicy_base.html" line: 0}]
[task 2019-02-16T07:07:36.287Z] 07:07:36 INFO - Buffered messages finished
[task 2019-02-16T07:07:36.289Z] 07:07:36 INFO - TEST-UNEXPECTED-FAIL | dom/ipc/tests/browser_domainPolicy.js | Test timed out -
[task 2019-02-16T07:07:36.291Z] 07:07:36 INFO - GECKO(9225) | MEMORY STAT | vsize 650MB | residentFast 268MB | heapAllocated 77MB
[task 2019-02-16T07:07:36.293Z] 07:07:36 INFO - TEST-OK | dom/ipc/tests/browser_domainPolicy.js | took 45035ms
[task 2019-02-16T07:07:36.295Z] 07:07:36 INFO - Not taking screenshot here: see the one that was previously logged
[task 2019-02-16T07:07:36.299Z] 07:07:36 INFO - TEST-UNEXPECTED-FAIL | dom/ipc/tests/browser_domainPolicy.js | Found a tab after previous test timed out: http://mochi.test:8888/browser/dom/ipc/tests/file_domainPolicy_base.html -

::dao , this fail started from your push on autoland and it`s pretty intermittent.
Could you please take a look?

Flags: needinfo?(dao+bmo)

(In reply to Narcis Beleuzu [:NarcisB] from comment #1)

::dao , this fail started from your push on autoland and it`s pretty intermittent.
Could you please take a look?

It's ntim's patch so I think you'd want to ask him. However, that patch only affects the Library window UI (where users can manage their browsing history and bookmarks), and only on Linux. It doesn't look like it could cause this failure.

Flags: needinfo?(dao+bmo)

Tim can you take a look at this?

It has 29 total failures in ~3 days on linux32, linux64, osx and win.

Flags: needinfo?(ntim.bugs)
Whiteboard: [stockwell needswork:owner]

(In reply to Andreea Pavel [:apavel] from comment #4)

Tim can you take a look at this?

It has 29 total failures in ~3 days on linux32, linux64, osx and win.

Like I said this is very unlikely to be caused by Tim's patch.

As Dão says, my patch only touched the library window, so it seems to be completely unrelated to the failure.

I see that treeherder didn’t run the affected test suite on :bz’s previous push: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=367ca1e168537c05f091c766e11b2c8338621cba

So that could have been that patch that caused the failure (and it would seem more likely).

I’ve triggered bc1 on bz’s push.

Flags: needinfo?(ntim.bugs)

Ni? Boris who might know about this.

Flags: needinfo?(bzbarsky)

Is this only happening on 32-bit opt Linux?

Flags: needinfo?(nbeleuzu)
Blocks: 1528146
Flags: needinfo?(bzbarsky)

This test is one of a very small number of places we listen for "DOMDocElementInserted" events, so that seems like a pretty plausible connection...

Flags: needinfo?(bzbarsky)

:bz this is frequently failing on:

  • linux32 opt and debug
  • linux64 pgo, opt and debug
  • osx-10-10 opt and debug

https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-02-13&endday=2019-02-20&tree=trunk&bug=1528456

Flags: needinfo?(nbeleuzu)

For what it's worth, I've confirmed on try that not doing nsDocElementCreatedNotificationRunner for non-initial about:blank makes the timeouts go away. Still trying to understand what the test is doing enough to figure out where the race is, and/or reproduce the timeout under rr.

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #14)

For what it's worth, I've confirmed on try that not doing nsDocElementCreatedNotificationRunner for non-initial about:blank makes the timeouts go away. Still trying to understand what the test is doing enough to figure out where the race is, and/or reproduce the timeout under rr.

Thanks for looking Boris. Let us know if this needs to be disabled if you find no other solution.
Removing the stockwell tag until then.

Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered][stockwell needswork:owner]

The old complicated code here was working around a fundamental race in the
test. The test was trying to do the following:

  1. Open a new tab. This sends an IPC message to do that and to start loading
    about:blank in the tab.

  2. Add a "load" listener. This sends an IPC message to add the listener.

  3. Start loading the URL we really want to load. This sends an IPC message to
    start the load.

There are two races here: the message from step 2 can be received before or
after the about:blank load is done, and the message from step 3 can be received
before or after the about:blank load is done. If step 2 wins the race (we add
the listener before the about:blank load is done) but step 3 loses the race
(new load starts only after the about:blank load is done), then we would get a
load event for the about:blank, not for the url we care about.

The test worked around this by having step 2 add a listener for the
"DOMDocElementInserted" event, which about:blank did not fire. Then when that
happened (indicating that the load from step 3 is now in progress), it added a
"load" listener. This fixed the race.

Bug 1528146 fixed about:blank to also fire "DOMDocElementInserted", so the test
became racy again. Now if the "DOMDocElementInserted" listener got added
after the about:blank load had fired that event (which is what usually
happened), then the test passed. But if the IPC message from step 2 to add the
"DOMDocElementInserted" listener won the race against the about:blank load
firing "DOMDocElementInserted", then we added the "load" listener too early and
got the "load" event for the about:blank, not the document we cared about.

The fix is to just use the (now) existing openNewForegroundTab function which
lets us pass in a url, then wait until that URL is loaded, instead of trying to
reinvent that wheel.

There is an additional ride-along change to make openNewForegroundTab always
create a new process in this test, which is the original intent of the test but
wasn't necessarily guaranteed by the old code.

Assignee: nobody → bzbarsky
Flags: needinfo?(bzbarsky)
Pushed by bzbarsky@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/febefd80b125
Fix browser_domainPolicy.js to not be racy.  r=Gijs
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: