Closed Bug 1669005 Opened 4 years ago Closed 4 years ago

Perma [Fission] /fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html | TypeError: window.__wptrunner_process_next_event is not a function

Categories

(Core :: DOM: Networking, defect, P5)

defect

Tracking

()

RESOLVED FIXED
84 Branch
Tracking Status
firefox82 --- unaffected
firefox83 --- wontfix
firefox84 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: whimboo)

References

(Regression)

Details

(Keywords: intermittent-failure, regression, Whiteboard: [marionette-fission-mvp][simple])

Attachments

(1 file)

Filed by: nbeleuzu [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=317478408&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/ALMETU7TSaqEmsKThETp6g/runs/0/artifacts/public/logs/live_backing.log


[task 2020-10-03T12:52:41.998Z] 12:52:41  WARNING - JavascriptException: TypeError: window.__wptrunner_process_next_event is not a function
[task 2020-10-03T12:52:41.999Z] 12:52:41  WARNING - stacktrace:
[task 2020-10-03T12:52:42.000Z] 12:52:41  WARNING - 	@tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py:92:8
[task 2020-10-03T12:52:42.001Z] 12:52:41  WARNING - 	@tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py:93:8
[task 2020-10-03T12:52:42.003Z] 12:52:41  WARNING - 
[task 2020-10-03T12:52:42.003Z] 12:52:41  WARNING - 
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO - TEST-UNEXPECTED-ERROR | /fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html | TypeError: window.__wptrunner_process_next_event is not a function
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO - Traceback (most recent call last):
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 744, in run_func
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO -     self.result = True, self.func(self.protocol, self.url, self.timeout)
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 857, in do_testharness
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO -     self.script_resume % format_map, asynchronous=True)
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 87, in execute_script
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO -     return method(script, new_sandbox=False, sandbox=None)
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py", line 1653, in execute_async_script
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO -     rv = self._send_message("WebDriver:ExecuteAsyncScript", body, key="value")
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/decorators.py", line 26, in _
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO -     return func(*args, **kwargs)
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py", line 602, in _send_message
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO -     self._handle_error(err)
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py", line 622, in _handle_error
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO -     raise errors.lookup(error)(message, stacktrace=stacktrace)
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO - JavascriptException: TypeError: window.__wptrunner_process_next_event is not a function
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO - stacktrace:
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO - 	@tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py:92:8
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO - 	@tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py:93:8
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO - 
[task 2020-10-03T12:52:42.035Z] 12:52:42     INFO - TEST-INFO expected TIMEOUT | took 226ms
[task 2020-10-03T12:52:42.051Z] 12:52:42     INFO - PID 24684 | 1601729562043	Marionette	INFO	Stopped listening on port 34219
[task 2020-10-03T12:52:42.196Z] 12:52:42     INFO - PID 24684 | ###!!! [Child][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2020-10-03T12:52:42.560Z] 12:52:42     INFO - Browser exited with return code 0
[task 2020-10-03T12:52:42.561Z] 12:52:42     INFO - PROCESS LEAKS None
[task 2020-10-03T12:52:42.562Z] 12:52:42     INFO - Closing logging queue
[task 2020-10-03T12:52:42.562Z] 12:52:42     INFO - queue closed
[task 2020-10-03T12:52:42.583Z] 12:52:42     INFO - Application command: /builds/worker/workspace/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmpqihwa2
[task 2020-10-03T12:52:42.601Z] 12:52:42     INFO - PID 24939 | console.warn: SearchSettings: "get: No settings file exists, new profile?" (new Error("", "(unknown module)"))
[task 2020-10-03T12:52:42.601Z] 12:52:42     INFO - Starting runner
[task 2020-10-03T12:52:43.264Z] 12:52:43     INFO - PID 24939 | 1601729563258	Marionette	INFO	Listening on port 47672```
Flags: needinfo?(hskupin)

I cannot see what in this merge to mozilla-central could have broken the tests:
https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=7d7faf0b6d7a7ee76d6685a61f26b4c20a6e5d29

Also the failure only happened on Saturday but since then all wpt2 jobs on mozilla-central are fine again. Maybe some backout fixed it?

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&searchStr=wpt2%2Cfis&fromchange=7d7faf0b6d7a7ee76d6685a61f26b4c20a6e5d29

Sheriffs, is there anything left to do here? If not we should get this bug closed.

Flags: needinfo?(hskupin) → needinfo?(sheriffs)
Summary: Perma /fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html | TypeError: window.__wptrunner_process_next_event is not a function → Perma [Fission] /fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html | TypeError: window.__wptrunner_process_next_event is not a function

This actually was caused by bug 1668357 when enabling SHIP for Fission, and I assume the test was marked as expected fail with some follow-up landings?

Regressed by: 1668357
Has Regression Range: --- → yes
Keywords: regression

Actually bug 1668986 made this an expected error for Fission:

https://hg.mozilla.org/mozilla-central/diff/e12083bcf22e97f163aee698e2e22a6b53af73ea/testing/web-platform/meta/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html.ini

So we should leave this bug open for further investigation. I will see what I can do over the next days to have a look at Marionette's navigation code.

Flags: needinfo?(sheriffs)
Keywords: leave-open

So here is the navigation as triggered by Marionette:

0:19.66 pid:58865 1601901354753	Marionette	DEBUG	2 -> [0,12,"WebDriver:Navigate",{"url":"http://web-platform.test:8000/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html"}]
 0:19.67 pid:58865 1601901354759	Marionette	TRACE	Received message beforeunload for about:blank
 0:19.69 pid:58865 1601901354773	Marionette	TRACE	Received message pagehide for about:blank
 0:19.70 pid:58865 1601901354792	Marionette	TRACE	Received message DOMContentLoaded for http://web-platform.test:8000/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html
 0:19.70 pid:58865 1601901354793	Marionette	DEBUG	2 <- [1,12,null,{"value":null}]

The WebDriver:Navigate command returns after the DOMContentLoaded event has been received. By that time the DOM has been loaded, but would that also include the embedded JavaScript code? AFAICR the document's readyState can be interactive before the scripts have been run, and as such the command would return too early. Sadly I cannot remember right now on which bug that investigation has happened.

Here the file that should be loaded on DOMContentLoaded:
https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/testharnessreport.js

Olli, do you have an idea? Are there any logs that I can enable to see when JS code has been loaded?

Flags: needinfo?(bugs)

Some of embedded javascript may be executed after DOMContentLoaded, for example if it is async.

Flags: needinfo?(bugs)

async scripts should be executed and loaded before page's load event fires.

Oh, this is good to know. Thanks!

James, what was the reason to set a page load strategy of eager for the tests? Given that all browsers make use of it, I assume it was widely agreed on and will stay that way. Given the problem with the testharness report script being executed after DOMContentLoaded we might have to wait until the expected object is available, and not bail out by the first attempt accessing it.

Flags: needinfo?(james)

I don't think this makes sense. The testharnessreport.js script would indeed execute after DOMContentLoaded if it was async or defer, but it's not; it's a script that executes as soon as it's inserted into the document.

Flags: needinfo?(james)

I added some logging to various parts of the page/harness lifecycle. Numbers on the left are performance.now counters:

Without fission:

1118 testharnessreport.js loaded testharnessreport.js:84:9
1166 before page loaded script-resource-with-json-parser-breaker.tentative.sub.html:82:9
1313 DOMContentLoaded script-resource-with-json-parser-breaker.tentative.sub.html:84:52
1987 webdriver_resume testing/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py:87:9

With fission:

193 testharnessreport.js loaded testharnessreport.js:84:9
208 before page loaded script-resource-with-json-parser-breaker.tentative.sub.html:82:9
222 DOMContentLoaded script-resource-with-json-parser-breaker.tentative.sub.html:84:52

Note that with fission the page lifecycle events are in the same/correct order but we don't see the webdriver_resume log at all. We can also look at the marionette logs after the call to navigate:

Without fission:

 0:31.56 pid:1133356 1602237599514	Marionette	DEBUG	2 -> [0,13,"WebDriver:Navigate",{"url":"http://web-platform.test:8000/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html"}]
 0:31.60 pid:1133356 1602237599560	Marionette	TRACE	Received message beforeunload for about:blank
 0:32.31 pid:1133356 1602237600266	Marionette	TRACE	Received message pagehide for about:blank
 0:32.62 pid:1133356 1602237600565	Marionette	TRACE	Received message beforeunload for http://web-platform.test:8000/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html
 0:32.80 pid:1133356 1602237600746	Marionette	TRACE	Received message DOMContentLoaded for http://web-platform.test:8000/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html
 0:33.12 pid:1133356 1602237601071	Marionette	TRACE	Received message pagehide for http://web-platform.test:8000/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html
 0:34.20 pid:1133356 1602237602148	Marionette	TRACE	Received message DOMContentLoaded for http://web-platform.test:8000/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html
 0:34.20 pid:1133356 1602237602150	Marionette	DEBUG	2 <- [1,13,null,{"value":null}]
 0:34.24 pid:1133356 JavaScript error: http://www1.web-platform.test:8000/fetch/corb/resources/sniffable-resource.py?type=text/html&body=)%5D%7D%27, line 1: SyntaxError: expected expression, got ')'
 0:34.30 pid:1133356 1602237602259	Marionette	DEBUG	2 -> [0,14,"WebDriver:ExecuteAsyncScript",{"scriptTimeout":null,"newSandbox":false,"args":[],"filename":"testing/web-platform/test ... r_callback = arguments[arguments.length - 1];\nwindow.__wptrunner_process_next_event();\n//edited","sandbox":null,"line":84}]

With fission:

0:37.25 pid:1133937 1602237737389	Marionette	DEBUG	2 -> [0,12,"WebDriver:Navigate",{"url":"http://web-platform.test:8000/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html"}]
 0:37.27 pid:1133937 1602237737410	Marionette	TRACE	Received message beforeunload for about:blank
 0:37.30 pid:1133937 1602237737439	Marionette	TRACE	Received message pagehide for about:blank
 0:37.37 pid:1133937 1602237737507	Marionette	TRACE	Received message DOMContentLoaded for http://web-platform.test:8000/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html
 0:37.37 pid:1133937 1602237737510	Marionette	DEBUG	2 <- [1,12,null,{"value":null}]
 0:37.43 pid:1133937 1602237737570	Marionette	DEBUG	2 -> [0,13,"WebDriver:ExecuteAsyncScript",{"scriptTimeout":null,"newSandbox":false,"args":[],"filename":"testing/web-platform/test ... r_callback = arguments[arguments.length - 1];\nwindow.__wptrunner_process_next_event();\n//edited","sandbox":null,"line":84}]

Note that the logging for the working case looks pretty different; it claims we get beforeunload, DOMContentLoaded, and pagehide events for the test page before getting a second DOMContentLoaded event and then finally returning from the navigate command; the fission case looks more like you'd expect with a single DOMContentLoaded event. So I'm not sure what's really going on; it seems very unlikley that we actually have two DOMContentLoaded events from the test page. I also can't see that the test is doing anything especially unusual. Anyway I think there's a real bug here, possibly in marionette.

Sorry for the delay here. Your answer went through unnoticed until this morning. So when I have a look at the logs when I disable the handling of DOMContenLoaded and force a complete page load strategy the log looks like the following:

0:12.65 pid:85275 1604653221821	Marionette	DEBUG	2 -> [0,12,"WebDriver:Navigate",{"url":"http://web-platform.test:8000/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html"}]
 0:12.66 pid:85275 1604653221828	Marionette	TRACE	[6442450945] MarionetteEvents actor created for window id 6442450948
 0:12.66 pid:85275 1604653221829	Marionette	TRACE	Received event beforeunload for about:blank
 0:12.66 pid:85275 1604653221833	Marionette	TRACE	Received event pagehide for about:blank
 0:12.67 pid:85275 1604653221836	Marionette	TRACE	[6442450945] MarionetteEvents actor created for window id 6442450949
 0:12.68 pid:85275 1604653221848	Marionette	TRACE	Received event DOMContentLoaded for http://web-platform.test:8000/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html
 0:12.68 pid:85275 1604653221850	Marionette	TRACE	Received event beforeunload for http://web-platform.test:8000/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html
 0:12.68 pid:85275 1604653221854	Marionette	TRACE	Received event pagehide for http://web-platform.test:8000/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html
 0:12.70 pid:85275 *** RUNNING testharnessreport.js
 0:12.70 pid:85275 *** DONE testharnessreport.js - function() {
 0:12.70 pid:85275   /* This function handles the next testdriver event. The presence of
 0:12.71 pid:85275      window.testdriver_callback is used as a switch; when that function
[..]
 0:12.71 pid:85275 1604653221877	Marionette	TRACE	[6442450945] MarionetteEvents actor created for window id 6442450950
 0:12.71 pid:85275 1604653221878	Marionette	TRACE	Received event DOMContentLoaded for http://web-platform.test:8000/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html
[..]
 0:12.77 pid:85275 1604653221944	Marionette	TRACE	Received event pageshow for http://web-platform.test:8000/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html
 0:12.77 pid:85275 1604653221945	Marionette	DEBUG	2 <- [1,12,null,{"value":null}]

Note that after the first DOMContentLoaded event the page undergoes another navigation. So whatever triggers that it's causing the formerly defined window.__wptrunner_process_next_event to be non-existent anymore.

Also interesting is that when I add a beforeunload event handler I do not see it called for that particular unload of the page, but only after the test when the next URL gets loaded, or the window closed in case of running only this single test.

Further investigation has shown that in the failing cases all the page related events are getting fired with a document.readyState of complete first, which looks wrong:

 0:09.84 pid:92436 1604657352246	Marionette	DEBUG	2 -> [0,12,"WebDriver:Navigate",{"url":"http://web-platform.test:8000/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html"}]
 0:09.85 pid:92436 1604657352250	Marionette	TRACE	[6442450945] MarionetteEvents actor created for window id 6442450948
 0:09.85 pid:92436 1604657352250	Marionette	TRACE	** beforeunload - readyState: complete
 0:09.86 pid:92436 1604657352255	Marionette	TRACE	Received event beforeunload (complete) for about:blank
 0:09.86 pid:92436 1604657352263	Marionette	TRACE	** pagehide - readyState: complete
 0:09.86 pid:92436 1604657352264	Marionette	TRACE	Received event pagehide (complete) for about:blank
 0:09.86 pid:92436 1604657352266	Marionette	TRACE	[6442450945] MarionetteEvents actor created for window id 6442450949
 0:09.86 pid:92436 1604657352266	Marionette	TRACE	** DOMContentLoaded - readyState: complete
 0:09.88 pid:92436 1604657352283	Marionette	TRACE	Received event DOMContentLoaded (complete) for http://web-platform.test:8000/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html
 0:09.88 pid:92436 1604657352283	Marionette	TRACE	** beforeunload - readyState: complete
 0:09.89 pid:92436 1604657352291	Marionette	TRACE	Received event beforeunload (complete) for http://web-platform.test:8000/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html
 0:09.91 pid:92436 1604657352307	Marionette	TRACE	** pagehide - readyState: complete
 0:09.91 pid:92436 1604657352307	Marionette	TRACE	Received event pagehide (complete) for http://web-platform.test:8000/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html
[..]
0:09.93 pid:92436 1604657352329	Marionette	TRACE	[6442450945] MarionetteEvents actor created for window id 6442450950
 0:09.93 pid:92436 1604657352329	Marionette	TRACE	** DOMContentLoaded - readyState: interactive
 0:09.93 pid:92436 1604657352330	Marionette	TRACE	Received event DOMContentLoaded (interactive) for http://web-platform.test:8000/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html
 0:09.93 pid:92436 1604657352330	Marionette	TRACE	** pageshow - readyState: complete
 0:09.93 pid:92436 1604657352330	Marionette	DEBUG	2 <- [1,12,null,{"value":null}]

I mean we can easily ignore those combinations in Marionette but I wonder why those strange events are getting sent out. Olli, do you have an idea?

Flags: needinfo?(bugs)
Depends on: 1675728
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Attachment #9186216 - Attachment description: Bug 1669005 - [wpt] Update meta data for /fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html for fission jobs. → Bug 1669005 - [wpt] Update wpt meta data for Marionette page load improvements..

So I chatted with Olli and he requested to minimize the HTML file. While doing so I noticed that the problem only exists as long as the meta tag for UTF-8 is present:

https://searchfox.org/mozilla-central/rev/919f7400e2e12bc963ee2ae8400ef961972c3059/testing/web-platform/tests/fetch/corb/script-resource-with-json-parser-breaker.tentative.sub.html#21

When I remove it the page load looks fine.

The fact that it's 1106 bytes from the top of the document to the meta element makes me suspicious that this is the meta prescan failing and the document being reparsed, or something. If we move the <meta charset> above the comment does it all work out? In any case I think there might be a bug here that we're able to observe the partial state.

Flags: needinfo?(hsivonen)

Yes, moving the meta tag above the comment makes it behave correctly. There is no more a DOMContentLoaded event with a document.readyState of complete.

Flags: needinfo?(bugs)

Please don't put comments above <meta charset>. Doing so results in problems like this.

The partial state is observable, because a late meta causes a reload, so whatever happened before the reload is visible.

Flags: needinfo?(hsivonen)

(In reply to Henri Sivonen (:hsivonen) from comment #18)

Please don't put comments above <meta charset>. Doing so results in problems like this.

The partial state is observable, because a late meta causes a reload, so whatever happened before the reload is visible.

Given that Marionette / geckodriver are widely used by eg users of Selenium, and we do not have the control over how websites are built, shall we simply ignore these DOMContentLoaded events with complete as readyState? Or is there a better way to know that a pre-parsing failed, and the page gets reloaded?

Flags: needinfo?(hsivonen)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #19)

(In reply to Henri Sivonen (:hsivonen) from comment #18)

Please don't put comments above <meta charset>. Doing so results in problems like this.

The partial state is observable, because a late meta causes a reload, so whatever happened before the reload is visible.

Given that Marionette / geckodriver are widely used by eg users of Selenium, and we do not have the control over how websites are built, shall we simply ignore these DOMContentLoaded events with complete as readyState? Or is there a better way to know that a pre-parsing failed, and the page gets reloaded?

This isn't a matter of a pre-parsing failing but a real parse commencing and a renavigation to the same URL getting triggered subsequently. AFAICT, the spec requires an "abort" and the abort is required to set the readyState to complete. I don't have good ideas for how to distinguish an implicit abort from a proper finish. (Notably, the abort event appears to be for user-initiated aborts, i.e. the Stop button, and not for implicit platform-triggered ones.)

Flags: needinfo?(hsivonen)
Depends on: 1676742
No longer depends on: 1676742

Thanks! We will go ahead and just ignore these DOMContentLoaded events then.

Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/417047568b4b [wpt] Update wpt meta data for Marionette page load improvements.. r=jgraham,marionette-reviewers,maja_zf
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/26530 for changes under testing/web-platform/tests
Upstream PR merged by moz-wptsync-bot

This has been fixed now.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Keywords: leave-open
Resolution: --- → FIXED
Whiteboard: [marionette-fission-mvp][simple]
Target Milestone: --- → 84 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: