Open Bug 1761634 Opened 3 years ago Updated 1 year ago

WebDriver:ExecuteScript returns null where no null is possible

Categories

(Remote Protocol :: Marionette, defect, P3)

Firefox 91
defect

Tracking

(Not tracked)

People

(Reporter: 4mr.minj, Unassigned)

References

(Depends on 1 open bug, )

Details

(Whiteboard: [webdriver:backlog])

Attachments

(1 file)

Attached file trace.log (deleted) —

Using Firefox: 91.6.0 ESR

Error is intermittent only: unexpected null is returned from WebDriver:ExecuteScript where no null is theoretically possible.

See line 60 (Marionette request on 1646144672743):

return globalThis?.window?.jQuery?.readyWait ?? -1; returns null.

It seems marionette intermittently gives up and just gives me null where no null is possible according to JS spec.

Is this some sort of race condition related to the following?
MarionetteCommands actor created for window id

Is this just Bug 1673478?

NOTE: Error first noticed trying to upgrade from FF v78.15.0 esr to v91.6.0 esr, but seems to be introduced with FF v84.0b1.
I suspect Bug 1669172 or something related may have introduced it.

This is a repost from Geckodriver GitHub as the team there seemed to be positively unphased by my report, Besides I believe this is a more appropriate place for marionette bugs any way.

(In reply to Mindaugas <LA-MJ> from comment #0)

See line 60 (Marionette request on 1646144672743):

return globalThis?.window?.jQuery?.readyWait ?? -1; returns null.

It seems marionette intermittently gives up and just gives me null where no null is possible according to JS spec.

I'm not sure which JS spec you are referring to here but Execute Script can as well return null. Also as I already requested on the github issue the additional log with details about the values for globalThis, globalThis.window etc was not provided. Instead it was claimed Please don't interrogate my code, I know how it works. If you want to see progress on issues that you are filing please provide the information that you have been requested for or at least hand-over a minimized testcase.

Is this some sort of race condition related to the following?
MarionetteCommands actor created for window id

Is this just Bug 1673478?

Probably not by what I see from the log but I wouldn't exclude it for now.

NOTE: Error first noticed trying to upgrade from FF v78.15.0 esr to v91.6.0 esr, but seems to be introduced with FF v84.0b1.
I suspect Bug 1669172 or something related may have introduced it.

So at least it might be a regression from the Fission compatibility work.

By that time we had the marionette.actors.enabled preference which could be used to turn on/off the new code. If you could set it to true within the WebDriver capabilities, and then help us testing older Firefox Nightly releases that would be great. I assume the Firefox 83 release works with this setting? for the checks you can use the tool mozregression. It has a -c argument to run any command that you specify that determines if a build is fine or not. Depending on how often it fails/passes you might want to run your code snippet several times.

This is a repost from Geckodriver GitHub as the team there seemed to be positively unphased by my report, Besides I believe this is a more appropriate place for marionette bugs any way.

Please note that this is the same team here. And we are not unphased but are still waiting for the requested information (see above). Also note that our work is based on priorities and your report is the first one in this vein and there is no chance for us to get it reproduced. So instead of blaming lets work together in getting the issue investigated. This indeed would be more fruitful. Thanks.

Severity: -- → S3
Priority: -- → P3

So finally we come to the root of the communication problem. I guess some of my assumptions are wrong.

  • I assumed that we all know what globalThis and window mean in a browser context.
  • I assumed that ExecuteScript works like a code block wrapped in a IIFE. Why else would one be allowed to use return?
    • Which would all make the meaning of jQuery?.readyWait totally irrelevant to the problem at hand. As according to EcmaScript specification ?? coalesces to the RHS when LHS is null.
    • But in case there is still interest, one can see this readyWait field in jQuery's codebase.

As I've mentioned this is an intermittent and unpredictable problem. It occurs as part of a large testing harness of a proprietary website. Therefore there is no test case to provide.

All of this has lead me to believe all of the relevant information is already provided. Regression testing on nightly builds takes an overwhelming amount of time and effort, especially when you need hours to reproduce the problem, and will have to wait.

What this code is used for is in a routine detecting browser activity (animations, transitions, onready handlers, navigation).

Let me know, what other information is still otherwise missing

Given the script executed, I agree that null can only come from our side here (and so probably from https://searchfox.org/mozilla-central/rev/570f6e5c06b6c8140f53bf104d785a18165212ab/remote/marionette/actors/MarionetteCommandsParent.jsm#346)

This would typically happen if a navigation occurs while the script is executed. Two options:

Can you help us confirm that? Is it possible that a navigation happens early in the test?

Blocks: 1768549
Severity: S3 → --
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows 10 → All
Priority: P3 → --
Hardware: x86_64 → All
Severity: -- → S3
Priority: -- → P3
Whiteboard: [webdriver:backlog]

We might want to think about catching an AbortError already in the executeScript method of the MarionetteCommandsParent actor. As such the client will at least get a valid error response instead of a null value.

We will observe intermittent failures during the next days to see what the logger entry from bug 1773939 will tell us.

The test failure over on bug 1761634 shows that we return null instead of raising an exception.

[task 2022-06-17T22:50:51.153Z] 22:50:51     INFO - PID 18459 | 1655506251152	Marionette	DEBUG	0 -> [0,80,"WebDriver:ExecuteScript",{"args":[],"script":"\n        return window.name;\n        "}]
[task 2022-06-17T22:50:51.161Z] 22:50:51     INFO - PID 18459 | 1655506251159	Marionette	TRACE	[43] Querying "executeScript" failed with AbortError, returning "null" as fallback
[task 2022-06-17T22:50:51.162Z] 22:50:51     INFO - PID 18459 | 1655506251161	Marionette	DEBUG	0 <- [1,80,null,{"value":null}]

As such we should maybe start with Execute Script and special-case the handling so that a JavaScriptError gets returned.

Then it would also be good to know why or at least when the actor has been destroyed. Right now we don't seem to log that as well.

No longer blocks: 1768549
Depends on: 1673478
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: