Closed Bug 1394354 Opened 7 years ago Closed 4 years ago

WebDriver:ElementClick sometimes does not trigger a click

Categories

(Remote Protocol :: Marionette, defect, P1)

57 Branch
defect

Tracking

(firefox82 fixed, firefox83 fixed)

RESOLVED FIXED
83 Branch
Tracking Status
firefox82 --- fixed
firefox83 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

(Blocks 3 open bugs)

Details

(Keywords: testcase-wanted, Whiteboard: [marionette-fission-mvp][simple])

Attachments

(3 files)

As noticed by different tests the click command sometimes succeeds but doesn't trigger an action. As such tests are waiting and timing out.
Blocks: 1378087
Clarification: judging by the logs in Bug 1378087, this is not related to actions or legacy actions.
Summary: click sometimes does not trigger an action → clickElement sometimes does not trigger a click
Blocks: 1394614
Blocks: 1374283
Blocks: 1396265
Priority: -- → P3
Maybe we can wait for bug 1321516 and see if that helps once we make use of the conformant click for everything.
Depends on: 1321516
Blocks: 1399761
Blocks: 1400107
This issue raised here appears to be the same problem https://github.com/mozilla/geckodriver/issues/1017
Do we have a definitive test case for this? Or some metric to tell if it is still occurring?
Blocks: webdriver
Flags: needinfo?(hskupin)
Priority: P3 → P2
Summary: clickElement sometimes does not trigger a click → WebDriver:ElementClick sometimes does not trigger a click
I'm not aware of some metrics yet. Given that we just released version 58 with webdriver click turned on by default, we could ask people again to provide a trace log. This might give us a better understanding.
Flags: needinfo?(hskupin)
Depends on: 1490268
Blocks: 1493112
No longer blocks: 1493112
Blocks: 1405967
Depends on: 1496683
Depends on: 1499360
Depends on: 1552742
Blocks: 1665250

As noticed on bug 1665209 we sometimes may hit the non-spec conforming click code but the one for Selenium:
https://searchfox.org/mozilla-release/rev/7ac70cbbbe2f44d1140535d2909458d04212b32f/testing/marionette/interaction.js#130-133

I would suggest that we add a trace log entry if that code path is used. It would give us an understanding if my assumption is correct.

Assignee: nobody → hskupin
Status: NEW → ASSIGNED

I'm only going to land the trace log patch but not going to fix this bug at the moment.

Assignee: hskupin → nobody
Status: ASSIGNED → NEW
Keywords: leave-open
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/05a47a0c800a [marionette] Add trace log output when using non spec-compatible element click. r=marionette-reviewers,maja_zf
Assignee: hskupin → nobody
Status: ASSIGNED → NEW
Blocks: 1665785
No longer blocks: 1665785
No longer blocks: 1665250
Blocks: 1665004
Blocks: 1665141
Blocks: 1665829
Blocks: 1665021

Now that we know if we call the legacy click by accident and that this doesn't actually happen in CI these days, we should make sure that the click definitely gets synthesized on the target element. For that I would propose that we add a new trace logger entry here:

https://searchfox.org/mozilla-release/rev/3856b575c1de56ecf63717c1b2f42740549ec3d5/testing/marionette/interaction.js#399-405

While I was adding that I noticed that when we call the flushEventLoop() method, the element as passed-in is NOT the target element but the container element. As such the click event is not detected if the event doesn't bubble up. As such I think we have to pass-in the real element that actually receives the click.

When synthesizing a click on an element the flushing the event
loop step right after awaits a click event on the element's
container but not the element itself. This could fail if
events are not bubbling up the DOM. As such the click event
of the actual target element needs to be awaited.

Assignee: nobody → hskupin
Status: NEW → ASSIGNED

Actually the try build perfectly shows such an example:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=316360560&repo=try&lineNumber=5820-5827

[task 2020-09-22T10:56:25.152Z] 10:56:25     INFO -  1600772185142	Marionette	DEBUG	210 -> [0,6,"WebDriver:FindElement",{"using":"link text","value":"333333"}]
[task 2020-09-22T10:56:25.212Z] 10:56:25     INFO -  1600772185206	Marionette	DEBUG	210 <- [1,6,null,{"value":{"element-6066-11e4-a52e-4f735466cecf":"ec06f7e3-8ee4-4f25-86c0-b7485e06a837"}}]
[task 2020-09-22T10:56:25.212Z] 10:56:25     INFO -  1600772185212	Marionette	DEBUG	210 -> [0,7,"WebDriver:ElementClick",{"id":"ec06f7e3-8ee4-4f25-86c0-b7485e06a837"}]
[task 2020-09-22T10:56:25.435Z] 10:56:25     INFO -  1600772185422	Marionette	TRACE	Canceled page load listener because no navigation has been detected
[task 2020-09-22T10:56:25.435Z] 10:56:25     INFO -  1600772185423	Marionette	DEBUG	210 <- [1,7,null,{"value":null}]
[task 2020-09-22T10:56:25.453Z] 10:56:25     INFO -  1600772185436	Marionette	DEBUG	210 -> [0,8,"WebDriver:GetCurrentURL",{}]
[task 2020-09-22T10:56:25.453Z] 10:56:25     INFO -  1600772185447	Marionette	TRACE	Received DOM event click for http://127.0.0.1:49983/test.html
[task 2020-09-22T10:56:25.455Z] 10:56:25     INFO -  1600772185452	Marionette	DEBUG	210 <- [1,8,null,{"value":"http://127.0.0.1:49983/clicks.html"}]

As it can be seen WebDriver:ElementClick returns way too early, and doesn't await the click event at all. As such it's clear why no navigation gets detected.

The reason for that seems to be that synthesizing the click is actually delayed, and the appropriate event is getting fired about 250ms later! Given that our check for beforeunload only allows a 200ms delay, its Promise gets resolved before the click event, and causes the command to return.

Olli, are those delays something which could happen once in a while? Or unexpected and should be investigated / fixed? Not sure if that might be only related to synthesizing the click via event utils, or if real clicks could also be affected.

Flags: needinfo?(bugs)

How is the click synthesized? In which process and how? If in different process, then nothing of course guarantees messages get through
in a timely manner if OS decides to not run IPC thread or something (if the system is otherwise very busy).

Flags: needinfo?(bugs)

(In reply to Olli Pettay [:smaug] from comment #14)

How is the click synthesized? In which process and how? If in different process, then nothing of course guarantees messages get through
in a timely manner if OS decides to not run IPC thread or something (if the system is otherwise very busy).

Yes, the click is synthesized in the appropriate content process here:
https://searchfox.org/mozilla-central/rev/b58ca45005fe02077c92779483d1b60e9a49687c/testing/marionette/event.js#164

That will make things tricky given that we have to setup a Promise first that checks for a possible navigation. It's part of the parent process and will timeout after 200ms. Maybe we have to allow a longer timeout, and somehow modify its existing timer instance for the click event delay.

So I assume something which calls synthesizeMouseAtPoint is delayed for whatever reason.

I will add some trace log output for timing. So lets see if that is for the IPC between the parent and content process, or indeed for synthesizeMouseAtPoint.

Here a try build with trace logging for synthesizing the click and the event:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=803a5525f9cc62f14e00d14f6a5579a8e163df1a

I will have a look a the results tomorrow.

Here an example of such a delayed behavior:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=316408344&repo=try&lineNumber=7847-7863

** 1600811489009: 195 -> [0,16,"WebDriver:ElementClick",{"id":"19073fbb-393f-4835-87e9-f68ca5dfab31"}]
** 1600811489020: navigation set beforeunload timer
** 1600811489020: driver calling clickElement in listener
** 1600811489034: listener calling clickElement in interaction
** 1600811489291: navigation beforeunload not seen
** 1600811489291:  Canceled page load listener because no navigation has been detected
** 1600811489291: driver after click + navigation?
** 1600811489292: 195 <- [1,16,null,{"value":null}]
** 1600811489336: 195 -> [0,17,"WebDriver:GetCurrentURL",{}]
** 1600811489335: interaction added click event listener
** 1600811489336: interaction calling synthesize mouse
** 1600811489344: interaction click event fired
** 1600811489345:  Received DOM event click for http://127.0.0.1:44898/clicks.html#
** 1600811489345: interaction called synthesize mouse
** 1600811489363: interaction after click event

It shows that the IPC call from driver -> listener is done within 14ms and as such is fine. But the problem is when calling interaction.clickElement() from within listener here:

https://searchfox.org/mozilla-central/rev/35d927df97900a57ecb562ad13909e392440b0fb/testing/marionette/listener.js#753-757
https://searchfox.org/mozilla-central/rev/35d927df97900a57ecb562ad13909e392440b0fb/testing/marionette/interaction.js#116
https://searchfox.org/mozilla-central/rev/35d927df97900a57ecb562ad13909e392440b0fb/testing/marionette/interaction.js#180

It means that it's not the click that is delayed here but the code related to these checks:
https://searchfox.org/mozilla-central/rev/35d927df97900a57ecb562ad13909e392440b0fb/testing/marionette/interaction.js#132-172

I will push another try build with more logger entries there. But anyway, the wait for beforeunload clearly has to wait until clickElement has been returned from the child process.

That's very strange. The delay indeed happens when calling into interaction.js:

Marionette  TRACE   ** 1600845279202: driver calling clickElement in listener
Marionette  TRACE   ** 1600845279203: listener calling clickElement in interaction
Marionette  TRACE   ** 1600845279402: navigation beforeunload not seen
Marionette  TRACE   Canceled page load listener because no navigation has been detected
Marionette  TRACE   ** 1600845279402: driver after click + navigation?
Marionette  DEBUG   114 <- [1,20,null,{"value":null}]
Marionette  TRACE   ** 1600845279403: interaction: file check

Actually, we have the situation that current capabilities have to be retrieved from the parent process, and that actually triggers a sync message to the parent process. And by its implementation it's even done twice for both moz:accessibilityChecks and moz:webdriverClick:

https://searchfox.org/mozilla-central/rev/35d927df97900a57ecb562ad13909e392440b0fb/testing/marionette/listener.js#753-757

I don't think that we should do that, and instead directly pass-over the capabilities as options to the appropriate command.

The new try build with the considered patch applied looks great:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1329cfec785e11a9a119bf8aa40e714770e33dbf

So I'm taking the way in passing the capabilities over to the framescript commands whose need these, and by that also getting rid of the capability getter in the framescript.

Whiteboard: [marionette-fission-mvp][simple]
Blocks: 1666755

For fixing the race condition I filed the follow-up bug 1666755.

To not cause extra synchronous IPC round-trips with the parent process
each method in the framescript that needs capabilities would have to
get those passed along with its call.

This will reduce code complexity and safety because there is only
a single object of truth. Also performance will be greatly improved
for the affected commands.

Depends on D90987

Once landed I will make sure that we get this patch uplifted to beta.

Blocks: 1552742
No longer depends on: 1490268, 1496683, 1552742
Priority: P2 → P1
Attachment #9177099 - Attachment description: Bug 1394354 - [marionette] flushEventLoop has to operate on the target element and not its container. → Bug 1394354 - [marionette] Add trace log entry for click event in WebDriver:ElementClick.
Pushed by mjzffr@gmail.com: https://hg.mozilla.org/integration/autoland/rev/9c3a426fa2ba [marionette] Add trace log entry for click event in WebDriver:ElementClick. r=marionette-reviewers,maja_zf https://hg.mozilla.org/integration/autoland/rev/c6cd31e46c96 [marionette] Don't update capabilities synchronously in the framescript. r=marionette-reviewers,jdescottes,maja_zf
Blocks: 1666135
Blocks: 1665022
Blocks: 1665209
Blocks: 1665297
Blocks: 1666166
Blocks: 1666204
Blocks: 1665549
Blocks: 1665250
Blocks: 1665686
Blocks: 1665032

This bug is fixed now.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 83 Branch

Comment on attachment 9177359 [details]
Bug 1394354 - [marionette] Don't update capabilities synchronously in the framescript.

Beta/Release Uplift Approval Request

  • User impact if declined: The Webdriver command "Element click" can fail for users of Selenium when the machine is under high cpu load. Under such a condition the click might be synthesized with a delay >200ms so that the command returns without actually doing the click.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Extra IPC roundtrips to fetch Webdriver capabilities are removed to not cause unexpected delays in processing the click. Instead capabilities are directly send as arguments to the framescript.
  • String changes made/needed: no
Attachment #9177359 - Flags: approval-mozilla-beta?
Attachment #9175980 - Flags: approval-mozilla-beta?
Attachment #9177099 - Flags: approval-mozilla-beta?
Blocks: 1559992
Blocks: 1560015

This bug is 3 years old, what makes it worth uplifting to 82 vs riding to 83?

Flags: needinfo?(hskupin)

(In reply to Julien Cristau [:jcristau] from comment #31)

This bug is 3 years old, what makes it worth uplifting to 82 vs riding to 83?

Recent work for Fission support in Marionette caused a spike of test failures for various tests, and that also with Fission disabled. Only that work actually allowed us to somewhat reproduce the failure, get it investigated, and fixed. Before that the chance to hit this failure was extremely low.

That said users of Selenium are likely to also see this spike, with the result that their test cases would be broken. As such I highly recommend to uplift this feature to 82 beta.

Flags: needinfo?(hskupin)

Comment on attachment 9175980 [details]
Bug 1394354 - [marionette] Add trace log output when using non spec-compatible element click.

this one's already on beta

Attachment #9175980 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

Comment on attachment 9177359 [details]
Bug 1394354 - [marionette] Don't update capabilities synchronously in the framescript.

approved for 82.0b4

Attachment #9177359 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9177099 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Blocks: 1770138
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: