Open Bug 1749109 Opened 3 years ago Updated 1 year ago

Intermittent Assertion failure: result.mScrollId == ScrollableLayerGuid::NULL_SCROLL_ID, at /builds/worker/checkouts/gecko/gfx/layers/apz/src/WRHitTester.cpp:75

Categories

(Core :: Panning and Zooming, defect, P5)

defect

Tracking

()

REOPENED

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: assertion, intermittent-failure)

Attachments

(1 file)

Filed by: malexandru [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=363462339&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/FqPOHRmoTNeTFV_cTPmpLg/runs/0/artifacts/public/logs/live_backing.log


[task 2022-01-08T06:42:54.605Z] 06:42:54     INFO - GECKO(6740) | [Child 7332, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, false) failed with result 0x804B0050 (NS_ERROR_INSUFFICIENT_DOMAIN_LEVELS): file /builds/worker/checkouts/gecko/toolkit/components/antitracking/URLQueryStringStripper.cpp:129
[task 2022-01-08T06:42:54.628Z] 06:42:54     INFO - GECKO(6740) | [Parent 8160, Main Thread] WARNING: WebProgress Ignored: BrowsingContext is null or discarded: file /builds/worker/checkouts/gecko/dom/ipc/BrowserParent.cpp:2961
[task 2022-01-08T06:42:54.769Z] 06:42:54     INFO - GECKO(6740) | [Parent 8160, Main Thread] WARNING: 'aOwner->IsDiscarded()', file /builds/worker/workspace/obj-build/dist/include/mozilla/dom/SyncedContextInlines.h:94
[task 2022-01-08T06:42:54.794Z] 06:42:54     INFO - GECKO(6740) | [GPU 7252, Compositor] WARNING: Possibly dropping task posted to updater thread: file /builds/worker/checkouts/gecko/gfx/layers/apz/src/APZUpdater.cpp:362
[task 2022-01-08T06:42:54.796Z] 06:42:54     INFO - GECKO(6740) | Assertion failure: result.mScrollId == ScrollableLayerGuid::NULL_SCROLL_ID, at /builds/worker/checkouts/gecko/gfx/layers/apz/src/WRHitTester.cpp:75
[task 2022-01-08T06:42:54.802Z] 06:42:54     INFO - GECKO(6740) | [Child 4196, Main Thread] WARNING: '!wgc', file /builds/worker/checkouts/gecko/dom/ipc/jsactor/JSWindowActorProtocol.cpp:260
[task 2022-01-08T06:42:54.894Z] 06:42:54     INFO - GECKO(6740) | #01: mozilla::layers::WRHitTester::GetAPZCAtPoint(mozilla::gfx::PointTyped<mozilla::ScreenPixel,float> const&, mozilla::RecursiveMutexAutoLock const&) [gfx/layers/apz/src/WRHitTester.cpp:75]
[task 2022-01-08T06:42:54.895Z] 06:42:54     INFO - GECKO(6740) | #02: mozilla::layers::APZCTreeManager::ProcessUnhandledEvent(mozilla::gfx::IntPointTyped<mozilla::LayoutDevicePixel>*, mozilla::layers::ScrollableLayerGuid*, unsigned long long*, mozilla::layers::LayersId*) [gfx/layers/apz/src/APZCTreeManager.cpp:2301]
[task 2022-01-08T06:42:54.896Z] 06:42:54     INFO - GECKO(6740) | #03: mozilla::layers::APZInputBridgeParent::RecvProcessUnhandledEvent(mozilla::gfx::IntPointTyped<mozilla::LayoutDevicePixel> const&, mozilla::gfx::IntPointTyped<mozilla::LayoutDevicePixel>*, mozilla::layers::ScrollableLayerGuid*, unsigned long long*, mozilla::layers::LayersId*) [gfx/layers/ipc/APZInputBridgeParent.cpp:132]
[task 2022-01-08T06:42:54.897Z] 06:42:54     INFO - GECKO(6740) | #04: mozilla::layers::PAPZInputBridgeParent::OnMessageReceived(IPC::Message const&, IPC::Message*&) [s3:gecko-generated-sources:c0fa2b6b8527089947076b30eefa0962a7a763b7764862078d40e0e3d1a345257b8de100d3a29f9ce56287c64662df4d913333ed4ab0f50fccfcf1527803a60d/ipc/ipdl/PAPZInputBridgeParent.cpp::565]
[task 2022-01-08T06:42:54.898Z] 06:42:54     INFO - GECKO(6740) | #05: mozilla::ipc::MessageChannel::DispatchSyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&, IPC::Message*&) [ipc/glue/MessageChannel.cpp:2006]
[task 2022-01-08T06:42:54.898Z] 06:42:54     INFO - GECKO(6740) | #06: mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&) [ipc/glue/MessageChannel.cpp:1960]
[task 2022-01-08T06:42:54.899Z] 06:42:54     INFO - GECKO(6740) | #07: mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) [ipc/glue/MessageChannel.cpp:1824]
[task 2022-01-08T06:42:54.899Z] 06:42:54     INFO - GECKO(6740) | #08: mozilla::ipc::MessageChannel::MessageTask::Run() [ipc/glue/MessageChannel.cpp:1853]
[task 2022-01-08T06:42:54.900Z] 06:42:54     INFO - GECKO(6740) | #09: mozilla::RunnableTask::Run() [xpcom/threads/TaskController.cpp:469]
[task 2022-01-08T06:42:54.901Z] 06:42:54     INFO - GECKO(6740) | #10: mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex &> const&) [xpcom/threads/TaskController.cpp:771]
[task 2022-01-08T06:42:54.901Z] 06:42:54     INFO - GECKO(6740) | #11: mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex &> const&) [xpcom/threads/TaskController.cpp:607]
[task 2022-01-08T06:42:54.902Z] 06:42:54     INFO - GECKO(6740) | #12: mozilla::TaskController::ProcessPendingMTTask(bool) [xpcom/threads/TaskController.cpp:391]
[task 2022-01-08T06:42:54.902Z] 06:42:54     INFO - GECKO(6740) | #13: mozilla::detail::RunnableFunction<`lambda at /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:124:7'>::Run() [xpcom/threads/nsThreadUtils.h:532]
[task 2022-01-08T06:42:54.903Z] 06:42:54     INFO - GECKO(6740) | #14: nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:1199]
[task 2022-01-08T06:42:54.903Z] 06:42:54     INFO - GECKO(6740) | #15: NS_ProcessNextEvent(nsIThread*, bool) [xpcom/threads/nsThreadUtils.cpp:467]
[task 2022-01-08T06:42:54.904Z] 06:42:54     INFO - GECKO(6740) | #16: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) [ipc/glue/MessagePump.cpp:85]
[task 2022-01-08T06:42:54.904Z] 06:42:54     INFO - GECKO(6740) | #17: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:325]
[task 2022-01-08T06:42:54.905Z] 06:42:54     INFO - GECKO(6740) | #18: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:307]
[task 2022-01-08T06:42:54.905Z] 06:42:54     INFO - GECKO(6740) | #19: nsBaseAppShell::Run() [widget/nsBaseAppShell.cpp:139]
[task 2022-01-08T06:42:54.905Z] 06:42:54     INFO - GECKO(6740) | #20: nsAppShell::Run() [widget/windows/nsAppShell.cpp:603]
[task 2022-01-08T06:42:54.906Z] 06:42:54     INFO - GECKO(6740) | #21: XRE_RunAppShell() [toolkit/xre/nsEmbedFunctions.cpp:864]
[task 2022-01-08T06:42:54.906Z] 06:42:54     INFO - GECKO(6740) | #22: mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) [ipc/glue/MessagePump.cpp:235]
[task 2022-01-08T06:42:54.906Z] 06:42:54     INFO - GECKO(6740) | #23: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:325]
[task 2022-01-08T06:42:54.907Z] 06:42:54     INFO - GECKO(6740) | #24: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:307]
[task 2022-01-08T06:42:54.907Z] 06:42:54     INFO - GECKO(6740) | #25: XRE_InitChildProcess(int, char**, XREChildData const*) [toolkit/xre/nsEmbedFunctions.cpp:705]
[task 2022-01-08T06:42:54.907Z] 06:42:54     INFO - GECKO(6740) | #26: NS_internal_main(int, char**, char**) [browser/app/nsBrowserApp.cpp:327]
[task 2022-01-08T06:42:54.908Z] 06:42:54     INFO - GECKO(6740) | #27: wmain(int, wchar_t**) [toolkit/xre/nsWindowsWMain.cpp:147]
[task 2022-01-08T06:42:54.908Z] 06:42:54     INFO - GECKO(6740) | #28: __scrt_common_main_seh() [/builds/worker/workspace/obj-build/browser/app/d:/agent/_work/1/s/src/vctools/crt/vcstartup/src/startup/exe_common.inl:288]
[task 2022-01-08T06:42:54.909Z] 06:42:54     INFO - GECKO(6740) | #29: BaseThreadInitThunk [C:\Windows\System32\KERNEL32.DLL + 0x17034]
[task 2022-01-08T06:42:54.909Z] 06:42:54     INFO - GECKO(6740) | #30: RtlUserThreadStart [C:\Windows\SYSTEM32\ntdll.dll + 0x52651]
[task 2022-01-08T06:42:55.522Z] 06:42:55     INFO - GECKO(6740) | [Parent 8160, IPC I/O Parent] WARNING: [1.1]: Ignoring message 'EVENT_MESSAGE' to unknown peer 766295460326B3CC.4BDEAEF9757F4304: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:341
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 3 years ago2 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

I can get this assert to trigger on my debug build pretty reliably:

  1. Go to about:config
  2. Bring up the print dialog
  3. Wait - triggering isn't instant, but it seems to trigger for me within 10 seconds. Otherwise, clicking the area outside of the print dialog seems to be another reliable trigger.

I just hit this locally as well. In my case, I had also visited about:config and did Ctrl+P, and I had an "always on top" terminal-window floating in the foreground, and I think I was trying to mousewheel scroll on the print-preview section of the print dialog.

Not sure if values are useful, but FWIW in my case result.mScrollId was 7, whereas ScrollableLayerGuid::NULL_SCROLL_ID is 0.

There's a long comment before the assertion saying that it's sort-of-expected to fail in non-fission scenarios, and there's a non-fatal version of the assertion for those scenarios (from bug 1634763). I wonder if my & David's situation with about:config (which importantly is rendered in the parent process) is functionally equivalent to the "non-fission" scenario, and we need to adjust the fatal assertion to account for that?

botond, maybe you could take a look here -- looks like you're the last to touch the code in question, according to searchfox.

Flags: needinfo?(botond)

I captured this in rr and submitted it to pernosco, using the STR from comment 21 (with me mousewheel scrolling over the print dialog, and with an always-on-top terminal window floating at the side of my display; again, I'm not entirely sure if those are necessary or not).

In comment 24 and in my rr recording, I was using a debug+opt build, i.e. --enable-debug --enable-optimize=O1. I also tried with a --disable-optimize build, for more-reliable variable visibility, but it turned out I always hit a different fatal-assertion first. :)

(In reply to Daniel Holbert [:dholbert] from comment #24)

I wonder if my & David's situation with about:config (which importantly is rendered in the parent process) is functionally equivalent to the "non-fission" scenario, and we need to adjust the fatal assertion to account for that?

I don't think so -- in order for us to be in the "expected to fail" case, this function would have to return false, but with our default pref values that only happens if FissionAutostart() returns false, and that should not happen even in the parent process (and the Pernosco recording seems to confirm this).

We have a similar bug open, bug 1690819, which may have the same underlying cause.

These bugs tend to be tricky to investigate, because they basically indicate a discrepancy between two different data structures in different places (data structures derived from the WebRender display list in the Rust part of the compositor, and data structures derived from the WebRenderScrollData in C++ APZ code), that are both ultimately based on the Gecko display list on the main thread, but there are various processing stages where there is a possibility of the discrepancy being introduced.

I will seek to prioritize investigation of bug 1690819 + this bug together early next year. The Pernosco recording should be helpful, thanks for that!

Flags: needinfo?(botond)

Quick update on this based on a recent discussion in our APZ planning meeting:

  • Timothy has a theory for what might be causing these assertions, related to there being two subtly different notions of an "active layer" in the Gecko display list.
  • We're planning to do at least enough investigation to confirm this, but if the theory is correct, fixing that underlying problem is likely to be lower priority because it's somewhat of a tricky fix to get right (e.g. likely to require combing through history so we can be confident we're not regressing something else), and any correctness impact from the discrepancy is likely to be limited to scrollbar dragging being affected in infrequent edge cases.

Specifically I remember debugging an instance of and it looked like the cause was this call to IsScrollingActive when adding scrollbar parts

https://searchfox.org/mozilla-central/rev/72d9bf2d4128990cd1f349d4e15003e515277982/layout/generic/nsGfxScrollFrame.cpp#3600

IsScrollingActive is only used in two places in the tree anymore, it would be good to get rid of it. Usually when we say a scrollframe is active we mean that is has a displayport and so can be async scrolled on the compositor.

Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Attached patch print_preview_crash_test.patch (deleted) — Splinter Review

If this helps, I've noticed this bug reproduces consistently on Thunderbird's debug builds, when the mouse cursor moves to an edge of the print preview modal (if the email being printed fits within a single page).

I've attached a patch for comm-central with a test that reproduces this crash, and here is an example of it failing on try-comm-central.

Assignee: nobody → brendan
Assignee: brendan → nobody
Comment on attachment 9348472 [details] [diff] [review] print_preview_crash_test.patch [I tried to edit the patch in place, which didn't work but this wasn't an important change anyway, so please ignore it]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: