Closed Bug 1837315 Opened 1 year ago Closed 1 year ago

crash near null in [@ mozilla::a11y::DocAccessibleParent::RecvHideEvent]

Categories

(Core :: Disability Access APIs, defect)

defect

Tracking

()

RESOLVED FIXED
116 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox114 --- unaffected
firefox115 --- unaffected
firefox116 --- verified

People

(Reporter: tsmith, Assigned: nlapre)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression, testcase, Whiteboard: [bugmon:bisected,confirmed])

Attachments

(3 files)

Attached file testcase.html (deleted) —

Found while fuzzing m-c 20230607-3a8e0dd854b5 (--enable-address-sanitizer --enable-fuzzing)

To reproduce via Grizzly Replay:

$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch -a --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.html
==26421==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000020 (pc 0x7fa7bf7ff4d3 bp 0x7ffc3f7c8f50 sp 0x7ffc3f7c8f10 T0)
==26421==The signal is caused by a READ memory access.
==26421==Hint: address points to the zero page.
    #0 0x7fa7bf7ff4d3 in Hdr /builds/worker/workspace/obj-build/dist/include/nsTArray.h:575:51
    #1 0x7fa7bf7ff4d3 in Elements /builds/worker/workspace/obj-build/dist/include/nsTArray.h:1185:48
    #2 0x7fa7bf7ff4d3 in unsigned long nsTArray_Impl<mozilla::a11y::RemoteAccessible*, nsTArrayInfallibleAllocator>::IndexOf<mozilla::a11y::RemoteAccessible*, nsDefaultComparator<mozilla::a11y::RemoteAccessible*, mozilla::a11y::RemoteAccessible*>>(mozilla::a11y::RemoteAccessible* const&, unsigned long, nsDefaultComparator<mozilla::a11y::RemoteAccessible*, mozilla::a11y::RemoteAccessible*> const&) const /builds/worker/workspace/obj-build/dist/include/nsTArray.h:1342:30
    #3 0x7fa7bf7d71c6 in RemoveElement<mozilla::a11y::RemoteAccessible *, nsDefaultComparator<mozilla::a11y::RemoteAccessible *, mozilla::a11y::RemoteAccessible *> > /builds/worker/workspace/obj-build/dist/include/nsTArray.h:1953:20
    #4 0x7fa7bf7d71c6 in RemoveElement<mozilla::a11y::RemoteAccessible *> /builds/worker/workspace/obj-build/dist/include/nsTArray.h:1966:12
    #5 0x7fa7bf7d71c6 in mozilla::a11y::RemoteAccessibleBase<mozilla::a11y::RemoteAccessible>::RemoveChild(mozilla::a11y::RemoteAccessible*) /builds/worker/workspace/obj-build/dist/include/mozilla/a11y/RemoteAccessibleBase.h:125:15
    #6 0x7fa7bf7d686a in mozilla::a11y::DocAccessibleParent::RecvHideEvent(unsigned long const&, bool const&) /builds/worker/checkouts/gecko/accessible/ipc/DocAccessibleParent.cpp:309:11
    #7 0x7fa7bf82f0fd in mozilla::a11y::PDocAccessibleParent::OnMessageReceived(IPC::Message const&) /builds/worker/workspace/obj-build/ipc/ipdl/PDocAccessibleParent.cpp:1226:52
    #8 0x7fa7b997247a in mozilla::dom::PContentParent::OnMessageReceived(IPC::Message const&) /builds/worker/workspace/obj-build/ipc/ipdl/PContentParent.cpp:6709:32
    #9 0x7fa7b10be415 in mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&) /builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp:1811:25
    #10 0x7fa7b10b9d9f in mozilla::ipc::MessageChannel::DispatchMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::UniquePtr<IPC::Message, mozilla::DefaultDelete<IPC::Message>>) /builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp:1736:9
    #11 0x7fa7b10bb1d9 in mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::ipc::MessageChannel::MessageTask&) /builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp:1536:3
    #12 0x7fa7b10bc753 in mozilla::ipc::MessageChannel::MessageTask::Run() /builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp:1634:14
    #13 0x7fa7af47e34a in mozilla::RunnableTask::Run() /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:555:16
    #14 0x7fa7af46f07e in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:879:26
    #15 0x7fa7af46bf97 in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:702:15
    #16 0x7fa7af46c87f in mozilla::TaskController::ProcessPendingMTTask(bool) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:491:36
    #17 0x7fa7af483931 in operator() /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:218:37
    #18 0x7fa7af483931 in mozilla::detail::RunnableFunction<mozilla::TaskController::TaskController()::$_0>::Run() /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.h:548:5
    #19 0x7fa7af4af3bc in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1240:16
    #20 0x7fa7af4bcd94 in NS_ProcessNextEvent(nsIThread*, bool) /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp:479:10
    #21 0x7fa7b10c7c4e in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:85:21
    #22 0x7fa7b0ef5afa in RunInternal /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:370:10
    #23 0x7fa7b0ef5afa in RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:363:3
    #24 0x7fa7b0ef5afa in MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:345:3
    #25 0x7fa7ba847999 in nsBaseAppShell::Run() /builds/worker/checkouts/gecko/widget/nsBaseAppShell.cpp:148:27
    #26 0x7fa7c03b689b in nsAppStartup::Run() /builds/worker/checkouts/gecko/toolkit/components/startup/nsAppStartup.cpp:295:30
    #27 0x7fa7c06b6c1f in XREMain::XRE_mainRun() /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:5659:22
    #28 0x7fa7c06b8c5d in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:5859:8
    #29 0x7fa7c06b9db1 in XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:5915:21
    #30 0x55b1c76ee323 in do_main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:227:22
    #31 0x55b1c76ee323 in main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:445:16
    #32 0x7fa7d6029d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #33 0x7fa7d6029e3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #34 0x55b1c7617d18 in _start (/home/user/workspace/browsers/m-c-20230607214358-fuzzing-asan-opt/firefox+0x107d18) (BuildId: 617fa83271e68d58e0b0d2cdaf913a555ab9dd29)
Flags: in-testsuite?
Attached file prefs.js (deleted) —

prefs.js for bugmon

Verified bug as reproducible on mozilla-central 20230608152955-256876c3862b.
The bug appears to have been introduced in the following build range:

Start: 9a56830b39f8d6802cef3772f81535d5f7bd475c (20230606194929)
End: c4cbe98d2837f0a0212af8c3b45bb1495edd42c6 (20230606201406)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=9a56830b39f8d6802cef3772f81535d5f7bd475c&tochange=c4cbe98d2837f0a0212af8c3b45bb1495edd42c6

Keywords: regression
Whiteboard: [bugmon:bisected,confirmed]
Regressed by: 1455416
Assignee: nobody → nlapre

Set release status flags based on info from the regressing bug 1455416

A distilled (to the extent possible) test case:

<script>
document.addEventListener("DOMContentLoaded", () => {
  // necessary in order to reproduce this crash, but I'm not yet sure why
  window.top.open('')

  // remove 'b', which re-parents its aria-owned child 'a' back to the root accessible
  let b = document.getElementById('b')
  b.parentNode.removeChild(b)

  // remove 'f', which has no accessible, which causes us to re-parent the aria-owned children of 'a'.
  // while processing these events in the parent process, we crash because a hide event has a null RemoteParent.
  let f = document.getElementById('f')
  f.parentNode.removeChild(f)
})
</script>
<body>
<div id='b' aria-owns='a'></div>
<div id='d'></div>
<dd id='f'>
  <div id='a' aria-owns='d'></div>
</div>
</body>

A few things of interest going on here:

  1. A DOM element that has no corresponding accessible is being removed from the tree, taking an aria-owner child with it.
  2. We're opening another tab, and removing that line stops the crash.
  3. The crash only happens if I have both aria-owns re-parents happen. It doesn't crash if I remove the first re-parent.

Still investigating, but curious if anything jumps out to you, Jamie? My plan at the moment is to track down the events we send from content once I get an unoptimized build. It feels like our tree state is getting messed up somewhere, causing us to send (not coalesce?) a hide event on an accessible that has no parent.

Alright, so after some investigation, here's the high-level summary of what's going on here:

  1. We have 'b' owns 'a' owns 'd'.
  2. We remove 'b', which puts 'a' and 'd' back. There's a hide event on 'b'.
  3. We remove the DOM element that parents 'a', which puts 'd' back. We queue a hide event on 'd', but don't drop it, despite there being an ancestor ('b') which is the shallowest node in the hidden subtree.
  4. We end up firing a hide event on 'd', but we really only want to fire a show event on it, since we're firing a hide event on 'b' to hide the entire subtree.
  5. In the parent process, we handle the hide event on 'b' and shut down the subtree, except for the moving node 'd'. We then handle the hide event on 'd', and find that there's no parent for it (it hasn't been shown yet). We crash when derefencing the null parent.

I'm unclear on what to do here. I tried avoiding firing the hide event if the accessible is on the moving list, but that breaks the fundamental "reparenting aria-owned siblings" case. I don't know how can preserve that ancestry relationship across multiple JavaScript calls. FWIW, I think the opening of the new window is important because it forces the calls to happen in the same tick, but I'm not sure.

:Jamie, do you have any thoughts on what to do here? I feel like I want to coalesce earlier so that we can drop the bad hide event, but the link to the ancestor hide event target is severed by the time we go to hide its descendants.

Flags: needinfo?(jteh)

Yuck.

Can we avoid queuing a hide event if the target is being moved? The idea is that if it's already been moved, it's already had a hide event queued on either it or an ancestor, so we don't need to queue another. This is very sensitive to when we call TrackMovedAccessible, since we don't want to accidentally drop the first hide event. Looking briefly, it looks like we only call TrackMovedAccessible after we move the Accessible, but that might need some tweaking if there are cases I've missed.

Flags: needinfo?(jteh) → needinfo?(nlapre)
Severity: -- → S3

If we move an accessible via an aria-owns relocation on a grandparent, then move
it again via an aria-owns relocation on a parent, we can end up with multiple
hide events in the queue: one for the shallowest hide (of the grandparent), and
one for the hide of the grandchild accessible. We can't always coalesce (drop)
the hide event of the grandchild since the first move severs the relationship
between its parent and its grandparent. To address this, this revision stops us
from queueing a hide event on an accessible that's already being moved.

Flags: needinfo?(nlapre)
Pushed by nlapre@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3b18400b7e1a
Avoid queueing a hide event on an accessible that's already being moved, r=Jamie
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 116 Branch

Bug appears to be fixed on mozilla-central 20230616040643-6bc2d3f9b1aa but BugMon was unable to find a usable build for 3a8e0dd854b5.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: