Crash in [@ mozilla::ipc::IProtocol::ChannelSend | mozilla::a11y::PDocAccessibleChild::SendShowEvent | IPC_Message_Name=PDocAccessible::Msg_ShowEvent]
Categories
(Core :: Disability Access APIs, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | wontfix |
firefox-esr115 | --- | fix-optional |
firefox116 | --- | wontfix |
firefox117 | --- | wontfix |
firefox118 | --- | fixed |
People
(Reporter: manuel, Assigned: Jamie)
References
(Blocks 1 open bug, Regressed 1 open bug)
Details
(Keywords: crash, topcrash, Whiteboard: [ctw-23h2])
Crash Data
Attachments
(2 files)
Crash report: https://crash-stats.mozilla.org/report/index/7dea1601-1203-4eae-b833-984860220714
MOZ_CRASH Reason: MOZ_CRASH(IPC message size is too large)
Top 10 frames of crashing thread:
0 libxul.so mozilla::ipc::IProtocol::ChannelSend /build/firefox/src/firefox-102.0.1/ipc/glue/ProtocolUtils.cpp:487
1 libxul.so mozilla::a11y::PDocAccessibleChild::SendShowEvent s3:gecko-generated-sources-l1:05e7b0735999cb80520a210a9906635a37fa583e4e1f4e967a53e718cdc2415da987df2e56d2f110dcd48bb9981897dfd5456f07d2ce3b91a11378d9eb21a298/ipc/ipdl/PDocAccessibleChild.cpp::254
2 libxul.so mozilla::a11y::DocAccessibleChildBase::MaybeSendShowEvent /build/firefox/src/firefox-102.0.1/accessible/ipc/DocAccessibleChildBase.h:86
3 libxul.so mozilla::a11y::DocAccessibleChildBase::InsertIntoIpcTree /build/firefox/src/firefox-102.0.1/accessible/ipc/DocAccessibleChildBase.cpp:91
4 libxul.so mozilla::a11y::LocalAccessible::HandleAccEvent /build/firefox/src/firefox-102.0.1/accessible/generic/LocalAccessible.cpp:931
5 libxul.so mozilla::a11y::AccessibleWrap::HandleAccEvent /build/firefox/src/firefox-102.0.1/accessible/atk/AccessibleWrap.cpp:1021
6 libxul.so nsEventShell::FireEvent /build/firefox/src/firefox-102.0.1/accessible/base/nsEventShell.cpp:54
7 libxul.so mozilla::a11y::NotificationController::ProcessMutationEvents /build/firefox/src/firefox-102.0.1/accessible/base/NotificationController.cpp:577
8 libxul.so mozilla::a11y::NotificationController::WillRefresh /build/firefox/src/firefox-102.0.1/accessible/base/NotificationController.cpp:896
9 libxul.so nsRefreshDriver::Tick /build/firefox/src/firefox-102.0.1/layout/base/nsRefreshDriver.cpp:2468
STR:
- Go to https://crash-stats.mozilla.org/report/index/5db7e2f3-187b-4ff3-ae90-ade790220714
- Click on "Raw Data and Minidumps"
Assignee | ||
Comment 1•2 years ago
|
||
Ouch. That's pretty nasty. :(
Given that this is happening without the full cache (just the tree structure), this is even more likely with the full cache. We've discussed ways to paginate the cache. There are a few challenges there:
- How do we paginate? Do we paginate up to a certain number of nodes or something like that?
- When we paginate, how do we handle that in terms of IPDL calls? The current SendShowEvent depends on having a single root, but we might paginate at any position in the tree.
- On the parent side, we might yield to the event loop between handling of incoming IPDL calls, which means an a11y client query can be handled. I don't think we have any way of stopping that. Even if we did, that would make the browser unresponsive for a potentially long period of time.
- Because of 3), a client might see an incomplete view of the tree. This is particularly important on Windows where a screen reader might have cached its own representation of the tree. Thus, we need to fire mutation events for anything we add to the tree in subsequent pages. That's going to be tricky, since pagination could occur at any position in the tree. This problem is similar to 2).
Assignee | ||
Comment 2•2 years ago
|
||
This is nasty, but also extremely rare. Downgrading to s3.
Assignee | ||
Comment 3•2 years ago
|
||
(In reply to James Teh [:Jamie] from comment #1)
- On the parent side, we might yield to the event loop between handling of incoming IPDL calls, which means an a11y client query can be handled. ...
- Because of 3), a client might see an incomplete view of the tree. ... Thus, we need to fire mutation events for anything we add to the tree in subsequent pages.
We could avoid these problems by sending the (sub)tree such that it is detached from its document until the last page is received. Clients would not be able to access the subtree. In the content process, pages could be sent without yielding to the event loop, thus avoiding the tree changing underneath us as we're sending it. The only problem we need to solve then is how to paginate the IPDL calls, but that's a lot simpler when we don't have to worry about events, the tree changing, etc.
Note that this only solves the IPDL message size problem and maybe a bit of parent process jank. It does not make things any faster for a11y clients. That is, a11y clients don't get the first part of the tree any earlier; they still have to wait until the whole tree is received before they can access anything at all.
Assignee | ||
Comment 4•1 years ago
|
||
It looks like the crash signature for this has morphed.
bp-1f84d790-2eb2-4a6e-90ad-40b770230606
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Updated•1 year ago
|
Comment 6•1 year ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 content process crashes on beta
:Jamie, could you consider increasing the severity of this top-crash bug?
For more information, please visit BugBot documentation.
Assignee | ||
Comment 7•1 year ago
|
||
I wonder why this is spiking now? I would have expected this to spike as soon as we shipped CtW, rather than waiting a few releases.
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 8•1 year ago
|
||
When we serialize a subtree, we put it into a flat list.
Previously, we included the child count for each Accessible so that we knew how many Accessibels to consume as children when de-serializing.
We also de-serialized recursively.
This made it very difficult to split serialization across IPDL calls, since we would always end up splitting in the middle of some Accessible's children.
Instead, we now no longer include the child count, but we do include the parent id and the index of the child in that parent.
This means that each Accessible can be de-serialized independently and iteratively, making it possible to split wherever we need to.
RemoteAccessible creation has also been separated from attachment of the child to its parent, since we will need this when splitting.
Assignee | ||
Comment 9•1 year ago
|
||
In the content process, we simply split into multiple calls when the number of Accessibles exceeds our maximum.
In the parent process, we defer attaching the root of the new subtree to its parent until the final call.
This is achieved by saving the root during the first call and using that to attach and fire events in the final call.
The maximum is calculated to allow for every Accessible to consume 2 KB in the IPDL message.
Currently, this means we split every 131072 Accessibles.
Of course, we could still exceed this IPDL message size if one or more Accessibles consumed a lot more than this; e.g. many labels longer than 2 KB.
However, this seems unlikely in the real world.
If this turns out to be a problem, we'll need to count the actual size of the serialized data for each Accessible.
For example, we could use AccAttributes::SizeOfExcludingThis, though that isn't exactly the serialized size.
I worry though that such data structure traversal could get expensive at scale.
Assignee | ||
Comment 10•1 year ago
|
||
Unfortunately, the URL in the STR in comment 0 is now dead. However, you can reliably reproduce this crash by loading this page with accessibility active:
https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html
Comment 11•1 year ago
|
||
Comment 12•1 year ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/cc0a76b6a875
https://hg.mozilla.org/mozilla-central/rev/91dd0dbb8e73
Updated•1 year ago
|
Updated•1 year ago
|
Comment 13•1 year ago
|
||
Reproduced with the info from comment 10 on Fx 116.0.2 on Windows 10.
I can no longer crash the browser on Fx 117.0b6 and Fx 118.0a1 (2023-08-10), tested on Windows 10.
Updated•1 year ago
|
Comment 14•1 year ago
|
||
Is this something we should uplift to ESR115? It'll need a rebased patch if so.
Assignee | ||
Comment 15•1 year ago
|
||
Potentially, but this change is not what I'd consider low risk. I'd prefer to let it bake on release for a bit before making that call.
Comment 16•1 year ago
|
||
Backed out from Beta for 117.0b9 for causing bug 1848991 and to allow for more bake time. It remains fixed for 118+.
https://hg.mozilla.org/releases/mozilla-beta/rev/4a0c1a095483
Description
•