Crash in [@ RtlpAllocateHeap | RtlpAllocateHeapInternal | AllocMemory]
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
People
(Reporter: ash153311, Unassigned)
References
Details
(Keywords: crash, Whiteboard: [tbird crash] [win:stability])
Crash Data
Crash report: https://crash-stats.mozilla.org/report/index/72786a95-85d3-48bd-8298-f98cb0221104
Reason: STATUS_HEAP_CORRUPTION
Top 10 frames of crashing thread:
0 ntdll.dll RtlReportFatalFailure
1 ntdll.dll RtlReportCriticalFailure
2 ntdll.dll RtlpHeapHandleError
3 ntdll.dll RtlpHpHeapHandleError
4 ntdll.dll RtlpLogHeapFailure
5 ntdll.dll RtlpAllocateHeap
6 ntdll.dll RtlpAllocateHeapInternal
7 dbgcore.dll AllocMemory
8 dbgcore.dll GenAllocateProcessObject
9 dbgcore.dll GenGetProcessInfo
Reporter | ||
Updated•2 years ago
|
Comment 1•2 years ago
|
||
The stacks I looked at involved the crash reporter. I'm not sure what the deal is though.
Updated•2 years ago
|
Comment 2•2 years ago
|
||
The bug is marked as tracked for firefox108 (nightly). We have limited time to fix this, the soft freeze is today. However, the bug still isn't assigned.
:gcp, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit auto_nag documentation.
Comment 3•2 years ago
|
||
Last Error Value ERROR_INSUFFICIENT_BUFFER
Is this missing error checking in Breakpad? Not sure why this spiked though. It's already with the right triage owner.
Comment 4•2 years ago
|
||
The severity field is not set for this bug.
:gsvelto, could you have a look please?
For more information, please visit auto_nag documentation.
Comment 5•2 years ago
|
||
This looks like it could be a bug in Microsoft libraries as the crash is coming deep within MiniDumpWriteDump()
but it could also be that something changed in Windows 11 and we need to pass information differently into that function. Most of the crashes come from Windows 11 version 10.0.25236 which is a dev-channel build.
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 12•2 years ago
|
||
(Edited, my previous analysis was wrong)
The crashes we have here are main process crashes generated during out-of-process crash generation for a child process. So the situation is as follows:
- a child process is crashing and requested an out-of-process crash dump;
- the parent process calls into Microsoft code to dump the child process;
- but the parent process' default heap itself is either already corrupt, or gets corrupted by Microsoft's code, and the corruption gets detected;
- so we end up doing an in-process crash dump of the parent process and reporting it.
Like [:gsvelto] mentioned, we have almost exclusively Windows 11 23H2 Insider Preview builds in the crashes, starting with build 10.0.25236. The beginning of the crash spike matches with the release of build 10.0.25236 on November 2nd, 2022. In addition to the possibilities already mentioned by [:gsvelto], I would like to suggest that Microsoft could have added new ways to detect heap corruptions, which could lead a usually undetected heap corruption to be more likely to be detected starting with build 10.0.25236. In that case, the heap corruption we would detect here would not necessarily originate from Microsoft code, it could get detected here but have occurred before.
Comment 13•2 years ago
|
||
Moving the code that is responsible for dumping other processes (including the main process) to a fully dedicated process should allow us to discriminate between the two possibilities here:
- if this is a real bug within our crash dump code or Microsoft's
MiniDumpWriteDump
, we would see the same crash reports as currently; - if this crash is just the consequence of simultaneous corruptions affecting the child and main processes, this specific crash should disappear, and we should manage to generate crash dumps for all corruptions.
When we implement the dedicated crash dumping process, we should have it block injection of all third-party DLLs to limit the risk of simultaneous corruptions affecting all our processes including the crash dumping one. An example scenario would be a third-party application injecting its DLL into all our processes and sending to all its in-process clients a message that results in a heap corruption.
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 14•2 years ago
|
||
As :gcp noted in comment 4 all the crashes where we have the last error value accessible have it set to ERROR_INSUFFICIENT_BUFFER
. It might be that we're hitting a limit somewhere inside of dbgcore.dll
which is sending it along an error-path that's poorly tested or just hard to recover from. I also wonder if Windows preview builds might work differently than regular ones, like having more internal assertions turned on, like we do on nightly/beta. So a harmless error in release becomes a hard error in a preview.
Comment 15•2 years ago
|
||
Talking about error codes made me realize I didn't check what kind of heap failure gets reported. By looking at a few dumps, it appears to always be a heap_failure_entry_corruption
, where the other possibilities are as follows:
ntdll!_HEAP_FAILURE_TYPE
heap_failure_internal = 0n0
heap_failure_unknown = 0n1
heap_failure_generic = 0n2
heap_failure_entry_corruption = 0n3
heap_failure_multiple_entries_corruption = 0n4
heap_failure_virtual_block_corruption = 0n5
heap_failure_buffer_overrun = 0n6
heap_failure_buffer_underrun = 0n7
heap_failure_block_not_busy = 0n8
heap_failure_invalid_argument = 0n9
heap_failure_invalid_allocation_type = 0n10
heap_failure_usage_after_free = 0n11
heap_failure_cross_heap_operation = 0n12
heap_failure_freelists_corruption = 0n13
heap_failure_listentry_corruption = 0n14
heap_failure_lfh_bitmap_mismatch = 0n15
heap_failure_segment_lfh_bitmap_corruption = 0n16
heap_failure_segment_lfh_double_free = 0n17
heap_failure_vs_subsegment_corruption = 0n18
heap_failure_null_heap = 0n19
heap_failure_allocation_limit = 0n20
heap_failure_commit_limit = 0n21
heap_failure_invalid_va_mgr_query = 0n22
More specifically, RtlpAllocateHeapInternal
has paths to report heap_failure_allocation_limit
, and RtlpAllocateHeap
has paths to report heap_failure_entry_corruption
or heap_failure_freelists_corruption
. But in our case it seems to always be heap_failure_entry_corruption
.
Comment 16•2 years ago
|
||
The crash volume suggests that this crash has been fixed in Windows 11 insider preview build 25309 (announced in March). We have received no report for this build and the following, the last reports are from build 25300:
Version Count %
10.0.19043 1 0.15 %
10.0.19045 3 0.44 %
10.0.25236 6 0.87 %
10.0.25247 6 0.87 %
10.0.25252 50 7.27 %
10.0.25262 51 7.41 %
10.0.25267 152 22.09 %
10.0.25272 54 7.85 %
10.0.25276 36 5.23 %
10.0.25281 38 5.52 %
10.0.25284 54 7.85 %
10.0.25290 67 9.74 %
10.0.25295 35 5.09 %
10.0.25300 135 19.62 %
This could relate to the following entry: Fixed an underlying issue which was leading to Microsoft Edge crashes for some Insiders in the last few flights.
Updated•1 years ago
|
Description
•