Crash in [@ mozilla::TrackBuffersManager::RemoveFrames], [@ mozilla::TrackBuffersManager::TrackData::GetTrackBuffer]
Categories
(Core :: Audio/Video: Playback, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox-esr115 | --- | unaffected |
firefox116 | --- | unaffected |
firefox117 | --- | unaffected |
firefox118 | blocking | fixed |
People
(Reporter: aryx, Assigned: alwu)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: crash, regression, topcrash)
Crash Data
Attachments
(4 files)
48 crashes from 23 installations with Firefox 118.0a1 after bug 1845213 had landed.
Crash report: https://crash-stats.mozilla.org/report/index/7a498ea6-f82f-4ddf-9861-43fe30230803
MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(data.IsEmpty())
Top 10 frames of crashing thread:
0 xul.dll mozilla::TrackBuffersManager::RemoveFrames dom/media/mediasource/TrackBuffersManager.cpp:2495
1 xul.dll mozilla::TrackBuffersManager::CodedFrameRemoval dom/media/mediasource/TrackBuffersManager.cpp:690
2 xul.dll mozilla::TrackBuffersManager::ProcessTasks dom/media/mediasource/TrackBuffersManager.cpp:255
3 xul.dll mozilla::TrackBuffersManager::QueueTask dom/media/mediasource/TrackBuffersManager.cpp:176
4 xul.dll mozilla::TrackBuffersManager::CodedFrameRemovalWithPromise dom/media/mediasource/TrackBuffersManager.cpp:627
5 xul.dll mozilla::detail::RunnableMethodArguments<StoreCopyPassByRRef<mozilla::media::Interval<mozilla::media::TimeUnit> > >::apply<mozilla::TrackBuffersManager, RefPtr<mozilla::MozPromise<bool, nsresult, 1> > const xpcom/threads/nsThreadUtils.h:1164
5 xul.dll std::invoke /builds/worker/fetches/vs/VC/Tools/MSVC/14.29.30133/include/type_traits:1534
5 xul.dll std::_Apply_impl /builds/worker/fetches/vs/VC/Tools/MSVC/14.29.30133/include/tuple:974
5 xul.dll std::apply /builds/worker/fetches/vs/VC/Tools/MSVC/14.29.30133/include/tuple:979
5 xul.dll mozilla::detail::RunnableMethodArguments<StoreCopyPassByRRef<mozilla::media::Interval<mozilla::media::TimeUnit> > >::apply xpcom/threads/nsThreadUtils.h:1162
Reporter | ||
Updated•1 year ago
|
Comment 1•1 year ago
|
||
Set release status flags based on info from the regressing bug 1845213
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 2•1 year ago
|
||
Assignee | ||
Comment 3•1 year ago
|
||
Depends on D185346
Comment 4•1 year ago
|
||
Setting as release blocker given the volume on Nightly.
Updated•1 year ago
|
Comment 6•1 year ago
|
||
bugherder |
Comment 7•1 year ago
|
||
:alwu Since the patches landed the volume has dropped in Fenix but there are still crashes. The Desktop crash volume is still high.
Assignee | ||
Comment 10•1 year ago
|
||
In this crash report, I found that when doing remove frame, the interval we supposed to remove was [80,000,000-2,692,131,704]
, but we only removed [80,000,000-167,985,000]
, which left [167,985,000-2,692,131,704]
in the buffered range but there was already no data left in our buffer.
So the question here would be, if the highest timestamp in data was 167,985,000
, why the highest end time in our buffered range was 2,692,131,704
? It seems there would still be mismatching when adding frames. I am going to add another assertion to see if we can capture this problem.
Assignee | ||
Comment 11•1 year ago
|
||
Updated•1 year ago
|
Updated•1 year ago
|
Comment 12•1 year ago
|
||
Comment 13•1 year ago
|
||
Backed out for build bustages on TrackBuffersManager.cpp.
Failure log: https://treeherder.mozilla.org/logviewer?job_id=425458219&repo=autoland
Backout link: https://hg.mozilla.org/integration/autoland/rev/23f18f8fcd2b176f75b640be4dfea8cb5e17c940
Comment 14•1 year ago
|
||
Comment 15•1 year ago
|
||
bugherder |
Assignee | ||
Comment 16•1 year ago
|
||
For these two crashes [1], when the buffered range was empty, there was still a key frame left in the buffer. That makes me wonder if we have any logic error in this part of code. One guess is that the data in the buffer are stored based on the order of DTS, instead of PTS, would it be possible we break out the loop too early?
Another crash [2] showed that there were still 944 samples left when the buffered range was empty.... I still can't get the reasonable explanation about that. If we can't find the root cause in time, then I will need to downgrade those assertions to MOZ_ASSERT
and keep debugging them until finding the real problem. Paul, if you have any insight in terms of these crashes, please let me know! Thanks!
[1]
https://crash-stats.mozilla.org/report/index/5c5f22a3-2f62-494f-83e5-4bb6a0230810
https://crash-stats.mozilla.org/report/index/c5d35fa7-e88a-4857-a037-42fce0230810
[2]
https://crash-stats.mozilla.org/report/index/cedaef99-ae0e-4769-859a-ee7100230810#tab-details
Assignee | ||
Comment 17•1 year ago
|
||
Comment 18•1 year ago
|
||
Comment 19•1 year ago
|
||
bugherder |
Comment 20•1 year ago
|
||
(In reply to Alastor Wu [:alwu] from comment #16)
For these two crashes [1], when the buffered range was empty, there was still a key frame left in the buffer. That makes me wonder if we have any logic error in this part of code. One guess is that the data in the buffer are stored based on the order of DTS, instead of PTS, would it be possible we break out the loop too early?
https://crash-stats.mozilla.org/report/index/cedaef99-ae0e-4769-859a-ee7100230810#tab-details
Data is always stored in DTS order, not PTS order.
Could it be that there's a sample with a duration of 0 but has a high pts/dts?
Comment 21•1 year ago
|
||
Fx118 merge day is on 2023-08-28.
Wondering if there any additional patches planned under this bug?
There is a leave-open
keyword is on the bug, and there are no crash reports since the latest patch landed.
Assignee | ||
Comment 22•1 year ago
|
||
We can now close this bug, because the crash rate has been significantly reduced. In the meanwhile, I filed bug 1849216 to keep tracking this issue because we didn't find out the real problem yet.
Could it be that there's a sample with a duration of 0 but has a high pts/dts?
I thought about this possibility before, so I added an assertion to ensure that if the duration of sample should be zero if the buffered range is not empty. But that also failed. Crash reports showed that there were still real sample (with non-zero duration) left in the buffer, but the range is already zero. (Also saw opposite situation, zero data left, but still had a buffered range) I still suspect that there might be some incorrect time unit conversion somewhere, but I couldn't find it now.
Updated•1 year ago
|
Updated•1 year ago
|
Description
•