Closed Bug 1745864 Opened 3 years ago Closed 2 years ago

Hit MOZ_CRASH(mozilla::LinkedList<mozilla::dom::ContentParent>::~LinkedList() [T = mozilla::dom::ContentParent] has a buggy user: it should have removed all this list's elements before the list's destruction)

Categories

(Core :: DOM: Content Processes, defect, P3)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox97 --- affected

People

(Reporter: tsmith, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: assertion, intermittent-failure, Whiteboard: [stockwell unknown])

Found while fuzzing m-c 20211212-1a4d98507807 (--enable-debug --enable-fuzzing)

Hit MOZ_CRASH(mozilla::LinkedList<mozilla::dom::ContentParent>::~LinkedList() [T = mozilla::dom::ContentParent] has a buggy user: it should have removed all this list's elements before the list's destruction) at /builds/worker/workspace/obj-build/dist/include/mozilla/LinkedList.h:444

#0 0x7ffb424972be in MOZ_Crash /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:261:3
#1 0x7ffb424972be in mozilla::LinkedList<mozilla::dom::ContentParent>::~LinkedList() /builds/worker/workspace/obj-build/dist/include/mozilla/LinkedList.h:440:7
#2 0x7ffb4245bd1e in operator() /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:463:5
#3 0x7ffb4245bd1e in reset /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:305:7
#4 0x7ffb4245bd1e in mozilla::UniquePtr<mozilla::LinkedList<mozilla::dom::ContentParent>, mozilla::DefaultDelete<mozilla::LinkedList<mozilla::dom::ContentParent> > >::~UniquePtr() /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:253:18
#5 0x7ffb530cea26 in __run_exit_handlers /build/glibc-eX1tMB/glibc-2.31/stdlib/exit.c:108:8
#6 0x7ffb530cebdf in exit /build/glibc-eX1tMB/glibc-2.31/stdlib/exit.c:139:3
#7 0x7ffb530ac0b9 in __libc_start_main /build/glibc-eX1tMB/glibc-2.31/csu/../csu/libc-start.c:342:3
#8 0x55e4611e465c in _start (/home/worker/builds/m-c-20211212214735-fuzzing-debug/firefox-bin+0x1565c)

A Pernosco session is available here: https://pernos.co/debug/lrMMgchhCroAiBviWvxFvA/index.html

mccr8: This is some sort of leak right? Is the Pernosco session helpful?

Flags: needinfo?(continuation)

The Bugbug bot thinks this bug should belong to the 'Core::DOM: Content Processes' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: General → DOM: Content Processes

I'm not sure I've seen this specific LinkedList assert before. I think it means that sContentParents wasn't cleared by the time we shut down, which I guess probably does mean that we're leaking a ContentParent.

(In reply to Andrew McCreight [:mccr8] from comment #3)

I'm not sure I've seen this specific LinkedList assert before. I think it means that sContentParents wasn't cleared by the time we shut down, which I guess probably does mean that we're leaking a ContentParent.

I assume clearing the sContentParents variable as such would just have triggered the same assert earlier if the list was non-empty? FWIW, I do not see any attempt to clear it anywhere in our code. So it seems to behave pretty much like a static instance.

In the pernosco session there seems to be no call to any of the LinkedListElement::remove* variants, which is probably not too surprising if we end up like this. We seem to re-use the same process three times. I had a hard time to find the place where we are supposed to remove those elements from the list, though.

Flags: needinfo?(continuation)

Olli, can you please help triage this? Thank you!

Flags: needinfo?(bugs)

This means something is leaking a ContentParent, as is said here already.
Removal happens when the relevant entry is deleted https://searchfox.org/mozilla-central/rev/e74a8e2d1dedebfe47a6afb009ddabdeea12694b/mfbt/LinkedList.h#200

Severity: -- → S3
Flags: needinfo?(bugs)
Priority: -- → P3

Hi,

I am seeing a similar, not exactly the same crash message with M-C/C-C TB.
LinkedList with a different content type triggers the shutdown-time crash.

I created a local DEBUG build of TB and found that it crashed at the shutdown time in a similar manner.
Except that the message points to a list of different type.

Hit MOZ_CRASH(mozilla::LinkedList<T>::~LinkedList() [with T = nsSHistory] has a buggy user: it should have removed all this list's elements before the list's destruction) at /NEW-SSD/moz-obj-dir/objdir-tb3/dist/include/mozilla/LinkedList.h:440
#01: ???[/NEW-SSD/moz-obj-dir/objdir-tb3/dist/bin/libxul.so +0x616b5e9]
#02: ???[/lib/x86_64-linux-gnu/libc.so.6 +0x3ef67]
#03: ???[/lib/x86_64-linux-gnu/libc.so.6 +0x3f10a]
#04: __libc_start_main[/lib/x86_64-linux-gnu/libc.so.6 +0x277f4]
#05: _start[/NEW-SSD/moz-obj-dir/objdir-tb3/dist/bin/thunderbird +0x162fa]
#06: ??? (???:???)

The full stacktrace, etc can be found in bug 1333342 comment 30 and preceding comments there.

Has this check been introduced lately? Or has there been any major code update for LinkedList. I am not sure if I have been hit with this assert before. (It IS possible that since I was testing |maildir| folder format in bug 1333342, it may be that I don't see it often since I use |mbox| folder format.)

I found out that in my case, statically allocated gSHistoryList variable (that holds a value of type mentioned in the assert, "with T = nsSHistory")
seems to point to a circular list and I wonder if it is a bad thing in itself.
See Bug 1333342 comment 30
https://bugzilla.mozilla.org/show_bug.cgi?id=1333342#c30

I was about to post a bugzilla, but found this entry and wanted to see if there is a common thread to the crash I saw.
(I wonder if I should file a separate bugzilla entry.)
I was a bit taken aback when I realized nsSHistory is for browser session.
I was testing TB and only copying messages from one folder to the other.
I wonder where this browser session thing showed up in the scenario.
But, come to think of it, the display of a message text may be handled as a browser display of a sort.

Anyway, another shutdown time crash to take care of :-(

As said in bug 1333342 comment 32, the two bugs have most probably different underlying causes.

(In reply to Intermittent Failures Robot from comment #21)

19 failures in 694 pushes (0.027 failures/push) were associated with this bug yesterday.
For more details, see:
https://treeherder.mozilla.org/intermittent-failures/bugdetails?bug=1745864&startday=2022-03-17&endday=2022-03-17&tree=all

So this is weird, as this gives a list of treeherder runs that do not seem to contain the MOZ_CRASH(mozilla::LinkedList<mozilla::dom::ContentParent>::~LinkedList() but other failures.

(In reply to Intermittent Failures Robot from comment #24)

9 failures in 3341 pushes (0.003 failures/push) were associated with this bug in the last 7 days.
For more details, see:
https://treeherder.mozilla.org/intermittent-failures/bugdetails?bug=1745864&startday=2022-03-21&endday=2022-03-27&tree=all

Seems that the mapping now works again as expected and we can ignore comment 22.

One hacky way to "fix" this is the approach I took in my patch in bug 1661862, where I add a wrapper around the linked list that clears the linked list when it is destroyed. ContentParents are leak checked, so we'll still have some error.

Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

I removed this assertion. It looks like the recent failures being put into this bug are actually bug 1770330.

Whiteboard: [stockwell unknown]

(In reply to Andrew McCreight [:mccr8] from comment #55)

I removed this assertion. It looks like the recent failures being put into this bug are actually bug 1770330.

Sorry, I confused myself. That other bug is for the history issue.

Whiteboard: [stockwell unknown]
Depends on: 1783933

Hi Tyson, would it be possible to have an updated (on latest m-c) pernosco trace? Just to see if it fails the very same way.

Flags: needinfo?(twsmith)

(In reply to Jens Stutte [:jstutte] from comment #59)

Hi Tyson, would it be possible to have an updated (on latest m-c) pernosco trace? Just to see if it fails the very same way.

This was last reported by fuzzers targeting m-c 20220810-615367610bb5. Prior to that build we were seeing a small number of reports every day.

I'm guessing something fixed it or maybe something else is blocking the fuzzers from finding it.

Flags: needinfo?(twsmith)

Yeah bug 1783933 stopped the assertions.

Then I assume there is nothing left to do here.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.