Closed Bug 1624384 Opened 5 years ago Closed 3 years ago

Crash in mozilla::dom::PlacesObservers::NotifyListeners

Categories

(Toolkit :: Places, defect, P1)

74 Branch
defect

Tracking

()

RESOLVED FIXED
98 Branch
Tracking Status
firefox-esr78 --- wontfix
firefox-esr91 --- wontfix
firefox74 --- wontfix
firefox93 --- wontfix
firefox94 --- wontfix
firefox95 --- wontfix
firefox96 --- wontfix
firefox97 --- fixed
firefox98 --- fixed

People

(Reporter: Crashdows, Assigned: daisuke)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file, 1 obsolete file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0

Steps to reproduce:

first please make sure go to bookmarks and search for an object(you must bookmarks page before) and select all bookmarks with ctrl+a or right click and select all, then try to delete with delete key or etc( at least 100 bookmark)

Actual results:

crash happens and nothing delete.
here is 3 times that my browser crash:
https://crash-stats.mozilla.org/report/index/90f49bf7-c3ad-4ec0-a56b-7f7e80200322

https://crash-stats.mozilla.org/report/index/770fd0a7-bcc7-4ee3-96b6-e5a620200322

https://crash-stats.mozilla.org/report/index/a6ffbdbd-146d-48ef-8331-a403e0200322

Expected results:

bookmarks must be removed.

Crash Signature: [@ mozilla::dom::PlacesObservers::NotifyListeners ]
Component: Untriaged → Places
Keywords: crash
Product: Firefox → Toolkit
Summary: mozilla::dom::PlacesObservers::NotifyListeners → Crash in mozilla::dom::PlacesObservers::NotifyListeners

I wonder if this is because deleting that many bookmarks at once fires a notification that causes some other code to run and also fire a notification, so we end up re-entering notifyObservers. This is enough to trigger that assertion:

PlacesObservers.addListener(["bookmark-added"], events =>
  PlacesObservers.notifyListeners([new PlacesBookmarkAddition({ dateAdded: 0, guid: "bookmarkAAAA", id: -1, index: 0, isTagging: false, itemType: 1, parentGuid: "fakeAAAAAAAA", parentId: -2, source: 0, title: "A", url: "http://example.com/a" })])
);
// PlacesObservers.notifyListeners([]); // Uncomment to crash!

Javad, do you have Sync turned on? That would be my first guess for what's firing notifyObservers...

(In reply to :Lina Cambridge from comment #1)

I wonder if this is because deleting that many bookmarks at once fires a notification that causes some other code to run and also fire a notification, so we end up re-entering notifyObservers. This is enough to trigger that assertion:

PlacesObservers.addListener(["bookmark-added"], events =>
  PlacesObservers.notifyListeners([new PlacesBookmarkAddition({ dateAdded: 0, guid: "bookmarkAAAA", id: -1, index: 0, isTagging: false, itemType: 1, parentGuid: "fakeAAAAAAAA", parentId: -2, source: 0, title: "A", url: "http://example.com/a" })])
);
// PlacesObservers.notifyListeners([]); // Uncomment to crash!

Javad, do you have Sync turned on? That would be my first guess for what's firing notifyObservers...

yes ofc, browser sync turned onf and work well

Do you have a plan of action for this?

Flags: needinfo?(lina)

Doug, what do you think is the best way to make notifyListeners re-entrant? I see we have a queue for addListener and removeListener calls; should we also queue up notifyListeners calls, and pop them off in a loop? (Flushing our addListener and removeListener queues after each turn, so that re-entrant calls to notifyListeners that end up calling {add, remove}Listener...my head hurts a bit thinking about it! 😅) Or is this not worth it or unsafe?

Flags: needinfo?(lina) → needinfo?(dothayer)

I don't have strong opinions here, but I can give my two cents regarding the trade-offs.

I don't have any technical objections for there being a queue of notifyListeners calls - however it can lead to bad assumptions regarding the ordering guarantees. For the most part if you call notifyListeners you can assume that all listeners have run by the time the function returns. This however would now only be true if you're not calling it in a reentrant way.

You could conceivably just allow it to nest, which would keep the ordering guarantees, but you would need to fix the CallListeners function which removes listeners that no longer exist (due to weak references) - since if we were in a nested notifyListeners call, this would mess up the iteration of the parent notifyListeners call. There might be other things that you would need to tweak to get this to be robust - I can't guarantee CallListeners is the end of it.

The third option would just be to require any consumers to defer their notifyListeners call, which would make the ordering at the call site explicit and not require any risky changes inside the PlacesObservers system. But, it would leave the risk around of introducing unexpected crashes like this in the future.

I would be okay with any of these options - I'm not in this code often enough anymore to feel like I have significant stake in it. If you would like me to have a preference though, I would say option 2 (keeping the ordering guarantees and allowing nested notifyListeners calls) sounds the most appealing. It also has the benefit that it will crash if we find ourselves in an infinite loop by running out of stack size, rather than just happily eating resources in the background like the other options.

Flags: needinfo?(dothayer)

The priority flag is not set for this bug.
:mak, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(mak)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(mak)
Priority: -- → P1

Marco, you have this as P1 for 2 years. We have a regression in bug 1734566, can you advise?

Flags: needinfo?(mak)

P1 and P2 are just used internally to the "team" to track things, what matters for the project is the severity field.

I must refresh my memory about this issue, it's a protective assertion to avoid re-entrant calls, and we must pick a strategy for that, as explained in comment 5.
In practice there are 3 options:

  1. enqueue new notifications. In this case we manage a FIFO queue of notifications, whoever is notifying will check if there's more to send. The nested order would be messed up, but could be easier to do.
  2. nest notifications. In this case new ones are just fired, CallListeners must change to check if it's a nested call, and remove dead listeners only at the end if it's not nested. Nested order is respected, but it has some risks of mismanaging in CallListeners.
  3. just annotate in the webidl that it's not possible to nest listeners, and fix any consumers doing it to enqueue. It's easy, but doesn't remove the risk of hitting the assertion. The order would be pretty much similar to (1), so this is likely not our best option.

I agree with Doug that a first try should be made to implement (2), if that becomes too hairy, we can fallback to (1).
Daisuke, would you have time to investigate this? Feel free to push back if it's too much or you're overloaded and we'll find a different solution.

Flags: needinfo?(mak) → needinfo?(daisuke)

Yes, let me do this.
I got the reason for this crash. So, I have questions about implementation.

(In reply to Marco Bonardo [:mak] from comment #8)

  1. enqueue new notifications. In this case we manage a FIFO queue of notifications, whoever is notifying will check if there's more to send. The nested order would be messed up, but could be easier to do.

Does this mean, we keep the events as like below?

void PlacesObservers::NotifyListeners(aEvents) {
  mEventsList.AppendElement(aEvents);
  .... notify
}
  1. nest notifications. In this case new ones are just fired, CallListeners must change to check if it's a nested call, and remove dead listeners only at the end if it's not nested. Nested order is respected, but it has some risks of mismanaging in CallListeners.

And, I'm so sorry, I don’t understand clearly yet what kind of implementation you are thinking of for these nested notifications. If possible, could you explain a bit more?

Flags: needinfo?(daisuke) → needinfo?(mak)

Yeah, the implementation bits are the tricky part to figure out here and may require playing with the code a bit.
I assume in (1) we'd detect that we are already notifying, thus we'd add those notifications to some sort of a queue, the original code that is currently notifying, when done with its notifications would check if there's anything in queue and serve it.
For (2) instead I think you'd basically just have to remove the assert and let the current code continue, but don't remove null listeners (also see Doug's comment 5) in nested calls, to avoid changing the listeners array position for the outside loop. Thus each call should track whether its nested or not.
This is on a first stance, as I said it's possible there are non visible issues that will arise when we start touching the code.
Doing this in a test-driver manner is probably suggested, it should be easy to write some nested notifications cases.

Flags: needinfo?(mak)

Thank you very much!
Perhaps, I could understand :)

Attached file Bug 1624384: Allow nested notifications. (obsolete) (deleted) —
Assignee: nobody → daisuke
Status: NEW → ASSIGNED
Has Regression Range: --- → yes

Set release status flags based on info from the regressing bug 1729423

I just got a crash after deleting just a single bookmark: https://crash-stats.mozilla.org/report/index/0fd9ead8-20a9-4229-b783-c9e590211113

Workaround for the crashes starting in 94 (Bug 1734566) is to set widget.wayland.async-clipboard.enabled = false in about:config.

No need to track for 95 given the volume on beta.

Attachment #9246084 - Attachment is obsolete: true

(In reply to Kestrel from comment #19)

Workaround for the crashes starting in 94 (Bug 1734566) is to set widget.wayland.async-clipboard.enabled = false in about:config.

The workaround here will be removed as it causes more clipboard issues (failed paste, non-working clipboard in dialogs).
Is there any way how to fix that?

I also experience a crash when deleting just one bookmark.

I don't know whether it's this bug or another one.

Here is the log: https://crash-stats.mozilla.org/report/index/94a7c5ad-7920-415e-b3e9-9497d0211208

How can I work around it?

Will there be a fix?

Distribution ID: mozilla-MSIX | User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0 | OS: Windows_NT 10.0 19044. No crash.

Pushed by dakatsuka.birchill@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7736689d07b7 Allow nested notifications by notifing events sequentially. r=mak

(In reply to Tizio Caio from comment #32)

Still crashing if I delete a one single bookmark

Please be patient, this bug is not yet marked as fixed, and even when it is fixed, it'll probably be 10th Jan before it is included in a release.

Flags: needinfo?(daisuke)
Crash Signature: [@ mozilla::dom::PlacesObservers::NotifyListeners ] → [@ mozilla::dom::PlacesObservers::NotifyListeners ] [@ libc.so.6@0xf6b5d | libglib-2.0.so.0@0x594dc]
Pushed by dakatsuka.birchill@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/56b41210698b Allow nested notifications by notifing events sequentially. r=mak
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 98 Branch

Can this be backported to Firefox 96? It is rather inconvinient to not be able to delete a bookmark...

This patch is not a good backport candidate for 96 due to the large size and complexity. It needs time to bake before we ship it to the release population at large. We are hoping to backport to Beta for Fx97 due to ship in a couple weeks, however.

Crash Signature: [@ mozilla::dom::PlacesObservers::NotifyListeners ] [@ libc.so.6@0xf6b5d | libglib-2.0.so.0@0x594dc] → [@ mozilla::dom::PlacesObservers::NotifyListeners ] [@ libc.so.6@0xf6b5d | libglib-2.0.so.0@0x594dc]

Comment on attachment 9248579 [details]
Bug 1624384: Allow nested notifications by notifing events sequentially.

Beta/Release Uplift Approval Request

  • User impact if declined: Might be crash when deleting bookmark on linux.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): The patch changes the way to notify places events, and is not so small.
  • String changes made/needed: none
Attachment #9248579 - Flags: approval-mozilla-beta?

Comment on attachment 9248579 [details]
Bug 1624384: Allow nested notifications by notifing events sequentially.

Moderately-risky patch needed to address a pretty frequent Linux crash spike. Approved for 97.0b6.

Attachment #9248579 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Crash Signature: [@ mozilla::dom::PlacesObservers::NotifyListeners ] [@ libc.so.6@0xf6b5d | libglib-2.0.so.0@0x594dc] → [@ mozilla::dom::PlacesObservers::NotifyListeners ] [@ libc.so.6@0xf6b5d | libglib-2.0.so.0@0x594dc] [@ <name omitted> | <name omitted> | <name omitted> | <name omitted> | <name omitted> | <name omitted> | js::Call]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: