Closed Bug 1463512 Opened 6 years ago Closed 6 years ago

Crash in shutdownhang | RtlAcquireSRWLockExclusive | mozilla::places::History::Observe

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla62

Tracking Flags:

Tracking

Status

firefox-esr52

---

unaffected

firefox-esr60

---

unaffected

firefox60

---

unaffected

firefox61

---

wontfix

firefox62

---

fixed

People

(Reporter: marcia, Assigned: mak)

References

Details

(Keywords: crash, regression, Whiteboard: [fxsearch])

Crash Data

Marcia Knous [:marcia]

Reporter

Description

•

6 years ago

This bug was filed from the Socorro interface and is report bp-d6b2a216-bb7c-45e6-8a03-316ee0180518. ============================================================= Seen while looking at nightly crash stats - crashes started using 20180515100038: https://bit.ly/2keWCBO. 35 crashes/27 installations. Possible regression range based on Build ID: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=45ec8fd380dd2c308e79dbb396ca87f2ce9b3f9c&tochange=380cf87c1ee3966dd94499942b73085754dc4824 ni on :mak in case he has some insight. There is one lone crash in 61 in beta 6, but there is a definite noticeable spike on nightly. Top 10 frames of crashing thread: 0 ntdll.dll NtWaitForAlertByThreadId 1 ntdll.dll RtlAcquireSRWLockExclusive 2 xul.dll mozilla::places::History::Observe toolkit/components/places/History.cpp:3321 3 xul.dll nsObserverList::NotifyObservers xpcom/ds/nsObserverList.cpp:112 4 xul.dll nsObserverService::NotifyObservers xpcom/ds/nsObserverService.cpp:297 5 xul.dll mozilla::places::Database::Observe toolkit/components/places/Database.cpp:2715 6 xul.dll nsObserverList::NotifyObservers xpcom/ds/nsObserverList.cpp:112 7 xul.dll nsObserverService::NotifyObservers xpcom/ds/nsObserverService.cpp:297 8 xul.dll nsXREDirProvider::DoShutdown toolkit/xre/nsXREDirProvider.cpp:1070 9 xul.dll ScopedXPCOMStartup::~ScopedXPCOMStartup toolkit/xre/nsAppRunner.cpp:1372 =============================================================

Flags: needinfo?(mak77)

Marco Bonardo [:mak]

Assignee

Comment 1

•

6 years ago

The range includes new autofill, I don't see other large changes there. I was hoping to see this disappear after the fix for bug 1461753. Though, it's also the day after we published https://blog.nightly.mozilla.org/2018/05/14/deep-dive-new-bookmark-sync-in-nightly/ and the new sync involves exclusive locks. I suppose we have no way to correlate crashes to prefs... I have no further ideas for now, but let's track this.

Priority: -- → P1

Whiteboard: [fxsearch]

Marco Bonardo [:mak]

Assignee

Comment 2

•

6 years ago

We're pretty much waiting on shutdown to acquire an exclusive slim R/W lock, likely to close the readonly connection to places.sqlite kept open by History. Sync seems to be more likely, but I'm really guessing here. Kit, could it be that the mirror doesn't give up if shutdown happens in the meanwhile? I think shutdown hang happens after 1 minute and 3 seconds, so it would be an operation longer than this time...

Flags: needinfo?(mak77) → needinfo?(kit)

Lina Butler [:lina]

Comment 3

•

6 years ago

Hmm, the mirror does have checks (via `Async.jankYielder`, in https://searchfox.org/mozilla-central/rev/eb51d734a8c06ac5973291be9bcc74ed4e999bca/services/common/async.js#105,108) to abort merging if it sees `quit-application`...but we don't (can't, without changing Sqlite.jsm?) abort the triggers. I guess we'll want to cancel the statement once shutdown begins? The mirror writes from a cloned connection, so it seems like it's safe to interrupt its statements without affecting the main Places connection. I also noticed Sqlite.jsm adds a shutdown blocker in `executeTransaction` (https://searchfox.org/mozilla-central/rev/eb51d734a8c06ac5973291be9bcc74ed4e999bca/toolkit/modules/Sqlite.jsm#662-664), which we probably don't want, either. If the transaction doesn't finish before shutdown, we'll just start the process over again on the next sync. OOC, why are we hanging trying to shut down the *read-only* connection? The mirror uses an IMMEDIATE transaction, and I thought, in any case, readers won't wait on writers with WAL? The other use of the shutdown mutex that I found is https://searchfox.org/mozilla-central/rev/eb51d734a8c06ac5973291be9bcc74ed4e999bca/toolkit/components/places/History.cpp#1166 and https://searchfox.org/mozilla-central/rev/eb51d734a8c06ac5973291be9bcc74ed4e999bca/toolkit/components/places/History.cpp#1880. Could it be that the mirror starts merging and runs its trigger, that takes a long time, History.cpp takes `mShutdownMutex` and starts its transaction, that transaction ends up waiting on the mirror, the user shuts down, history hasn't started its transaction and is still holding `mShutdownMutex`, and we hang?

Flags: needinfo?(kit)

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

status-firefox60: --- → unaffected

status-firefox61: affected → fix-optional

status-firefox-esr52: --- → unaffected

status-firefox-esr60: --- → unaffected

Lina Butler [:lina]

Comment 4

•

6 years ago

Side note: Slow SQL telemetry (bug 1367023) would be very helpful here. Mine has two hits for `DELETE FROM itemsToMerge`, average 803ms.

Marco Bonardo [:mak]

Assignee

Comment 5

•

6 years ago

(In reply to Kit Cambridge (they/them) [:kitcambridge] from comment #4) > Side note: Slow SQL telemetry (bug 1367023) would be very helpful here. Mine > has two hits for `DELETE FROM itemsToMerge`, average 803ms. we are discussing it and will likely be ready soon.

Marco Bonardo [:mak]

Assignee

Comment 6

•

6 years ago

I have another possible theory, that this was caused by bug 1463017. Clearly that means this crash will disappear/reduce from the 20180530 nightly build. We'll be able to tell in a couple days. And this may also have other implications...

Marco Bonardo [:mak]

Assignee

Comment 7

•

6 years ago

Moving to P2 for now, because we don't have sufficient data to act on it, hopefully the new Slow SQL dashboard will give us that info, plus we are investigating some reports of slow SQL that may be related. Anyway doesn't look like this is directly actionable yet.

Priority: P1 → P2

Marco Bonardo [:mak]

Assignee

Updated

•

6 years ago

Depends on: 1467537, 1463017

Marco Bonardo [:mak]

Assignee

Comment 8

•

6 years ago

There are no crashes after the 9th of June build, so I assume bug 1467537 fixed this. IT can make sense, that bug may have caused longstanding statements and large wal journal, that are usually merged on shutdown. It also caused multi-seconds hangs on ssd, let alone mechanical disks.

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla62

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

Assignee: nobody → mak77

status-firefox61: fix-optional → wontfix

status-firefox62: affected → fixed

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Crash in shutdownhang | RtlAcquireSRWLockExclusive | mozilla::places::History::Observe

Categories

(Toolkit :: Places, defect, P2)

Tracking

()

People

(Reporter: marcia, Assigned: mak)

References

Details

(Keywords: crash, regression, Whiteboard: [fxsearch])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Updated