Crash in AsyncShutdownTimeout | places.sqlite#0: waiting for clients | places.sqlite#0: AboutSync: promiseSql (18),places.sqlite#0: Bookmarks.jsm: fetchBookmark (17),places.sqlite#0: GuidHelper.getItemGuid (16),places.sqlite#0: History.jsm: insert (13)...
Categories
(Toolkit :: Storage, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox59 | --- | unaffected |
firefox60 | --- | wontfix |
firefox61 | --- | fixed |
People
(Reporter: kkumari, Assigned: lina)
References
Details
(Keywords: crash)
Crash Data
Attachments
(3 files, 1 obsolete file)
Updated•7 years ago
|
Reporter | ||
Comment 2•7 years ago
|
||
Assignee | ||
Comment 3•7 years ago
|
||
Assignee | ||
Comment 4•7 years ago
|
||
Assignee | ||
Comment 5•7 years ago
|
||
Comment 6•7 years ago
|
||
Assignee | ||
Comment 8•7 years ago
|
||
Comment 9•7 years ago
|
||
Comment 10•7 years ago
|
||
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 14•7 years ago
|
||
Assignee | ||
Comment 15•7 years ago
|
||
Assignee | ||
Comment 16•7 years ago
|
||
Assignee | ||
Comment 17•7 years ago
|
||
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Comment 22•7 years ago
|
||
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 28•7 years ago
|
||
mozreview-review |
Assignee | ||
Comment 29•7 years ago
|
||
mozreview-review-reply |
Comment 30•7 years ago
|
||
mozreview-review |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 34•7 years ago
|
||
Assignee | ||
Updated•7 years ago
|
Comment 36•7 years ago
|
||
mozreview-review |
Comment 37•7 years ago
|
||
mozreview-review |
Comment 38•7 years ago
|
||
mozreview-review |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 42•7 years ago
|
||
mozreview-review |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 46•7 years ago
|
||
Comment 47•7 years ago
|
||
bugherder |
Comment 48•7 years ago
|
||
Comment 49•7 years ago
|
||
Assignee | ||
Comment 50•7 years ago
|
||
Reporter | ||
Updated•7 years ago
|
Comment 51•4 years ago
|
||
@Lina, on bug 1567018 we are hitting another possible SQLITE_BUSY infinite loop. On bug 1567018 comment 26, the user reported that removing the favicons.sqlite
file cleared the problem. Can you say if the fix landed here would apply to the favicons.sqlite
access as well? I'm wondering if a similar fix is also needed for favicons or if bug 1567018 is just a different problem.
Assignee | ||
Comment 52•4 years ago
|
||
Ooof, good question—I don't know! Even after this fix landed, we were still seeing stepStatement
return SQLITE_BUSY
forever (just a lot less frequently) when using a second read-write connection. I never could come up with a standalone test case, but it was easy to repro just by opening the Library and quickly dragging bookmarks around with Sync turned on. Eventually, the main connection would wedge forever trying to run a statement, and the second Sync-only connection would keep working...but the awesomebar and library would be broken, and closing Firefox would trigger a shutdown hang, just like bug 1567018 describes. In the end, we fixed this particular problem by removing the second connection entirely (bug 1567238, comment 4), and having the main Places connection do all the writing.
But favicons.sqlite
is an attached database (even though it's on the same, single read-write connections), and I'm not sure how that factors into things. It might be a different problem, or it might be the same one, with some combination of attached databases, multiple connections—even though we only have one writer, we still have multiple readers—transactions, and locks not getting released when they should.
Mak, do you have any more insights?
Comment 53•4 years ago
|
||
(In reply to :Lina Cambridge from comment #52)
But
favicons.sqlite
is an attached database (even though it's on the same, single read-write connections), and I'm not sure how that factors into things. It might be a different problem, or it might be the same one, with some combination of attached databases, multiple connections—even though we only have one writer, we still have multiple readers—transactions, and locks not getting released when they should.
Unfortunately no, I'd really like to know what's up. Ideally an attached database shouldn't change much since we still have just a single connection writing. And using WAL should ideally ensure readers never get a BUSY state. I've seen other reports on the Web about SQLITE_BUSY with attached databases, so I can't exclude there are unknowns here.
One interesting question would be to figure out whether that favicons.sqlite was corrupt or sane... But since it has been removed in the meanwhile, we may never know. It could also be a case where a query is taking a really huge amount of time (maybe due to some specific characteristic of that db), now the perf profiler should report the running SQL query in a marker, or at least a part of it (bug 1635570), so we should always ask for profiles when possible.
Of course, having some way to reproduce these behavior would go a long way to help us understanding the problem.
Comment 54•4 years ago
|
||
Thanks for looking into it. A couple of questions which might be better moved to bug 1567018.
I wonder if we could add some diagnostic asserts to Nightly to trigger a crash if we ever get SQLITE_BUSY repeatedly assuming that would help debug this further. And should we land a fix to make it impossible to infinite loop here?
I noticed that in AsyncExecuteStatements::Run()
, we check mCancelRequested
. If mCancelRequested
could be set between the time the runnable is dispatched and when it is executed, presumably it could be set after we have checked in ::Run. Should we be checking this while in the SQLITE_BUSY loop or perhaps checking for any other state indicating a shutdown is happening?
Comment 55•4 years ago
|
||
(In reply to Haik Aftandilian [:haik] from comment #54)
I wonder if we could add some diagnostic asserts to Nightly to trigger a crash if we ever get SQLITE_BUSY repeatedly assuming that would help debug this further. And should we land a fix to make it impossible to infinite loop here?
Note getting SQLITE_BUSY in certain cases is absolutely normal, because we have a few connections using multiple threads, we use PRAGMA busy_timeout to cope with that in Places, we should likely do the same in any component that uses both the sync and async API on the same connection.
By code inspection, that's likely just Places, Cookies and tests. Afaict, Places is the only one that has to do real mixed usage, Cookies only do opening/closing on MT, so maybe it's not strictly necessary. Effectively it looks like fixing Cookies and tests, would leave Places as the only mixed use consumer that is a bit more complex to attack.
I noticed that in
AsyncExecuteStatements::Run()
, we checkmCancelRequested
. IfmCancelRequested
could be set between the time the runnable is dispatched and when it is executed, presumably it could be set after we have checked in ::Run. Should we be checking this while in the SQLITE_BUSY loop or perhaps checking for any other state indicating a shutdown is happening?
Potentially yes, my only doubt is that accessing it requires a lock on mMutex, that may still be ok considered we don't set it that often.
I'm not completely sure why it's not just an atomic, but there's a long discussion in bug 506805, that may be outdated considered it was about PR_AtomicSet and very old Gcc. That discussion ended up replacing the old PR_ATOMIC with the mutex (https://hg.mozilla.org/mozilla-central/rev/097171b5d15f).
I wonder if today we could use std::atomic<bool>, Andrew was part of that original discussion and maybe has an insight.
Otherwise, I think we can just use the mutex.
Comment 57•4 years ago
|
||
(In reply to Marco Bonardo [:mak] from comment #55)
I noticed that in
AsyncExecuteStatements::Run()
, we checkmCancelRequested
. IfmCancelRequested
could be set between the time the runnable is dispatched and when it is executed, presumably it could be set after we have checked in ::Run. Should we be checking this while in the SQLITE_BUSY loop or perhaps checking for any other state indicating a shutdown is happening?Potentially yes, my only doubt is that accessing it requires a lock on mMutex, that may still be ok considered we don't set it that often.
Acquiring the lock should be OK. It's only held for a few instructions each time it is used and it appears we never hold both mMutex and mDBMutex meaning we don't have to worry about deadlock if we follow that convention.
Updated•4 years ago
|
Updated•4 years ago
|
Comment 58•4 years ago
|
||
Sorry about the delay, I was actively looking into this when a big storm knocked out my power for a few days.
(In reply to Marco Bonardo [:mak] from comment #55)
I wonder if today we could use std::atomic<bool>, Andrew was part of that original discussion and maybe has an insight.
Otherwise, I think we can just use the mutex.
We should just use the mutex. If anything, the fact that we now have Link Time Optimization (LTO) means it's much more dangerous to try and get clever with use of atomics. The cost of the mutex is negligible given the inefficiency of SQLITE_BUSY itself.
Description
•