abort() pending read transactions in remote settings' Database.jsm when we shutdown to enable faster shutdown
Categories
(Firefox :: Remote Settings Client, defect)
Tracking
()
People
(Reporter: Gijs, Assigned: Gijs)
References
(Blocks 2 open bugs)
Details
(Whiteboard: [fxperf:p1])
Attachments
(2 files)
(In reply to Andrew Sutherland [:asuth] (he/him) from comment #6)
(In reply to :Gijs (he/him) from comment #3)
- It feels like read requests to indexeddb shouldn't be blocking shutdown (because performance), but I'm unclear on whether, per bug 1545438 comment 26, we need to do that at least for now to avoid a much longer hang (I think so.). :asuth, can you clarify whether I'm right in thinking we're better off blocking the completion of
profile-before-change
on those IDB requests completing for now?If you're not going to wait for the reads, it's appropriate to invoke abort() on any IDBTransaction instances that have already been created and then call close().
It seems like we could do this fairly easily in Database.jsm by making executeIDB
take an option to indicate the transaction can be aborted, and then adding that option to callers that do read-only transactions. executeIDB
can use that option to keep track of pending read transactions, and when the async shutdown promise getter trips we can call abort
on all those transactions. Per the spec, it seems this should cause an abort
event which executeIDB
can listen for, and it can reject its promise with an error in that case.
After the changes in bug 1598924 doing so should automatically throw and stop processing whatever we were doing, and then call .close()
. I think this should always be fine for reads on the RS databases, though we should perhaps audit the callsites of the read-only methods from the previous paragraph to ensure there aren't problematic consumer patterns where they do multiple writes interspersed with reads that aren't in the same transaction - of course, that would already be bad, because both the writes and reads can already be aborted/throw for other reasons.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
Comment 4•5 years ago
|
||
Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=15a52d0bf11640e0fbbbd30c09d7a7e822e0524a&selectedJob=295107068
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=295107068&repo=autoland
https://treeherder.mozilla.org/logviewer.html#?job_id=295096429&repo=autoland
https://treeherder.mozilla.org/logviewer.html#?job_id=295096602&repo=autoland
Backout link: https://hg.mozilla.org/integration/autoland/rev/c012372b0db2
Assignee | ||
Comment 5•5 years ago
|
||
Note: there were also xpcshell failures:
TEST-UNEXPECTED-FAIL | xpcshell-legacyconfig.ini:toolkit/components/search/tests/xpcshell/test_selectedEngine.js | test_fallback_kept_after_restart - [test_fallback_kept_after_restart : 265] A promise chain failed to handle a rejection: A mutation operation was attempted on a database that did not allow mutations. - stack: ensureShutdownBlocker/<@resource://services-settings/Database.jsm:534:21
Assignee | ||
Comment 6•5 years ago
|
||
The issues with browser_RecipeRunner.js are easy to fix - there we can just await
the calls to db.close()
(which they're not right now), and then we stop trying to reuse a db we're in the middle of closing.
However, fundamentally all the methods on Database
(now InternalDatabase
) share their access to _idb
. And all of the ones that want that object call await this.open()
. That's great. The problem is that a bunch of the operations they then proceed to do are asynchronous, and so if there are multiple consumers using the DB via remotesettings, one of them can end up calling this.db.close()
while one of the others (using the same db!) is mid-operation. This does not go well.
Mathieu, how do you want this part of remote settings to work? Should we somehow keep track of pending operations and make Database.close
a no-op as long as at least one operation is still ongoing?
Comment 9•5 years ago
|
||
As a general FYI, the IndexedDB backend will automatically close the underlying database connections if they're idle for a while and reopen it on demand. It would probably be fine to keep the database around once the consumer triggers the open, and then hold onto it until shutdown as long as there's an expectation that there will be future database accesses.
Also, re: database closing, note that IDB's close() doesn't actually take effect "until all outstanding transactions have completed" so as long as things are happening within a transaction, things should generally keep working.
Assignee | ||
Comment 10•5 years ago
|
||
Ah, I think I've figured it out. I think this is a data race in my patch. This happens:
- call A calls close on the DB (which async-awaits the cached
_idb
ref) - call B asks for a DB (which also async-awaits the cached
_idb
ref)
(microtask checkpoint runs, and A+B get their db)
- call A closes the DB
- call B tries to start a transaction on the closed DB
If calling close
clears the this._idb
property sync, instead of only doing so after await
'ing it, this problem should go away, because then in step 2, B gets a fresh ref.
If the microtask queue goes the "other way", so B runs before A, this already can't happen:
- call B asks for a DB (awaits this._idb)
- call A calls close, clearing _idb and await'ing it
(microtask checkpoint runs, and A+B get their db)
- call B starts a transaction on the DB.
- call A calls close, which per Andrew in comment #9 will wait for the transaction to finish.
Assignee | ||
Comment 11•5 years ago
|
||
So apparently comment #10, while sounding plausible, is still wrong - it's possible for the close to happen between when we call open()
(getting a cached promise) and when the promise resolves and we actually try to use said DB...
Assignee | ||
Comment 12•5 years ago
|
||
(In reply to :Gijs (he/him) from comment #11)
So apparently comment #10, while sounding plausible, is still wrong - it's possible for the close to happen between when we call
open()
(getting a cached promise) and when the promise resolves and we actually try to use said DB...
OK, this took me a surprising amount of time to work out, but it's because open
is declared async. See also this (to me, surprising) minimal example: https://jsbin.com/ceyiquxeqe/edit?html,js,output
Declaring open
as a sync function (that just returns a promise) makes this issue go away. The fact that we take an extra turn through the event loop / extra microtask queue to get the DB if open
is declared async is... surprising, I think, and feels fragile.
I'm not really sure what to do about that though, short of re-architecting Database.jsm and remote settings to somehow never call close, or keep track of pending transactions and delaying close
until all pending transactions have finished...
(In reply to Andrew Sutherland [:asuth] (he/him) from comment #9)
As a general FYI, the IndexedDB backend will automatically close the underlying database connections if they're idle for a while and reopen it on demand. It would probably be fine to keep the database around once the consumer triggers the open, and then hold onto it until shutdown as long as there's an expectation that there will be future database accesses.
I would do this (ie just not call close
) if it wasn't for the other issues that we've had in this space, where IDB seems to take 30 seconds, even during shutdown, to decide the connection is idle and to close it...
That said, not all the operations in RS currently call close
when done, so this generally feels like it's already messy.
Comment 13•5 years ago
|
||
(In reply to :Gijs (he/him) from comment #12)
I would do this (ie just not call
close
) if it wasn't for the other issues that we've had in this space, where IDB seems to take 30 seconds, even during shutdown, to decide the connection is idle and to close it...
To clarify, I meant calling close (and abort) at your component's shutdown, not never calling close.
Updated•5 years ago
|
Assignee | ||
Comment 15•5 years ago
|
||
If I can't manage to get this into 76 before we branch I'll probably be looking to uplift.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 16•5 years ago
|
||
Andrew, Mathieu: just to check, this should now be back in your queues...
Assignee | ||
Comment 18•5 years ago
|
||
I'm now running into this assertion failure:
GECKO(1226) | Assertion failure: data.mRecursionDepth > mBaseRecursionDepth, at /builds/worker/checkouts/gecko/xpcom/base/CycleCollectedJSContext.cpp:546
and not 100% sure what to do about that...
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Comment 19•5 years ago
|
||
Assignee | ||
Comment 21•5 years ago
|
||
Comment 22•5 years ago
|
||
Comment 23•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/7312dfafef58
https://hg.mozilla.org/mozilla-central/rev/825f035018c2
Comment 24•5 years ago
|
||
This is not fixed as it continued appearing after the last follow-up:
https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception&searchStr=xpc&revision=09fc2941dd6844a6404834ec3bd76ed4f2e44102&selectedJob=296019366
https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception&searchStr=xpc&tochange=7f069649d6cf0d7d8cba14cfd987e835509b22fa&fromchange=825f035018c2a7c4526ad68909d96ddd585b7a71
Gijs, could you please take a look at this?
Assignee | ||
Comment 25•5 years ago
|
||
(In reply to Noemi Erli[:noemi_erli] from comment #24)
This is not fixed as it continued appearing after the last follow-up:
https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception&searchStr=xpc&revision=09fc2941dd6844a6404834ec3bd76ed4f2e44102&selectedJob=296019366
https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception&searchStr=xpc&tochange=7f069649d6cf0d7d8cba14cfd987e835509b22fa&fromchange=825f035018c2a7c4526ad68909d96ddd585b7a71Gijs, could you please take a look at this?
Thanks for the heads-up, I filed bug 1627179 for these failures.
Assignee | ||
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Description
•