Closed Bug 1047811 Opened 10 years ago Closed 10 years ago

Move some of COMMIT TRANSACTION off the main-thread

Categories

(Toolkit :: Places, defect)

defect
Not set
normal
Points:
8

Tracking

()

RESOLVED FIXED
mozilla35
Iteration:
35.1

People

(Reporter: mak, Assigned: mak)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 6 obsolete files)

COMMIT TRANSACTION is one of the top Slow SQL offenders, for good reasons (fsyncs).

We might try to move the fsyncs off the main thread. Some ideas:
1. we could just manage the transaction in the batches, starting a batch immediately BEGIN TRANSACTION, while ending it starts an async COMMIT
2. we could stop using the mozStorageTransaction RAII and make special one (either in Storage or just for Places) that commits off the main thread

In any case we must pay attention to nested transactions:
BEGIN 1 BEGIN 2 COMMIT 2 COMMIT 1 will commit at COMMIT 2
we might instead use SAVEPOINTs, RELEASE only commits when the transactions stack is empty. Might be interesting to create a mozStorageAsyncSavepoint that will release off the main thread if the async thread is active, on the main thread otherwise.
Flags: firefox-backlog+
Iteration: --- → 34.2
Points: --- → 8
Iteration: 34.2 → ---
Assignee: nobody → mak77
Status: NEW → ASSIGNED
Iteration: --- → 34.2
QA Whiteboard: [qa-]
Attached patch patch v1 (obsolete) (deleted) — Splinter Review
I must still write a test, similar to test_transaction_helper.cpp, but I think it's better to get early feedback on the approach before spending time on that.

I had mixed feelings whether this should be in Places or Storage, but I think doesn't hurt to have it available globally in case we might need for other conversions.
This is clearly not a final goal, more like an interim solution during the sync to async conversion.

I also had some thoughts about taking busy_timeout as a default for any connection, but in the end I preferred staying safe and see how it will behave in Places.
Attachment #8469252 - Flags: feedback?(bugmail)
Comment on attachment 8469252 [details] [diff] [review]
patch v1

I definitely agree with putting this logic in storage over Places so that people trying to understand what storage can get up to can see.

I think mConnection probably wants to be ref-counted (like mozStorageTransaction), especially since you're adding async things into the picture.

I think the doc block you is great, but it might be good to add some comment along the lines of "Hey, this is being added for Places which is trying to move away from doing main-thread I/O.  If you're C++ code and you want to do this off the main thread, that's fine, but you probably shouldn't be writing new code for this on the main thread.  Also, seriously consider just writing your code in JS using sqlite.jsm or IndexedDB."
Attachment #8469252 - Flags: feedback?(bugmail) → feedback+
Attached patch patch v2 (obsolete) (deleted) — Splinter Review
Addressed comments.
There's a problem with Places tests though, some are failing. I still have to debug it but my guess is that concurrent connections on WAL don't see the changes done in a transaction/savepoint until it's committed. Some of our testcases are likely expecting those changes to be instant.

What I should do is:
1. figure if my theory is what really causes the failure
2. add further comment on the helper about this
3. figure if this is an issue only for tests or might become a problem in real-life cases (async PlacesTransactions, tagging and other stuff using Sqlite transactions in Places)
4. if it's just a tests issue, fix the tests
5. otherwise... no idea.
Attachment #8469252 - Attachment is obsolete: true
The failures
TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\xpcshell\tests\toolkit\components\places\tests\bookmarks\test_1017502-bookmarks_foreign_count.js | test failed (with xpcshell return code: 0), see following log:
TEST-UNEXPECTED-FAIL | C:/slave/test/build/tests/xpcshell/tests/toolkit/components/places/tests/bookmarks/test_1017502-bookmarks_foreign_count.js | 0 == 1 - See following stack:
TEST-UNEXPECTED-FAIL | C:\\slave\\test\\build\\tests\\xpcshell\\head.js | Unexpected exception 2147500036
TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\xpcshell\tests\toolkit\components\places\tests\unit\test_async_transactions.js | test failed (with xpcshell return code: 0), see following log:
TEST-UNEXPECTED-FAIL | C:/slave/test/build/tests/xpcshell/tests/toolkit/components/places/tests/unit/test_async_transactions.js | 1 == 2 - See following stack:
TEST-UNEXPECTED-FAIL | C:\\slave\\test\\build\\tests\\xpcshell\\head.js | Unexpected exception 2147500036
TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\xpcshell\tests\toolkit\components\places\tests\unit\test_PlacesUtils_asyncGetBookmarkIds.js | test failed (with xpcshell return code: 0), see following log:
TEST-UNEXPECTED-FAIL | C:/slave/test/build/tests/xpcshell/tests/toolkit/components/places/tests/unit/test_PlacesUtils_asyncGetBookmarkIds.js | 1 == 2 - See following stack:
TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\xpcshell\tests\toolkit\components\places\tests\unit\test_promiseBookmarksTree.js | Test timed out
TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\xpcshell\tests\toolkit\components\places\tests\unit\test_promiseBookmarksTree.js | test failed (with xpcshell return code: 15), see following log:
Ok, so for example see what we do here:
toolkit/components/places/tests/unit/test_PlacesUtils_asyncGetBookmarkIds.js

53     itemIds.push(PlacesUtils.bookmarks.insertBookmark(
54       PlacesUtils.unfiledBookmarksFolderId, uri, "test",
55       PlacesUtils.bookmarks.DEFAULT_INDEX
56     ));

a savepoint is created, not yet committed

57     itemIds.push(PlacesUtils.bookmarks.insertBookmark(
58       PlacesUtils.unfiledBookmarksFolderId, uri, "test",
59       PlacesUtils.bookmarks.DEFAULT_INDEX
60     ));

another sevepoint is created not yet committed

61     PlacesUtils.asyncGetBookmarkIds(uri, (aItemIds, aURI) => {
62       do_check_eq(aItemIds.length, 2);

here we don't see both bookmarks cause the commit is asynchronous

Basically this might be an issue for any consumer of PlacesUtils.promiseDBConnection, since this connection is concurrent it might read at any time.

These consumers are:

* test_1017502-bookmarks_foreign_count.js - not a big deal, just make it wait.

* nsLivemarkService.js - it's using it to build the initial cache, not a big deal since it's unlikely to cause issues in real-life (one should add a livemark just before the cache is built)

* PlacesUtils.promiseItemGUID and PlacesUtils.promiseItemId - these might be a problem since they are used by async transactions. Though the GUIDHelper listens to notifications, and those still come at the right time. It might not yet be listening at the first request, but again, must be done just after an addition.

* asyncGetBookmarkIds - it's used by the star, I don't think it's a big deal since it also listens to bookmark notifications. Once bookmarks will be async this method will regardless go out of sync for the same reason (concurrent connection), so we'd just be anticipating an issue we will have regardless.

* promiseBookmarksTree - so far only used by backups, it's unlikely a problem, the worst thing that might happen is we'd lose the last second of changes. It's like the snapshot tree is anticipated by some ms.

The alternative is to not use async commit on bookmarks, use it only for batches or history changes. It would still be a win, even if a smaller one.

What about annotations? Well, it might be a problem only if we set and immediately read an anno on another connection, I think it's quite unlikely.

Mano, what are your thoughts about this?
Flags: needinfo?(mano)
Attached patch Storage part v1 (obsolete) (deleted) — Splinter Review
Since I'm about to leave for a brief vacation, let's try to bring this to a point where it can proceed.

I'm splitting the patch into a Storage part, that is basically ready for review, and a Places part that still requires a little bit brainstorming.

I added a simple test resembling the transactions one, and I'm going to push to Try this part to ensure it's sane.
Attachment #8471582 - Attachment is obsolete: true
Attachment #8472940 - Flags: review?(bugmail)
note for future reference: to reproduce the failures it's possible to delay async queries, for example by adding PR_Sleep(PR_SecondsToInterval(1)) to AsyncExecuteStatements::executeAndProcessStatement.
So, after fighting with tests a little bit, I figured we are not ready for this change in bookmarks and annotations. Especially async transactions need a reliable API that can tell when it's done, with this change it should wait for something we can't track.

I'm reverting the local changes I had for bookmarks and annotations, and apply the async behavior only to batches and history removals. It's still something good and savepoints are nicer than transactions, so it should still be a win, even if much smaller than I was hoping.

Unfortunately there's no way out of async bookmarks to fix those fsyncs...
Summary: Move most of COMMIT TRANSACTION off the main-thread → Move some of COMMIT TRANSACTION off the main-thread
Attached patch Places changes v1 (obsolete) (deleted) — Splinter Review
This is how far I was able to go.

https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=f799201f2045
Attachment #8473019 - Flags: review?(mano)
Comment on attachment 8472940 [details] [diff] [review]
Storage part v1

Nice tests!
Attachment #8472940 - Flags: review?(bugmail) → review+
too many failures on the Places side, this will still require work unfortunately.
Iteration: 34.2 → 34.3
QA Whiteboard: [qa-]
Flags: qe-verify-
Iteration: 34.3 → 35.1
Attachment #8473019 - Flags: review?(mano)
so, I figured why most of the tests were failing.
The problem is the mixup of savepoints and transactions, basically this sequence:

BEGIN TRANSACTION
SAVEPOINT one
COMMIT
RELEASE one

causes RELEASE to error, because COMMIT empties the transactions stack, and as stated on sqlite docs (http://www.sqlite.org/lang_savepoint.html) "If the savepoint-name in a RELEASE command does not match any savepoint currently in the transaction stack, then no savepoints are released, the database is unchanged, and the RELEASE command returns an error"

Now, the patch converts most transactions to savepoints in Places, but Storage has an internal COMMIT in async execution (http://mxr.mozilla.org/mozilla-central/source/storage/src/mozStorageAsyncStatementExecution.cpp#449) that means when a connection uses both synchronous and asynchronous APIs and savepoints, RELEASE can fail and savepoints might finish sooner than expected.

We also use immediate transactions in async history and async favicons code. So it's unfeasible to avoid any COMMIT without rewriting a lot of code. We could try to use sqlite3_get_autocommit, if it would also detect an open SAVEPOINT, as suggested in bug 860357, to avoid opening a transaction when a savepoint is active, but wouldn't fix the above case when a savepoint is opened after a transaction BEGIN (unless we also add an API for savepoints, but that's still lots of changes).
We could use savepoint in async execution, but there we want an immediate transaction to avoid sync/async concurrency issues, while savepoints are equivalent to deferred transactions, so it's not possible.

What I did on Try to verify this, was just to ignore RELEASE errors, and indeed that solves 99% of the failures. But it's not a very nice solution.

My reasoning to use savepoints was mostly:
- avoid nested transactions issues
- automatic merge of savepoints, so that only the most external RELEASE causes a commit (thus an ideal reduction of the number of commits)

Since nested issues happen regardless and the nested cases should disappear by converting more code to async, I might as well just use old transactions helper, by adding to it an option to make commit asynchronous.

Due to the existence of the async COMMIT, mixed APIs consumers are already facing the possibility that a BEGIN fails due to existence of another transaction, and also the transactions helper takes BEGIN errors into account.
The only problem might be related to transactions tracking in connection. It might misreport whether we have a transaction, but on the other side it's already totally broken since doesn't take into account externally created transactions and provided this is just a temporary solution for Places it's unlikely we care...
Flags: needinfo?(mano)
I had always found our auto-commit batching on the async execution thread frustrating in Thunderbird because we always managed our own transactions.  Unfortunately, the difference in semantics between savepoints and commits is serious before we get into the issues you mention, and it's probably not a good idea to arbitrarily change the defaults without an opt-in.

I think it could make sense to be able to set a connection in a 'manual transactions' mode.  I think the original rationale for the auto-batching for async execution was that it would be absurdly foolish in pre-WAL land to issue N commands in rapid succession that would each trigger a full COMMIT.  And given Places holding multiple connections and the locking structure in place, it all logically made sense.  I avoided the heuristic for Thunderbird by never issuing statements in a batch (and possibly us quietly realizing that a failure to start a commit meant things were okay and we just didn't need an automatic transaction.)

Presumably modern Places could benefit from just being in charge of its own transactions.  And really so could any complicated code where you don't want to second-guess automatic batching logic.
(In reply to Andrew Sutherland [:asuth] from comment #15)
> I think the original rationale for the auto-batching
> for async execution was that it would be absurdly foolish in pre-WAL land to
> issue N commands in rapid succession that would each trigger a full COMMIT.

For whatever reason, the last time I tried it did still matter, even in WAL mode. Not having a transaction makes operations much slower, I would have expected a smaller impact by knowing what wal does.
Clearly this requires updated measurements, than just a weak recall. I just know when we tried to limit number of transactions in Places (by removing what we call "batches"), write performance got quite terrible. But we are also fighting a sucky API there.

> Presumably modern Places could benefit from just being in charge of its own
> transactions.  And really so could any complicated code where you don't want
> to second-guess automatic batching logic.

I agree. Though that doesn't solve my current problem :)
I need to land a solution for this in a week, so I guess the easiest path forward, for now, is to make transaction helper support a very special (made scary by huge warnings) async commit mode, similar to what I tried to do with savepoints.

I agree doesn't make much sense to manage transactions in Storage itself, so I'll file a bug to remove the transaction handling from async execution. I agree it's the consumer that should manage them when needed. That would also solve possible future issues with mixing up savepoint/transaction without the consumer knowing.
Andrew, sorry for the double review work here.  Unfortunately the first approach didn't work and I had to go back to the scratchboard.

This does the following:
1. introduce an async commit option on mozStorageTransaction that can only be used on main-thread (this is enforced by ExecuteSimpleSQLAsync).
2. cleaned up the docs a little bit and put a scary documentation around async commit
3. removed HasTransaction and SetDefaultAction that had no uses out of testing 
4. added a test for async transaction commit
5. used sqlite3_getAutocommit in the test to check for transactions, instead of the old HasTransaction
6. made cloning copy PRAGMA busy_timeout to the cloned connections (with test)

There are a couple behavioral changes to be aware of:
1. rollback was throwing in case of !mHasTransaction. Considered we can't tell if the initial BEGIN will work, since there might be another transaction or savepoint ongoing, this check was nonsense. So I made it act as commit, just return NS_OK if there's no tracked transaction.  We could make errors work more properly after fixing bug 860357.
2. I don't bother setting mCompleted to true if !mHasTransaction, since we always check both. This allows to shrink the code a little bit.

This was already pushed to Try along with the Places changes and was quite green
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=c7f81709653e
Attachment #8472940 - Attachment is obsolete: true
Attachment #8472971 - Attachment is obsolete: true
Attachment #8473019 - Attachment is obsolete: true
Attachment #8485795 - Flags: review?(bugmail)
hm, I also remove the template code cause it's currently unused (it was originally introduced to be able to use this API on a raw connection, due to the introduction of mozIStorageAsyncConnection).
This is what we can do in Places so far, it's changing batches and history removals to use async commit.
I left alone annotations and bookmarks cause the risk there is too high, PlacesTransaction needs to work on reliable APIs. For those we will need the async bookmarking API.
Attachment #8485803 - Flags: review?(mano)
Comment on attachment 8485795 [details] [diff] [review]
Part 1 - Allow to commit transaction off the main-thread

Review of attachment 8485795 [details] [diff] [review]:
-----------------------------------------------------------------

Great cleanup!  r=asuth with the likely orange-prone test issue addressed.

::: storage/test/test_transaction_helper.cpp
@@ +131,5 @@
> +    "CREATE TABLE test (id INTEGER PRIMARY KEY)"
> +    ));
> +    (void)transaction.Commit();
> +  }
> +  do_check_true(has_transaction(db));

This is inherently going to race.  You either need to use the ThreadWedger mechanism from test_true_async.cpp (or similar) or be holding the SQLite mutex until after you've done your check.
Attachment #8485795 - Flags: review?(bugmail) → review+
Fixed the test using ThreadWedger. Thanks!
Attachment #8485795 - Attachment is obsolete: true
Attachment #8485803 - Flags: review?(mano) → review+
uh looks like I moved do_check_ok in the wrong place (there is an ifdef linux some rows above and I didn't figure I was into its else)
https://hg.mozilla.org/mozilla-central/rev/2801865552cf
https://hg.mozilla.org/mozilla-central/rev/91bd8c173563
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Blocks: 1065633
Depends on: 1066707
Depends on: 1067054
Depends on: 1113071
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: