Closed Bug 1091446 Opened 10 years ago Closed 7 years ago

If an async generator function passed to PlacesTransaction.transact yields a pending promise returned by another call to PlacesTransaction.transact, the queue is stuck until shutdown

Categories

(Toolkit :: Places, defect, P3)

defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: asaf, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 1 obsolete file)

Read and test in your browser-environment scratchpad: https://pastebin.mozilla.org/6999856 Do read that first. It's the kind of bug that JS explains better than English. ******* While the summary refers to |PlacesTransactions.transact|, I suggest focusing on this simplified testcase linked above, because the implementation and semantics of |PlacesTransactions.transact| (which is to become |batch|) are going to change much soon (but this fundamental issue remains, of course). English description follows nonetheless. ******* |PlacesTransactions.transact|, when called with a generator function must guarantee the following: 1. It shouldn't be blocked if another transaction is already in progress, but it must be serialized. That is, the asynchronous function should run only once all previous calls to |transact| are done. Since callers of |transact| are independent of each other, we cannot rely on them to do the serialization on their side (i.e. |transact(...).then(...transact)|). 2. Once the asynchronous functions is called, no other call to |transact| can get in the middle. 3. It should return a promise that is to be resolved when it's done, or rejected if the asynchronous function threw. This is *not* necessary for serializing calls to |transact|, but it helps callers which want to do something unrelated to transactions once transact is done. The problem is that without having some control over what happens *during each call to generator.next from |Task.spawn|, but not in between*, enqueuing calls to |transact| also means that a function passed to |transact| may call |transact| on its own. That's not something one should do (and I don't mind disallowing that if need be), but it becomes a true problem if the promised returned by the "inner" call to transact is yielded (see 3.1 vs 4.1 in the testcase). i.e. PlacesTransaction.transact(function* () { yield PlacesTransactions.transact(...); // We will never get here, or any other transaction until the next shutdown }); If we could somehow set a flag whenever |Task.spawn| enters the generator function (i.e. before it's called and before each call to next()) and unset it when the asynchronous function is "on hold" (i.e. when waiting either for a promise to resolve or just for the next tick), then we would simply make transact throw in this case. Without having such control, we can only make transact throw as long as the call to |Task.spawn| is in process. This means we loose (1) above. There are only three other options I see: (a) Giving up on (3) - i.e. |transact| won't return a promise. Callers which need such a promise will have to write their own and wrap the call to |transact|. (b) Re-implementing |Task.spawn|[1]* in |PlacesTransactions|, so we have full control over the Task processing. (c) Somehow allow |Task.spawn| callers to have control over what happens, e.g. by passing in an optional listener object (with methods like |entered|, |pending|, |done|). Alternatively, we can try to live with this issue and only get back to it if it proves to cause bugs. One problem is that if we do so, (a) isn't likely to be an option. __________ [1] Crazy as it sounds, the _current_ implementation of PlacesTransaction.transact does that (for other reasons), but my plan was to be civilized and just use Task.spawn in |PlacesTransactions.batch|
It's worth mentioning that the inner call doesn't have to be a |batch| call. It could even be a call to |undo| (which is also serialized with batch, of course). And here it's even more clear that while an "outer" call to |undo| is always legit and should be enqueued (it's valid for the user to Undo while a transaction is still being processed), an inner call to |undo| should throw. We must find a way to distinguish these callers.
The current idea is to make the promise returned by Task a little bit "special", in the sense adding to it an inProgress property that is true when the generator is being yielded. Then in our queue we can check if the last item in queue has it set and bail out in case.
Thanks for the really clear explanation of the issue! Hope my answer will be understandable as well :-) (In reply to Mano (::mano, needinfo? for any questions; not reading general bugmail) from comment #0) > (b) Re-implementing |Task.spawn|[1]* in |PlacesTransactions|, so we have > full control over the Task processing. > (c) Somehow allow |Task.spawn| callers to have control over what happens, > e.g. by passing in an optional listener object (with methods like |entered|, > |pending|, |done|). The core issue is that, regardless of whether PlacesTransaction.transact is executed within the context of another task started by a previous call to PlacesTransaction.transact or not, you can't know whether that call is to be interpreted as a sub-transaction that must be merged with the currently executing one or a new one that must be serialized. There is no general solution to this problem. Here is a simple academic example that would make any possible Task instrumentation useless: PlacesTransaction.transact(function* () { yield new Promise(resolve => setTimeout(() => PlacesTransaction.transact(function* () { // Is this a sub-transaction or a separate one? }).then(resolve), 100)); }); Real-life scenarios may hide this subtly in sub-calls, and be less obvious. > (a) Giving up on (3) - i.e. |transact| won't return a promise. Callers which > need such a promise will have to write their own and wrap the call to > |transact|. This looks sensible as a first step. > Alternatively, we can try to live with this issue This too is a valid approach. For the record, you can also consider having callers specify a transaction merge context object: PlacesTransaction.transact(cx, function* () { ... }); At this stage and without knowing the code, this intuitively looks to me like over-architecting, but you can evaluate that better than me.
Paolo, please see bug 1102186 for a possible simple solution that could satisfy our case (we just need to disallow a dangerous possible coding mistake in this case). Btw, our problem is not to decide whether to serialize or enqueue the sub call, out issue is that the existence of a queue itself makes so the internal transaction would yield on the latest queue entry (that is the external transaction), but since the external transaction is already yielding on the internal transaction, we end up in a deadlock. We need to ensure consumers of the API won't deadlock this way by nesting.
(In reply to Marco Bonardo [::mak] (needinfo? me) from comment #4) > Paolo, please see bug 1102186 for a possible simple solution that could > satisfy our case (we just need to disallow a dangerous possible coding > mistake in this case). Maybe I'm misunderstanding the scenario you described, but my point in comment 3 is that no Task instrumentation can prevent this coding mistake to happen. Mano's solution (a) would be much more effective. > Btw, our problem is not to decide whether to serialize or enqueue the sub > call, out issue is that the existence of a queue itself makes so the > internal transaction would yield on the latest queue entry (that is the > external transaction), but since the external transaction is already > yielding on the internal transaction, we end up in a deadlock. > We need to ensure consumers of the API won't deadlock this way by nesting. You can't detect nesting.
(In reply to :Paolo Amadini from comment #5) > (In reply to Marco Bonardo [::mak] (needinfo? me) from comment #4) > > Paolo, please see bug 1102186 for a possible simple solution that could > > satisfy our case (we just need to disallow a dangerous possible coding > > mistake in this case). > > Maybe I'm misunderstanding the scenario you described, but my point in > comment 3 is that no Task instrumentation can prevent this coding mistake to > happen. > > Mano's solution (a) would be much more effective. We can't just do a), cause Sqlite.jsm has a longstanding API that has lots of consumers, and we can't just change it to not return a promise anymore. We will shortly have 3 or 4 components with this bug. > You can't detect nesting. bug 1102186 would have allowed that, as well solution c) (listeners). We clearly need a solution cause otherwise any consumer can break the whole component with just a wrong API call. Can we implement a QueuedTasks module with a different Task implementation that disallow nesting?
Flags: needinfo?(paolo.mozmail)
I had a brief meeting with Paolo, and whatever solution we take won't be able to catch all possible deadlock cases, plus we could end up introducing unclear APIs on top of Task or Promise that could be misused. This, we will first try a timeout approach, where we give a promise some time to complete, and then we reject it, this should allow to unblock the deadlock, even if it won't be an immediate solution.
Flags: needinfo?(paolo.mozmail)
Attached file possible async transactions solution (obsolete) (deleted) —
This is a possible solution for async transactions, even if I put a timeout here, it's never hit afaict. The queue avoids a deadlock by making only the first added consumer deplete it. from brief testing it works fine. In case of nesting instead of deadlocking it reorder things a little bit, that is surely better than killing the component. Unfortunately I cannot do the same in Sqlite.jsm cause BEGIN TRANSACTION cannot be nested, so the reordering there would not do. This needs refinements too, the timeout part could not be working as expected yet. But it's still an improvement over what we have now.
Attached file possible async transactions solution (deleted) —
this seems to enqueue better (the only unordered call is the nested one) and the timeout works
Attachment #8526675 - Attachment is obsolete: true
Attached file enqueuer with timeout (deleted) —
This is a possible solution for Sqlite.jsm case that Paolo helped me building, involving a timeout.
Priority: -- → P3
Like we did with Sqlite.jsm, for now we solved this through a timeout rejection
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: