Closed Bug 1148818 Opened 10 years ago Closed 10 years ago

Intermittent failure in Cache.put on Android because the returned promise never gets resolved or rejected

Categories

(Core :: DOM: Core & HTML, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox40 --- fixed

People

(Reporter: ehsan.akhgari, Assigned: ehsan.akhgari)

References

Details

This is hitting the test for bug 1126820 (dom/workers/test/serviceworkers/test_request_context.html ) that I am landing there. The test fails most of the time on Android 2.3 and that is the only place that I've seen this so far after having run this many times on the try server. See this try server run <https://treeherder.mozilla.org/#/jobs?repo=try&revision=69918983f4fe> that contains my patches on top of <http://hg.mozilla.org/mozilla-central/rev/44e454b5e93b>, and this try server run <https://treeherder.mozilla.org/#/jobs?repo=try&revision=652b2367943a> that contains a debugging patch that printfs the steps of the Cache API invocation in the service worker: <https://hg.mozilla.org/try/rev/86465a0f923b>. On the second try run, click on one of the failing tests, open the emulator logcat by looking for "artifact uploaded in the bottom left of the TreeHerder UI, and look for lines containing "XXXehsan". You'll see that the last thing that the test does is calling Cache.put, but the returned promise never gets resolved or rejected. Ben, can you think of ways that this could be happening? For now I'm planning to disable that test on Android and land bug 1126820, but I think this failure can happen in real apps, so needs to block the release.
Flags: needinfo?(bkelly)
(Note: the test landed as disabled on Android for now: https://hg.mozilla.org/integration/mozilla-inbound/rev/e7e532d69293#l3.145)
Here are the most likely causes: 1) A bug in CachePutAllAction causing it not to resolve for some code path. This is the only async action, so it would be a problem unique to putting operations. 2) The worker thread is trying to shutdown, calls Notify() on the WorkerFeature, which then destroys the actors. If something is keeping the worker thread alive, though, the script may still continue to run. In this case it would appear the Promise is just never resolved. 3) CycleCollection can trigger (2) as well, although it seems less likely the script would still be running blocked on the Promise. Can we reproduce? Instrumentation would be the best way to investigate (1). We could try to inspect the test for something keeping the worker alive for (2). I guess a promise without an error handler registers its own WorkerFeature.
Flags: needinfo?(bkelly) → needinfo?(ehsan)
This is almost reproducible 100% of the time on try if you remove this if block: <https://hg.mozilla.org/integration/mozilla-inbound/rev/e7e532d69293#l3.145> About #2, what is the thing that _should_ keep the worker alive? The worker in question is a service worker that is responding to a ton of requests coming from the controlled document. I would expect the promises that we return from the API to keep the worker alive until they're resolved/rejected, is that not the case? If it is, it should rule out theory #2. Also, since we hold a strong reference to the promises that we hand off to script before they're resolved/rejected, I don't think #3 is likely. But of course only if my assumption above about unresolved promises keeping the worker alive is correct.
Flags: needinfo?(ehsan)
<digression> ServiceWorker lifetime is poorly defined, unfortunately. So we're in a bit of a grey area as to what "should" happen here. Normally the FetchEvent keeps the ServiceWorker ref'd until respondWith(). IMO, it should also keep it ref'd until the stream returned to respondWith() is complete, but I don't think it does that. </digression> Its possible the WorkerFeature in Cache is cleaning up too early. For example, the PromiseWorkerProxy in dom/promise only cleans up with status >= Canceling. Cache stops its actors for status > Running. You could try modifying this: https://dxr.mozilla.org/mozilla-central/source/dom/cache/Feature.cpp#76 To: if (aStatus < Canceling || mNotified) { return true; }
(In reply to Ben Kelly [:bkelly] from comment #4) > <digression> > ServiceWorker lifetime is poorly defined, unfortunately. So we're in a bit > of a grey area as to what "should" happen here. > > Normally the FetchEvent keeps the ServiceWorker ref'd until respondWith(). > IMO, it should also keep it ref'd until the stream returned to respondWith() > is complete, but I don't think it does that. > </digression> Is there a bug on file for that? > Its possible the WorkerFeature in Cache is cleaning up too early. For > example, the PromiseWorkerProxy in dom/promise only cleans up with status >= > Canceling. Cache stops its actors for status > Running. > > You could try modifying this: > > https://dxr.mozilla.org/mozilla-central/source/dom/cache/Feature.cpp#76 > > To: > > if (aStatus < Canceling || mNotified) { > return true; > } Thanks! https://treeherder.mozilla.org/#/jobs?repo=try&revision=0f8ac9f03c83
(In reply to :Ehsan Akhgari (not reading bugmail, needinfo? me!) from comment #5) > Is there a bug on file for that? I don't think so.
The change in comment 4 doesn't seem to have helped.
This bug seems to have gone away somehow: <https://treeherder.mozilla.org/#/jobs?repo=try&revision=b6763c2b193d>. I'll enable this part of the test on Android.
Assignee: nobody → ehsan
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.