Closed
Bug 1148818
Opened 10 years ago
Closed 10 years ago
Intermittent failure in Cache.put on Android because the returned promise never gets resolved or rejected
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
RESOLVED
FIXED
mozilla40
Tracking | Status | |
---|---|---|
firefox40 | --- | fixed |
People
(Reporter: ehsan.akhgari, Assigned: ehsan.akhgari)
References
Details
This is hitting the test for bug 1126820 (dom/workers/test/serviceworkers/test_request_context.html ) that I am landing there. The test fails most of the time on Android 2.3 and that is the only place that I've seen this so far after having run this many times on the try server.
See this try server run <https://treeherder.mozilla.org/#/jobs?repo=try&revision=69918983f4fe> that contains my patches on top of <http://hg.mozilla.org/mozilla-central/rev/44e454b5e93b>, and this try server run <https://treeherder.mozilla.org/#/jobs?repo=try&revision=652b2367943a> that contains a debugging patch that printfs the steps of the Cache API invocation in the service worker: <https://hg.mozilla.org/try/rev/86465a0f923b>.
On the second try run, click on one of the failing tests, open the emulator logcat by looking for "artifact uploaded in the bottom left of the TreeHerder UI, and look for lines containing "XXXehsan". You'll see that the last thing that the test does is calling Cache.put, but the returned promise never gets resolved or rejected.
Ben, can you think of ways that this could be happening?
For now I'm planning to disable that test on Android and land bug 1126820, but I think this failure can happen in real apps, so needs to block the release.
Flags: needinfo?(bkelly)
Assignee | ||
Comment 1•10 years ago
|
||
(Note: the test landed as disabled on Android for now: https://hg.mozilla.org/integration/mozilla-inbound/rev/e7e532d69293#l3.145)
Comment 2•10 years ago
|
||
Here are the most likely causes:
1) A bug in CachePutAllAction causing it not to resolve for some code path. This is the only async action, so it would be a problem unique to putting operations.
2) The worker thread is trying to shutdown, calls Notify() on the WorkerFeature, which then destroys the actors. If something is keeping the worker thread alive, though, the script may still continue to run. In this case it would appear the Promise is just never resolved.
3) CycleCollection can trigger (2) as well, although it seems less likely the script would still be running blocked on the Promise.
Can we reproduce? Instrumentation would be the best way to investigate (1). We could try to inspect the test for something keeping the worker alive for (2). I guess a promise without an error handler registers its own WorkerFeature.
Flags: needinfo?(bkelly) → needinfo?(ehsan)
Assignee | ||
Comment 3•10 years ago
|
||
This is almost reproducible 100% of the time on try if you remove this if block: <https://hg.mozilla.org/integration/mozilla-inbound/rev/e7e532d69293#l3.145>
About #2, what is the thing that _should_ keep the worker alive? The worker in question is a service worker that is responding to a ton of requests coming from the controlled document. I would expect the promises that we return from the API to keep the worker alive until they're resolved/rejected, is that not the case? If it is, it should rule out theory #2.
Also, since we hold a strong reference to the promises that we hand off to script before they're resolved/rejected, I don't think #3 is likely. But of course only if my assumption above about unresolved promises keeping the worker alive is correct.
Flags: needinfo?(ehsan)
Comment 4•10 years ago
|
||
<digression>
ServiceWorker lifetime is poorly defined, unfortunately. So we're in a bit of a grey area as to what "should" happen here.
Normally the FetchEvent keeps the ServiceWorker ref'd until respondWith(). IMO, it should also keep it ref'd until the stream returned to respondWith() is complete, but I don't think it does that.
</digression>
Its possible the WorkerFeature in Cache is cleaning up too early. For example, the PromiseWorkerProxy in dom/promise only cleans up with status >= Canceling. Cache stops its actors for status > Running.
You could try modifying this:
https://dxr.mozilla.org/mozilla-central/source/dom/cache/Feature.cpp#76
To:
if (aStatus < Canceling || mNotified) {
return true;
}
Assignee | ||
Comment 5•10 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #4)
> <digression>
> ServiceWorker lifetime is poorly defined, unfortunately. So we're in a bit
> of a grey area as to what "should" happen here.
>
> Normally the FetchEvent keeps the ServiceWorker ref'd until respondWith().
> IMO, it should also keep it ref'd until the stream returned to respondWith()
> is complete, but I don't think it does that.
> </digression>
Is there a bug on file for that?
> Its possible the WorkerFeature in Cache is cleaning up too early. For
> example, the PromiseWorkerProxy in dom/promise only cleans up with status >=
> Canceling. Cache stops its actors for status > Running.
>
> You could try modifying this:
>
> https://dxr.mozilla.org/mozilla-central/source/dom/cache/Feature.cpp#76
>
> To:
>
> if (aStatus < Canceling || mNotified) {
> return true;
> }
Thanks! https://treeherder.mozilla.org/#/jobs?repo=try&revision=0f8ac9f03c83
Assignee | ||
Comment 6•10 years ago
|
||
Comment 7•10 years ago
|
||
(In reply to :Ehsan Akhgari (not reading bugmail, needinfo? me!) from comment #5)
> Is there a bug on file for that?
I don't think so.
Assignee | ||
Comment 9•10 years ago
|
||
This bug seems to have gone away somehow: <https://treeherder.mozilla.org/#/jobs?repo=try&revision=b6763c2b193d>. I'll enable this part of the test on Android.
Assignee: nobody → ehsan
Comment 10•10 years ago
|
||
Comment 11•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
status-firefox40:
--- → fixed
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Updated•6 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•