Closed
Bug 1152026
Opened 10 years ago
Closed 9 years ago
IndexedDB cycle-collection crash, probably
Categories
(Core :: Storage: IndexedDB, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox40 | --- | wontfix |
firefox41 | + | fixed |
firefox42 | + | fixed |
firefox43 | --- | fixed |
firefox-esr31 | --- | unaffected |
firefox-esr38 | 41+ | fixed |
b2g-v2.0 | --- | unaffected |
b2g-v2.0M | --- | unaffected |
b2g-v2.1 | --- | unaffected |
b2g-v2.1S | --- | unaffected |
b2g-v2.2 | --- | fixed |
b2g-v2.2r | --- | fixed |
b2g-master | --- | fixed |
People
(Reporter: asuth, Assigned: khuey)
References
Details
(Keywords: crash, sec-moderate, Whiteboard: [fixed in 43 by bug 1179909][post-critsmash-triage][adv-main41+][adv-esr38.3+])
Crash Data
Attachments
(5 files)
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
application/x-shellscript
|
Details | |
(deleted),
patch
|
baku
:
review+
ritu
:
approval-mozilla-aurora+
ritu
:
approval-mozilla-beta+
ritu
:
approval-mozilla-esr38+
ritu
:
approval-mozilla-b2g37+
|
Details | Diff | Splinter Review |
I'm using IndexedDB on workers, which rocks. But my (stock, mozilla.org-built linux x86_64) nightly builds are crashing a lot in the cycle collector it seems.
Here's one: bp-d90ca7bb-15bc-4ab7-8ef3-a26d12150407
And all of these but the windows one seem to be me: https://crash-stats.mozilla.com/report/list?product=Firefox&signature=ChildFinder%3A%3ANoteJSObject%28JSObject*%29#tab-reports
Here's the summary of sorts from the stack provided by the crash reporter infrastructure:
ChildFinder::NoteJSObject(JSObject*)
TraceCallbackFunc::Trace(JS::Heap<JSObject*>*, char const*, void*) const
mozilla::DOMEventTargetHelper::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&)
mozilla::dom::indexedDB::IDBWrapperCache::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&)
mozilla::dom::indexedDB::IDBTransaction::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&)
void nsPurpleBuffer::Block::VisitEntries<RemoveSkippableVisitor>(nsPurpleBuffer&, RemoveSkippableVisitor&)
nsPurpleBuffer::RemoveSkippable(nsCycleCollector*, bool, bool, void (*)())
nsCycleCollector_forgetSkippable(bool, bool)
FireForgetSkippable
CCTimerFired
Note that I did initially suspect the devtools indexeddb inspector was at play, and commented to that effect, but I've stopped using that.
I'm going to spin a local debug build and switch to that and see if there are any helpful assertion explosions that show up there.
Hm, that crash is on the main thread...
Reporter | ||
Comment 2•10 years ago
|
||
So, things seem to be a bit more stable now for me (still using the same nightly production build that likes to crash) now that my code isn't throwing as many random errors.
=== Mental state dump for my benefit, probably don't read this:
I believe that while I was seeing these crashes, I was triggering an exception in a request "onsuccess" handler of an on-worker function that would be caught and do a console.error('blah', ex, ex.stack), possibly following an app schema upgrade. It was also possible for code to throw an exception and not be caught. The arrow-function closure where these calls happen definitely closed over the IDBRequest whose onsuccess they were handling and may have incidentally closed-over the associated IDBTransaction by virtue of being declared in the same scope.
The transaction seems like something scary could be happening with it because even now I'm seeing:
"JavaScript error: http://glodastrophe/deps/gelam/js/maildb.js, line 336: InvalidStateError: An attempt was made to use an object that is not, or is no longer, usable"
The code in question is:
this._db.transaction([TBL_CONFIG, TBL_FOLDER_INFO], 'readonly')
And this._db is only populated in the "onsuccess" handler of an IndexedDB.open() call which also has an appropriate "onupgradeneeded" handler. Which seems surprising, but it wouldn't be surprising if I'm being insufficiently diligent about IndexedDB database lifecycle stuff.
Reporter | ||
Comment 3•10 years ago
|
||
Not sure how related this is to my crashes (I'll try and upgrade variant too), but this appears to be a case of my code creating a new transaction inside an onsuccess event and this causing RunBeforeNextEvent to assert. My investigation got cut short because when I did "call DumpJSStack()" my X session locked up until I killed gdb. (I was expecting something bad to happen, just not that.)
Context: this is from "./mach run --debug -P aprofile", which I think means that gdb should generally be happy, configuration wise. And the path is in my auto-load-safe-path.
Gonna try again now forcing an upgrade and maybe try and use devtools if I manage to avoid the asserts.
Reporter | ||
Updated•10 years ago
|
Summary: IndexedDB-on-workers cycle-collection crash, probably → IndexedDB cycle-collection crash, probably
Reporter | ||
Comment 4•10 years ago
|
||
I deferred the transaction's creation to a future turn of the event loop/micro-task queue with Promise.resolve().then(...) but the assertion still happens. Just with a shorter stack. And call DumpJSStack() was fine this time.
Reporter | ||
Comment 5•10 years ago
|
||
My optimization was turned up too high so ptarray didn't work and I did it manually. The pre-empted runnable is an outstanding, non-overlapping read-only transaction. (Pre-empted one is readonly over ['config', folderInfo], exploding one is readonly over ['tasks'].)
Reporter | ||
Comment 6•10 years ago
|
||
I think that pre-empted transaction may be doing something weird and suspicious, because if I defer the creation of my new transaction with with setTimeout(blah, 0) or event 5000ms, things still explode. I'm going to stop poking at things now, but let me know if there's something useful I can do if this isn't a known problem.
Ugh, promises are broken. Bug 1132436.
Updated•9 years ago
|
Severity: normal → critical
Crash Signature: ChildFinder::NoteJSObject(JSObject*) → [@ ChildFinder::NoteJSObject(JSObject*)]
Keywords: crash
Comment 8•9 years ago
|
||
[Blocking Requested - why for this release]:
We can't move calendar's database interactions into the worker if this doesn't get fixed in v3.
blocking-b2g: --- → 3.0?
Comment 9•9 years ago
|
||
I wrote a little script to run a test-agent (gaia unit) test that reproduces the issue here. You do the following after downloading the script to $GAIA/reproduce_bug_1152026.sh:
```bash
cd $GAIA
git remote add gaye https://github.com/gaye/bug-1169790
git fetch gaye
git checkout gaye/bug-1169790
FIREFOX=/path/to/nightly/firefox ./reproduce_bug_1152026.sh
```
The script will
(1) start the server that receives test results from the browser
(2) build gaia
(3) start firefox with the gaia profile and navigate to the test app
(4) run the test that reproduces the crash
Let me know if this helps! It'd be a bit of a chore for me to produce a standalone webpage that reproduces, but I could do that if you all needed.
Attachment #8622776 -
Flags: feedback?(bent.mozilla)
Comment 10•9 years ago
|
||
Just FYI you need to wait for a tiny bit (~10-20s on my machine) for gecko to crash after the test runs.
Updated•9 years ago
|
Blocks: ServiceWorkers-B2G
Assignee | ||
Comment 12•9 years ago
|
||
(In reply to Gareth Aye [:gaye] from comment #9)
> Created attachment 8622776 [details]
> Repro script
>
> I wrote a little script to run a test-agent (gaia unit) test that reproduces
> the issue here. You do the following after downloading the script to
> $GAIA/reproduce_bug_1152026.sh:
>
> ```bash
> cd $GAIA
> git remote add gaye https://github.com/gaye/bug-1169790
> git fetch gaye
> git checkout gaye/bug-1169790
> FIREFOX=/path/to/nightly/firefox ./reproduce_bug_1152026.sh
> ```
>
> The script will
>
> (1) start the server that receives test results from the browser
> (2) build gaia
> (3) start firefox with the gaia profile and navigate to the test app
> (4) run the test that reproduces the crash
>
> Let me know if this helps! It'd be a bit of a chore for me to produce a
> standalone webpage that reproduces, but I could do that if you all needed.
That repo doesn't exist :/
Assignee | ||
Comment 13•9 years ago
|
||
The site in bug 1176529 reproduces though, so I can debug that.
Comment 14•9 years ago
|
||
Ah crap. I was trying to point you to a branch of my fork. You should really `git remote add gaye https://github.com/gaye/gaia` and then the rest should work out... see https://github.com/gaye/gaia/tree/bug-1169790 ?
Updated•9 years ago
|
Group: core-security
Comment 16•9 years ago
|
||
Hiding because one of the dupes indicates a thread safety issue. Is this a regression from bug 1137245 as indicated in bug 1176529? This bug was filed two months before that.
Updated•9 years ago
|
Flags: needinfo?(bent.mozilla)
No, I don't think bug 1137245 has anything to do with this.
Flags: needinfo?(bent.mozilla)
Assignee | ||
Comment 18•9 years ago
|
||
The patch in bug 1179909 fixes the site in bug 1176529. I suspect it will fix the Gaia problems too.
Updated•9 years ago
|
Keywords: sec-high → sec-moderate
Comment on attachment 8622776 [details]
Repro script
khuey is on this (thanks!)
Attachment #8622776 -
Flags: feedback?(bent.mozilla)
Reporter | ||
Comment 20•9 years ago
|
||
I've been running with a local build using the fix on bug 1179909 (try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=6f94afe72869 for builds too), and it indeed seems to have fixed the problem for me. Woo!
Comment 22•9 years ago
|
||
This crash (or one very similar to it) shot up dramatically in the 20150728030209 build, possibly due to 1184574. What's the status of the investigation here?
Flags: needinfo?(nsm.nikhil)
Flags: needinfo?(khuey)
Assignee | ||
Comment 23•9 years ago
|
||
The status is that we need to finish the patch from bug 1179909.
Flags: needinfo?(nsm.nikhil)
Flags: needinfo?(khuey)
It is quite possible that Nightly users are trying out push notifications since Bug 1184574 and operating on IDB leading to crashes.
Assignee | ||
Comment 25•9 years ago
|
||
It shouldn't lead to crashes unless there are promises involved.
Comment 26•9 years ago
|
||
Apparently nightly and aurora both spiked on 0728. I guess bug 1184574 is off the hook, but this is still worrying.
[Tracking Requested - why for this release]: Existing low-volume crash spiking in 41 and 42.
status-firefox41:
--- → affected
status-firefox42:
--- → affected
tracking-firefox41:
--- → ?
tracking-firefox42:
--- → ?
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #25)
> It shouldn't lead to crashes unless there are promises involved.
There are Promises involved in all the APIs that ServiceWorkers expose. In addition, if users are using some library like localforage, that would involve Promises too.
Tracked.
Reporter | ||
Comment 29•9 years ago
|
||
With the landing of bug 1179909 which fixes this, I think we can call this fixed. Or we can call it something else (worksforme?, dupe to bug 1179909?). Many thanks to :khuey and :bzbarsky and :smaug and all involved.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
(In reply to Andrew Sutherland [:asuth] from comment #29)
> With the landing of bug 1179909 which fixes this, I think we can call this
> fixed. Or we can call it something else (worksforme?, dupe to bug
> 1179909?). Many thanks to :khuey and :bzbarsky and :smaug and all involved.
Given that the fix for bug 1179909 works, can we request an uplift to Aurora as it affects 42? The patch associated with bug 1179909 is pretty huge, can somebody comment on whether it is safe to uplift to Beta?
Flags: needinfo?(bugmail)
Reporter | ||
Comment 32•9 years ago
|
||
(In reply to Ritu Kothari (:ritu) from comment #31)
> Given that the fix for bug 1179909 works, can we request an uplift to Aurora
> as it affects 42? The patch associated with bug 1179909 is pretty huge, can
> somebody comment on whether it is safe to uplift to Beta?
The authoritative answer would seem to be "no", from bug 1179909:
(Quoting Nikhil Marathe [:nsm] (please needinfo?) from bug 1179909 comment 18)
> Should this be uplifted at least to aurora? There are some push+IDB+Promises
> combinations that crash currently.
(Quoting Benjamin Smedberg [:bsmedberg] from bug 1179909 comment 20)
> I oppose trying to uplift anything non-critical related to event loops.
> Event loops are notoriously regression-prone and we need the full release
> cycle to find bugs.
I personally do think we should disable IndexedDB-on-workers on all branches where bug 1179909 did not land because the crashing is not fun. (And anyone cool enough to use IndexedDB on workers is probably also cool enough to use Promises on workers.)
Flags: needinfo?(bugmail)
Assignee | ||
Comment 33•9 years ago
|
||
IndexedDB on workers has been shipping since 37, so disabling it is not an option IMO.
Assignee | ||
Comment 35•9 years ago
|
||
Given that people are discovering it, we should probably do something though.
Updated•9 years ago
|
status-firefox43:
--- → fixed
status-firefox-esr31:
--- → unaffected
status-firefox-esr38:
--- → affected
Whiteboard: [fixed in 43 by bug 1179909]
Reporter | ||
Comment 36•9 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #35)
> Given that people are discovering it, we should probably do something though.
I don't suppose there's any chance the pre-empted runnable logic was the entire cause of the cycle collector problem and can be harmlessly disabled?
Assignee | ||
Comment 37•9 years ago
|
||
For branches, we can at least explicitly clear the array on the right thread. This should prevent any security issues, although I don't think IndexedDB will actually work with Promises with this patch.
Attachment #8651883 -
Flags: review?(amarchesini)
Reporter | ||
Comment 38•9 years ago
|
||
Unless there's some side-effect to when the mPreemptingRunnableInfos gets cleared out, the only correctness problem I saw with IndexedDB-with-promises prior to the bug 1179909 fix was bug 1161690 stalling no-op transactions forever.
Updated•9 years ago
|
Attachment #8651883 -
Flags: review?(amarchesini) → review+
Assignee | ||
Comment 39•9 years ago
|
||
(In reply to Andrew Sutherland [:asuth] from comment #38)
> Unless there's some side-effect to when the mPreemptingRunnableInfos gets
> cleared out, the only correctness problem I saw with IndexedDB-with-promises
> prior to the bug 1179909 fix was bug 1161690 stalling no-op transactions
> forever.
Right. I guess having requests running "primes the pump" so to speak. I didn't account for that behavior, so correctness fallout should be minimal.
Assignee | ||
Updated•9 years ago
|
Assignee | ||
Comment 40•9 years ago
|
||
Comment on attachment 8651883 [details] [diff] [review]
Branch patch
[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: sec-moderate that affects a new feature introduced in ESR 38
User impact if declined: Possible security issues, crashes when IndexedDB is used in certain ways.
Fix Landed on Version: 43
Risk to taking this patch (and alternatives if risky): Low risk, alternative is disabling a feature which has been in release for at least 3 cycles.
String or UUID changes made by this patch: N/A
Attachment #8651883 -
Flags: approval-mozilla-esr38?
Attachment #8651883 -
Flags: approval-mozilla-beta?
Attachment #8651883 -
Flags: approval-mozilla-aurora?
Updated•9 years ago
|
status-b2g-v2.0:
--- → unaffected
status-b2g-v2.0M:
--- → unaffected
status-b2g-v2.1:
--- → unaffected
status-b2g-v2.1S:
--- → unaffected
status-b2g-v2.2:
--- → affected
status-b2g-v2.2r:
--- → affected
status-b2g-master:
--- → fixed
Target Milestone: --- → mozilla43
Updated•9 years ago
|
Attachment #8651883 -
Flags: approval-mozilla-b2g37?
Comment on attachment 8651883 [details] [diff] [review]
Branch patch
Seems safe to uplift to all nom'd branches.
Attachment #8651883 -
Flags: approval-mozilla-esr38?
Attachment #8651883 -
Flags: approval-mozilla-esr38+
Attachment #8651883 -
Flags: approval-mozilla-beta?
Attachment #8651883 -
Flags: approval-mozilla-beta+
Attachment #8651883 -
Flags: approval-mozilla-b2g37?
Attachment #8651883 -
Flags: approval-mozilla-b2g37+
Attachment #8651883 -
Flags: approval-mozilla-aurora?
Attachment #8651883 -
Flags: approval-mozilla-aurora+
Comment 42•9 years ago
|
||
Flags: in-testsuite?
Comment 43•9 years ago
|
||
Comment 44•9 years ago
|
||
Comment 45•9 years ago
|
||
Updated•9 years ago
|
Whiteboard: [fixed in 43 by bug 1179909] → [fixed in 43 by bug 1179909][post-critsmash-triage]
Updated•9 years ago
|
Group: core-security → core-security-release
Updated•9 years ago
|
tracking-firefox-esr38:
--- → 41+
Whiteboard: [fixed in 43 by bug 1179909][post-critsmash-triage] → [fixed in 43 by bug 1179909][post-critsmash-triage][adv-main41+][adv-esr38.3+]
Updated•8 years ago
|
Group: core-security-release
You need to log in
before you can comment on or make changes to this bug.
Description
•