Closed Bug 798366 Opened 12 years ago Closed 11 years ago

Intermittent crash in test_obj_group.xul, test_pluginstream_newstream.html, test_html5_tree_construction_part2.html, browser_aboutHome.js [@ libpthread-2.11.so@0xe6ad | unixSync | xSync]

Categories

(Toolkit :: Telemetry, defect)

defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: emorley, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash, intermittent-failure)

Crash Data

Attachments

(2 files, 1 obsolete file)

Rev3 Fedora 12x64 mozilla-inbound debug test mochitest-other on 2012-10-04 02:10:07 PDT for push 739f8aac68a6 slave: talos-r3-fed64-044 https://tbpl.mozilla.org/php/getParsedLog.php?id=15807160&tree=Mozilla-Inbound { 1312 INFO TEST-START | chrome://mochitests/content/a11y/accessible/attributes/test_obj_group.xul ++DOMWINDOW == 28 (0x18c81d0) [serial = 41] [outer = 0x3832210] 1313 INFO TEST-INFO | chrome://mochitests/content/a11y/accessible/attributes/test_obj_group.xul | must wait for load TEST-UNEXPECTED-FAIL | chrome://mochitests/content/a11y/accessible/attributes/test_obj_group.xul | application timed out after 330 seconds with no output args: ['/home/cltbld/talos-slave/test/build/bin/screentopng'] INFO | automation.py | Application ran for: 0:13:12.861257 INFO | automation.py | Reading PID log: /tmp/tmpUDj2Nspidlog Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64-debug/1349339357/firefox-18.0a1.en-US.linux-x86_64.crashreporter-symbols.zip PROCESS-CRASH | chrome://mochitests/content/a11y/accessible/attributes/test_obj_group.xul | application crashed (minidump found) Crash dump filename: /tmp/tmp0QaMNZ/minidumps/501cd257-f3c8-a21a-1f6214b2-6b38d7f7.dmp Operating system: Linux 0.0.0 Linux 2.6.31.5-127.fc12.x86_64 #1 SMP Sat Nov 7 21:11:14 EST 2009 x86_64 CPU: amd64 family 0 model 0 stepping 0 1 CPU Crash reason: SIGABRT Crash address: 0x1f400000cbb Thread 0 (crashed) 0 libpthread-2.11.so + 0xe6ad rbx = 0x0000000002b589b0 r12 = 0x00000000000000e8 r13 = 0x0000000000000000 r14 = 0x0000000000000002 r15 = 0x0000000000000000 rip = 0x00000034d360e6ad rsp = 0x00007fffdb279570 rbp = 0x00007fffdb2795b0 Found by: given as instruction pointer in context 1 libmozsqlite3.so!unixSync [sqlite3.c : 26073 + 0x7] rip = 0x00007f328a91459a rsp = 0x00007fffdb279580 rbp = 0x00007fffdb2795b0 Found by: stack scanning 2 libxul.so!xSync [TelemetryVFS.cpp : 158 + 0x5] rbx = 0x0000000002b589a0 r12 = 0x00000000000000e8 r13 = 0x0000000000000000 rip = 0x00007f328813ef37 rsp = 0x00007fffdb2795c0 rbp = 0x00007fffdb2795f0 Found by: call frame info 3 libmozsqlite3.so!sqlite3PagerCommitPhaseOne [sqlite3.c : 42892 + 0x7] rbx = 0x0000000002b58848 r12 = 0x0000000003c4ff18 r13 = 0x0000000000000000 rip = 0x00007f328a93e77b rsp = 0x00007fffdb279600 rbp = 0x00007fffdb279660 Found by: call frame info 4 libmozsqlite3.so!sqlite3BtreeCommitPhaseOne [sqlite3.c : 51065 + 0xd] rbx = 0x00000000021f6908 r12 = 0x0000000003c4ff18 r13 = 0x0000000000000000 r14 = 0x0000000004816f48 r15 = 0x0000000000000000 rip = 0x00007f328a93ea02 rsp = 0x00007fffdb279670 rbp = 0x00007fffdb2796c0 Found by: call frame info 5 libmozsqlite3.so!sqlite3VdbeHalt [sqlite3.c : 59920 + 0xd] rbx = 0x00000000019ecf58 r12 = 0x0000000003c4ff18 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007f328a93f618 rsp = 0x00007fffdb2796d0 rbp = 0x00007fffdb279780 Found by: call frame info 6 libmozsqlite3.so!sqlite3VdbeExec [sqlite3.c : 64314 + 0x7] rbx = 0x0000000003d9c188 r12 = 0x0000000000000000 r13 = 0x0000000003c4ff18 r14 = 0x00000000019ecf58 r15 = 0x0000000003ae53e0 rip = 0x00007f328a960944 rsp = 0x00007fffdb279790 rbp = 0x00007fffdb279c30 Found by: call frame info 7 libmozsqlite3.so!sqlite3_step [sqlite3.c : 61763 + 0x7] } Followed later in mochitest-ipcplugins by: { 805 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/plugins/test/test_pluginstream_newstream.html | Test timed out. TEST-UNEXPECTED-FAIL | /tests/dom/plugins/test/test_pluginstream_newstream.html | application timed out after 330 seconds with no output INFO | automation.py | Application ran for: 0:15:57.428228 INFO | automation.py | Reading PID log: /tmp/tmpc0VS3lpidlog Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64-debug/1349339357/firefox-18.0a1.en-US.linux-x86_64.crashreporter-symbols.zip PROCESS-CRASH | /tests/dom/plugins/test/test_pluginstream_newstream.html | application crashed (minidump found) Crash dump filename: /tmp/tmpIF3fip/minidumps/7cdf0735-50e9-fd24-50d516fb-5048778f.dmp Operating system: Linux 0.0.0 Linux 2.6.31.5-127.fc12.x86_64 #1 SMP Sat Nov 7 21:11:14 EST 2009 x86_64 CPU: amd64 family 0 model 0 stepping 0 1 CPU Crash reason: SIGABRT Crash address: 0x1f400000fa7 Thread 0 (crashed) 0 libpthread-2.11.so + 0xe6ad rbx = 0x00000000064c8110 r12 = 0x00000000000000e8 r13 = 0x0000000000000000 r14 = 0x0000000000000002 r15 = 0x0000000000000000 rip = 0x00000034d360e6ad rsp = 0x00007fff5e91c3c0 rbp = 0x00007fff5e91c400 Found by: given as instruction pointer in context 1 libmozsqlite3.so!unixSync [sqlite3.c : 26073 + 0x7] rip = 0x00007fab2a23159a rsp = 0x00007fff5e91c3d0 rbp = 0x00007fff5e91c400 Found by: stack scanning 2 libxul.so!xSync [TelemetryVFS.cpp : 158 + 0x5] rbx = 0x00000000064c8100 r12 = 0x00000000000000e8 r13 = 0x0000000000000000 rip = 0x00007fab27a5bf37 rsp = 0x00007fff5e91c410 rbp = 0x00007fff5e91c440 Found by: call frame info 3 libmozsqlite3.so!sqlite3PagerCommitPhaseOne [sqlite3.c : 42892 + 0x7] rbx = 0x00000000064c7fa8 r12 = 0x0000000005b39708 r13 = 0x0000000000000000 rip = 0x00007fab2a25b77b rsp = 0x00007fff5e91c450 rbp = 0x00007fff5e91c4b0 Found by: call frame info 4 libmozsqlite3.so!sqlite3BtreeCommitPhaseOne [sqlite3.c : 51065 + 0xd] rbx = 0x000000000415fcf8 r12 = 0x0000000005b39708 r13 = 0x0000000000000000 r14 = 0x00000000055eb0b8 r15 = 0x0000000000000000 rip = 0x00007fab2a25ba02 rsp = 0x00007fff5e91c4c0 rbp = 0x00007fff5e91c510 Found by: call frame info 5 libmozsqlite3.so!sqlite3VdbeHalt [sqlite3.c : 59920 + 0xd] rbx = 0x00000000021f0e38 r12 = 0x0000000005b39708 r13 = 0x0000000000000000 r14 = 0x0000000000000000 r15 = 0x0000000000000000 rip = 0x00007fab2a25c618 rsp = 0x00007fff5e91c520 rbp = 0x00007fff5e91c5d0 Found by: call frame info 6 libmozsqlite3.so!sqlite3VdbeExec [sqlite3.c : 64314 + 0x7] rbx = 0x0000000005db9328 r12 = 0x0000000000000000 r13 = 0x0000000005b39708 r14 = 0x00000000021f0e38 r15 = 0x0000000004d413a0 rip = 0x00007fab2a27d944 rsp = 0x00007fff5e91c5e0 rbp = 0x00007fff5e91ca80 Found by: call frame info 7 libmozsqlite3.so!sqlite3_step [sqlite3.c : 61763 + 0x7] }
Attached patch part 1: delete TelemetryVFS (obsolete) (deleted) — Splinter Review
Attachment #671558 - Attachment description: Bug 798366 - part 1: delete TelemetryVFS; r=mak → part 1: delete TelemetryVFS
Attachment #671558 - Flags: review?(mak77)
Attachment #671560 - Attachment description: Bug 798366 - part 2: delete MOZ_SQLITE histogram definitions; r=taras → part 2: delete MOZ_SQLITE histogram definitions
Attachment #671560 - Flags: review?(taras.mozilla)
Attachment #671560 - Flags: review?(taras.mozilla) → review+
Comment on attachment 671558 [details] [diff] [review] part 1: delete TelemetryVFS my only concern is about the weird nfs bits that were added at some point. Can you reach out to whoever made those to let them know they'll have to find another avenue for their hack?
(In reply to Taras Glek (:taras) from comment #3) > Comment on attachment 671558 [details] [diff] [review] > part 1: delete TelemetryVFS > > my only concern is about the weird nfs bits that were added at some point. > Can you reach out to whoever made those to let them know they'll have to > find another avenue for their hack? Oooh, good point. I can move those bits over into the main storage code...I think. CC'ing Martin to see if he has any better ideas, since he added this for bug 433129.
Well, if you remove the telemetryVFS I'll rewrite the nfs patch in Bug 433129.
May I have some more information about what happens and why we are doing this, here I just see a stack and a removal patch. Do we think the vfs is causing too many issues that is simpler to kill it than investigate them? Or is this part of a wider plan?
Comment on attachment 671558 [details] [diff] [review] part 1: delete TelemetryVFS Review of attachment 671558 [details] [diff] [review]: ----------------------------------------------------------------- So the nfs thing is in Aurora, right? So this patch is basically a regression. Imo either we backout it from Aurora too, avoiding the regression, or we wait for a new version of that patch that doesn't use the vfs. Apart that, a couple comments to be fixed but the patch looks ok. ::: storage/src/mozStorageService.cpp @@ +437,2 @@ > if (rc != SQLITE_OK) > NS_WARNING("sqlite3 did not shutdown cleanly."); While here could you please fix this (and next ones in this destructor) to use NS_WARN_IF_FALSE? And define rc as DebugOnly<int> rc; or it would warn on compile. @@ +575,5 @@ > rc = ::sqlite3_initialize(); > if (rc != SQLITE_OK) > return convertResultCode(rc); > > + rc = ::sqlite3_quota_initialize(nullptr, 0); nullptr is C++, but sqlite APIs are C, shouldn't we use NULL then?
Attachment #671558 - Flags: review?(mak77) → review+
Well, the r+ is clearly dependent on answering comment 6 and figuring out a way of proceeding for the nfs issue. Just a code-wise review.
I don't know that the VFS is causing a lot of issues; we've have one intermittent orange that's seemingly attributable to the VFS, but the VFS has been in tree for quite some time. (In reply to Marco Bonardo [:mak] from comment #7) > So the nfs thing is in Aurora, right? So this patch is basically a > regression. > Imo either we backout it from Aurora too, avoiding the regression, or we > wait for a new version of that patch that doesn't use the vfs. > Apart that, a couple comments to be fixed but the patch looks ok. I think just this part: - rc = ::sqlite3_quota_initialize("telemetry-vfs", 0); + rc = ::sqlite3_quota_initialize(nullptr, 0); if (rc != SQLITE_OK) would have to fixed to use the appropriate VFS name and preference that TelemetryVFS currently uses. > ::: storage/src/mozStorageService.cpp > @@ +437,2 @@ > > if (rc != SQLITE_OK) > > NS_WARNING("sqlite3 did not shutdown cleanly."); > > While here could you please fix this (and next ones in this destructor) to > use NS_WARN_IF_FALSE? And define rc as DebugOnly<int> rc; or it would warn > on compile. Sure. > @@ +575,5 @@ > > rc = ::sqlite3_initialize(); > > if (rc != SQLITE_OK) > > return convertResultCode(rc); > > > > + rc = ::sqlite3_quota_initialize(nullptr, 0); > > nullptr is C++, but sqlite APIs are C, shouldn't we use NULL then? I don't think so.
Rev3 Fedora 12x64 mozilla-central opt test mochitests-5/5 on 2012-10-19 05:15:06 PDT for push 987d2f22f83f slave: talos-r3-fed64-044 https://tbpl.mozilla.org/php/getParsedLog.php?id=16264522&tree=Firefox
Summary: Intermittent crash in test_obj_group.xul, test_pluginstream_newstream.html [@ libpthread-2.11.so@0xe6ad | unixSync | xSync] → Intermittent crash in test_obj_group.xul, test_pluginstream_newstream.html, test_html5_tree_construction_part2.html [@ libpthread-2.11.so@0xe6ad | unixSync | xSync]
(In reply to Nathan Froyd (:froydnj) from comment #9) > > nullptr is C++, but sqlite APIs are C, shouldn't we use NULL then? > > I don't think so. Well at least in Storage code we usually use NULL when dealing with SQLite APIs, it's possible nullptr is safe (likely the compiler takes care of that), but per convention I'd keep using NULL.
Whiteboard: [orange]
FWIW, I was unable to find a single instance of this crash signature in the last 4 weeks on crash-stats.
Summary: Intermittent crash in test_obj_group.xul, test_pluginstream_newstream.html, test_html5_tree_construction_part2.html [@ libpthread-2.11.so@0xe6ad | unixSync | xSync] → Intermittent crash in test_obj_group.xul, test_pluginstream_newstream.html, test_html5_tree_construction_part2.html, browser_aboutHome.js [@ libpthread-2.11.so@0xe6ad | unixSync | xSync]
Attached patch part 1 - delete TelemetryVFS (deleted) — Splinter Review
Updated patch that applies, addresses the NFS VFS issue (I think), and fixes the potential non-main thread accesses to prefs.
Attachment #671558 - Attachment is obsolete: true
Attachment #707666 - Flags: review?(mak77)
Blocks: 832677
Comment on attachment 707666 [details] [diff] [review] part 1 - delete TelemetryVFS Review of attachment 707666 [details] [diff] [review]: ----------------------------------------------------------------- ::: storage/src/mozStorageService.cpp @@ +43,5 @@ > + * This preference is a workaround to allow users/sysadmins to identify > + * that the profile exists on an NFS share whose implementation > + * is incompatible with SQLite's default locking implementation. > + * Bug 433129 attempted to automatically identify such file-systems, > + * but a reliable way was not found and it was determined that the fallback some trailing spaces on these lines @@ +55,5 @@ > +#else > +#define STANDARD_VFS "unix" > +#define NFS_VFS "unix-excl" > +#endif > + trailing spaces
Attachment #707666 - Flags: review?(mak77) → review+
...and backed out due to mochitest-2 orange: https://tbpl.mozilla.org/php/getParsedLog.php?id=19252208&tree=Mozilla-Inbound http://hg.mozilla.org/integration/mozilla-inbound/rev/46c47be9d44e I thought the quota handling for IndexedDB happened someplace about sqlite nowadays, but apparently not...
(In reply to Nathan Froyd (:froydnj) from comment #17) > ...and backed out due to mochitest-2 orange: > > https://tbpl.mozilla.org/php/getParsedLog.php?id=19252208&tree=Mozilla- > Inbound > > http://hg.mozilla.org/integration/mozilla-inbound/rev/46c47be9d44e > > I thought the quota handling for IndexedDB happened someplace about sqlite > nowadays, but apparently not... Apparently we ditched sqlite's quota handling at some point and baked the quota handling into the Telemetry VFS. Thus, ditching the Telemetry VFS also ditched the quota handling. I love (non-)composable systems...
So, it's easy enough to solve this by ripping the Telemetry bits out of TelemetryVFS and moving the pref reading into ServiceMainThreadInitializer. However, I think we have a race: 1. Non-main thread asks for the service. 2. Service initializes, enqueues ServiceMainThreadInitializer. 3. Non-main thread now has a handle on the service, but things are not fully initialized. 4. A database connection is established before ServiceMainThreadInitializer runs. What happens now depends on where we construct the VFS that we use to supply quota handling: a) We construct the VFS as soon as the service initializes and read the (cached) value of the pref every time we open a file. In this scenario, any databases opened before ServiceMainThreadInitializer runs may have NFS problems, but quota handling will be OK. b) We construct the VFS inside of ServiceMainThreadInitializer. In this scenario, both quota handling and NFS issues crop up for databases opened before initialization completes. To be fair, we have approximately the same sort of issues now, except that we're dealing with prefs read off the main thread, which may or may not work... Mak, WDYT? Is it worth trying to address this, or should we just go with solution (a)? I almost think that dispatching ServiceMainThreadInitializer to the main thread DISPATCH_SYNC ought to work, though we'd have to tweak Service::getSingleton to make sure that we're not returning a partially-initialized service on some *other* thread while we're waiting for ServiceMainThreadInitializer to complete...
Flags: needinfo?(mak77)
(In reply to Nathan Froyd (:froydnj) from comment #19) > So, it's easy enough to solve this by ripping the Telemetry bits out of > TelemetryVFS and moving the pref reading into ServiceMainThreadInitializer. > However, I think we have a race: I actually noticed this some days ago, and looks like it always "worked" like this, didn't have the time to investigate that though. > What happens now depends on where we construct the VFS that we use to supply > quota handling: Afaik, we don't use anymore a vfs for quota handling but that is managed directly by the quota mananger that is then used by indexedDB and DOMStorage. > a) We construct the VFS as soon as the service initializes and read the > (cached) value of the pref every time we open a file. In this scenario, any > databases opened before ServiceMainThreadInitializer runs may have NFS > problems, but quota handling will be OK. That is unfortunately problematic, the nfs case is disruptive and just guessing the first database may not be that important looks like a poor choice. > Mak, WDYT? Is it worth trying to address this Yes I think we should try to address this, instead of workaround it. Can we hack up something with mutexes so the async thread waits for mainthread? I don't think it's so bad if another thread waits the mainthread for a little while.
Flags: needinfo?(mak77)
And honestly I actually didn't verify if by chance we already mutex properly here...
(In reply to Marco Bonardo [:mak] from comment #20) > > What happens now depends on where we construct the VFS that we use to supply > > quota handling: > > Afaik, we don't use anymore a vfs for quota handling but that is managed > directly by the quota mananger that is then used by indexedDB and DOMStorage. That's what I thought, too! But then all those quota-related tests wouldn't have failed when I landed my stuff yesterday. And then I discovered that the quota manager is actually called by the TelemetryVFS stuff: http://mxr.mozilla.org/mozilla-central/source/storage/src/TelemetryVFS.cpp#94 http://mxr.mozilla.org/mozilla-central/source/storage/src/TelemetryVFS.cpp#140 http://mxr.mozilla.org/mozilla-central/source/storage/src/TelemetryVFS.cpp#322 > > Mak, WDYT? Is it worth trying to address this > > Yes I think we should try to address this, instead of workaround it. > Can we hack up something with mutexes so the async thread waits for > mainthread? I don't think it's so bad if another thread waits the mainthread > for a little while. I think sync dispatching and some careful programming in getSingleton() will work; I will attempt to do that.
Depends on: 836493
(In reply to Nathan Froyd (:froydnj) from comment #22) > That's what I thought, too! But then all those quota-related tests wouldn't > have failed when I landed my stuff yesterday. And then I discovered that > the quota manager is actually called by the TelemetryVFS stuff: > > http://mxr.mozilla.org/mozilla-central/source/storage/src/TelemetryVFS.cpp#94 > http://mxr.mozilla.org/mozilla-central/source/storage/src/TelemetryVFS. > cpp#140 > http://mxr.mozilla.org/mozilla-central/source/storage/src/TelemetryVFS. > cpp#322 ugh, this complicates things a bit, also related to this patch. I was actually told the quota manager didn't need the vfs anymore at all, but didn't verify that directly :(
Resolving WFM keyword:intermittent-failure bugs last modified >3 months ago, whose whiteboard contains none of: {random,disabled,marked,fuzzy,todo,fails,failing,annotated,time-bomb,leave open} There will inevitably be some false positives; for that (and the bugspam) I apologise. Filter on orangewfm.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: