Closed
Bug 798366
Opened 12 years ago
Closed 11 years ago
Intermittent crash in test_obj_group.xul, test_pluginstream_newstream.html, test_html5_tree_construction_part2.html, browser_aboutHome.js [@ libpthread-2.11.so@0xe6ad | unixSync | xSync]
Categories
(Toolkit :: Telemetry, defect)
Toolkit
Telemetry
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: emorley, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: crash, intermittent-failure)
Crash Data
Attachments
(2 files, 1 obsolete file)
(deleted),
patch
|
taras.mozilla
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
mak
:
review+
|
Details | Diff | Splinter Review |
Rev3 Fedora 12x64 mozilla-inbound debug test mochitest-other on 2012-10-04 02:10:07 PDT for push 739f8aac68a6
slave: talos-r3-fed64-044
https://tbpl.mozilla.org/php/getParsedLog.php?id=15807160&tree=Mozilla-Inbound
{
1312 INFO TEST-START | chrome://mochitests/content/a11y/accessible/attributes/test_obj_group.xul
++DOMWINDOW == 28 (0x18c81d0) [serial = 41] [outer = 0x3832210]
1313 INFO TEST-INFO | chrome://mochitests/content/a11y/accessible/attributes/test_obj_group.xul | must wait for load
TEST-UNEXPECTED-FAIL | chrome://mochitests/content/a11y/accessible/attributes/test_obj_group.xul | application timed out after 330 seconds with no output
args: ['/home/cltbld/talos-slave/test/build/bin/screentopng']
INFO | automation.py | Application ran for: 0:13:12.861257
INFO | automation.py | Reading PID log: /tmp/tmpUDj2Nspidlog
Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64-debug/1349339357/firefox-18.0a1.en-US.linux-x86_64.crashreporter-symbols.zip
PROCESS-CRASH | chrome://mochitests/content/a11y/accessible/attributes/test_obj_group.xul | application crashed (minidump found)
Crash dump filename: /tmp/tmp0QaMNZ/minidumps/501cd257-f3c8-a21a-1f6214b2-6b38d7f7.dmp
Operating system: Linux
0.0.0 Linux 2.6.31.5-127.fc12.x86_64 #1 SMP Sat Nov 7 21:11:14 EST 2009 x86_64
CPU: amd64
family 0 model 0 stepping 0
1 CPU
Crash reason: SIGABRT
Crash address: 0x1f400000cbb
Thread 0 (crashed)
0 libpthread-2.11.so + 0xe6ad
rbx = 0x0000000002b589b0 r12 = 0x00000000000000e8
r13 = 0x0000000000000000 r14 = 0x0000000000000002
r15 = 0x0000000000000000 rip = 0x00000034d360e6ad
rsp = 0x00007fffdb279570 rbp = 0x00007fffdb2795b0
Found by: given as instruction pointer in context
1 libmozsqlite3.so!unixSync [sqlite3.c : 26073 + 0x7]
rip = 0x00007f328a91459a rsp = 0x00007fffdb279580
rbp = 0x00007fffdb2795b0
Found by: stack scanning
2 libxul.so!xSync [TelemetryVFS.cpp : 158 + 0x5]
rbx = 0x0000000002b589a0 r12 = 0x00000000000000e8
r13 = 0x0000000000000000 rip = 0x00007f328813ef37
rsp = 0x00007fffdb2795c0 rbp = 0x00007fffdb2795f0
Found by: call frame info
3 libmozsqlite3.so!sqlite3PagerCommitPhaseOne [sqlite3.c : 42892 + 0x7]
rbx = 0x0000000002b58848 r12 = 0x0000000003c4ff18
r13 = 0x0000000000000000 rip = 0x00007f328a93e77b
rsp = 0x00007fffdb279600 rbp = 0x00007fffdb279660
Found by: call frame info
4 libmozsqlite3.so!sqlite3BtreeCommitPhaseOne [sqlite3.c : 51065 + 0xd]
rbx = 0x00000000021f6908 r12 = 0x0000000003c4ff18
r13 = 0x0000000000000000 r14 = 0x0000000004816f48
r15 = 0x0000000000000000 rip = 0x00007f328a93ea02
rsp = 0x00007fffdb279670 rbp = 0x00007fffdb2796c0
Found by: call frame info
5 libmozsqlite3.so!sqlite3VdbeHalt [sqlite3.c : 59920 + 0xd]
rbx = 0x00000000019ecf58 r12 = 0x0000000003c4ff18
r13 = 0x0000000000000000 r14 = 0x0000000000000000
r15 = 0x0000000000000000 rip = 0x00007f328a93f618
rsp = 0x00007fffdb2796d0 rbp = 0x00007fffdb279780
Found by: call frame info
6 libmozsqlite3.so!sqlite3VdbeExec [sqlite3.c : 64314 + 0x7]
rbx = 0x0000000003d9c188 r12 = 0x0000000000000000
r13 = 0x0000000003c4ff18 r14 = 0x00000000019ecf58
r15 = 0x0000000003ae53e0 rip = 0x00007f328a960944
rsp = 0x00007fffdb279790 rbp = 0x00007fffdb279c30
Found by: call frame info
7 libmozsqlite3.so!sqlite3_step [sqlite3.c : 61763 + 0x7]
}
Followed later in mochitest-ipcplugins by:
{
805 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/plugins/test/test_pluginstream_newstream.html | Test timed out.
TEST-UNEXPECTED-FAIL | /tests/dom/plugins/test/test_pluginstream_newstream.html | application timed out after 330 seconds with no output
INFO | automation.py | Application ran for: 0:15:57.428228
INFO | automation.py | Reading PID log: /tmp/tmpc0VS3lpidlog
Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64-debug/1349339357/firefox-18.0a1.en-US.linux-x86_64.crashreporter-symbols.zip
PROCESS-CRASH | /tests/dom/plugins/test/test_pluginstream_newstream.html | application crashed (minidump found)
Crash dump filename: /tmp/tmpIF3fip/minidumps/7cdf0735-50e9-fd24-50d516fb-5048778f.dmp
Operating system: Linux
0.0.0 Linux 2.6.31.5-127.fc12.x86_64 #1 SMP Sat Nov 7 21:11:14 EST 2009 x86_64
CPU: amd64
family 0 model 0 stepping 0
1 CPU
Crash reason: SIGABRT
Crash address: 0x1f400000fa7
Thread 0 (crashed)
0 libpthread-2.11.so + 0xe6ad
rbx = 0x00000000064c8110 r12 = 0x00000000000000e8
r13 = 0x0000000000000000 r14 = 0x0000000000000002
r15 = 0x0000000000000000 rip = 0x00000034d360e6ad
rsp = 0x00007fff5e91c3c0 rbp = 0x00007fff5e91c400
Found by: given as instruction pointer in context
1 libmozsqlite3.so!unixSync [sqlite3.c : 26073 + 0x7]
rip = 0x00007fab2a23159a rsp = 0x00007fff5e91c3d0
rbp = 0x00007fff5e91c400
Found by: stack scanning
2 libxul.so!xSync [TelemetryVFS.cpp : 158 + 0x5]
rbx = 0x00000000064c8100 r12 = 0x00000000000000e8
r13 = 0x0000000000000000 rip = 0x00007fab27a5bf37
rsp = 0x00007fff5e91c410 rbp = 0x00007fff5e91c440
Found by: call frame info
3 libmozsqlite3.so!sqlite3PagerCommitPhaseOne [sqlite3.c : 42892 + 0x7]
rbx = 0x00000000064c7fa8 r12 = 0x0000000005b39708
r13 = 0x0000000000000000 rip = 0x00007fab2a25b77b
rsp = 0x00007fff5e91c450 rbp = 0x00007fff5e91c4b0
Found by: call frame info
4 libmozsqlite3.so!sqlite3BtreeCommitPhaseOne [sqlite3.c : 51065 + 0xd]
rbx = 0x000000000415fcf8 r12 = 0x0000000005b39708
r13 = 0x0000000000000000 r14 = 0x00000000055eb0b8
r15 = 0x0000000000000000 rip = 0x00007fab2a25ba02
rsp = 0x00007fff5e91c4c0 rbp = 0x00007fff5e91c510
Found by: call frame info
5 libmozsqlite3.so!sqlite3VdbeHalt [sqlite3.c : 59920 + 0xd]
rbx = 0x00000000021f0e38 r12 = 0x0000000005b39708
r13 = 0x0000000000000000 r14 = 0x0000000000000000
r15 = 0x0000000000000000 rip = 0x00007fab2a25c618
rsp = 0x00007fff5e91c520 rbp = 0x00007fff5e91c5d0
Found by: call frame info
6 libmozsqlite3.so!sqlite3VdbeExec [sqlite3.c : 64314 + 0x7]
rbx = 0x0000000005db9328 r12 = 0x0000000000000000
r13 = 0x0000000005b39708 r14 = 0x00000000021f0e38
r15 = 0x0000000004d413a0 rip = 0x00007fab2a27d944
rsp = 0x00007fff5e91c5e0 rbp = 0x00007fff5e91ca80
Found by: call frame info
7 libmozsqlite3.so!sqlite3_step [sqlite3.c : 61763 + 0x7]
}
Comment 1•12 years ago
|
||
Comment 2•12 years ago
|
||
Updated•12 years ago
|
Attachment #671558 -
Attachment description: Bug 798366 - part 1: delete TelemetryVFS; r=mak → part 1: delete TelemetryVFS
Attachment #671558 -
Flags: review?(mak77)
Updated•12 years ago
|
Attachment #671560 -
Attachment description: Bug 798366 - part 2: delete MOZ_SQLITE histogram definitions; r=taras → part 2: delete MOZ_SQLITE histogram definitions
Attachment #671560 -
Flags: review?(taras.mozilla)
Updated•12 years ago
|
Attachment #671560 -
Flags: review?(taras.mozilla) → review+
Comment 3•12 years ago
|
||
Comment on attachment 671558 [details] [diff] [review]
part 1: delete TelemetryVFS
my only concern is about the weird nfs bits that were added at some point. Can you reach out to whoever made those to let them know they'll have to find another avenue for their hack?
Comment 4•12 years ago
|
||
(In reply to Taras Glek (:taras) from comment #3)
> Comment on attachment 671558 [details] [diff] [review]
> part 1: delete TelemetryVFS
>
> my only concern is about the weird nfs bits that were added at some point.
> Can you reach out to whoever made those to let them know they'll have to
> find another avenue for their hack?
Oooh, good point. I can move those bits over into the main storage code...I think. CC'ing Martin to see if he has any better ideas, since he added this for bug 433129.
Comment 5•12 years ago
|
||
Well, if you remove the telemetryVFS I'll rewrite the nfs patch in Bug 433129.
Comment 6•12 years ago
|
||
May I have some more information about what happens and why we are doing this, here I just see a stack and a removal patch.
Do we think the vfs is causing too many issues that is simpler to kill it than investigate them? Or is this part of a wider plan?
Comment 7•12 years ago
|
||
Comment on attachment 671558 [details] [diff] [review]
part 1: delete TelemetryVFS
Review of attachment 671558 [details] [diff] [review]:
-----------------------------------------------------------------
So the nfs thing is in Aurora, right? So this patch is basically a regression.
Imo either we backout it from Aurora too, avoiding the regression, or we wait for a new version of that patch that doesn't use the vfs.
Apart that, a couple comments to be fixed but the patch looks ok.
::: storage/src/mozStorageService.cpp
@@ +437,2 @@
> if (rc != SQLITE_OK)
> NS_WARNING("sqlite3 did not shutdown cleanly.");
While here could you please fix this (and next ones in this destructor) to use NS_WARN_IF_FALSE? And define rc as DebugOnly<int> rc; or it would warn on compile.
@@ +575,5 @@
> rc = ::sqlite3_initialize();
> if (rc != SQLITE_OK)
> return convertResultCode(rc);
>
> + rc = ::sqlite3_quota_initialize(nullptr, 0);
nullptr is C++, but sqlite APIs are C, shouldn't we use NULL then?
Attachment #671558 -
Flags: review?(mak77) → review+
Comment 8•12 years ago
|
||
Well, the r+ is clearly dependent on answering comment 6 and figuring out a way of proceeding for the nfs issue. Just a code-wise review.
Comment 9•12 years ago
|
||
I don't know that the VFS is causing a lot of issues; we've have one intermittent orange that's seemingly attributable to the VFS, but the VFS has been in tree for quite some time.
(In reply to Marco Bonardo [:mak] from comment #7)
> So the nfs thing is in Aurora, right? So this patch is basically a
> regression.
> Imo either we backout it from Aurora too, avoiding the regression, or we
> wait for a new version of that patch that doesn't use the vfs.
> Apart that, a couple comments to be fixed but the patch looks ok.
I think just this part:
- rc = ::sqlite3_quota_initialize("telemetry-vfs", 0);
+ rc = ::sqlite3_quota_initialize(nullptr, 0);
if (rc != SQLITE_OK)
would have to fixed to use the appropriate VFS name and preference that TelemetryVFS currently uses.
> ::: storage/src/mozStorageService.cpp
> @@ +437,2 @@
> > if (rc != SQLITE_OK)
> > NS_WARNING("sqlite3 did not shutdown cleanly.");
>
> While here could you please fix this (and next ones in this destructor) to
> use NS_WARN_IF_FALSE? And define rc as DebugOnly<int> rc; or it would warn
> on compile.
Sure.
> @@ +575,5 @@
> > rc = ::sqlite3_initialize();
> > if (rc != SQLITE_OK)
> > return convertResultCode(rc);
> >
> > + rc = ::sqlite3_quota_initialize(nullptr, 0);
>
> nullptr is C++, but sqlite APIs are C, shouldn't we use NULL then?
I don't think so.
Reporter | ||
Comment 10•12 years ago
|
||
Rev3 Fedora 12x64 mozilla-central opt test mochitests-5/5 on 2012-10-19 05:15:06 PDT for push 987d2f22f83f
slave: talos-r3-fed64-044
https://tbpl.mozilla.org/php/getParsedLog.php?id=16264522&tree=Firefox
Summary: Intermittent crash in test_obj_group.xul, test_pluginstream_newstream.html [@ libpthread-2.11.so@0xe6ad | unixSync | xSync] → Intermittent crash in test_obj_group.xul, test_pluginstream_newstream.html, test_html5_tree_construction_part2.html [@ libpthread-2.11.so@0xe6ad | unixSync | xSync]
Comment 11•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #9)
> > nullptr is C++, but sqlite APIs are C, shouldn't we use NULL then?
>
> I don't think so.
Well at least in Storage code we usually use NULL when dealing with SQLite APIs, it's possible nullptr is safe (likely the compiler takes care of that), but per convention I'd keep using NULL.
Assignee | ||
Updated•12 years ago
|
Keywords: intermittent-failure
Assignee | ||
Updated•12 years ago
|
Whiteboard: [orange]
Comment 12•12 years ago
|
||
FWIW, I was unable to find a single instance of this crash signature in the last 4 weeks on crash-stats.
Comment 13•12 years ago
|
||
Summary: Intermittent crash in test_obj_group.xul, test_pluginstream_newstream.html, test_html5_tree_construction_part2.html [@ libpthread-2.11.so@0xe6ad | unixSync | xSync] → Intermittent crash in test_obj_group.xul, test_pluginstream_newstream.html, test_html5_tree_construction_part2.html, browser_aboutHome.js [@ libpthread-2.11.so@0xe6ad | unixSync | xSync]
Comment 14•12 years ago
|
||
Updated patch that applies, addresses the NFS VFS issue (I think), and
fixes the potential non-main thread accesses to prefs.
Attachment #671558 -
Attachment is obsolete: true
Attachment #707666 -
Flags: review?(mak77)
Comment 15•12 years ago
|
||
Comment on attachment 707666 [details] [diff] [review]
part 1 - delete TelemetryVFS
Review of attachment 707666 [details] [diff] [review]:
-----------------------------------------------------------------
::: storage/src/mozStorageService.cpp
@@ +43,5 @@
> + * This preference is a workaround to allow users/sysadmins to identify
> + * that the profile exists on an NFS share whose implementation
> + * is incompatible with SQLite's default locking implementation.
> + * Bug 433129 attempted to automatically identify such file-systems,
> + * but a reliable way was not found and it was determined that the fallback
some trailing spaces on these lines
@@ +55,5 @@
> +#else
> +#define STANDARD_VFS "unix"
> +#define NFS_VFS "unix-excl"
> +#endif
> +
trailing spaces
Attachment #707666 -
Flags: review?(mak77) → review+
Comment 16•12 years ago
|
||
Comment 17•12 years ago
|
||
...and backed out due to mochitest-2 orange:
https://tbpl.mozilla.org/php/getParsedLog.php?id=19252208&tree=Mozilla-Inbound
http://hg.mozilla.org/integration/mozilla-inbound/rev/46c47be9d44e
I thought the quota handling for IndexedDB happened someplace about sqlite nowadays, but apparently not...
Comment 18•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #17)
> ...and backed out due to mochitest-2 orange:
>
> https://tbpl.mozilla.org/php/getParsedLog.php?id=19252208&tree=Mozilla-
> Inbound
>
> http://hg.mozilla.org/integration/mozilla-inbound/rev/46c47be9d44e
>
> I thought the quota handling for IndexedDB happened someplace about sqlite
> nowadays, but apparently not...
Apparently we ditched sqlite's quota handling at some point and baked the quota handling into the Telemetry VFS. Thus, ditching the Telemetry VFS also ditched the quota handling. I love (non-)composable systems...
Comment 19•12 years ago
|
||
So, it's easy enough to solve this by ripping the Telemetry bits out of TelemetryVFS and moving the pref reading into ServiceMainThreadInitializer. However, I think we have a race:
1. Non-main thread asks for the service.
2. Service initializes, enqueues ServiceMainThreadInitializer.
3. Non-main thread now has a handle on the service, but things are not fully initialized.
4. A database connection is established before ServiceMainThreadInitializer runs.
What happens now depends on where we construct the VFS that we use to supply quota handling:
a) We construct the VFS as soon as the service initializes and read the (cached) value of the pref every time we open a file. In this scenario, any databases opened before ServiceMainThreadInitializer runs may have NFS problems, but quota handling will be OK.
b) We construct the VFS inside of ServiceMainThreadInitializer. In this scenario, both quota handling and NFS issues crop up for databases opened before initialization completes.
To be fair, we have approximately the same sort of issues now, except that we're dealing with prefs read off the main thread, which may or may not work...
Mak, WDYT? Is it worth trying to address this, or should we just go with solution (a)? I almost think that dispatching ServiceMainThreadInitializer to the main thread DISPATCH_SYNC ought to work, though we'd have to tweak Service::getSingleton to make sure that we're not returning a partially-initialized service on some *other* thread while we're waiting for ServiceMainThreadInitializer to complete...
Flags: needinfo?(mak77)
Comment 20•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #19)
> So, it's easy enough to solve this by ripping the Telemetry bits out of
> TelemetryVFS and moving the pref reading into ServiceMainThreadInitializer.
> However, I think we have a race:
I actually noticed this some days ago, and looks like it always "worked" like this, didn't have the time to investigate that though.
> What happens now depends on where we construct the VFS that we use to supply
> quota handling:
Afaik, we don't use anymore a vfs for quota handling but that is managed directly by the quota mananger that is then used by indexedDB and DOMStorage.
> a) We construct the VFS as soon as the service initializes and read the
> (cached) value of the pref every time we open a file. In this scenario, any
> databases opened before ServiceMainThreadInitializer runs may have NFS
> problems, but quota handling will be OK.
That is unfortunately problematic, the nfs case is disruptive and just guessing the first database may not be that important looks like a poor choice.
> Mak, WDYT? Is it worth trying to address this
Yes I think we should try to address this, instead of workaround it.
Can we hack up something with mutexes so the async thread waits for mainthread? I don't think it's so bad if another thread waits the mainthread for a little while.
Flags: needinfo?(mak77)
Comment 21•12 years ago
|
||
And honestly I actually didn't verify if by chance we already mutex properly here...
Comment 22•12 years ago
|
||
(In reply to Marco Bonardo [:mak] from comment #20)
> > What happens now depends on where we construct the VFS that we use to supply
> > quota handling:
>
> Afaik, we don't use anymore a vfs for quota handling but that is managed
> directly by the quota mananger that is then used by indexedDB and DOMStorage.
That's what I thought, too! But then all those quota-related tests wouldn't have failed when I landed my stuff yesterday. And then I discovered that the quota manager is actually called by the TelemetryVFS stuff:
http://mxr.mozilla.org/mozilla-central/source/storage/src/TelemetryVFS.cpp#94
http://mxr.mozilla.org/mozilla-central/source/storage/src/TelemetryVFS.cpp#140
http://mxr.mozilla.org/mozilla-central/source/storage/src/TelemetryVFS.cpp#322
> > Mak, WDYT? Is it worth trying to address this
>
> Yes I think we should try to address this, instead of workaround it.
> Can we hack up something with mutexes so the async thread waits for
> mainthread? I don't think it's so bad if another thread waits the mainthread
> for a little while.
I think sync dispatching and some careful programming in getSingleton() will work; I will attempt to do that.
Comment 23•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #22)
> That's what I thought, too! But then all those quota-related tests wouldn't
> have failed when I landed my stuff yesterday. And then I discovered that
> the quota manager is actually called by the TelemetryVFS stuff:
>
> http://mxr.mozilla.org/mozilla-central/source/storage/src/TelemetryVFS.cpp#94
> http://mxr.mozilla.org/mozilla-central/source/storage/src/TelemetryVFS.
> cpp#140
> http://mxr.mozilla.org/mozilla-central/source/storage/src/TelemetryVFS.
> cpp#322
ugh, this complicates things a bit, also related to this patch. I was actually told the quota manager didn't need the vfs anymore at all, but didn't verify that directly :(
Reporter | ||
Comment 24•11 years ago
|
||
Resolving WFM keyword:intermittent-failure bugs last modified >3 months ago, whose whiteboard contains none of:
{random,disabled,marked,fuzzy,todo,fails,failing,annotated,time-bomb,leave open}
There will inevitably be some false positives; for that (and the bugspam) I apologise. Filter on orangewfm.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•