Closed
Bug 692605
Opened 13 years ago
Closed 13 years ago
Perma-orange Aurora (and Beta) Linux opt (non-PGO) browser_webconsole_bug_595934_message_categories.js | Test timed out and test_errorPropagation.html | Test timed out followed by shutdown timeout
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
FIXED
mozilla10
Tracking | Status | |
---|---|---|
firefox7 | --- | unaffected |
firefox8 | --- | fixed |
firefox9 | --- | fixed |
firefox10 | --- | fixed |
People
(Reporter: mbrubeck, Assigned: bent.mozilla)
References
Details
(Keywords: intermittent-failure, Whiteboard: [qa-] [inbound][see comment #111])
Attachments
(1 file)
(deleted),
patch
|
bent.mozilla
:
review+
christian
:
approval-mozilla-aurora+
christian
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
Ever since PGO was turned off for opt builds (bug 658313), these failures are occurring in every non-PGO Linux 32-bit opt build on Aurora (Firefox 9).
We're pretty sure that non-PGO is to blame. The failures do not occur in PGO builds. They do occur in non-PGO builds even on pushes that were green before the config change, and the failures kept occurring after backing out the changeset where they first occurred.
https://tbpl.mozilla.org/php/getParsedLog.php?id=6710381&tree=Mozilla-Aurora
Rev3 Fedora 12 mozilla-aurora opt test mochitests-3/5 on 2011-10-06 13:40:03 PDT for push 4c15202ca083
5937 INFO TEST-PASS | /tests/dom/workers/test/test_errorPropagation.html | Correct message event.filename - http://mochi.test:8888/tests/dom/workers/test/errorPropagation_worker.js should equal http://mochi.test:8888/tests/dom/workers/test/errorPropagation_worker.js
5938 INFO TEST-PASS | /tests/dom/workers/test/test_errorPropagation.html | Correct message event.lineno - 48 should equal 48
5939 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/workers/test/test_errorPropagation.html | Test timed out.
XScreenSaver state: Disabled
User input has been idle for 743 seconds
args: ['/home/cltbld/talos-slave/test/build/bin/screentopng']
...
16466 INFO SimpleTest FINISHED
TEST-UNEXPECTED-FAIL | Shutdown | application timed out after 330 seconds with no output
INFO | automation.py | Application ran for: 0:14:38.951252
INFO | automation.py | Reading PID log: /tmp/tmp6qt-3tpidlog
PROCESS-CRASH | Shutdown | application crashed (minidump found)
Crash dump filename: /tmp/tmpCIjJRu/minidumps/68928f9b-b21b-2ce0-74dd07a2-79f2fec2.dmp
Operating system: Linux
0.0.0 Linux 2.6.31.5-127.fc12.i686.PAE #1 SMP Sat Nov 7 21:25:57 EST 2009 i686
CPU: x86
GenuineIntel family 6 model 23 stepping 10
2 CPUs
Crash reason: SIGABRT
Crash address: 0x847
Thread 0 (crashed)
0 libpthread-2.11.so + 0x8edd
eip = 0x00c6eedd esp = 0xbf9a0074 ebp = 0xbf9a0078 ebx = 0x08051a80
esi = 0xb7673040 edi = 0x93dd1000 eax = 0x00000003 ecx = 0x00000008
edx = 0x00000000 efl = 0x00200246
Found by: given as instruction pointer in context
1 firefox-bin!malloc_mutex_unlock [jemalloc.c : 1476 + 0x5]
eip = 0x0804b1ae esp = 0xbf9a0080 ebp = 0xb7673040
Found by: previous frame's frame pointer
2 firefox-bin!arena_malloc [jemalloc.c : 3818 + 0xa]
eip = 0x0804d78d esp = 0xbf9a00a0 ebp = 0xb7673040 ebx = 0x08051a80
Found by: call frame info
3 firefox-bin!imalloc [jemalloc.c : 3892 + 0x10]
eip = 0x0804da5c esp = 0xbf9a00d0 ebp = 0xbf9a018c ebx = 0x08051a80
esi = 0x0000000a edi = 0x00000000
Found by: call frame info
4 firefox-bin!malloc [jemalloc.c : 5986 + 0x6]
eip = 0x0804da9a esp = 0xbf9a00f0 ebp = 0xbf9a018c ebx = 0x08051a80
esi = 0x0000000a edi = 0x00000000
Found by: call frame info
5 libmozalloc.so!moz_malloc [mozalloc.cpp : 113 + 0x8]
eip = 0x00143b41 esp = 0xbf9a0100 ebp = 0xbf9a018c ebx = 0x00144118
esi = 0x00000002 edi = 0x00000000
Found by: call frame info
6 libxul.so!nsStringBuffer::Alloc [nsSubstring.cpp : 209 + 0x8]
eip = 0x01c29a4d esp = 0xbf9a0120 ebp = 0xbf9a018c ebx = 0x02323174
esi = 0x00000002 edi = 0x00000000
Found by: call frame info
7 libxul.so!nsAString_internal::MutatePrep [nsTSubstring.cpp : 162 + 0x8]
eip = 0x01c29c1d esp = 0xbf9a0140 ebp = 0xbf9a018c ebx = 0x02323174
esi = 0xa5e02e38 edi = 0x00000000
Found by: call frame info
8 libxul.so!nsAString_internal::ReplacePrepInternal [nsTSubstring.cpp : 198 + 0x13]
eip = 0x01c29c48 esp = 0xbf9a0170 ebp = 0x00000000 ebx = 0x02323174
esi = 0xa5e02e38 edi = 0x00000000
Found by: call frame info
9 libxul.so!nsAString_internal::ReplacePrep [nsTSubstring.h : 685 + 0x10]
eip = 0x01c29d56 esp = 0xbf9a01a0 ebp = 0x00000000 ebx = 0x02323174
esi = 0xa5e02e38 edi = 0x00000000
Found by: call frame info
https://tbpl.mozilla.org/php/getParsedLog.php?id=6710557&tree=Mozilla-Aurora
Rev3 Fedora 12 mozilla-aurora opt test mochitest-other on 2011-10-06 13:40:06 PDT for push 4c15202ca083
TEST-PASS | chrome://mochitests/content/browser/browser/devtools/webconsole/test/browser/browser_webconsole_bug_595934_message_categories.js | test #4: message found 'multipart/form-data'
TEST-PASS | chrome://mochitests/content/browser/browser/devtools/webconsole/test/browser/browser_webconsole_bug_595934_message_categories.js | test #4: error category 'HTML'
TEST-INFO | chrome://mochitests/content/browser/browser/devtools/webconsole/test/browser/browser_webconsole_bug_595934_message_categories.js | Console message: [JavaScript Warning: "Form contains enctype=multipart/form-data, but does not contain method=post. Submitting normally with method=GET and no enctype instead." {file: "http://example.com/browser/browser/devtools/webconsole/test//browser/test-bug-595934-html.html" line: 0}]
TEST-PASS | chrome://mochitests/content/browser/browser/devtools/webconsole/test/browser/browser_webconsole_bug_595934_message_categories.js | test #5: message found 'no element found'
TEST-PASS | chrome://mochitests/content/browser/browser/devtools/webconsole/test/browser/browser_webconsole_bug_595934_message_categories.js | test #5: error category 'malformed-xml'
TEST-INFO | chrome://mochitests/content/browser/browser/devtools/webconsole/test/browser/browser_webconsole_bug_595934_message_categories.js | Console message: [JavaScript Error: "no element found" {file: "http://example.com/browser/browser/devtools/webconsole/test//browser/test-bug-595934-malformedxml.xhtml" line: 11}]
TEST-PASS | chrome://mochitests/content/browser/browser/devtools/webconsole/test/browser/browser_webconsole_bug_595934_message_categories.js | test #6: message found 'fooBarSVG'
TEST-PASS | chrome://mochitests/content/browser/browser/devtools/webconsole/test/browser/browser_webconsole_bug_595934_message_categories.js | test #6: error category 'SVG'
TEST-INFO | chrome://mochitests/content/browser/browser/devtools/webconsole/test/browser/browser_webconsole_bug_595934_message_categories.js | Console message: [JavaScript Warning: "Unexpected value fooBarSVG parsing height attribute." {file: "http://example.com/browser/browser/devtools/webconsole/test//browser/test-bug-595934-svg.xhtml" line: 0}]
TEST-INFO | unknown test url | [SimpleTest/SimpleTest.js, window.onerror] An error occurred: fooBarWorker is not defined at http://example.com/browser/browser/devtools/webconsole/test//browser/test-bug-595934-workers.js:7
TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/devtools/webconsole/test/browser/browser_webconsole_bug_595934_message_categories.js | Test timed out
XScreenSaver state: Disabled
User input has been idle for 1338 seconds
args: ['/home/cltbld/talos-slave/test/build/bin/screentopng']
INFO TEST-END | chrome://mochitests/content/browser/browser/devtools/webconsole/test/browser/browser_webconsole_bug_595934_message_categories.js | finished in 30043ms
Comment 1•13 years ago
|
||
Note that we've also had other issues on Linux on the -O3 non-pgo builds... glandium might recall details.
Comment 2•13 years ago
|
||
We've had float rounding issues, but nothing crashy that i remember.
Comment 3•13 years ago
|
||
BTW, non-PGO should build -Os on aurora, by now.
Comment 4•13 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #3)
> BTW, non-PGO should build -Os on aurora, by now.
Confirmed, that was bug 655003.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•13 years ago
|
Summary: Perma-orange Aurora Linux opt (non-PGO) browser_webconsole_bug_595934_message_categories.js | Test timed out and test_errorPropagation.html | Test timed out followed by shutdown timeout → Perma-orange Aurora (and Beta) Linux opt (non-PGO) browser_webconsole_bug_595934_message_categories.js | Test timed out and test_errorPropagation.html | Test timed out followed by shutdown timeout
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 14•13 years ago
|
||
Not quite literally permaorange on m-b: there have been three builds, two of which say in the scrape that they were clobbers, which hit this, and one which does not say it was, which did not hit this.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 82•13 years ago
|
||
So, Luke pushed https://hg.mozilla.org/integration/mozilla-inbound/rev/4c0e6b3f7791, removing a totally empty header from jseng, and browser_webconsole_bug_595934_message_categories.js went permaorange on 10.6 opt.
The only thing I've seen doing that same sort of senseless and impossible in a reasonable world thing is when we've got code accessing memory that it totally shouldn't be touching, and a nothing change to jseng suddenly affects a test that couldn't possibly be affected. A previous case with a11y only happened with some PGO'd builds, when PGO happened to move things around to make the badness happen; apparently the Aurora Linux thing is the opposite, and PGO will cover our tracks but non-PGO leaves us busted.
Given that the Aurora case is both devtools and workers, does that mean that both of them are being bad, or is the devtools test being bitten by bad workers? Will running these tests under Valgrind lead to it shrieking in horror at the things we're doing?
Blocks: 689362
Severity: normal → blocker
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 85•13 years ago
|
||
The devtools hang is the only devtools test that uses a worker, so I would not be too suspicious of the devtools code.
The way these two failing tests work is that they register an error console listener and then use a worker to generate an error. The tests wait until they see the error appear in the console before continuing. Sometimes they randomly time out without anything else seeming to fail (at least, I don't see anything else in the logs).
I'm not ready to blame the worker code yet, though. I've looked over the worker error generation code a bunch now and can't see anything amiss. On a whim I decided to take a look at the code in the error console listener implementation and I ran across this (nsConsoleService.cpp#161):
* Iterate through any registered listeners and tell them about
* the message. We use the mListening flag to guard against
* recursive message logs. This could sometimes result in
* listeners being skipped because of activity on other threads,
* when we only care about the recursive case.
That seems pretty bad.. If we're unlucky enough to have an error reported on another thread immediately before the worker code generates the error that the tests are waiting on we'll miss the report and the test will never finish.
I'll see what we can do to fix this behavior (or the tests as a bandaid - maybe by looking at the console message list every second or so?). Basically this problem boils down to a bad test mechanism, not a bug in workers.
However, I don't think we can simply wave this off due to the crashes, and I'm pretty sure that the crashes are entirely due to the fact that the console manager is using proxies of JS objects to make its listeners get called back on the correct thread. Proxies are no longer safe to be used by JS on other threads, so this is a big red flag.
In short, I blame the console service for all my woes ;)
CC'ing luke and bsmedberg since they have a plan to nuke proxies and may have a patch to fix that part.
Anyone think my hunches are bonkers?
Component: General → XPCOM
QA Contact: general → xpcom
Comment 86•13 years ago
|
||
Inconveniently, backing out luke did not fix inbound's permaorange. His 10.6 opt build was a clobber, the two before that were not (and the four before that were burning), so either of them or Waldo's backout not really backing everything out are possibilities if they didn't manage to take effect until after a clobber, as is some releng change. Not sure what, if any, part of that fits with blaming the console service.
Comment 87•13 years ago
|
||
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 90•13 years ago
|
||
(In reply to ben turner [:bent] from comment #85)>
> Anyone think my hunches are bonkers?
Well, I don't get it, but that's not an unusual thing for me.
How does the theory that maybe sometimes an error gets reported on another thread right before your error work with the evidence, that in Linux32 PGO builds on mozilla-aurora, it never ever happens, but in Linux32 non-PGO builds on aurora it happens every single time for both tests, and in Linux32 non-PGO builds on mozilla-beta it happens every single time only for the devtools test, not for the workers test? And what part of sfink's pushes caused an error to be reported on another thread right before yours, every single time, only on 10.6 opt?
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 96•13 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #90)
Dunno, race conditions like this are always pretty strange, subject to weird tiny variations in timing and whatever OS scheduler you're dealing with.
I'm reasonably certain this will go away once bug 675221 removes the proxies and makes the listener logic more sane.
In the meantime I believe that we're just going to disable this test.
Depends on: 675221
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 102•13 years ago
|
||
This is rs=sicking.
Assignee | ||
Comment 103•13 years ago
|
||
Whiteboard: [orange] → [orange] [inbound]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 110•13 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla10
Assignee | ||
Comment 111•13 years ago
|
||
Comment on attachment 566979 [details] [diff] [review]
Patch to disable console listener tests
This just disables small parts of two tests that are problematic. The real fix will land on trunk soon-ish, but the patch isn't really suitable for branches. Disabling these test pieces in the interim seems the smartest path forward.
Attachment #566979 -
Flags: approval-mozilla-beta?
Attachment #566979 -
Flags: approval-mozilla-aurora?
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 124•13 years ago
|
||
can we turn this off on aurora or is this a real bug?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [orange] [inbound] → [orange] [inbound][see comment #111]
Comment 125•13 years ago
|
||
Ok, as per c#111, we can create a patch to turn off the test on aurora. We'll do the work in this bug until it lands.
Comment 126•13 years ago
|
||
and, since I'm talking to myself in this bug now, can we get approval to land BenT's patch? :)
Assignee | ||
Comment 127•13 years ago
|
||
I think we leave this FIXED since it has landed on trunk then wait for them to approve the patch for aurora.
Status: REOPENED → RESOLVED
Closed: 13 years ago → 13 years ago
Resolution: --- → FIXED
Comment 128•13 years ago
|
||
Maybe so, I just wanted the aurora-approver-gods to see that there was still something to be done here. Hopefully my incessant prattling will convince them of that! :)
Comment 129•13 years ago
|
||
Comment on attachment 566979 [details] [diff] [review]
Patch to disable console listener tests
Make it so!
Attachment #566979 -
Flags: approval-mozilla-beta?
Attachment #566979 -
Flags: approval-mozilla-beta+
Attachment #566979 -
Flags: approval-mozilla-aurora?
Attachment #566979 -
Flags: approval-mozilla-aurora+
Assignee | ||
Comment 130•13 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/4754469691db
https://hg.mozilla.org/releases/mozilla-beta/rev/52c9c801be77
status-firefox10:
--- → fixed
status-firefox7:
--- → unaffected
status-firefox8:
--- → fixed
status-firefox9:
--- → fixed
Updated•13 years ago
|
Whiteboard: [orange] [inbound][see comment #111] → [qa-][orange] [inbound][see comment #111]
Comment 131•12 years ago
|
||
Heads up: having revised the infrastructure for console-monitoring tests, I turned these tests back on in bug 663291 (specifically, https://hg.mozilla.org/integration/mozilla-inbound/rev/b36eaac9ecf8 ) I may not have gotten all the race conditions.
Comment 132•12 years ago
|
||
... and they were backed out again because yep, still race conditions. Filed new bug 812714 to get them fixed properly.
Updated•12 years ago
|
Keywords: intermittent-failure
Updated•12 years ago
|
Whiteboard: [qa-][orange] [inbound][see comment #111] → [qa-] [inbound][see comment #111]
You need to log in
before you can comment on or make changes to this bug.
Description
•