Closed
Bug 669819
Opened 13 years ago
Closed 13 years ago
[TI] Permaorange on WinXP debug, frequent on 10.5 debug and Win7 debug: test_websocket_basic.html | Test timed out. and test_websocket_hello.html | Connection closed cleanly
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: philor, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: intermittent-failure)
Attachments
(1 obsolete file)
Came in on a TM -> JM merge, needs to go out before we can merge TM -> m-c.
The assertion stacks in the Windows logs are clearly crap; if the Mac ones are, it's not so obvious.
http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1310004908.1310008532.27157.gz
39252 INFO TEST-PASS | /tests/content/base/test/test_websocket_basic.html | Wrong message - "test message100" should equal "test message100"
39253 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket_basic.html | Test timed out.
39254 INFO TEST-PASS | /tests/content/base/test/test_websocket_basic.html | Didn't receive all the messages! - 100 should equal 100
39255 INFO TEST-PASS | /tests/content/base/test/test_websocket_basic.html | Connection closed cleanly
39256 INFO TEST-END | /tests/content/base/test/test_websocket_basic.html | finished in 300987ms
39257 INFO TEST-START | /tests/content/base/test/test_websocket_hello.html
39258 INFO TEST-PASS | /tests/content/base/test/test_websocket_hello.html | 'this' should point to the WebSocket. (2) - [object WebSocket @ 0x27acca60 (native @ 0x29f67110)] should equal [object WebSocket @ 0x27acca60 (native @ 0x29f67110)]
39259 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket_hello.html | Connection closed cleanly
WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/tm-osx-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1889
before 991232, after 987136, break 07b34000
++DOMWINDOW == 297 (0x257040c8) [serial = 827] [outer = 0x1f6176b0]
39260 INFO TEST-PASS | /tests/content/base/test/test_websocket_hello.html | Wrong data - "Hello world!" should equal "Hello world!"
###!!! ASSERTION: nsLoadGroup not thread-safe: '_mOwningThread.GetThread() == PR_GetCurrentThread()', file /builds/slave/tm-osx-dbg/build/netwerk/base/src/nsLoadGroup.cpp, line 213
nsLoadGroup::Internal::Release [netwerk/base/src/nsLoadGroup.cpp:213]
nsLoadGroup::Release [netwerk/base/src/nsLoadGroup.cpp:213]
nsCOMPtr<nsILoadGroup>::~nsCOMPtr [nsCOMPtr.h:533]
mozilla::net::BaseWebSocketChannel::~BaseWebSocketChannel [netwerk/protocol/websocket/BaseWebSocketChannel.h:57]
mozilla::net::WebSocketChannel::~WebSocketChannel [netwerk/protocol/websocket/WebSocketChannel.cpp:571]
mozilla::net::WebSocketChannel::Release [netwerk/protocol/websocket/WebSocketChannel.cpp:81]
nsCOMPtr<nsIInputStreamCallback>::assign_assuming_AddRef [nsCOMPtr.h:518]
nsCOMPtr<nsIInputStreamCallback>::assign_with_AddRef [nsCOMPtr.h:1204]
nsCOMPtr<nsIInputStreamCallback>::operator= [nsCOMPtr.h:664]
nsInputStreamReadyEvent::Run [xpcom/io/nsStreamUtils.cpp:117]
nsThread::ProcessNextEvent [xpcom/threads/nsThread.cpp:617]
NS_ProcessNextEvent_P [obj-firefox/xpcom/build/nsThreadUtils.cpp:245]
nsSocketTransportService::Run [netwerk/base/src/nsSocketTransportService2.cpp:639]
nsThread::ProcessNextEvent [xpcom/threads/nsThread.cpp:617]
NS_ProcessNextEvent_P [obj-firefox/xpcom/build/nsThreadUtils.cpp:245]
nsThread::ThreadFunc [xpcom/threads/nsThread.cpp:271]
_pt_root [nsprpub/pr/src/pthreads/ptthread.c:190]
libSystem.B.dylib + 0x32155
libSystem.B.dylib + 0x32012
39261 INFO TEST-END | /tests/content/base/test/test_websocket_hello.html | finished in 2135ms
Comment hidden (Legacy TBPL/Treeherder Robot) |
Reporter | ||
Updated•13 years ago
|
Blocks: LandTIOnMC
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 6•13 years ago
|
||
it appears WebSocketChannel::mLoadGroup need to be proxy released on the main thread.
I'll want to checkout tracemonkey and run the patch through (at least os x) try though.. so it will take some time.
Comment 7•13 years ago
|
||
the real cause is a regression from 537787, mLoadGroup was supposed to be moved to a base class but was instead duplicated in both the base and derived classes.
It should also be proxy released as I mentioned in comment 6, but that's apparently only relevant on a corner case that isn't being tested.
I'll attach a patch in a minute.
Depends on: 537787
Keywords: regression
Comment 8•13 years ago
|
||
Attachment #544505 -
Flags: review?(josh)
Updated•13 years ago
|
Assignee: general → nobody
Component: JavaScript Engine → Networking: WebSockets
QA Contact: general → networking.websockets
Updated•13 years ago
|
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 10•13 years ago
|
||
Comment on attachment 544505 [details] [diff] [review]
patch v1
Review of attachment 544505 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #544505 -
Flags: review?(josh) → review+
Comment 11•13 years ago
|
||
Comment on attachment 544505 [details] [diff] [review]
patch v1
The e10s patch landed in aurora, and this fixes a relatively important yet simple bug in it.
Attachment #544505 -
Flags: approval-mozilla-aurora?
Comment 12•13 years ago
|
||
crud - this patch didn't fix the orange with try. (it did clear up all those assertions, which are legit problems so I'll push the patch.)
Comment 13•13 years ago
|
||
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mcmanus@ducksong.com-1761e7290d98/try-macosx-debug/try_leopard-o-debug_test-mochitests-1-build414.txt.gz
JS team - the portion of the test that is failing deals with a very large (225KB) message. Are there any diffs between TM and M-C that might be relevant to that?
Assignee: nobody → general
Component: Networking: WebSockets → JavaScript Engine
QA Contact: networking.websockets → general
Comment 14•13 years ago
|
||
Can you attach the message or link to it? TM recently pulled the TypeInference project (bug 608741) from the JM repo, which has a large effect on how JS is processed.
Comment 15•13 years ago
|
||
I was thinking it was the large message in test part 4 (with the huge message, which is both sent and recvd through ws) of this test:
http://mxr.mozilla.org/mozilla-central/source/content/base/test
/test_websocket_basic.html?force=1#22
But it actually looks like part 3 of the same test didn't quite complete.. it moved all 100 of its messages but doesn't move onto the close until a timeout kicks in. Note all those calls to forcegc(). [the test is just to an echo server, so each time a message is sent one is recvd].
hth.
Updated•13 years ago
|
Attachment #544505 -
Attachment is obsolete: true
Attachment #544505 -
Flags: approval-mozilla-aurora?
Comment 16•13 years ago
|
||
I've moved the ASSERTION(loadgroup not threadsafe) patch to a different bug 669975 as it is just a distraction from the TM issue.
status-firefox6:
unaffected → ---
status-firefox7:
affected → ---
status-firefox8:
affected → ---
Keywords: regression
Comment 17•13 years ago
|
||
I've tried to repro this on both OSX debug and Win7 debug and they both work fine.
Comment 18•13 years ago
|
||
Have to wonder if this test is doing several hundred GCs and just hitting a timeout before it finishes. Will reduce the testCount and tryserver that.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 24•13 years ago
|
||
Reduced the testCount to 50 on tryserver and this test was green on OSX, Win7 and WinXP. I don't think it gets stuck, it just takes longer and usually hits the timeout.
Comment 25•13 years ago
|
||
Well, if it hits the timeout when testCount isn't modified, there is probably
some significant performance regression which should be fixed.
Comment 26•13 years ago
|
||
How do we determine that? It seems like this test has been the cause of random timeout orange in the past, and it may have been close to the timeout threshold already. There are GC performance regressions we're working on (bug 669815), but if fixing those does not fix this then I'd say the problem is with the test and/or timeout value.
Comment 27•13 years ago
|
||
(In reply to comment #26)
> How do we determine that? It seems like this test has been the cause of
> random timeout orange in the past, and it may have been close to the timeout
> threshold already.
I'm not a fan of the tests, which certainly rely too much on timeouts and assumptions about processing speeds. but afaik it has not had an appreciable level of timeout flakiness in the last few months at all.
perhaps a measurement of time to completion before and after the merge would be a useful data point.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 34•13 years ago
|
||
This hasn't happened on any platform in the JM branch since the patch for bug 670185 went in (which was, in large part, cutting out work done during GC). How should this get marked as resolved?
Reporter | ||
Comment 35•13 years ago
|
||
JM's the only place it was broken, so if it's fixed there it's FIXED.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•12 years ago
|
Keywords: intermittent-failure
Updated•12 years ago
|
Whiteboard: [orange]
You need to log in
before you can comment on or make changes to this bug.
Description
•