Closed Bug 275124 Opened 20 years ago Closed 13 years ago

Chatzilla/BeOS can't communicate after a while (UI remains responsive)

Categories

(Core :: XPCOM, defect)

Other
BeOS
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: bugzillamozilla, Unassigned)

References

Details

Attachments

(1 obsolete file)

After an uncertain amount of time, cz dies. Typed text is only echoed locally, nothing is actually sent. Similarly, nothing in the channel is displayed. Log file of such a session: http://beos.prognathous.mail-central.com/irc_terminal_log3.txt Screenshot: http://beos.prognathous.mail-central.com/cz_bug.png In the above screenshot, "test 2:56" was typed in cz but wasn't received in the channel (I verified this with another IRC client). A few configuration notes: Chatzilla v0.9.66e (Mozilla 20041216] /pref debugMode t browser.dom.window.dump.enabled=true javascript.options.showInConsole=true After adding javascript.options.strict=true, the following two warnings were registered in the JS console during the next session (see irc_terminal_log4.txt for the terminal output of this session): Warning: function my_reclaimname does not always return a value Source File: chrome://chatzilla/content/handlers.js Line: 1730, Column: 14 Source Code: return; Warning: function my_reclaimname does not always return a value Source File: chrome://chatzilla/content/handlers.js Line: 1737, Column: 15 Source Code: return true; CZ window of this broken session: #bezilla [INFO] Channel view for “#bezilla” opened. [MODE] User mode for Prog_0_9_66e20041216 is now +i [JOIN] YOU have joined #bezilla PrognathousMonitor test 3:30 Prog_0_9_66e20041216 test 3:31 PrognathousMonitor test 3:35 PrognathousMonitor test 3:44 PrognathousMonitor test 3:53 PrognathousMonitor test 4:01 PrognathousMonitor test 4:04 Prog_0_9_66e20041216 test 4:11 <-- NOT RECEIVED IN CHANNEL Additional tests: /eval client.eventPump.queue.length - client.eventPump.queuePointer [EVAL-IN] client.eventPump.queue.length - client.eventPump.queuePointer [EVAL-OUT] 4 /eval dumpObject(client.eventPump.queue[0]) [EVAL-IN] dumpObject(client.eventPump.queue[0]) [EVAL-OUT] set = server type = rawdata destObject = [object Object] destMethod = onRawData hooks = data = PING :irc2.mozilla.org queuedAt = Sat Dec 18 2004 04:07:58 GMT+0000 (GMT) Thanks for looking into this bug, Prog.
Is is chatzilla extension for FireFox?
This is a long standing bug that effects both the Chatzilla extension for Firefox and the one built into Seamonkey. It happens with the latest version, 0.9.66e, as well as much older ones. Silver suspects that the problem may be related to Necko. Prog.
"After an uncertain amount of time, cz dies." - this is unclear statement form the problem POV. Does it mean - "After an uncertain amount of IDLE time" ? If so, it is common problem for some servers, like freenode.org, for example. Such servers require either activity from user or implementation of "ping/pong" protocol in clients. For example, in BeOS Baxter also dies in same way for some servers, while Vision stays alive - as last is forcing ping-pong exchange with server.
(In reply to comment #3) > Does it mean - "After an uncertain amount of IDLE time" ? Not necessarily. Sometimes it happens after less than a minute, sometimes it doesn't happen after an hour. > If so, it is common problem for some servers, like freenode.org, for example. Then why doesn't Chatzilla/Windows suffer from this problem? Prog.
My initial stab at being Necko was basedo n what I'd heard about the problem prior to any logs. It now looks more like a Mozilla timer issue. Here are some more debug things to try... When you start ChatZilla on BeOS, do the following two /evals: /eval client.STEP_TIMEOUT = 10000 /eval mainStep = function() { client.displayHere("mainStep: BEGIN " + new Date()); client.eventPump.stepEvents(); setTimeout("mainStep()", client.STEP_TIMEOUT); client.displayHere("mainStep: END " + new Date()); } The first will slow down CZ so it only processes events once every 10 seconds, and the second will display a message to *client* both before and after each processing, including the time for reference. What would be interesting to know is a) do the pairs of messages still keep appearing after it stops communicating, and b) if not, was the last one a BEGIN or an END message?
From those logs, it looks distinctly like a timer issue in Mozilla's code. client_log.txt is the key - notice that every |mainStep| BEGIN has a matching END, so no unexpected JS exceptions occured, and more importantly, it /did/ run the setTimeout line that calls itself each time. It appears that the setTimeout set up just prior to "mainStep: END Sun Dec 19 2004 02:45:36 GMT+0000 (GMT)" simply didn't fire. I guess the next level of debugging would be NSPR logging of the timer code... unfortunately, it looks like you need to build w/ PR_LOGGING define before it actually logs anything, though a normal debug build appears to have it defined. Anyway, use the env var NSPR_LOG_MODULES=nsTimerImpl:5 to log timer stuff to the console, but beware - there is a /lot/ of spew. :)
Blocks: 266252
I am punting this over to Core: XPCOM because I am pretty sure this is not a CZ bug (see previous comment). However, I don't actually know who's bug it is... it could be a Spidermonkey setTimeout bug (unlikely IMHO, since this is platform-specific), but could be a core timer bug in Mozilla (probably specific to BeOS). Can anyone other than Prognathous reproduce this on BeOS?
Assignee: rginda → dougt
Component: ChatZilla → XPCOM
Product: Other Applications → Core
QA Contact: samuel → xpcom
Version: unspecified → Trunk
Can't verify this before bug 299058 gets fixed.
Depends on: 299058
Start suite with the -chat commandline parameter? That way you don't get the main window (just the chatzilla one), which should help.
Saw from the comments that there was a Firefox extention. I will see what I can do, this looks like a good bug in the summer heat :)
No longer depends on: 299058
Ok, confirmed (BeOS R5.03 BONE). Using the evals, it stuck after a few minutes. Also got a disconnect on my other IRC-client (*** TQH_test (chatzilla@moz-149A0DF3...) has quit IRC (Ping timeout) I will turn on timer-logging and see what I can find.
QA Contact: xpcom → prognathous
I remember in one place in Mozilla long ago, microseconds and millisecons were confused in setting and getting time intervals. Maybe this needs additional check for consistency when we using Be API for time settings.
Well, BeOS NSPR has needed an overhaul for a long time, I got sidetracked from debugging and started on that instead (I even rewrote atomic ops in asm for x86). The most confusing file is beos.c which now only consists of one function instead of lots of unused functions.
Got some assertions while I left Chatzilla running which might be interesting: -2147265008[80035610]: ###!!! ASSERTION: forget-word-frame: '(void*)aFrame == mWordFrames->PeekFront()', file /mozdev/mozilla/layout/generic/nsLineLayout.cpp, line 3027 ###!!! ASSERTION: forget-word-frame: '(void*)aFrame == mWordFrames->PeekFront()', file /mozdev/mozilla/layout/generic/nsLineLayout.cpp, line 3027 -2147265008[80035610]: ###!!! Break: at file /mozdev/mozilla/layout/generic/nsLineLayout.cpp, line 3027 Break: at file /mozdev/mozilla/layout/generic/nsLineLayout.cpp, line 3027 -2147265008[80035610]: ###!!! ASSERTION: forget-word-frame: '(void*)aFrame == mWordFrames->PeekFront()', file /mozdev/mozilla/layout/generic/nsLineLayout.cpp, line 3027 ###!!! ASSERTION: forget-word-frame: '(void*)aFrame == mWordFrames->PeekFront()', file /mozdev/mozilla/layout/generic/nsLineLayout.cpp, line 3027 -2147265008[80035610]: ###!!! Break: at file /mozdev/mozilla/layout/generic/nsLineLayout.cpp, line 3027 Break: at file /mozdev/mozilla/layout/generic/nsLineLayout.cpp, line 3027 JavaScript error: chrome://chatzilla/content/chatzilla.xul, line 1: contentAreaDNDObserver is not defined
The 'forget-word-frame' assertions occur on Windows as well, and are all layout issues, which I don't believe have any relation to the bug here.
No longer blocks: 266252
Blocks: 311032
QA Contact: prognathous → xpcom
ChatZilla isn't only thing which dies here at time. After a while, gif-animation stops, until you reload page. Second suspicious thing - contents stops updating until you resize window. When BeZilla is just started, all is ok, but at some unpredictable moment, newly loaded pages don't show new content, or mails in mailnews main window don't show content if you switch between mails in list. Looks like some timer is stopped or gone backward:)
Sergei in comment #17 > Looks like some timer is stopped or gone backward:) fixed on trunk?
Is there any reason to believe it's fixed? Btw trunk has been completly broken for BeOS ever since they forced Cairo on us. I don't think it will ever get fixed...
mass reassigning to nobody.
Assignee: dougt → nobody
this is definitely not fixed on trunk or branch. I'm beginning to suspect some kind of deep-rooted timer bug in BeOS-specific code. In addition to Chatzilla simply dying, Thunderbird ceases to automatically check for new mail when setting is configured in the Account/server preferences. We also have ongoing issues with some sites (usually SSL) hanging in Firefox but not SeaMonkey. If anyone has ideas where to look, I'd appreciate a push in the right direction.
tqh suggests this problem may be solved by changing BeOS to use pipes instead of TCP socket pairs. This is how Unix does it. Working now to test.
Doug and I suspect it's because of polling using socketpairs, which isn't really implemented in BeOS and I guess the default implementation might time out. Doug is testing with the pipes-code for UNIX. See: http://lxr.mozilla.org/mozilla1.8.0/source/nsprpub/pr/src/io/prpolevt.c#409
Status: NEW → ASSIGNED
initial testing is positive, at least for Chatzilla. Does not seem to make a difference in Thunderbird mail retrieval.
This implementation seems to take care of hangs in Chatzilla. Ran for nearly 12 hours without hanging here.
Attachment #304281 - Flags: review?(thesuckiestemail)
Comment on attachment 304281 [details] [diff] [review] Change BeOS to use Pipes instead of TCP Sockets, as Unix does r=thesuckiestemail@yahoo.se
Attachment #304281 - Flags: review?(thesuckiestemail) → review+
Attachment #304281 - Flags: review+ → review?(wtc)
Comment on attachment 304281 [details] [diff] [review] Change BeOS to use Pipes instead of TCP Sockets, as Unix does >+#if !defined(XP_UNIX) || !defined(XP_BEOS) This should be && instead of ||. Did you test this patch? With this patch, USE_TCP_SOCKETPAIR would still be defined on BeOS.
Attachment #304281 - Flags: review?(wtc) → review-
actually, in BeOS R5 at least, pipes are quite flacky if not buggy. That's the reason, for example, why we use pipefs replacement by mmu_man instead default pipe implementation, when working on big projects. At least I had enumerous problem building Mozilla 4 years ago, until start to use that replacement.
OK. Feeling stupid here - tested patch which does nothing, apparently. But why, then, is the bug not occurring. More work needed (obviously).
Sergei, afaik it's the problem when the pipe gets full, which we dealt with in nsAppShell. Are there any other problems?
(In reply to comment #30) > Sergei, afaik it's the problem when the pipe gets full, which we dealt with in > nsAppShell. Are there any other problems? > in AppShell we use ports, not pipes. Ports are reliable, that's basis of all BeOS functionaly (as are in use by BMessages internally). IIRC pipes have several problems, one of that was origin of big troubles in Apache porting, there was another too. I tried to search internet, but as Be Bug database is gone it was not so successful. Maybe better idea is to ask in mail lists. I think in Haiku problem is fixed, though.
Ah, then we probably should write a better one for BeOS.
Comment on attachment 304281 [details] [diff] [review] Change BeOS to use Pipes instead of TCP Sockets, as Unix does After applying the change correctly, BeOS Firefox and chatzilla hang MORE frequently. The trouble may still be somewhere in BeOS NSPR, but not here.
Attachment #304281 - Attachment is obsolete: true
Blocks: 397256
This is a mass change. Every comment has "assigned-to-new" in it. I didn't look through the bugs, so I'm sorry if I change a bug which shouldn't be changed. But I guess these bugs are just bugs that were once assigned and people forgot to change the Status back when unassigning.
Status: ASSIGNED → NEW
(In reply to comment #33) > The trouble may still be somewhere in BeOS NSPR, but not here. Doug, have you nailed what that might be?
I am no longer able to support the BeOS/Haiku Mozilla port and so cannot answer this question. Sorry I cannot be of more help.
BeOS is gone.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: