Closed
Bug 476960
Opened 16 years ago
Closed 16 years ago
IMAP Hang involving UI thread
Categories
(MailNews Core :: Networking: IMAP, defect)
Tracking
(Not tracked)
VERIFIED
FIXED
Thunderbird 3.0b2
People
(Reporter: gkw, Assigned: Bienvenu)
References
Details
(Keywords: hang)
Attachments
(7 files, 2 obsolete files)
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
neil
:
superreview+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
standard8
:
review+
Bienvenu
:
superreview+
|
Details | Diff | Splinter Review |
I've finally trapped down this hard-to-diagnose IMAP hang. When I came out of standby, sometimes TB will have a spinning bob for about 30s before I can use TB. This time, I trapped it using gdb and the attached info is there.
Flags: blocking-thunderbird3?
Assignee | ||
Comment 1•16 years ago
|
||
taking - we shouldn't be issuing imap protocol and parsing responses on the UI thread - it's a recipe for disaster. And when the connection has timed out, we shouldn't be doing all that stuff anyway.
Assignee: nobody → bienvenu
Summary: Hang involving UI thread → IMAP Hang involving UI thread
Reporter | ||
Comment 2•16 years ago
|
||
when i pressed continue, there were another 3 traps by gdb. Here's the thread apply all backtrace of the first one of these 3.
Reporter | ||
Comment 3•16 years ago
|
||
Related to bug 434642?
Reporter | ||
Comment 4•16 years ago
|
||
bienvenu, do you need the IMAP logs?
One of them ends like this (during the hang):
-1333592064[16e62bd0]: ReadNextLine [stream=16e630cc nb=10 needmore=0]
-1333592064[16e62bd0]: 160bb000:imap.gmail.com:S-INBOX:CreateNewLineFromSocket: + idling
-1603684576[70a8d0]: WARNING: GetGeckoKeyCodeFromChar has failed.: file /Users/skywalker/comm-central/mozilla/widget/src/cocoa/nsChildView.mm, line 4320
-1603684576[70a8d0]: WARNING: Preventing load of VerifiedDownloadPlugin.plugin (see bug 436575): file /Users/skywalker/comm-central/mozilla/modules/plugin/base/src/nsPluginsDirDarwin.cpp, line 163
-1603684576[70a8d0]: 160bb000:imap.gmail.com:S-INBOX:SendData: DONE
-1603684576[70a8d0]: ReadNextLine [stream=16e630cc nb=0 needmore=1]
-1603684576[70a8d0]: 160bb000:imap.gmail.com:S-INBOX:CreateNewLineFromSocket: clearing IMAP_CONNECTION_IS_OPEN - rv = 804b0014
Assignee | ||
Comment 5•16 years ago
|
||
Gary, here's a patch to try - basically, this makes it so the UI thread no longer blocks when trying to shutdown imap connections. This may cause us not to gracefully log out of all cached imap connections at shutdown, depending on how fast the rest of shutdown goes - I need to think about that a little...but on the plus side, bonking the go offline button will no longer block the UI for several seconds while we close imap connections. And it should help with your problem.
Comment 6•16 years ago
|
||
OS=ALL, and not a regression, correct?
can this generalized as a potential cause of crashes which have TellThreadToDie in the stack, such as nsProxyEventObject::CallMethod == bug 470808? (topcrash for 3.0b1, but on trunk, only 2 in last 8 days)
Assignee | ||
Comment 7•16 years ago
|
||
I'd say OS=ALL, right. Probably not a regression, though it might be from 1.0 or something...it might be the cause of a crash, though I can't say for sure.
Flags: blocking-thunderbird3? → blocking-thunderbird3+
OS: Mac OS X → All
Hardware: x86 → All
Target Milestone: --- → Thunderbird 3.0b3
Reporter | ||
Comment 8•16 years ago
|
||
(In reply to comment #5)
> Created an attachment (id=361385) [details]
> proposed fix
>
> Gary, here's a patch to try - basically, this makes it so the UI thread no
> longer blocks when trying to shutdown imap connections. This may cause us not
> to gracefully log out of all cached imap connections at shutdown, depending on
> how fast the rest of shutdown goes - I need to think about that a little...but
> on the plus side, bonking the go offline button will no longer block the UI for
> several seconds while we close imap connections. And it should help with your
> problem.
This patch looks good -- I no longer get spinning bob hangs at school, now I get a bunch of "cannot connect to server" dialogs immediately when I just resume from standby, but that's to be expected since my wifi is not yet connected. Once everything connects, they're good to go.
Request review or something for checkin of patch?
Assignee | ||
Updated•16 years ago
|
Attachment #361385 -
Flags: superreview?(neil)
Attachment #361385 -
Flags: review?(bugzilla)
Reporter | ||
Comment 9•16 years ago
|
||
bienvenu, I don't know if this hang is related, but it occurs with the patch.
STR:
- Send an email with 2 large attachments through Gmail SMTP.
- While it is sending, i.e. the dialog screen, go back to TB main window and press Get All Messages, i.e. cmd-shift-t. Shredder hangs, but gdb doesn't show anything so I took a sample through Activity Monitor.
I seem to recall a not-so-fast connection or one that is being throttled by the school during the sending of the email. Would this be related?
Comment 10•16 years ago
|
||
Comment on attachment 361385 [details] [diff] [review]
proposed fix
>+ /**
>+ * Tell thread to die - only call from IMAP thread
>+ *
>+ * @param aIsSafeToClose false if we're dropping a timed out connection.
>+ */
>+ void tellThreadToDie(in boolean aIsSafeToClose);
Are there supposed to be any public callers of this any more? I noticed a caller in nsImapIncomingServer::CloseConnectionForFolder which is invoked when a folder is renamed which looks as if it should switched. Additionally it might be better to rename the internal TellThreadToDie instead. This would mean that you don't have to change the interface ;-)
>+// this is to be called from the UI thread.
Worth an assertion?
> NS_IMETHODIMP
> nsImapProtocol::TellThreadToDie(PRBool isSafeToClose)
> {
> nsresult rv = NS_OK;
> // ** This routine is called from the ui thread and the imap protocol thread.
Fix the comment? (Also worth an assertion?) Also, I notice that all the callers pass PR_FALSE, except the new one that passes m_safeToCloseConnection; if you initialise that to 0 in the constructor then you can use it directly instead of passing it in explicitly.
Comment 11•16 years ago
|
||
Spot the ugly static cast :-(
Assignee | ||
Comment 12•16 years ago
|
||
I'll add assertions and fix the comments, and fix CloseConnectionForFolder...
Flags: blocking-thunderbird3+ → blocking-thunderbird3?
OS: All → Mac OS X
Hardware: All → x86
Target Milestone: Thunderbird 3.0b3 → ---
Reporter | ||
Comment 13•16 years ago
|
||
[01:51] bienvenu> nth10sd - the ui thread was doing things like this:
[01:51] bienvenu> #14 0x13d847dc in nsImapServerResponseParser::ParseIMAPServerResponse (this=0x160bb160, aCurrentCommand=0xfb87ca8 "DONE\r\n", aIgnoreBadAndNOResponses=0, aGreetingWithCapability=0x0) at /Users/skywalker/comm-central/mailnews/imap/src/nsImapServerResponseParser.cpp:233
[01:51] bienvenu> #15 0x13d5d609 in nsImapProtocol::ParseIMAPandCheckForNewMail (this=0x160bb000, commandString=0x0, aIgnoreBadAndNOResponses=0) at /Users/skywalker/comm-central/mailnews/imap/src/nsImapProtocol.cpp:1758
[01:51] bienvenu> the ui thread isn't supposed to read or write to the network
[01:51] bienvenu> and that can cause deadlock
[01:51] * nth10sd reads
[01:53] nth10sd> bienvenu: http://pastebin.mozilla.org/620853 has:
[01:53] nth10sd> #19 0x1495940d in MimeMessage_parse_eof (obj=0x1020a3b0, abort_p=0) at /Users/skywalker/comm-central/mailnews/mime/src/mimemsg.cpp:584
[01:53] * nth10sd wonders if there's a similarity in the "parse"s
[01:53] bienvenu> no, mime is only supposed to be called on the ui thread, and it is, in this case.
[01:54] nth10sd> oh
The key causes of this IMAP hang are the ParseIMAPServerResponse and ParseIMAPandCheckForNewMail functions. (Thanks bienvenu for patiently guiding me on IRC)
Assignee | ||
Comment 14•16 years ago
|
||
Attachment #361385 -
Attachment is obsolete: true
Attachment #361551 -
Flags: superreview?(neil)
Attachment #361551 -
Flags: review?(bugzilla)
Attachment #361385 -
Flags: superreview?(neil)
Attachment #361385 -
Flags: review?(bugzilla)
Updated•16 years ago
|
Attachment #361551 -
Flags: superreview?(neil) → superreview+
Comment 15•16 years ago
|
||
The code move looks safe, because calling Cancel on the mock channel will cause DeathSignalReceived to return true eventually anyway, but I'm not sure...
Attachment #361496 -
Attachment is obsolete: true
Assignee | ||
Comment 16•16 years ago
|
||
Comment on attachment 361741 [details] [diff] [review]
Castless version
Gary, could you try running with this version of the patch instead to see if it works as well? I like this version of the patch well enough, but I'm a little worried about removing the check for safeToCloseConnection after the sleep in ImapThreadMainLoop()...
Updated•16 years ago
|
Flags: blocking-thunderbird3? → blocking-thunderbird3+
Reporter | ||
Comment 17•16 years ago
|
||
(In reply to comment #14)
> Created an attachment (id=361551) [details]
> fix comments, add assertions, fix folder rename case
(In reply to comment #15)
> Created an attachment (id=361741) [details]
> Castless version
Bienvenu, so I'm confused, I've been testing your patch id 361551 for quite a few days and it seems to work properly. I haven't tested Neil's patch 361741.
Should I test them together, or are they mutually exclusive such that I should test Neil's patch stand-alone only?
Assignee | ||
Comment 18•16 years ago
|
||
They are alternatives, mutually exclusive, so you'd need to back out my patch and apply Neil's instead. I'm just trying to choose between them.
Reporter | ||
Comment 19•16 years ago
|
||
(In reply to comment #18)
> They are alternatives, mutually exclusive, so you'd need to back out my patch
> and apply Neil's instead. I'm just trying to choose between them.
Sure, I have to test over the next few days.
/me goes to prepare a new debug build..
Reporter | ||
Comment 20•16 years ago
|
||
(In reply to comment #15)
> Created an attachment (id=361741) [details]
> Castless version
>
> The code move looks safe, because calling Cancel on the mock channel will cause
> DeathSignalReceived to return true eventually anyway, but I'm not sure...
Neil, I have a compile error with your patch on the Mac:
chmod +x libimport.dylib
/Users/skywalker/objdir-tb/mozilla/config/nsinstall -L /Users/skywalker/objdir-tb/mailnews/import/build -m 755 libimport.dylib ../../../mozilla/dist/bin/components
: ../../../mozilla/dist/bin/components/libimport.dylib
make[3]: *** [libs_tier_app] Error 2
make[2]: *** [tier_app] Error 2
make[1]: *** [default] Error 2
make: *** [build] Error 2
real 48m19.073s
user 57m32.440s
sys 10m13.814s
(or is it just me?)
Reporter | ||
Comment 21•16 years ago
|
||
(In reply to comment #20)
> Neil, I have a compile error with your patch on the Mac:
I'm quite sure this is the case -- I did a reverse of Neil's patch and applied bienvenu's patch, this time the compile succeeded.
Assignee | ||
Comment 22•16 years ago
|
||
Comment on attachment 361551 [details] [diff] [review]
fix comments, add assertions, fix folder rename case
switching review to Neil in the hopes of landing this today...but if you don't want to review it, feel free to switch it back...
Attachment #361551 -
Flags: review?(bugzilla) → review?(neil)
Assignee | ||
Updated•16 years ago
|
Attachment #361551 -
Flags: review?(neil) → review?(bugzilla)
Assignee | ||
Comment 23•16 years ago
|
||
Comment on attachment 361551 [details] [diff] [review]
fix comments, add assertions, fix folder rename case
oops, standard8 thinks he might be able to look at this today.
Comment 24•16 years ago
|
||
Comment on attachment 361551 [details] [diff] [review]
fix comments, add assertions, fix folder rename case
>+ /////////////////////////////////////////////////////////////////////////
>+ // IsBusy returns true if the connection is currently processing a url
>+ // and false otherwise.
>+ /////////////////////////////////////////////////////////////////////////
nit: I know you're just re-indenting, but can you change this to /**\n * IsBusy...\n */ format so that it is picked up by doxygen properly (same to the rest of the comments you've re-indented). Note: I'm not fussed about full documentation at this stage, just properly formatting what's there.
>+ /**
>+ * Tell thread to die - only call from IMAP thread
>+ *
>+ * @param aIsSafeToClose false if we're dropping a timed out connection.
>+ */
>+ void tellThreadToDie(in boolean aIsSafeToClose);
This seems unnecessary as an extra function to the UI thread - do we really need it?
>+
>+ /**
>+ * signalThreadToDie - called from the UI thread
>+ *
>+ * @param aIsSafeToClose false if we're dropping a timed out connection.
>+ */
>+ void signalThreadToDie(in boolean aIsSafeToClose);
>diff --git a/mailnews/imap/src/nsImapProtocol.cpp b/mailnews/imap/src/nsImapProtocol.cpp
>+nsImapProtocol::SignalThreadToDie(PRBool aIsSafeToClose)
>+{
>+ NS_WARN_IF_FALSE(NS_IsMainThread(),
>+ "TellThreadToDie() should only be called from UI thread");
>+ nsAutoCMonitor mon(this);
>+
>+ nsCOMPtr<nsIMsgIncomingServer> me_server = do_QueryReferent(m_server);
>+ if (me_server)
>+ {
>+ nsresult rv;
>+ nsCOMPtr<nsIImapIncomingServer>
>+ aImapServer(do_QueryInterface(me_server, &rv));
>+ if (NS_SUCCEEDED(rv))
>+ aImapServer->RemoveConnection(this);
>+ m_server = nsnull;
>+ me_server = nsnull;
nit: indentation needs fixing.
Assignee | ||
Comment 25•16 years ago
|
||
this is a combination of Neil and my fixes - I left the static cast in because I'm not convinced we don't need it yet, and I'd like to put this in b2.
I changed the uuid because I changed TellThreadToDie to tellThreadToDie...
Attachment #362768 -
Flags: review?(bugzilla)
Comment 26•16 years ago
|
||
Comment on attachment 362768 [details] [diff] [review]
combo-fix
>+ /**
>+ * Tell thread to die - only call from IMAP thread
>+ *
>+ * @param aIsSafeToClose false if we're dropping a timed out connection.
>+ */
>+ void tellThreadToDie(in boolean aIsSafeToClose);
The documentation is meant to say UI thread now...
>+ NS_WARN_IF_FALSE(NS_IsMainThread(),
>+ "TellThreadToDie(aIsSafeToClose) should only be called from UI thread");
nit: only one space before UI (two places)
r=me with them fixed. Lets give this a go and see how it goes (I've been seeing it as well btw).
Attachment #362768 -
Flags: review?(bugzilla) → review+
Assignee | ||
Comment 27•16 years ago
|
||
Comment on attachment 362768 [details] [diff] [review]
combo-fix
carrying forward Neil's sr
Attachment #362768 -
Flags: superreview+
Assignee | ||
Comment 28•16 years ago
|
||
OK, Gary, I hope you can go back to release builds with b2 :-)
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 3.0b2
Comment 29•16 years ago
|
||
Since this patch landed, both SeaMonkey and Thunderbird3.0 fail mozilla/_tests/xpcshell-simple/test_imap/unit/test_bug460636.js with that message:
*** TEST-UNEXPECTED-FAIL | e:/buildbot/win32-comm-1.9.1-check/build/mozilla/testing/xpcshell/head.js | SaveMessageToDisk did not complete within 10 seconds (incorrect messageURI?). ABORTING.
I don't think Thunderbird can release a b2 or SeaMonkey an a3 with unit tests failing on Windows...
Assignee | ||
Comment 30•16 years ago
|
||
I'm looking into the test failure - I can tweak the test so that it doesn't fail - the thing the test is checking actually succeeds - it's the removal of the saved file that fails, which I'm trying to figure out.
Comment 31•16 years ago
|
||
Comment on attachment 361551 [details] [diff] [review]
fix comments, add assertions, fix folder rename case
I believe this patch is now obsolete.
Attachment #361551 -
Flags: review?(bugzilla)
Reporter | ||
Comment 33•16 years ago
|
||
Fixed as far as I can tell in my <24h testing of b2 in Mac.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•