Closed Bug 390036 Opened 17 years ago Closed 14 years ago

https webdav based ICS or CalDAV calendar and secure IMAP/SMTP (SSL) email accounts leads to TB hanging with 50% cpu usage


(Core :: Security: PSM, defect)

1.8 Branch
Not set





(Reporter: chabrie, Assigned: KaiE)



(Keywords: hang, qawanted, relnote)


(2 files)

User-Agent:       Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04506; InfoPath.2; FDM)
Build Identifier: TB Version (20070716) - Lightning 0.5


4 imap based mail accounts with SSL connection to the mailserver and two webdav based ics calendar files (around 500KB) with https connection to CMS server (Internet with dsl 16M/1M connection). 


Reproducable timing bug in Lightning leads to endless CPU usage of 50% (Dual Core), if a calendar action (PUT/GET) is started at the same time like mail transfer. No more calendar actions useable until killing TB process.

Reproducible: Always

Steps to Reproduce:
1. Insert an event to the webdav calendar with an invitation via email
2. If the mail window is closed to early (to send the invitation), the thunderbird process gets 50% cpu until killing the process. No calendar actions are useable until reopening thunderbird after killing process.
3. Deactivating all security features of mail transfer (no SSL IMAP/SMTP), the problem does not occur.
Actual Results:  
Same results with TB portable and Lightning 0.5. add-on. Sunbird does not have the problem, because of separate threads and missing email invitation functionality. Sometimes the webdav based calendar file is corrupt after this bug.
Do you see any error messages in the Error Console?
Version: unspecified → Lightning 0.5
I see the following logs in the error console:

gCacheStyleSheet is not defined
in Zeile 111


standard has no properties
in Zeile 167 

If anybody needs an imap account and a webdav account for testing, I see no problem. We could reproduce the error oon several systems.
Frank, does the issue still exists using Lightning 0.7 Release Candidate 1?

The last error could be related to Bug 396580 or Bug 396873.
Today I got the latest Lightning 0.8 RC2 (have been using 0.7 before) and have a very similar problem as Frank has:

Using an IMAP SSL Account together with an https caldav calendar makes Thunderbird stuck with 50% (dual core) CPU load. If I disable SSL in the IMAP account everything works fine. Unlike Frank, I can close my Thunderbird regulary. It just does not retrieve any email and calendar information. The error console does not report anything. Also, the problem will first come up when you restart Thunderbird (if you changed the settings). For instance, if you add a new https calendar into lightning then this new calendar will work until you restart Thunderbird.

This problem does not occur in Lightning 0.7 only with newer 0.8 RC2.
I see a similar problem on a friend's computer, but in this case she's using an https connection to the webdav server and a secure SSL connection to the POP3 server (not IMAP).  As Frank said, it seems to happen when she changes the calendar during a timed mail check.  The timed mail checks happen every five minutes so she sees this problem at least once per day.  In this case it doesn't matter if it's a meeting invite (she doesn't send or receive them).

The CPU was at 100% and I had to kill Thunderbird.  Before I killed it I checked the current internet connections using Process Explorer.  Thunderbird had open connections to both the POP3 server and the webdav server.  The connections were stuck in the "Close_Waiting" state.  Both servers are at FastMail.

This is on a single core 1.0 GHz CPU (WinXP), a high-speed internet connection, and the ICS file is over 100 KB.  I forgot to check the Error Console but it's too late now because she wasn't happy so I switched her calendar to a local ICS file.  She's using the 0.7 release version of Lightning and Thunderbird
Multiple users have reported this issue at Oracle. See:

Using Process Explorer we were able to identify the thread that was taking all
the CPU (50% on a Core2 Duo) in thunderbird.exe and look at the thread stack.
From what we've seen the thread was looping in
nsPrintSettings::GetStartPageRange. I don't know if that makes any sense or
Confirming per duplicates.
Ever confirmed: true
Keywords: hang
Summary: https webdav based ICS calendar and secure IMAP/SMTP (SSL) email accounts leads to TB hanging with 50% cpu usage → https calendar and secure IMAP/SMTP (SSL) email accounts leads to TB hanging with 50% cpu usage
Version: Lightning 0.5 → unspecified
This seems like a pretty serious failure mode; requesting blocking.
Flags: blocking-calendar0.9?
Summary: https calendar and secure IMAP/SMTP (SSL) email accounts leads to TB hanging with 50% cpu usage → https webdav based ICS or CalDAV calendar and secure IMAP/SMTP (SSL) email accounts leads to TB hanging with 50% cpu usage
Version: unspecified → Lightning 0.5
Flags: blocking-calendar0.9? → blocking-calendar0.9+
OS: Windows Vista → All
Hardware: PC → All
Version: Lightning 0.5 → unspecified
I tried to reproduce this, but from reading the previous comments it seems this issue only happens under special circmstances so its quite obvious that I couldn't. I tried the following:

* gmail imap server
* my webdav https server

- accept an event from the gmail server into the webdav calendar
--> Fails, but without error. Might be something else, no hang

- read mail, add/modify events
--> No error, no hang

Is this maybe OS dependant? Are there any sure-fire steps to reproduce this?
Maybe it depends also from the IMAP Sever itself. For me it happens as soon as I have SSL enabled for both, calendar and imap. It does not matter in which order I enable SSL (first imap or first calendar). It hangs immediatley after restarting TB while connecting to the imap/calendar server. Can be reproduced on Win XP 32Bit and Ubuntu 64bit
What type of IMAP server do you use? What sort of webdav server do you use? Is the IMAP server on the same host as the SSL webdav server? What

I used gmail as imap and apache mod_dav for webdav/
(In reply to comment #13)
In my case
> What type of IMAP server do you use? 
What sort of webdav server do you use? 
> Is
> the IMAP server on the same host as the SSL webdav server? 
(In reply to comment #13)
> What type of IMAP server do you use?
Don't know. Host ist for example

>What sort of webdav server do you use?
DaviCal Caldav server

>Is the IMAP server on the same host as the SSL webdav server?
See bug 422618 comment 25 for steps on how to reproduce.
the 50% busy sounds like a dual core computer with one cpu busy at 100% ?

although highly unlikely, just to be sure, can you trace the IMAP code and check whether it is doing additional i/o in the ssl test case?

can you use a packet sniffer and check whether there is constantly data being transfered, or whether the connection is idle?

if there is constantly data being transfered, one might use a tool like ssltap to snoop the ssl traffic to see what's going on.

is this specific to windows or happening on all platforms?

I know the SSL thread is doing a busy wait on SSL I/O under certain circumstances, if NSPR is unable to create a loopback socket for implementing the nspr pollable event
(In reply to comment #13)
> What type of IMAP server do you use?
Exchange server imaps

>What sort of webdav server do you use?
DaviCal Caldav server

>Is the IMAP server on the same host as the SSL webdav server?
(In reply to comment #17)
> the 50% busy sounds like a dual core computer with one cpu busy at 100% ?

Judging similar reports in bugzilla it seems like this. 
> I know the SSL thread is doing a busy wait on SSL I/O under certain
> circumstances, if NSPR is unable to create a loopback socket for implementing
> the nspr pollable event

Shouldn't the busy wait be something like 15 seconds max as a server-setting? I see comment 18, comment 14 and comment 15 (the three confirmed setups) all use Davical. Judging ssl-encryption doesn't work for Davical?
(In reply to comment #19)

> Judging ssl-encryption doesn't work
> for Davical?

Davical use the mod-SSL of Apache to support ssl-encryption (Davical is in PHP and use Apache)
(In reply to comment #17)
> the 50% busy sounds like a dual core computer with one cpu busy at 100% ?
I confirm that in my computer the 50% are for one core at 100% 
> Davical use the mod-SSL of Apache to support ssl-encryption (Davical is in PHP
> and use Apache)

I thought it had to be something like this :-) I see Maxime uses DaviCal 0.9.1, I wonder wether the version of the DaviCal server makes a difference. Bernard, which server does Oracle use?  Jaap and Christoph, which versions do you use? As bug 416239 should be solved for caldav since 25-07, do you still see this with a recent nightly?
I use Lightning build 2008073119, Apple's CalendarServer SVN from 2008-01-11 on a Debian box, and Courier as IMAP, with both IMAPS and CalDav's https - and I see the same problem, in about 80% of all starts of thunderbird. AFAICS CalendarServer does not use Apache internally.
(In reply to comment #21)
> Bernard, which server does Oracle use?

We're using both the CalDAV and IMAP servers part of Oracle Beehive.
>Jaap and Christoph, which versions do you use?
>As bug 416239 should be solved for caldav since 25-07, do you still see this
>with a recent nightly?

ii  rscds              DAViCal CalDAV Server

No errors on error console

100% CPU on single core box 50% on dual core  

Very unresposive but not totaly dead  99% CPU sometimes

Oh sorry Lightning version  Nightly build 0.9.pre  2008073119
For CalDAV users, with recent nightlies there are two preferences that you could set, calendar.debug.log and calendar.debug.log.verbose, that will significantly increase the amount of data being logged; that might help getting to the bottom of this. Also, as Kai suggested in comment #17, it would be useful if someone could wireshark a machine with this problem to see if there is network traffic associated with the CPU load. His question as to platform is also a good one - does this happen on Windows only?
I created the two preference settings as integers in the option's config editor, with their value set to 10000 (the more the better? :-) Where should I see the output? There is not a single line in the error console, even though it's set to display "All".

I noticed that the same issue shows up without network connection, i.e. with all network interfaces turned off. Windows (and yes, I have only tested on windows so far) does not show any network traffic on its interface.

Would you know of of a thunderbird binary with debug symbols? It should allow me to attach a debugger and tell you what keeps my CPU busy. Without I can basically only tell you what dlls the 17 threads are in.
Sorry not to have been more explicit: those prefs want to be booleans set to 'true'. The .verbose one is not in 0.8; you'll want to use a fairly new nightly to take advantage of it.
FWIW I've not been able to reproduce this on Linux - and I've tried.
I don't know where one could find a debug binary short of building one.
Updated to the agust 4th Nightly

I've been running with .verbose for a few hours now.

For some reason I can't reproduce the error anymore though Thunderbird is eating CPU cycles every time I touch anything calendar like, It gets sluggish but no  more 100% usage.

Nothing out off the ordinari in the logging. Even eating 40% cpu with no extra loging.

OK, got it (well, the debug output :-) working, thanks. The last two messages I see are

CalDAV: recv: <?xml version='1.0' encoding='UTF-8'?>
<multistatus xmlns='DAV:'>
      <status>HTTP/1.1 200 OK</status>
      <status>HTTP/1.1 200 OK</status>
      <status>HTTP/1.1 200 OK</status>
CalDAV: send: <?xml version="1.0" encoding="UTF-8"?>
<calendar-multiget xmlns:D="DAV:" xmlns="urn:ietf:params:xml:ns:caldav">
  <href xmlns="DAV:">/calendars/users/axel/calendar/ba61862e-6ec9-4165-94ca-2fffe5a0a11f.ics</href>
  <href xmlns="DAV:">/calendars/users/axel/calendar/eb72ddf8-a764-42f9-aaab-ebecafe2f19c.ics</href>
  <href xmlns="DAV:">/calendars/users/axel/calendar/b650acf2-4e2d-4a9a-8c5f-732f0e15d1ba.ics</href>
  <href xmlns="DAV:">/calendars/users/axel/calendar/f65900c6-4e1a-4b92-9129-6e196a660775.ics</href>

After that it hangs with 100% CPU (on one core). So that probably doesn't help.

I now have a x86_64 debug build for linux - which doesn't show the problem :-( So to me it looks like it's windows only. I'll post my findings once I have the debug build for windows.
Axel, if it's really Windows only, that might support my theory.
Do you have some Security Firewall enabled on your Windows system, that does prevent Firefox from opening a server socket on the loopback device?
No firewall but the Windows one. Disabling it has no effect; thunderbird still hangs. Process Explorer tells me that thunderbird has opened four local sockets (say port 2283-2286). 2283 connects to 2284, 2285 to 2286. All connections are in the "established" state. That's the case both when thunderbird and lightning work and when they hang.
I had not realized (until now) that we're talking about a real hang, I had assumed we just waste the cpu cycles.

So we've got a real deadlock, and that's not likely to be related to the busy wait I have mentioned.

Ideally someone who is would attach a debugger and get stack traces of all threads.
I don't think this issue is Windows-only, see bug #428522. It happens on my Linux system.

IMAP servers are Courier and SurgeMail, CalDAV server is Bedework.
Thread 2892 eats up all the CPU time.
Comment on attachment 333109 [details]
backtrace of all threads when TB hangs

Sorry for the long wait. Attachment #333109 [details] is the stack trace for all threads; for those without symbols I only quoted one stack frame (without symbols :-). The thread eating up the CPU time is thread 2892.

I don't see nsSocketTransportService::Run()'s variable "active" ever becoming false, so the while loop starting at line nsSocketTransportService2.cpp:532 is never left and instead runs continuously. But maybe that's the plan and instead it's some WaitFor123manyObject that fails - I didn't really look at the code yet.

Please let me know what variables you care about.
Whiteboard: [needs patch]
I believe mvl told me he'd be looking at this, so I'm taking the liberty of reassigning to him.  If I misunderstood, please let me know.
Assignee: nobody → mvl
I can't reproduce this, so it's hard for me to fix. Besides, I won't have time to work on this for at least a few days.
Assignee: mvl → nobody
Axel, thanks for the stacks. I guess you are using the latest 1.8.1 code (MOZILLA_1_8_BRANCH) ?
Yes; both thunderbird and the calendar are MOZILLA_1_8_BRANCH.
Axel, yes I think it is intended that the while(active) loop in nsSocketTransportService2 is never left.

You say that loop is consuming all CPU while you are experiencing the deadlock.
cc'ing biesi who is our most experienced active developer of that code.

Maybe we can find a way to use logging that will tell us why that socket code is constantly active, rather than waiting for data to dispatch.
Christian, please see my comment 41. We experience a deadlock in socket transport.
Does the issue also exists on Trunk, i.e. testing with Shredder Alpha 2 and a Lightning 0.6a1 nightly build?
I did not test the trunk yet.

I poked around the code a bit, and this is what I believe happens: lightning's (SSL) socket is polled on read | write. I see that in nsSSLThread.cpp requestPoll(), the socket currently serviced by the thread is the IMAPS socket, so the switch on si->mThreadData->mSSLState gets evaluated. We hit ssl_idle. The problem is that si->mThreadData->mOneBytePendingFromEarlierWrite is true, so the poll returns immediately claiming that the lightning socket has something to work on, and resetting all other sockets' poll status.

The poll result "write" is now handled by trying to write to the Lightning SSL socket. That fails, though, because the IMAPS socket is blocking the SSL thread. Here is the relevant backtrace of the failure:

>	pipnss.dll!nsSSLThread::requestWrite(nsNSSSocketInfo * si=0x04a66e68, const void * buf=0x04e7c9b5, int amount=323)  Line 734	C++
 	pipnss.dll!nsSSLIOLayerWrite(PRFileDesc * fd=0x03d6fa90, const void * buf=0x04e7c9b5, int amount=323)  Line 1351 + 0x11 bytes	C++
 	nspr4.dll!PR_Write(PRFileDesc * fd=0x03d6fa90, const void * buf=0x04e7c9b5, int amount=323)  Line 146 + 0x14 bytes	C
 	necko.dll!nsSocketOutputStream::Write(const char * buf=0x04e7c9b5, unsigned int count=323, unsigned int * countWritten=0x022dfe0c)  Line 550 + 0x12 bytes	C++
 	necko.dll!nsHttpConnection::OnReadSegment(const char * buf=0x04e7c9b5, unsigned int count=323, unsigned int * countRead=0x022dfe0c)  Line 524 + 0x26 bytes	C++
 	necko.dll!nsHttpTransaction::ReadRequestSegment(nsIInputStream * stream=0x046e0b80, void * closure=0x049cee60, const char * buf=0x04e7c9b5, unsigned int offset=0, unsigned int count=323, unsigned int * countRead=0x022dfe0c)  Line 405 + 0x1c bytes	C++
 	xpcom_core.dll!nsMultiplexInputStream::ReadSegCb(nsIInputStream * aIn=0x04c554c8, void * aClosure=0x022dfe10, const char * aFromRawSegment=0x04e7c9b5, unsigned int aToOffset=0, unsigned int aCount=323, unsigned int * aWriteCount=0x022dfe0c)  Line 288 + 0x29 bytes	C++
 	xpcom_core.dll!nsStringInputStream::ReadSegments(unsigned int (nsIInputStream *, void *, const char *, unsigned int, unsigned int, unsigned int *)* writer=0x002ff4f0, void * closure=0x022dfe10, unsigned int aCount=323, unsigned int * result=0x022dfe0c)  Line 240 + 0x22 bytes	C++
 	xpcom_core.dll!nsMultiplexInputStream::ReadSegments(unsigned int (nsIInputStream *, void *, const char *, unsigned int, unsigned int, unsigned int *)* aWriter=0x02023ee0, void * aClosure=0x049cee60, unsigned int aCount=4096, unsigned int * _retval=0x022dfea4)  Line 245 + 0x28 bytes	C++
 	necko.dll!nsHttpTransaction::ReadSegments(nsAHttpSegmentReader * reader=0x049f1670, unsigned int count=4096, unsigned int * countRead=0x022dfea4)  Line 430 + 0x2b bytes	C++
 	necko.dll!nsHttpConnection::OnSocketWritable()  Line 559 + 0x1e bytes	C++
 	necko.dll!nsHttpConnection::OnOutputStreamReady(nsIAsyncOutputStream * out=0x049f1900)  Line 770 + 0xb bytes	C++
 	necko.dll!nsSocketOutputStream::OnSocketReady(unsigned int condition=0)  Line 490	C++
 	necko.dll!nsSocketTransport::OnSocketReady(PRFileDesc * fd=0x03d6fa90, short outFlags=2)  Line 1474	C++

Writing returns with an error code due to this piece of code (the line numbers in above backtrace are +/- a few due to my debug output, so I'm posting it here):

        if (some_other_socket_is_busy)
          return -1;

That keeps my CPU pretty busy: mOneBytePendingFromEarlierWrite is constantly true, and so the Lightning socket constantly tries to write and constantly fails.

I did not manage to identify the IMAPS socket in the list of sockets to poll. Do you have a hint where mOneBytePendingFromEarlierWrite should get reset, and why it might stay true without the IMAPS socket owner caring?

I see that two threads are communicating with the IMAP server, both are in nsPipeInputStream::Wait(). One wants to know the quota of the INBOX, the other wants to know the capabilities. The stack traces I posted earlier only show the latter (see ParseIMAPandCheckForNewMail() which issues "1 capability"), so the other connection is probably not significant.

[Sorry for the lengthy post, but it's a pretty complex context and I don't know the code well enough to be able to judge what's obvious and what isn't.]
I cannot answer Stefan's question on Shredder a2. Installing the lightning build from 2008-08-17 with a newly installed Shredder alpha 2 build fails with "Lightning 0.6a1 could not be installed because it is not compatible with Shredder 3.0a2." Thunderbird and Shredder are in different install locations and they use different, separate profiles.

Isn't this supposed to be working or am I doing something obviously wrong? Should I be using a Thunderbird nightly trunk build instead of alpha2 because of bug 448753?
Removing bug from the blocking list, since there's no solution at the horizon and it's still not reproducable for most developers. We really feel sorry about that, but please understand that we need to move on.
Moreover I don't yet see this is something we could fix in calendar-land, but it looks like a necko/platform bug to me, thus a fix would presumably require a thunderbird update.
Flags: blocking-calendar0.9+ → wanted-calendar0.9+
Whiteboard: [needs patch]
We should add this bug to the release notes then.
Keywords: relnote
Axel, thanks a lot for your very helpful analysis.

Let me describe the scenario, combined with SSL/PSM state

- application code talks to the PSM I/O layer, 
  which talks to the NSS libSSL I/O layer

- the described bug happens after libSSL has signalled a short write,
  some bytes not yet flushed out to the socket

- libSSL expects that we call "write" again, 
  giving it a chance to flush

- when we arrive in this state, 
  PSM reports to the application level "-1 bytes written, would block"

- when the application level calls (write) again,
  PSM calls into libSSL, trying to flush

  I don't know what happens if libSSL is still unable to flush,
  as it appears to be in this bug scenario.

  I suspect it will tell us "would block".

- when the application level polls, while we are in this
  "short write, need flush" mode,
  the PSM layer will always signal "writeable" to the application level.

  It's done this way, because apparently when I wrote the code,
  I didn't know of a way to ensure we'll wake up, once the socket becomes
  writable again.

- Axel tells us, the application code constantly polls and attempts to write
  never succeeding, resulting in a deadlock.
So, it was necessary for me to write the previous paragraph, in order to refresh my memory about how the SSL interaction works.

Now I've seen Axel's statement, which is describing the cause for the deadlock:

- the IMAPS socket blocks the SSL thread
  (I don't know yet how this can happen)

- the calendar code tries to write SSL data to the calendar server,
  but fails, because our SSL thread currently only has a single worker.
  If the SSL thread is blocked on a read/write call, then other application
  requests for reading/writing SSL get rejected (postponed) with "would block"

The next step is, we must understand why the SSL thread is blocked by the IMAPS thread.
I wish that PSM didn't limit itself to a single thread for SSL.  
libSSL certainly doesn't impose that limiation.
The SSL worker thread is designed in a way that would allow for additional worker threads, someone just have to find the time to write the additional code.

The decoupling into the SSL worker thread had been necessary, in order to allow us to callback into necko, while we are blocked in libssl, waiting for ocsp results.

When I implemented that decoupling I had decided to not increase the complexity of that development project further, and decided to postpone the introduction of multiple worker threads until necessary.

This is the first bug I've seen that really requires us to have more threads. Well, assuming that the analysis is correct.

David Bienvenu: On IRC you said, the IMAP code might do blocking I/O. You pointed me to function nsImapProtocol::CreateNewLineFromSocket(), which I indeed can see in the attached list of stacks.

I followed that code to nsPipeInputStream::Wait which says it is waiting for a pipe.

What kind of pipe is that? Is it the input socket/fd, or is it some helper pipe?
It's just a pipe on the input stream, see nsImapProtocol::SetupWithUrl. We create a transport on the io socket:
        rv = socketService->CreateTransport(&connectionType, connectionType != nsnull,
                                            *socketHost, socketPort, proxyInfo,

and then we open an input stream on that transport:
          rv = m_transport->OpenInputStream(nsITransport::OPEN_BLOCKING, 0, 0, getter_AddRefs(m_inputStream));
I'm not familiar with TB or Lightning's internals, but I think I might have a repeatable test case for this situation. See below. Let me know if I can help with testing a fix.

Test case:
When I fire up Thunderbird with Lightning enabled, it always eats 50% of my dual core directly after starting up. This situation sometimes goes away after about 10-20 seconds, but sometimes not. It seems to be a race condition in Lightning's initialization.

In the first case, TB is blocked for the first 10-20 seconds. Mouse clicks are hardly accepted, and the only thing to do seems to be just wait until the 50% CPU use blows over. After that TB/Lightning are usable as one would expect.

In the latter case, TB is blocked completely. No mouse clicks are handled anymore, but I can close TB and the 50% CPU use goes down again.

Both cases seem to occur about 50% of the time.

TB (from Ubuntu repository)
Lightning 0.9pre nightly (27 Aug build)
Plain IMAP over 143, which "Use TLS if available" checked
8 CALDAV calendars via HTTPS, using a Davical backend
After some playing with my settings, I found that using plain IMAP (ie. switching off "Use TLS if available" and selecting "Never") makes the lock-ups disappear. Lightning still blocks Thunderbird's thread though in the first 10-20 seconds with 50% CPU use, but after this period all CalDav data is loaded and Thunderbird becomes responsive and usable.
I am seeing this problem too. I see 100% CPU usage; I presume that the OP reports 50% CPU because they have dual processor of some sort.

I just installed lightning 0.9 rc2 on Ubuntu Hardy and this problem still exists. I have two IMAP servers, one is secure (SSL) and the other is insecure. The calendar is secure (SSL) webDAV to a server in the same domain as the secure IMAP server (

Would have been great if this could be fixed for 0.9...
I have also been experiencing the same problem (100% CPU usage) since Lightning ver. 0.8 and still having it with ver. 0.9.
When I start Thunderbird (TB) in a PC, TB hangs for a while (10-15 minutes) and the problem occurs periodically.  When TB hangs, TB-Lightning is communicating with Google site (it was reported by VirusBuster).
I tested it in the safe mode, but the problem continued appearing.  So it seems that other programmes (anti-virus, etc.) are irrelevant to the problem.
I tested in other two computers with the same set of calendars; the problem did not come up.  All three computers' operating systems are Windows XP SP2.  The main difference among them concerning TB is the fact that the profile of the first computer is much larger and more complicated with many multi-layered folders.
Therefore, after creating another profile in the first PC, I tested with the same set of calendars and found no problem.  But when I copied the messages to the new profile in the first PC, the problem began occurring.
Another test was to deactivate some calendars.  The problem discontinued when I deactivated the largest calendar.  I can use safely other smaller calendars.

Judging from these, I suspect the followings:
- This problem happens only with a certain type of profile and a certain type of calendar.
- Such profiles must have many messages and multi-layered folder structure, or other conditions.
- Such calendars must have many items.

Hopefully this information is useful.  I will be happy to provide further details if needed.
Last week 4 users received Lightning-0.9-win.
Also 99% CPU usage single core / 50% CPU usage dual core.
Thunderbird version and on Win XP SP2.
Removing Lightning solved the CPU-load problem completely.
Reinstalling initially appeared to run fine, but the problem
came back at for us not definabele times.
We use Kerio Mail Server for IMAPS and Calendar.
No indications found in the logs there.
As a measure, we asked our user not to accept the 0.9 version for now.

I am happy to provide more details if productive.
Flags: blocking-calendar0.9+
(In reply to comment #43)
> Does the issue also exists on Trunk, i.e. testing with Shredder Alpha 2
> and a Lightning 0.6a1 nightly build?

Did someone tried to retest using Trunk builds as requested? Matching test builds can be found at <> and <>.
(In reply to comment #59)
> (In reply to comment #43)
> > Does the issue also exists on Trunk, i.e. testing with Shredder Alpha 2
> > and a Lightning 0.6a1 nightly build?
> Did someone tried to retest using Trunk builds as requested? Matching test
> builds can be found at
> <>
> and
> <>.

Just installed both of them, removed all other add-on's.
I will report as soon as anything stange occurs. Nico
(In reply to comment #60)
> (In reply to comment #59)
> > (In reply to comment #43)
> > > Does the issue also exists on Trunk, i.e. testing with Shredder Alpha 2
> > > and a Lightning 0.6a1 nightly build?
> > 
> > Did someone tried to retest using Trunk builds as requested? Matching test
> > builds can be found at
> > <>
> > and
> > <>.
> Just installed both of them, removed all other add-on's.
> I will report as soon as anything stange occurs. Nico

Attempt to open Junk E-mail gave the following error:
Unable to open the summary file for Junk E-mail. Perhaps there was an error on disk, or the full path is too long.

Later several other folders had the same problem, exit & restart did not solve this.

Therefore I went back to Thunderbird- and Lightning-0.9 for now. Nico
(In reply to comment #61)
> (In reply to comment #60)
> > (In reply to comment #59)
> > > (In reply to comment #43)
> > > > Does the issue also exists on Trunk, i.e. testing with Shredder Alpha 2
> > > > and a Lightning 0.6a1 nightly build?
> > > 
> > > Did someone tried to retest using Trunk builds as requested? Matching test
> > > builds can be found at
> > > <>
> > > and
> > > <>.
> > 
> > Just installed both of them, removed all other add-on's.
> > I will report as soon as anything stange occurs. Nico
> Attempt to open Junk E-mail gave the following error:
> Unable to open the summary file for Junk E-mail. Perhaps there was an error on
> disk, or the full path is too long.
> Later several other folders had the same problem, exit & restart did not solve
> this.
> Therefore I went back to Thunderbird- and Lightning-0.9 for now. Nico

After going back to Thunderbird and Lightning 0.9, I got again 99% CPU. Stop/start TB again seemed to solve it initially, but the hourglass appeared while cursor over folder-browser or message-list, not during cursor-over message itself. Switching back- and forth between Calendar and Email made the hourglass disappear.

I have no insight in the code, but hope anyway my contributions are helpfull.
Flags: wanted-calendar1.0+
Flags: wanted-calendar0.9+
Flags: blocking-calendar1.0?
Flags: blocking-calendar0.9+
Does anyone know if the new 'CACHE (experimental)' feature allows one to side-step this bug?

This bug is a really major issue for my use of Lightning -- basically any day that I have a calendar reminder coming up when I first launch my email, I get the 100% CPU problem and I have to kill Thunderbird. Then I have to relaunch it, and wait a few minutes before I dismiss or snooze the reminder (I guess so that all other SSL connections can do their stuff, since only one SSL connection at a time has been implemented, IIUIC).
Caching triggers an initial sync (with network load) on startup. Thus it's unlikely to be a cure.
How about some way of delaying the intial sync by say a couple of minutes - is there perhaps some hidden configuration option for that?
At Stefan:
I tried the Thunderbird and Lightning trunk builds on 3 machines: my home workstation, work laptop and work workstation. All 3 perform significantly better. After startup there is a small blockage of around 10 seconds, but after the calendars are loaded, everything works smoothly. And much more snappy I must say.

The only annoyance of the trunk build is the fact that the CALDAV authentication dialogs keep popping up (passwords are stored correctly, so clicking OK works). Since this is a minor issue compared to the 50% CPU usage, I will stick to Shredder for the coming period.
I am experiencing this bug, too. (Thunderbird 2 and Lightning 0.8+)

Today I tried the trunk builds, and they worked for me. Pretty fast load times and no problems with hanging or cpu load. I tried this with Google Calendar and the new Google CalDAV implementation, both work fine.

CalDAV authentication dialogues keep popping up for me too, "native" Google Calendar does not prompt for passwords.
Does this bug closed on trunk ?
I don't know. In trunk I see similar behaviour, it is just not as disturbing. Although I do not know the code, what I think happens is the following:

1. TB starts up
2. Lightning starts downloading my CalDav calendars in parallel (I have 7 of them)
3. Every time a calendar has been downloaded, it loops through the list of appointments and starts processing them
4. TB operates as normal

For larger calendars Step 3 can take up to 10-15 seconds, even on a Core Duo. During that time TB blocks completely (Compiz turns it into black&white). I assume thereforce that Step 3 is executed in the main thread, blocking the main window handling. After the processing, TB colorizes again and works as normal. This process only happens at TB startup.

I hope this makes sense for anyone familiar with the code.
Thanks for your comment
These current bug seems fixed in the trunk
but what you observe and that I observe as well let me think that lightning doesn't perform enough to be used my company
I've reproduced this for debug build. It seems like IMAP operation acquires ssl_thread_singleton->mBusySocket blocker (see security/manager/ssl/src/nsSSLThread.cpp) and never releases.
Paul, was that on trunk or branch?

Bernard/Frank, are you guys still seeing this with trunk builds?
Assignee: nobody → kaie
Component: Lightning Only → Security: PSM
Flags: wanted-calendar1.0+
Flags: blocking-calendar1.0?
Product: Calendar → Core
QA Contact: lightning → psm
Whiteboard: [tb3needs]
Version: unspecified → Trunk
Well, actually, it was reproduced for Thunderbird sources.
Paul, can you try a trunk build (either a nightly, or from source) and see if you can reproduce it there?
I experienced this problem also. I tried out Thunderbird 3 Beta 1
"Mozilla/5.0 (Windows; U; Windows NT 6.0; de; rv:1.9.1b3pre) Gecko/20081204 Lightning/1.0pre Thunderbird/3.0b1"
and a corresponding Lightning build (probably from the same day) and had no problems whatsoever.
I apologize, I do not have enough time to try a trunk sources. Here the exact steps to reproduce the bug:

0. Ensure that your imap & webdav accounts are using secure connection.
1. Start continuous imap operation, say synchronize for offline.
2. Try to upload a huge file to webdav server while imap operation is in progress.
3. Here it is. Enjoy :)
I have XPSP3. When I start thunderbird (connecting to IMAP with SSL) with lightning (connecting to google cal via HTTPS) the program often hangs with 100% CPU utilization. I can restart thunderbird in safe mode, and it will successfully download my email. I then close and restart in normal mode, and it works OK. On a different computer, also with XPSP3 but much newer with more memory etc, I connect to the same email and google calendars, and I haven't encountered any problems so far.
I think I had proposed that one solution is to change the IMAP/mail code implementation to no longer use any blocking I/O. I guess such a change has not happened, or is probably unlikely to expect.

The other solution is to change the PSM code to use multiple SSL threads, instead of just one.
yes, it's unlikely that we're going to rewrite the imap code to use non-blocking i/o.
Given that NSS itself does not impose any single-threaded limitations on 
users of SSL, and allows many threads to simultaneously do SSL, the fact
that some Mozilla code (which I gather is PSM) imposes a single-thread 
limitation on the use of SSL is rather disappointing.  I am willing to 
work with Kai or anyone to remove that limitation.
(In reply to comment #80)
> Given that NSS itself does not impose any single-threaded limitations on 
> users of SSL, and allows many threads to simultaneously do SSL, the fact
> that some Mozilla code (which I gather is PSM) imposes a single-thread 
> limitation on the use of SSL is rather disappointing.  I am willing to 
> work with Kai or anyone to remove that limitation.

Long story, caused by Mozilla's networking code being single-threaded, and the need to allow a callback into http, while blocked on ssl (for ocsp).

When I implemented the fix to allow proxied ocsp requests (by allowing to call back into mozilla network code, and at the same time decoupling from the network layer), I had implemented a quite complicatedd patch.

At that time, in order to avoid additional complexity, I went with a single SSL worker thread.

Now the time has come to extend that to a pool of threads. During the last 2-3 days I worked on a patch, I'm mostly done, but I need to review my own code and identify a bug.

Problem is, this patch will change the core of PSM. It's not a mail patch, but a core patch. So you'd have to use a version of core gecko that contains this patch...
Are you still able to reproduce this with Thunderbird 3 and Lightning?

If I understand correctly, this bug is triggered when using both MAIL/SSL and CALENDAR/SSL.

While I saw this problem with TB 2, I can currently not reproduce with TB 3.
Can you?
How can I check it? Have no Lightning working at all:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b3pre) Gecko/20090211 Shredder/3.0b2pre (as of yesterday)

- Lightning 1.0pre (build 20081104031354) 
- Provider for Google Calendar 0.6pre (Requires additional items, can live w/o this)
- Quicktext

Not compatible with Shredder 3.0b2pre
No updates found.
(In reply to comment #83)
> - Lightning 1.0pre (build 20081104031354) 

Try a current nightly build instead of sticking with an old one:
Thanks Stefan!

Silly me.
Just installed nightly build.
Works so far.
Are you saying the problem is gone?
Attached patch Patch v1 (deleted) — Splinter Review
This is my first attempt to get the multiple worker threads implemented.

It seems to work for me, but unfortunately I see a crash when having the flash plugin installed, so I suspect this patch needs some more reviewing to find the bug.

But before we try to get this in, we must have reliable steps to reproduce the original deadlock/hang problem with Thunderbird 3 nightlies.
(In reply to comment #86)
> Are you saying the problem is gone?
Not yet - have to setup IMAP folder first.
Those problems described in bug 444537 and bug 458690 are not visible anymore
for me, but definitely were connected to SSL issues.
I would appreciate testing and feedback from someone who is able to reproduce this bug with TB 2.

I have been running Thunderbird 3 test versions, I have 3 IMAP/SSL accounts configured, I have 3 remote https calendar configured.

I configured all mail accounts to check for new mail every 1 minute, and to reload all remote calendars every one minute.

What I can see are slowdowns when using Thunderbird. It appears to stall for 1-2 seconds occassionally, probably as calendar data is being processed.

But I don't get any deadlocks, have been running this configuration for over 2 days. Linux.
Simon, is this something you can reproduce on trunk or 1.9.1?
One IMAPS account and 2 CalDAV accounts with HTTPS URLs. All connecting to the same host through a slow VPN connection (It's harder to reproduce is the network is fast). CPU is Quad core so I get 25% cpu usage when the bug happens.

With TB2:
1) Send a 500k mail to yourself, quit TB main window while it's sending (so TB quits right after the mail is sent)
2) Start TB again, click on the new mail header, quickly go to calendar pane and do a "reload calendars", go back to mail headers and click on the new mail again.

With my setup I can reproduce it about 50% of the time in the first 5 seconds of starting it up (After that it usually never locks up). 

The same test with TB3 does not seem to cause any issue.
Since noone has yet been able to reproduce this bug on the trunk, removing [tbneeds].  If someone does manage to reproduce it, please re-add that keyword!

I wonder if the nsIThreadManager changes that happened after 1.8 are working in our favor.
Whiteboard: [tb3needs]
I appear to be having this same issue in TB version (20090105) and Lightning 0.9.

I am using one IMAPS server and one CalDAVS calendar.  On start-up TB goes to 100% CPU and has no network capabilities.  If I remove the CalDAVS calendar and restart TB seems to work again.  

So, from my perspective, it appears that Lightning doesn't work with CalDAV as it renders TB useless.  I do not have non SSL calendar sources available so I wouldn't know that Lightning worked with non-SSL CalDAV.
I noticed the SSL issue while reading calendars from a Zimbra server.

Is there anyway I can try the latest trunk using Thunderbird 2.0?
(In reply to comment #94)
> I noticed the SSL issue while reading calendars from a Zimbra server.
> Is there anyway I can try the latest trunk using Thunderbird 2.0?

Which latest trunk do you refer to?
As you say "latest trunk" with TB2, I guess you are referring to "Lightning trunk".
This combination won't help you, as the bug is in the core TB code.
(In reply to comment #92)
> Since noone has yet been able to reproduce this bug on the trunk, removing
> [tbneeds].  If someone does manage to reproduce it, please re-add that keyword!
> I wonder if the nsIThreadManager changes that happened after 1.8 are working in
> our favor.

I found one more difference. We never added the enhancement from bug 363455 to 1.8 branch, so TB 2 does not have it. That patch is meant to improve handling of blocking sockets. It would be interesting to know if it helps for this bug. I backported the patch, you find it in bug 363455 attachment 365683 [details] [diff] [review].
Would someone of you who is still using TB 2 be able to try that patch?
Versions have moved since some of the past comments were written.

The case that's really most critical at this point is ensuring that the nightly versions built from the mozilla-central trunk of Lightning <> (next Lightning will ship from 1.9.1, but we don't have builds for that yet) and Thunderbird <> (Thunderbird 3.0 will ship from here) work well together and don't have this problem.

Note that as of this writing, comm-central hasn't yet branched, though it will in the not-too-distant future.
I've filed bug 481685 to track getting mozilla-1.9.1-based builds of Lightning.
It appears that I was confused, and we do already have 1.9.1 builds of Lightning at <>.  Bug 481685 has more details for those who wish to keep up.
(In reply to comment #96)
> (In reply to comment #92)
> > Since noone has yet been able to reproduce this bug on the trunk, removing
> > [tbneeds].  If someone does manage to reproduce it, please re-add that keyword!
> > 
> > I wonder if the nsIThreadManager changes that happened after 1.8 are working in
> > our favor.
> I found one more difference. We never added the enhancement from bug 363455 to
> 1.8 branch, so TB 2 does not have it. That patch is meant to improve handling
> of blocking sockets. It would be interesting to know if it helps for this bug.
> I backported the patch, you find it in bug 363455 attachment 365683 [details] [diff] [review].
> Would someone of you who is still using TB 2 be able to try that patch?

Used this patch for TB 2.
The Cal+mail performance is much better now.
Though i see cpu spikes now and then, but it does not hang my TB.
Thanks, Huzaifa, that's very helpful to know.  Setting the version field appropriately, since there's no longer reason to believe that this bug applies to the trunk.
Version: Trunk → 1.8 Branch
You know I'm honestly wondering if its the same bug as I just turned mail checking back on and it still works. Maybe whatever the offending event in my calendar simply passed...
Perhaps someone could deliver 0.9.1 versions of lightning with the backported fix and the fix from bug 363455 comment 16 ?
(In reply to comment #26 and #17)

> does this happen on Windows only?

No, this is also a problem on TB / Lightning 0.9 on Mac OS X 10.5.7

I've found that once it happens the only solution is to delete the https caldav calendar, quit TB then restart it, add the calendar back in.

I'm connecting to a Google Apps calendar.
You are lucky. I can't get it to work even if I reimport Google Calendar. A fix would be very appreciated.

The bug is present on TB on Linux Ubuntu 9.04.

I recently suffered serious data loss as a result of this bug. PLEASE PLEASE PLEASE could resources be allocated to applying this patch to current Thunderbird versions?
Given bug 363455, I think we can soon mark this bug as FIXED (by that bug). Leaving a couple of weeks grace period, please report if you can reproduce this on Lightning 1.0pre ONLY. We are aware that this is an issue for 0.9, but the only way to fix it would be to drive forward bug 363455's branch approval.
Depends on: 363455
(In reply to comment #111)
> Given bug 363455, I think we can soon mark this bug as FIXED (by that bug).
> Leaving a couple of weeks grace period, please report if you can reproduce this
> on Lightning 1.0pre ONLY. We are aware that this is an issue for 0.9, but the
> only way to fix it would be to drive forward bug 363455's branch approval.

I do not really agree with the above, especially since this bug has been marked as a 1.8-only bug. Even when Thunderbird 3 is released, people using Thunderbird 2 will still be stuck with it because it is ignored by its maintainers.
I have been able to reproduce this problem with TB 3.0b4 and Lightning 1.0pre (2009-10-27) nightly. OS - WinXP. I see the exact same symptoms - CPU usage goes to 50% (on a dual core machine). It takes 10-15 mins for TB to start. Even after this it is very sluggish. CPU usage keep fluctuating between 50%-20%.

If I disable Lightning, then TB starts up just fine. I have tried this over and over (back and forth) and am quite positive that Lightning is causing this hang.

I only have one IMAP account configured (without SSL). When the hang happens, there is not data traffic as connection to IMAP server is not available. 

No WebDAV/CalDAV account has been configured. No google account has been configured.
(In reply to comment #113)
I don't think you are seeing this bug because you are not using remote calendars and no secured mail server. Lightning 1.0pre test builds have a known issue that causes the calendar database to grow uncontrolled. The big database causes a similar slowdown. See Bug 521408 -> Bug 494140.
I convinced the drivers to approve the backported patch from bug 363455 to the thunderbird 2 branch. I'll check it in soon and nightly builds will contain the fix.

I'd like to ask everyone experiencing this problem for a favor. Please get the nightly build and test it. The test manipulates some core communication code, and the decision makes were a bit scared to include this patch on a old stable branch.

So, we need to make this change really works. I'm looking forward to your understanding and testing.

I'll make another comment, once the builds are ready, with the link to the test builds.

Could you please test one of the builds named 
  thunderbird-* (nightly prerelease builds)

and let us know how it works for you?

Is this bug solved for you?
Do you see any functional regressions (new bugs) when using POP3/SSL, IMAP/SSL, SMTP/SSL (or TLS)?

The problem is not corrected for me using the build referenced in Comment #116. Installed and started without trouble. Turned on "Check for new messages at startup" on three email accounts. Restarted and Thunderbird hung up using 100% of one processor. Not all calendars had loaded.
Same here: the bug is not fixed by thunderbird- on win32. Enabling lightning 0.9 will make thunderbird 100% busy during startup, disabling it will revert to the expected startup behavior. Please let us know whether we should check for regressions nevertheless, or whether the patch will be reverted.
This has been the bain of my life recently, because my calendar has grown to a substantial size and takes a while to upload any new appointments.
If I edit an appointment too soon after editing another appointment I get 100% CPU, and worse, a truncated calendar is stored because the DAV upload is aborted!

Interupting this DAV upload process with any other kind of SSL traffic aborts the transfer and causes the deadlock. Worse, terminating the TB process then empties the remote ICS file.
TB 2.0.23
L 0.9
I have been using Thunderbird 2.x with Lightning for at least a year with no problems.  I have several mail server accounts - all IMAP.  My main/production mail server definition is set to use TLS on port 143 rather than SSL.  My other mail accounts are mostly test and do not use TLS or SSL.  I recently added a new production IMAP account which requires SSL (not TLS).  And a new calendar which also requires SSL.

Last night I was prompted to allow TB to install  After restarting, TB now always goes to 100% CPU almost instantly - I have a few seconds to twiddle the UI before it hangs.  If I am quick, I can get in and disable the lightning plugin.  It is completely unusable unless I disable lightning.
So I now have 3 things that want to use SSL at the same time: my new IMAP account, my old calendar account and my new calendar account.

Lightning used to be very annoying about calendar passwords.  It was constantly popping up to ask for my calendar password again and again.  I don't know if that was because my old calendar server was horribly unreliable and therefore lightning was often having to reconnect or if this is just a design defect.  But to avoid that hassle, I finally broke down and allowed it to remember my calendar password in the password manager.  I did the same with my new calendar password.  But not with any of my mail accounts.

I just removed all those passwords and enabled lightning again and am able to use mail.  But if I give it the password instead of clicking cancel on both calendar login prompts, it hangs again.
(In reply to comment #120)

This bug is fixed in Thunderbird 3.x
(In reply to comment #122)
> (In reply to comment #120)
> This bug is fixed in Thunderbird 3.x

Any chance of having this backported to 2.x ?
We won't be backporting this bug. Closing this bug for now, the original issue seems fixed. Note there may be other hangs with a different core problem, so please only reopen this bug if the issues mentioned here persist.
Closed: 14 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.


