Closed
Bug 955947
Opened 11 years ago
Closed 9 years ago
crash in nsPop3Protocol::GetMsg() where m_pop3ConData->uidlinfo is null. need threadsafe nsNSSSocketInfo
Categories
(MailNews Core :: Networking: POP, defect)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: wsmwk, Unassigned)
References
()
Details
(4 keywords, Whiteboard: [gs][regression:TB24][antivirus?])
Crash Data
Attachments
(1 file)
(deleted),
patch
|
Details | Diff | Splinter Review |
#43 crash for TB24.2.0. Not high enough for tracking.
Signature exists in TB17.0.x, but is rare. Therefore, this is regression starting in TB24.0.
Regression of m_kato's patch in bug 784912?
(currently I am not able to determine rough regression range because of difficulties with socorro)
bp-913c63c5-663d-4ff2-b64c-b9f5f2131204
=============================================================
0 xul.dll nsPop3Protocol::GetMsg() mailnews/local/src/nsPop3Protocol.cpp
1 xul.dll nsPop3Protocol::ProcessProtocolState(nsIURI *,nsIInputStream *,unsigned __int64,unsigned int) mailnews/local/src/nsPop3Protocol.cpp
2 xul.dll nsMsgProtocol::OnDataAvailable(nsIRequest *,nsISupports *,nsIInputStream *,unsigned __int64,unsigned int) mailnews/base/util/nsMsgProtocol.cpp
3 xul.dll nsInputStreamPump::OnStateTransfer() netwerk/base/src/nsInputStreamPump.cpp
4 xul.dll nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream *) netwerk/base/src/nsInputStreamPump.cpp
5 xul.dll nsInputStreamReadyEvent::Run() xpcom/io/nsStreamUtils.cpp
6 xul.dll nsThread::ProcessNextEvent(bool,bool *) xpcom/threads/nsThread.cpp
7 xul.dll NS_ProcessNextEvent(nsIThread *,bool) objdir-tb/mozilla/xpcom/build/nsThreadUtils.cpp
8 xul.dll mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate *) ipc/glue/MessagePump.cpp
9 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc
10 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc
hg@0 2878 char c = 0;
hg@0 2879 popstateTimestamp = TimeInSecondsFromPRTime(PR_Now());
hg@0 2880 if (m_pop3ConData->msg_info[i].uidl)
hg@0 2881 {
hg@0 2882 Pop3UidlEntry *uidlEntry = (Pop3UidlEntry *) PL_HashTableLookup(m_pop3ConData->uidlinfo->hash,
hg@0 2883 m_pop3ConData->msg_info[i].uidl);
hg@0 355 // Whenever data arrives from the connection, core netlib notifices the protocol by calling
hg@0 356 // OnDataAvailable. We then read and process the incoming data from the input stream.
m_kato@13623 357 NS_IMETHODIMP nsMsgProtocol::OnDataAvailable(nsIRequest *request, nsISupports *ctxt, nsIInputStream *inStr, uint64_t sourceOffset, uint32_t count)
hg@0 358 {
hg@0 359 // right now, this really just means turn around and churn through the state machine
hg@0 360 nsCOMPtr<nsIURI> uri = do_QueryInterface(ctxt);
hg@0 361 return ProcessProtocolState(uri, inStr, sourceOffset, count);
Not sure if this specific crash ID is representative of all crashes.
Also, the user of this specific crash ID has *several* crash signatures as reported at https://getsatisfaction.com/mozilla_messaging/topics/repeatedly_crashes -- I'm also not sure whether multiple signatures is typical for users of this crash ..
https://crash-stats.mozilla.com/report/index/67c8b22f-dcb9-4a1a-a5bd-7de182131207
[@ PR_PopIOLayer | nsSSLIOLayerAddToSocket(int, char const*, int, char const*, int, PRFileDesc*, nsISupports**, bool, unsigned int)] - Thunderbird 24.1.1 Crash Report - Report ID: 67c8b22f-dcb9-4a1a-a5bd-7de182131207
https://crash-stats.mozilla.com/report/index/91e0852e-bc35-40a8-aea7-851192131205
[@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | nsRelativeFilePrefConstructor] - Thunderbird 24.1.1 Crash Report - Report ID: 91e0852e-bc35-40a8-aea7-851192131205
https://crash-stats.mozilla.com/report/index/198b70ae-6bf9-44de-904d-6cbe42131205
[@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | NS_strdup(char const*) | PREF_CopyCharPref] - Thunderbird 24.1.1 Crash Report - Report ID: 198b70ae-6bf9-44de-904d-6cbe42131205
https://crash-stats.mozilla.com/report/index/ed1326db-95d9-4c0b-a101-863432131201
[@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | NS_strdup(char const*) | PREF_CopyCharPref] - Thunderbird 24.1.1 Crash Report - Report ID: ed1326db-95d9-4c0b-a101-863432131201
https://crash-stats.mozilla.com/report/index/620d269a-5d89-4fb5-b270-551e52131126
[@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | nsSocketTransport::PostEvent(unsigned int, tag_nsresult, nsISupports*)] - Thunderbird 24.1.1 Crash Report - Report ID: 620d269a-5d89-4fb5-b270-551e52131126
https://crash-stats.mozilla.com/report/index/35e79b23-2816-426f-87fa-6fba32131124
[@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | XPCWrappedNativeProto::GetNewOrUsed(XPCWrappedNativeScope*, nsIClassInfo*, XPCNativeScriptableCreateInfo const*, bool)] - Thunderbird 24.1.1 Crash Report - Report ID: 35e79b23-2816-426f-87fa-6fba32131124
Reporter | ||
Comment 1•11 years ago
|
||
OTOH, I would not rule out AV software involvement, especially given the multiple signatures.
earliest strong uptick seems to be at TB24.0.1, not TB24.0
24.0 https://crash-stats.mozilla.com/report/list?signature=nsPop3Protocol%3A%3AGetMsg%28%29&product=Thunderbird&query_type=contains&range_unit=weeks&process_type=any&hang_type=any&date=2013-10-01+00%3A00%3A00&range_value=4#reports
24.0.1 https://crash-stats.mozilla.com/report/list?signature=nsPop3Protocol%3A%3AGetMsg%28%29&product=Thunderbird&query_type=contains&range_unit=weeks&process_type=any&hang_type=any&date=2013-11-01+00%3A00%3A00&range_value=4#reports
24.1.x/24.2.0 https://crash-stats.mozilla.com/report/list?signature=nsPop3Protocol%3A%3AGetMsg%28%29&product=Thunderbird&query_type=is_exactly&range_unit=weeks&process_type=any&hang_type=any&date=2014-01-02+02%3A00%3A00&range_value=4#reports
Whiteboard: [gs][regression:TB?][antivirus?]
Comment 2•11 years ago
|
||
Wayne, do you investigate what 3rd party module cause this crash? From Gecko 2, 3rd party developer must rebuild binary XPCOM module per version.
Reporter | ||
Comment 3•11 years ago
|
||
(In reply to Makoto Kato (:m_kato) from comment #2)
> Wayne, do you investigate what 3rd party module cause this crash? From
> Gecko 2, 3rd party developer must rebuild binary XPCOM module per version.
Yes I am asking users which software they use.
Do you correlate this crash to a specific dll?
Flags: needinfo?(m_kato)
Comment 4•11 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #3)
> (In reply to Makoto Kato (:m_kato) from comment #2)
> > Wayne, do you investigate what 3rd party module cause this crash? From
> > Gecko 2, 3rd party developer must rebuild binary XPCOM module per version.
>
> Yes I am asking users which software they use.
>
> Do you correlate this crash to a specific dll?
cannot. bp-97dac378-9145-4c0e-9d54-94a8f2140106 has no 3rd party extensions. So this may not depends on 3rd party module ...
But, I think that m_pop3ConData->msg_info[i] may be null... We should re-write modern C++ code. (use new or TArray, not PR_Malloc)
Flags: needinfo?(m_kato)
Reporter | ||
Comment 5•11 years ago
|
||
(In reply to Makoto Kato (:m_kato) from comment #4)
> ...
> I think that m_pop3ConData->msg_info[i] may be null... We should
> re-write modern C++ code. (use new or TArray, not PR_Malloc)
will this be difficult?
Best I can tell, this is a regression in TB24.
Perhaps significant:
1. (as previously noted) some users have multiple crash signatures (but perhaps these are edge cases)
2. 85% of crashes are win7 - why?
Mac crash PL_HashTableLookup | nsPop3Protocol::GetMsg()
bp-ba557bc7-2531-451e-9d86-953772140123
0 libplds4.dylib PL_HashTableLookup nsprpub/lib/ds/plhash.c
1 XUL nsPop3Protocol::GetMsg() mailnews/local/src/nsPop3Protocol.cpp
2 XUL nsPop3Protocol::ProcessProtocolState(nsIURI*, nsIInputStream*, unsigned long long, unsigned int) mailnews/local/src/nsPop3Protocol.cpp
3 XUL nsMsgProtocol::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned long long, unsigned int) mailnews/base/util/nsMsgProtocol.cpp
4 XUL nsInputStreamPump::OnStateTransfer() netwerk/base/src/nsInputStreamPump.cpp
Flags: needinfo?(m_kato)
Whiteboard: [gs][regression:TB?][antivirus?] → [gs][regression:TB24][antivirus?]
Reporter | ||
Comment 6•11 years ago
|
||
most users with this crash have multiple crash sigs, at least 3-4 other signatures [1] which occur at least as frequently as nsPop3Protocol::GetMsg. Adding these to the count makes this a topcrash
[1] including Don at http://getsatisfaction.com/mozilla_messaging/topics/repeatedly_crashes who replaced AVG with MSE. And still crashes (with less frequency, according to him) but still multiple crash sigs
Keywords: topcrash-thunderbird
Comment 7•10 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #5)
> (In reply to Makoto Kato (:m_kato) from comment #4)
> > ...
> > I think that m_pop3ConData->msg_info[i] may be null... We should
> > re-write modern C++ code. (use new or TArray, not PR_Malloc)
>
> will this be difficult?
sorry for delay.
Ah, I misunderstand. All? crash data is accessing to address 0x8. So this root cause it hat msg_info is null. (offset of uidl is 0x8). I cannot find why msg_info is still null, but we already use msg_info check at several places.
Flags: needinfo?(m_kato)
Comment 8•10 years ago
|
||
Add null check for msg_info. All crash data is during accessing address 0x8. If msg_info is null, msg_info[0].uidl becomes 0x8.
Attachment #8428968 -
Flags: review?(kent)
Comment 9•10 years ago
|
||
The code we are crashing on is the last code line of this snippet:
if (m_pop3ConData->msg_info[i].uidl)
{
Pop3UidlEntry *uidlEntry = (Pop3UidlEntry *) PL_HashTableLookup(m_pop3ConData->uidlinfo->hash,
m_pop3ConData->msg_info[i].uidl);
We just checked for non-null m_pop3ConData->msg_info[i].uidl successfully. Isn't it therefore more likely that the crash is caused by a null m_pop3ConData->uidlinfo?
Comment 10•10 years ago
|
||
Comment on attachment 8428968 [details] [diff] [review]
Add null check for msg_info
I'm clearing the review request since I did not hear back from m_kato on the issue of the root cause of this bug. It seems to me that null m_pop3ConData->uidlinfo is a more likely root cause of this bug than an error accessing m_pop3ConData->msg_info[i].uidl that we just accessed successfully in the previous statement.
Attachment #8428968 -
Flags: review?(kent)
Flags: needinfo?(m_kato)
Comment 11•10 years ago
|
||
I am also suspecting m_pop3ConData->uidlinfo is null. But just null checking does not fix the root cause anyway.
If uidlinfo is null there, nsPop3Protocol::Cleanup() was already invoked before the crash, so something went wrong before GetMsg().
Reporter | ||
Comment 12•10 years ago
|
||
So far, the majority of users I correspond with about this crash, no longer crash in version 31. Does that mean that the crash cause is merely masked in version 31??
Crash Signature: [@ nsPop3Protocol::GetMsg()] → [@ nsPop3Protocol::GetMsg() ]
Reporter | ||
Comment 13•10 years ago
|
||
This crash signature is gone in version 31, with no signficant changes in the areas of the suggested patch. But much has chnaged since TB24 in other areas of http://hg.mozilla.org/comm-central/log/d9842ddc938b/mailnews/local/src/nsPop3Protocol.cpp
fixed by bug 239455 ? (landed for TB26)
Or, is the problem now only hidden?
Comment 14•10 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #13)
> fixed by bug 239455 ? (landed for TB26)
> Or, is the problem now only hidden?
Hard to know.
Reporter | ||
Comment 15•10 years ago
|
||
For the record, a few 31.x users continued to crash with NS_ProxyRelease aka bug 902158 which is now fixed in version 31.3.0. And fewer still also crashed in 24.x and 31.x with @ OOM | small.
Hiro, do you think we still need a patch?
Flags: needinfo?(hiikezoe)
Comment 16•10 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #15)
> For the record, a few 31.x users continued to crash with NS_ProxyRelease aka
> bug 902158 which is now fixed in version 31.3.0. And fewer still also
> crashed in 24.x and 31.x with @ OOM | small.
>
> Hiro, do you think we still need a patch?
Yes, but the patch will be against codes in mozilla-central because firefox also crashes at NS_ProxyRelease.
https://crash-stats.mozilla.com/report/index/020f6bbd-3194-44a9-8d82-1d30e2141227
I am still convinced nsNSSSocketInfo should make thread-safe.
Flags: needinfo?(hiikezoe)
Reporter | ||
Comment 17•10 years ago
|
||
(In reply to Hiroyuki Ikezoe (:hiro) from comment #16)
> (In reply to Wayne Mery (:wsmwk) from comment #15)
> > For the record, a few [TB]31.x users continued to crash with NS_ProxyRelease aka
> > bug 902158 which is now fixed in version [TB]31.3.0. And fewer still also
> > crashed in 24.x and 31.x with @ OOM | small.
> >
> > Hiro, do you think we still need a patch?
>
> Yes, but the patch will be against codes in mozilla-central because firefox
> also crashes at NS_ProxyRelease.
> https://crash-stats.mozilla.com/report/index/020f6bbd-3194-44a9-8d82-
> 1d30e2141227
You mean a bug report is needed for firefox crashes of NS_ProxyRelease?
There are *very* few. https://crash-stats.mozilla.com/query/?product=Firefox&version=ALL%3AALL&date=&range_value=4&range_unit=weeks&query_search=signature&query_type=is_exactly&build_id=&process_type=any&do_query=1&query=NS_ProxyRelease%28nsIEventTarget*%2C%20nsISupports*%2C%20bool%29
> I am still convinced nsNSSSocketInfo should make thread-safe.
you mean via this bug/this patch? Or via new bug
Flags: needinfo?(hiikezoe)
Comment 19•9 years ago
|
||
Wayne, does this still occur on Tb38?
I cannot find crash after Tb31.
https://crash-stats.mozilla.com/signature/?signature=nsPop3Protocol%3A%3AGetMsg%28%29&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1
Flags: needinfo?(m_kato)
Reporter | ||
Comment 20•9 years ago
|
||
(In reply to Makoto Kato (:m_kato) from comment #19)
> Wayne, does this still occur on Tb38?
>
> I cannot find crash after Tb31.
> https://crash-stats.mozilla.com/signature/
> ?signature=nsPop3Protocol%3A%3AGetMsg%28%29&_columns=date&_columns=product&_c
> olumns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=a
> ddress&page=1
It is too early to tell for TB38. But, along the lines of comment 15, it does exist for tb31 with a different signature failing at the same source line, thought at much reduced rate - perhaps because of bug 902158
https://crash-stats.mozilla.com/signature/?date=%3E2015-06-01&product=Thunderbird&signature=PL_HashTableRawLookup+|+PL_HashTableLookup+|+nsPop3Protocol%3A%3AGetMsg%28%29&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1
https://crash-stats.mozilla.com/report/index/4dabe7cc-c7ff-4b1a-9b75-9650b2150617
hg@0 2874 char c = 0;
hg@0 2875 popstateTimestamp = TimeInSecondsFromPRTime(PR_Now());
hg@0 2876 if (m_pop3ConData->msg_info[i].uidl)
hg@0 2877 {
hg@0 2878 Pop3UidlEntry *uidlEntry = (Pop3UidlEntry *) PL_HashTableLookup(m_pop3ConData->uidlinfo->hash,
hg@0 2879 m_pop3ConData->msg_info[i].uidl);
Crash Signature: [@ nsPop3Protocol::GetMsg() ] → [@ nsPop3Protocol::GetMsg() ]
[@ PL_HashTableRawLookup | PL_HashTableLookup | nsPop3Protocol::GetMsg() ]
Updated•9 years ago
|
Crash Signature: [@ nsPop3Protocol::GetMsg() ]
[@ PL_HashTableRawLookup | PL_HashTableLookup | nsPop3Protocol::GetMsg() ] → [@ nsPop3Protocol::GetMsg() ]
[@ PL_HashTableRawLookup | PL_HashTableLookup | nsPop3Protocol::GetMsg() ]
[@ nsPop3Protocol::GetMsg ]
[@ PL_HashTableRawLookup | PL_HashTableLookup | nsPop3Protocol::GetMsg ]
Reporter | ||
Comment 21•9 years ago
|
||
What's (In reply to Makoto Kato [:m_kato] from comment #19)
> Wayne, does this still occur on Tb38?
>
> I cannot find crash after Tb31.
> https://crash-stats.mozilla.com/signature/
> ?signature=nsPop3Protocol%3A%3AGetMsg%28%29&_columns=date&_columns=product&_c
> olumns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=a
> ddress&page=1
Getting back to this ... No, it is gone - all crashes with nsPop3Protocol:GetMsg in signature are now at a different location - http://hg.mozilla.org/releases/comm-esr38/file/tip/mailnews/local/src/nsPop3Protocol.cpp#l2959 - like bp-2eb2d2db-6c29-48f1-b1b9-a75a92160104 which are almost all startup and extremely rare.
This is quite the tangled web, so to recap (without double checking all the time lines) best guess is fixed or helped by bug 239455. Perhaps further bullet proofing comes via hiro's race condition patch via bug 902158 comment 57 2014-11-25 https://hg.mozilla.org/releases/comm-esr31/rev/798e338f82e0. FTR https://www.mozilla.org/en-US/thunderbird/31.3.0/releasenotes/ does not mention the fix
I'm going to close this because the crash sig is gone (alleluia!), and hiro states in comment 18 that what we really want is bug 427948. (gosh I miss hiro)
If someone thinks we need to bullet proof by addressing m_pop3ConData->uidlinfo is null, now is a good time to speak up.
You need to log in
before you can comment on or make changes to this bug.
Description
•