Closed Bug 955947 Opened 11 years ago Closed 9 years ago

crash in nsPop3Protocol::GetMsg() where m_pop3ConData->uidlinfo is null. need threadsafe nsNSSSocketInfo

Categories

(MailNews Core :: Networking: POP, defect)

x86
Windows NT
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: wsmwk, Unassigned)

References

()

Details

(4 keywords, Whiteboard: [gs][regression:TB24][antivirus?])

Crash Data

Attachments

(1 file)

#43 crash for TB24.2.0. Not high enough for tracking. Signature exists in TB17.0.x, but is rare. Therefore, this is regression starting in TB24.0. Regression of m_kato's patch in bug 784912? (currently I am not able to determine rough regression range because of difficulties with socorro) bp-913c63c5-663d-4ff2-b64c-b9f5f2131204 ============================================================= 0 xul.dll nsPop3Protocol::GetMsg() mailnews/local/src/nsPop3Protocol.cpp 1 xul.dll nsPop3Protocol::ProcessProtocolState(nsIURI *,nsIInputStream *,unsigned __int64,unsigned int) mailnews/local/src/nsPop3Protocol.cpp 2 xul.dll nsMsgProtocol::OnDataAvailable(nsIRequest *,nsISupports *,nsIInputStream *,unsigned __int64,unsigned int) mailnews/base/util/nsMsgProtocol.cpp 3 xul.dll nsInputStreamPump::OnStateTransfer() netwerk/base/src/nsInputStreamPump.cpp 4 xul.dll nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream *) netwerk/base/src/nsInputStreamPump.cpp 5 xul.dll nsInputStreamReadyEvent::Run() xpcom/io/nsStreamUtils.cpp 6 xul.dll nsThread::ProcessNextEvent(bool,bool *) xpcom/threads/nsThread.cpp 7 xul.dll NS_ProcessNextEvent(nsIThread *,bool) objdir-tb/mozilla/xpcom/build/nsThreadUtils.cpp 8 xul.dll mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate *) ipc/glue/MessagePump.cpp 9 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc 10 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc hg@0 2878 char c = 0; hg@0 2879 popstateTimestamp = TimeInSecondsFromPRTime(PR_Now()); hg@0 2880 if (m_pop3ConData->msg_info[i].uidl) hg@0 2881 { hg@0 2882 Pop3UidlEntry *uidlEntry = (Pop3UidlEntry *) PL_HashTableLookup(m_pop3ConData->uidlinfo->hash, hg@0 2883 m_pop3ConData->msg_info[i].uidl); hg@0 355 // Whenever data arrives from the connection, core netlib notifices the protocol by calling hg@0 356 // OnDataAvailable. We then read and process the incoming data from the input stream. m_kato@13623 357 NS_IMETHODIMP nsMsgProtocol::OnDataAvailable(nsIRequest *request, nsISupports *ctxt, nsIInputStream *inStr, uint64_t sourceOffset, uint32_t count) hg@0 358 { hg@0 359 // right now, this really just means turn around and churn through the state machine hg@0 360 nsCOMPtr<nsIURI> uri = do_QueryInterface(ctxt); hg@0 361 return ProcessProtocolState(uri, inStr, sourceOffset, count); Not sure if this specific crash ID is representative of all crashes. Also, the user of this specific crash ID has *several* crash signatures as reported at https://getsatisfaction.com/mozilla_messaging/topics/repeatedly_crashes -- I'm also not sure whether multiple signatures is typical for users of this crash .. https://crash-stats.mozilla.com/report/index/67c8b22f-dcb9-4a1a-a5bd-7de182131207 [@ PR_PopIOLayer | nsSSLIOLayerAddToSocket(int, char const*, int, char const*, int, PRFileDesc*, nsISupports**, bool, unsigned int)] - Thunderbird 24.1.1 Crash Report - Report ID: 67c8b22f-dcb9-4a1a-a5bd-7de182131207 https://crash-stats.mozilla.com/report/index/91e0852e-bc35-40a8-aea7-851192131205 [@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | nsRelativeFilePrefConstructor] - Thunderbird 24.1.1 Crash Report - Report ID: 91e0852e-bc35-40a8-aea7-851192131205 https://crash-stats.mozilla.com/report/index/198b70ae-6bf9-44de-904d-6cbe42131205 [@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | NS_strdup(char const*) | PREF_CopyCharPref] - Thunderbird 24.1.1 Crash Report - Report ID: 198b70ae-6bf9-44de-904d-6cbe42131205 https://crash-stats.mozilla.com/report/index/ed1326db-95d9-4c0b-a101-863432131201 [@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | NS_strdup(char const*) | PREF_CopyCharPref] - Thunderbird 24.1.1 Crash Report - Report ID: ed1326db-95d9-4c0b-a101-863432131201 https://crash-stats.mozilla.com/report/index/620d269a-5d89-4fb5-b270-551e52131126 [@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | nsSocketTransport::PostEvent(unsigned int, tag_nsresult, nsISupports*)] - Thunderbird 24.1.1 Crash Report - Report ID: 620d269a-5d89-4fb5-b270-551e52131126 https://crash-stats.mozilla.com/report/index/35e79b23-2816-426f-87fa-6fba32131124 [@ mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | XPCWrappedNativeProto::GetNewOrUsed(XPCWrappedNativeScope*, nsIClassInfo*, XPCNativeScriptableCreateInfo const*, bool)] - Thunderbird 24.1.1 Crash Report - Report ID: 35e79b23-2816-426f-87fa-6fba32131124
Wayne, do you investigate what 3rd party module cause this crash? From Gecko 2, 3rd party developer must rebuild binary XPCOM module per version.
(In reply to Makoto Kato (:m_kato) from comment #2) > Wayne, do you investigate what 3rd party module cause this crash? From > Gecko 2, 3rd party developer must rebuild binary XPCOM module per version. Yes I am asking users which software they use. Do you correlate this crash to a specific dll?
Flags: needinfo?(m_kato)
(In reply to Wayne Mery (:wsmwk) from comment #3) > (In reply to Makoto Kato (:m_kato) from comment #2) > > Wayne, do you investigate what 3rd party module cause this crash? From > > Gecko 2, 3rd party developer must rebuild binary XPCOM module per version. > > Yes I am asking users which software they use. > > Do you correlate this crash to a specific dll? cannot. bp-97dac378-9145-4c0e-9d54-94a8f2140106 has no 3rd party extensions. So this may not depends on 3rd party module ... But, I think that m_pop3ConData->msg_info[i] may be null... We should re-write modern C++ code. (use new or TArray, not PR_Malloc)
Flags: needinfo?(m_kato)
(In reply to Makoto Kato (:m_kato) from comment #4) > ... > I think that m_pop3ConData->msg_info[i] may be null... We should > re-write modern C++ code. (use new or TArray, not PR_Malloc) will this be difficult? Best I can tell, this is a regression in TB24. Perhaps significant: 1. (as previously noted) some users have multiple crash signatures (but perhaps these are edge cases) 2. 85% of crashes are win7 - why? Mac crash PL_HashTableLookup | nsPop3Protocol::GetMsg() bp-ba557bc7-2531-451e-9d86-953772140123 0 libplds4.dylib PL_HashTableLookup nsprpub/lib/ds/plhash.c 1 XUL nsPop3Protocol::GetMsg() mailnews/local/src/nsPop3Protocol.cpp 2 XUL nsPop3Protocol::ProcessProtocolState(nsIURI*, nsIInputStream*, unsigned long long, unsigned int) mailnews/local/src/nsPop3Protocol.cpp 3 XUL nsMsgProtocol::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned long long, unsigned int) mailnews/base/util/nsMsgProtocol.cpp 4 XUL nsInputStreamPump::OnStateTransfer() netwerk/base/src/nsInputStreamPump.cpp
Flags: needinfo?(m_kato)
Whiteboard: [gs][regression:TB?][antivirus?] → [gs][regression:TB24][antivirus?]
most users with this crash have multiple crash sigs, at least 3-4 other signatures [1] which occur at least as frequently as nsPop3Protocol::GetMsg. Adding these to the count makes this a topcrash [1] including Don at http://getsatisfaction.com/mozilla_messaging/topics/repeatedly_crashes who replaced AVG with MSE. And still crashes (with less frequency, according to him) but still multiple crash sigs
(In reply to Wayne Mery (:wsmwk) from comment #5) > (In reply to Makoto Kato (:m_kato) from comment #4) > > ... > > I think that m_pop3ConData->msg_info[i] may be null... We should > > re-write modern C++ code. (use new or TArray, not PR_Malloc) > > will this be difficult? sorry for delay. Ah, I misunderstand. All? crash data is accessing to address 0x8. So this root cause it hat msg_info is null. (offset of uidl is 0x8). I cannot find why msg_info is still null, but we already use msg_info check at several places.
Flags: needinfo?(m_kato)
Attached patch Add null check for msg_info (deleted) — Splinter Review
Add null check for msg_info. All crash data is during accessing address 0x8. If msg_info is null, msg_info[0].uidl becomes 0x8.
Attachment #8428968 - Flags: review?(kent)
The code we are crashing on is the last code line of this snippet: if (m_pop3ConData->msg_info[i].uidl) { Pop3UidlEntry *uidlEntry = (Pop3UidlEntry *) PL_HashTableLookup(m_pop3ConData->uidlinfo->hash, m_pop3ConData->msg_info[i].uidl); We just checked for non-null m_pop3ConData->msg_info[i].uidl successfully. Isn't it therefore more likely that the crash is caused by a null m_pop3ConData->uidlinfo?
Comment on attachment 8428968 [details] [diff] [review] Add null check for msg_info I'm clearing the review request since I did not hear back from m_kato on the issue of the root cause of this bug. It seems to me that null m_pop3ConData->uidlinfo is a more likely root cause of this bug than an error accessing m_pop3ConData->msg_info[i].uidl that we just accessed successfully in the previous statement.
Attachment #8428968 - Flags: review?(kent)
Flags: needinfo?(m_kato)
I am also suspecting m_pop3ConData->uidlinfo is null. But just null checking does not fix the root cause anyway. If uidlinfo is null there, nsPop3Protocol::Cleanup() was already invoked before the crash, so something went wrong before GetMsg().
So far, the majority of users I correspond with about this crash, no longer crash in version 31. Does that mean that the crash cause is merely masked in version 31??
Crash Signature: [@ nsPop3Protocol::GetMsg()] → [@ nsPop3Protocol::GetMsg() ]
This crash signature is gone in version 31, with no signficant changes in the areas of the suggested patch. But much has chnaged since TB24 in other areas of http://hg.mozilla.org/comm-central/log/d9842ddc938b/mailnews/local/src/nsPop3Protocol.cpp fixed by bug 239455 ? (landed for TB26) Or, is the problem now only hidden?
(In reply to Wayne Mery (:wsmwk) from comment #13) > fixed by bug 239455 ? (landed for TB26) > Or, is the problem now only hidden? Hard to know.
For the record, a few 31.x users continued to crash with NS_ProxyRelease aka bug 902158 which is now fixed in version 31.3.0. And fewer still also crashed in 24.x and 31.x with @ OOM | small. Hiro, do you think we still need a patch?
Flags: needinfo?(hiikezoe)
(In reply to Wayne Mery (:wsmwk) from comment #15) > For the record, a few 31.x users continued to crash with NS_ProxyRelease aka > bug 902158 which is now fixed in version 31.3.0. And fewer still also > crashed in 24.x and 31.x with @ OOM | small. > > Hiro, do you think we still need a patch? Yes, but the patch will be against codes in mozilla-central because firefox also crashes at NS_ProxyRelease. https://crash-stats.mozilla.com/report/index/020f6bbd-3194-44a9-8d82-1d30e2141227 I am still convinced nsNSSSocketInfo should make thread-safe.
Flags: needinfo?(hiikezoe)
(In reply to Hiroyuki Ikezoe (:hiro) from comment #16) > (In reply to Wayne Mery (:wsmwk) from comment #15) > > For the record, a few [TB]31.x users continued to crash with NS_ProxyRelease aka > > bug 902158 which is now fixed in version [TB]31.3.0. And fewer still also > > crashed in 24.x and 31.x with @ OOM | small. > > > > Hiro, do you think we still need a patch? > > Yes, but the patch will be against codes in mozilla-central because firefox > also crashes at NS_ProxyRelease. > https://crash-stats.mozilla.com/report/index/020f6bbd-3194-44a9-8d82- > 1d30e2141227 You mean a bug report is needed for firefox crashes of NS_ProxyRelease? There are *very* few. https://crash-stats.mozilla.com/query/?product=Firefox&version=ALL%3AALL&date=&range_value=4&range_unit=weeks&query_search=signature&query_type=is_exactly&build_id=&process_type=any&do_query=1&query=NS_ProxyRelease%28nsIEventTarget*%2C%20nsISupports*%2C%20bool%29 > I am still convinced nsNSSSocketInfo should make thread-safe. you mean via this bug/this patch? Or via new bug
Flags: needinfo?(hiikezoe)
Already filed as bug 427948.
Flags: needinfo?(hiikezoe)
(In reply to Makoto Kato (:m_kato) from comment #19) > Wayne, does this still occur on Tb38? > > I cannot find crash after Tb31. > https://crash-stats.mozilla.com/signature/ > ?signature=nsPop3Protocol%3A%3AGetMsg%28%29&_columns=date&_columns=product&_c > olumns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=a > ddress&page=1 It is too early to tell for TB38. But, along the lines of comment 15, it does exist for tb31 with a different signature failing at the same source line, thought at much reduced rate - perhaps because of bug 902158 https://crash-stats.mozilla.com/signature/?date=%3E2015-06-01&product=Thunderbird&signature=PL_HashTableRawLookup+|+PL_HashTableLookup+|+nsPop3Protocol%3A%3AGetMsg%28%29&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1 https://crash-stats.mozilla.com/report/index/4dabe7cc-c7ff-4b1a-9b75-9650b2150617 hg@0 2874 char c = 0; hg@0 2875 popstateTimestamp = TimeInSecondsFromPRTime(PR_Now()); hg@0 2876 if (m_pop3ConData->msg_info[i].uidl) hg@0 2877 { hg@0 2878 Pop3UidlEntry *uidlEntry = (Pop3UidlEntry *) PL_HashTableLookup(m_pop3ConData->uidlinfo->hash, hg@0 2879 m_pop3ConData->msg_info[i].uidl);
Crash Signature: [@ nsPop3Protocol::GetMsg() ] → [@ nsPop3Protocol::GetMsg() ] [@ PL_HashTableRawLookup | PL_HashTableLookup | nsPop3Protocol::GetMsg() ]
Crash Signature: [@ nsPop3Protocol::GetMsg() ] [@ PL_HashTableRawLookup | PL_HashTableLookup | nsPop3Protocol::GetMsg() ] → [@ nsPop3Protocol::GetMsg() ] [@ PL_HashTableRawLookup | PL_HashTableLookup | nsPop3Protocol::GetMsg() ] [@ nsPop3Protocol::GetMsg ] [@ PL_HashTableRawLookup | PL_HashTableLookup | nsPop3Protocol::GetMsg ]
What's (In reply to Makoto Kato [:m_kato] from comment #19) > Wayne, does this still occur on Tb38? > > I cannot find crash after Tb31. > https://crash-stats.mozilla.com/signature/ > ?signature=nsPop3Protocol%3A%3AGetMsg%28%29&_columns=date&_columns=product&_c > olumns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=a > ddress&page=1 Getting back to this ... No, it is gone - all crashes with nsPop3Protocol:GetMsg in signature are now at a different location - http://hg.mozilla.org/releases/comm-esr38/file/tip/mailnews/local/src/nsPop3Protocol.cpp#l2959 - like bp-2eb2d2db-6c29-48f1-b1b9-a75a92160104 which are almost all startup and extremely rare. This is quite the tangled web, so to recap (without double checking all the time lines) best guess is fixed or helped by bug 239455. Perhaps further bullet proofing comes via hiro's race condition patch via bug 902158 comment 57 2014-11-25 https://hg.mozilla.org/releases/comm-esr31/rev/798e338f82e0. FTR https://www.mozilla.org/en-US/thunderbird/31.3.0/releasenotes/ does not mention the fix I'm going to close this because the crash sig is gone (alleluia!), and hiro states in comment 18 that what we really want is bug 427948. (gosh I miss hiro) If someone thinks we need to bullet proof by addressing m_pop3ConData->uidlinfo is null, now is a good time to speak up.
Status: NEW → RESOLVED
Closed: 9 years ago
Depends on: 902158, 427948, 239455
Resolution: --- → WORKSFORME
Summary: crash in nsPop3Protocol::GetMsg() → crash in nsPop3Protocol::GetMsg() where m_pop3ConData->uidlinfo is null. need threadsafe nsNSSSocketInfo
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: