Closed Bug 1299076 Opened 8 years ago Closed 8 years ago

Massive Xpcshell failure in debug on all platforms: application crashed [@ nsWeakReference::QueryReferent(nsID const &,void * *)]

Categories

(Thunderbird :: General, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 51.0

People

(Reporter: jorgk-bmo, Assigned: rkent)

References

Details

(Keywords: intermittent-failure)

First seen: Tue Aug 30, 2016, 5:05:18: https://treeherder.mozilla.org/#/jobs?repo=comm-central&revision=8c7ddccf95c4168995fa6e57182133f22d21be03 Massive Xpcshell failure in debug on all platforms. Some examples: TEST-UNEXPECTED-FAIL | mail/base/test/unit/test_emptyTrash_dbViewWrapper.js | xpcshell return code: 1 PROCESS-CRASH | mail/base/test/unit/test_emptyTrash_dbViewWrapper.js | application crashed [@ nsWeakReference::QueryReferent(nsID const &,void * *)] TEST-UNEXPECTED-FAIL | mailnews/db/gloda/test/unit/test_index_junk_imap_offline.js | xpcshell return code: 1 PROCESS-CRASH | mailnews/db/gloda/test/unit/test_index_junk_imap_offline.js | application crashed [@ nsWeakReference::QueryReferent(nsID const &,void * *)] Looks like some M-C landing added a MOZ_ASSERT somewhere. Bingo: PROCESS | 312 | Hit MOZ_CRASH(nsWeakReference not thread-safe) at c:/builds/moz2_slave/tb-c-cen-w32-d-000000000000000/build/mozilla/xpcom/glue/nsWeakReference.cpp:144 Bug 956338: https://hg.mozilla.org/mozilla-central/rev/e2fda83015d8#l4.18 Honza, this has caused mayhem to many/most/all of our Xpcshell tests in debug mode. Please advise.
Flags: needinfo?(honzab.moz)
Definitely result of added assertions in bug 956338, yes. The patch was pushed several times to m-c based try, catching number of bugs (now all fixed), but I never attempted to push to comm-try (my bad). First of all, the added assertions detected a critical bug! What you see is a misuse of weak reference: the problem the added assertions in bug 956338 try to catch is an attempt to access weak ptr on one thread while the real object is just being destroyed on another thread. Hence the simple restriction to query weak ptrs only on a single thread. Dereference of a weak ptr to the same object on more then one thread (say A and B) is _LIKELY_ bad, at least it's a very bad programing architecture. If you expect the real object go away on either thread A or B or even some thread X, then you cannot avoid very dangerous race conditions when referrers, still alive, keep the weak ref and try to query it. So, we need to find the offending code and fix it ASAP. One I can see is query for a weak reference from nsImapProtocol::GetPassword, exactly here: https://dxr.mozilla.org/comm-central/rev/8c7ddccf95c4168995fa6e57182133f22d21be03/mailnews/imap/src/nsImapProtocol.cpp#8237 I didn't look further if there are more, but I expect them (there were 4+ bugs to fix in firefox/gecko). I can definitely help with these bugs.
Blocks: 956338
Flags: needinfo?(honzab.moz)
It looks like the fixup bugs on m-c are marked security-only - at least they are hidden. Should this be the case for the corresponding c-c bugs as well?
Should tihs be considered critical or blocker? Or only the bug fixes makred to block this? /me wonders if this will hit some of the imap threading issues
Severity: normal → major
Not confined to TB. Seamonkey is also affected. Happens here: >> nsCOMPtr<nsIMsgIncomingServer> server = do_QueryReferent(m_server, &rv)
Thanks for looking into this. (In reply to Honza Bambas (:mayhemer) from comment #1) > So, we need to find the offending code and fix it ASAP. > One I can see is query for a weak reference from > nsImapProtocol::GetPassword, exactly here: Looking though https://archive.mozilla.org/pub/thunderbird/tinderbox-builds/comm-central-win32-debug/1472526380/comm-central_win7_ix-debug_test-xpcshell-bm111-tests1-windows-build3.txt.gz this string nsImapProtocol::GetPassword(nsCString &,bool) [C:/builds/moz2_slave/tb-c-cen-w32-d-000000000000000/build/mailnews/imap/src/nsImapProtocol.cpp:8237 features a lot. I see 108x "application crashed" in the log and 108x "#03: nsImapProtocol::GetPassword". So with some luck, that is the only call site that needs to be fixed. BTW, what is the replacement for nsCOMPtr<nsIMsgIncomingServer> server = do_QueryReferent(m_server, &rv); ?
Flags: needinfo?(honzab.moz)
(In reply to aleth [:aleth] from comment #2) > It looks like the fixup bugs on m-c are marked security-only - at least they > are hidden. Should this be the case for the corresponding c-c bugs as well? (In reply to Wayne Mery (:wsmwk, NI for questions) from comment #3) > Should tihs be considered critical or blocker? Or only the bug fixes makred > to block this? Let's file individual bugs as critical + security sensitive, blocking bug 956338. (In reply to Jorg K (GMT+2, PTO during summer) from comment #5) > So with some luck, that is the only call site that needs to be fixed. From my experience, fixing one bug uncovers another one. But we'll see :) > > BTW, what is the replacement for > nsCOMPtr<nsIMsgIncomingServer> server = do_QueryReferent(m_server, &rv); > ? That's not that simple. Let's file a bug for this piece and talk there.
Flags: needinfo?(honzab.moz)
Blocks: 1299113
(In reply to Honza Bambas (:mayhemer) from comment #6) > That's not that simple. Let's file a bug for this piece and talk there. Done, bug 1299113.
Fixed by bug 1299116, thanks, Kent!
Assignee: nobody → rkent
No longer blocks: 1299113
Status: NEW → RESOLVED
Closed: 8 years ago
Depends on: 1299116
OS: Unspecified → All
Hardware: Unspecified → All
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 51.0
Version: unspecified → Trunk
Depends on: 1299920
You need to log in before you can comment on or make changes to this bug.