Closed Bug 853881 Opened 12 years ago Closed 8 years ago

Eliminate PLDHashtable in nsMsgDatabase

Categories

(MailNews Core :: Database, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 55.0

People

(Reporter: jcranmer, Assigned: jcranmer)

References

Details

Attachments

(5 files, 4 obsolete files)

Old, outdated, hard-to-read.
Attached patch Kill it dead (deleted) — Splinter Review
Hey look, I also killed a few static_cast<nsMsgHdr*>! :-) This requires the memory reporter to apply properly. I also did some code cleanup regarding the ownership of nsIMdbRow by nsMsgHdr and eliminated a few return values to reflect operation infallibility.
Attachment #728244 - Flags: review?(kent)
"This requires the memory reporter to apply properly" I'm not sure what you mean by that. As you say, it does not apply as given. What do I have to do to get it to apply? I've occasionally whined about big bugs with little motivation listed, so I'll try to preclude my fellow whiners by giving a motivation for this, which I only understood after reading the patch. Please correct me if I am incorrect: PLDHashTable is a low-level implementation that is not the preferred hash table for common uses where standard types are stored. This bug changes the PLDHashTables uses in nsMsgDatabase to type-specific variants, resulting in cleaner, easier to read code.
(In reply to Kent James (:rkent) from comment #2) > "This requires the memory reporter to apply properly" > > I'm not sure what you mean by that. As you say, it does not apply as given. > What do I have to do to get it to apply? Bug 480843. There's no real dependency, just the fact that two patches in my queue touch the same code, and I only decided to change this after getting sick of manipulating PLDHashTables in that bug. > PLDHashTable is a low-level implementation that is not the preferred hash > table for common uses where standard types are stored. This bug changes the > PLDHashTables uses in nsMsgDatabase to type-specific variants, resulting in > cleaner, easier to read code. I also take the time to fix up some of the ownership: nsMsgHdr now explicitly uses an nsCOMPtr for its nsIMdbRow--it always held a strong reference to it, now we use nsCOMPtr to explicitly manage it.
Comment on attachment 728244 [details] [diff] [review] Kill it dead Review of attachment 728244 [details] [diff] [review]: ----------------------------------------------------------------- Overall the patch is fine, but has bit rotted a bit since my review was delayed. I'm going to r- just so we discuss my few questions, but I probably won't give the next version much scrutiny before r+ I am suggesting that we cleanup at least one of memory management oddities, namely doing an addref in "new nsMsgHdr" ::: mailnews/db/msgdb/src/nsMsgDatabase.cpp @@ +462,5 @@ > + hdr->GetMessageKey(&key); > + // TODO: Implement LRU replacement. > + if (m_cachedHeaders.Count() > m_cacheSize) > + ClearHdrCache(); > + nsIMsgDBHdr *bar = hdr; Why do you need this temporary instead of just using hdr directly? @@ +563,5 @@ > } > > +void nsMsgDatabase::ClearHdrCache() > +{ > + if (m_cachedHeaders.IsInitialized()) Is there any reason why we can't just initialize m_cachedHeaders to the default value in the initializer (like you do for the inuse cache) and get rid of all of these .IsInitialized() checks? @@ +597,5 @@ > + m_headersInUse.Put(key, hdr); > + > + // the hash table won't add ref, we'll do it ourselves > + // stand for the addref that CreateMsgHdr normally does. > + NS_ADDREF(hdr); This is really odd, non-standard behavior. Why do this here instead of the more conventional way of doing the addref in CreateMsgHdr? Also note that the comment in CreateMsgHdr now says: // don't need to addref here; GetHdrFromUseCache addrefs. which is wrong, if the odd behavior is kept, the addref is in AddHdrToUseCache. @@ +627,5 @@ > nsMsgHdr *msgHdr = new nsMsgHdr(this, hdrRow); > if(!msgHdr) > return NS_ERROR_OUT_OF_MEMORY; > msgHdr->SetMessageKey(key); > // don't need to addref here; GetHdrFromUseCache addrefs. As I said earlier, this comment is incorrect, and I would prefer if the addref was done here, in the conventional place. I don't expect *msgHdr = new ... to return an addrefed value. @@ +4029,3 @@ > { > nsresult rv = InitRefHash(); > if (NS_FAILED(rv)) Unlike the normal NS_ERROR_FAILURE from this method, a failure at this point is something that the devs should know about. Could we at least make this NS_ENSURE_SUCCESS(rv, rv) so that we see a warning if this is failing? Otherwise this is a completely silent failure. @@ +4037,5 @@ > +} > + > +void nsMsgDatabase::AddRefToHash(nsCString &reference, nsMsgKey threadId) > +{ > + if (m_msgReferences.IsInitialized()) Is there any good reason why we are not initializing here if needed? Previously this was a complex pain, but now it is relatively easy to add. It was not clear to me from looking a couple of levels deep that we could guarantee initialization. @@ +4052,5 @@ > > for (int32_t i = 0; i < numReferences; i++) > { > nsAutoCString reference; > Inserted line of just spaces? Please trim. @@ +4105,2 @@ > > // Populate table with references of existing messages Is there some reason that you are removing the null check here? In my own code, I would use NS_ENSURE_STATE(enumerator) here, as that is the shortest available version. That has not caught on in mailnews, but I am happy for you to start a trend :) @@ +4487,5 @@ > return NS_OK; // it's not an error not to find a msg hdr. > } > > nsIMsgDBHdr *nsMsgDatabase::GetMsgHdrForSubject(nsCString &subject) > { This method AFAICT has no users. Perhaps it makes more sense to just remove it instead of fixing it.
Attachment #728244 - Flags: review?(kent) → review-
(In reply to Kent James (:rkent) from comment #4) > I am suggesting that we cleanup at least one of memory management oddities, > namely doing an addref in "new nsMsgHdr" I originally avoided doing that to minimize regressions and avoid making the patch even larger, but I'm not against doing that in this patch. > > ::: mailnews/db/msgdb/src/nsMsgDatabase.cpp > @@ +462,5 @@ > > + hdr->GetMessageKey(&key); > > + // TODO: Implement LRU replacement. > > + if (m_cachedHeaders.Count() > m_cacheSize) > > + ClearHdrCache(); > > + nsIMsgDBHdr *bar = hdr; > > Why do you need this temporary instead of just using hdr directly? I honestly have no idea what I was doing here. > @@ +563,5 @@ > > } > > > > +void nsMsgDatabase::ClearHdrCache() > > +{ > > + if (m_cachedHeaders.IsInitialized()) > > Is there any reason why we can't just initialize m_cachedHeaders to the > default value in the initializer (like you do for the inuse cache) and get > rid of all of these .IsInitialized() checks? The original reason, IIRC, was to honor the cache size if a non-custom value is set later. Looking in more detail in the code, this isn't an issue. > @@ +4037,5 @@ > > +} > > + > > +void nsMsgDatabase::AddRefToHash(nsCString &reference, nsMsgKey threadId) > > +{ > > + if (m_msgReferences.IsInitialized()) > > Is there any good reason why we are not initializing here if needed? > Previously this was a complex pain, but now it is relatively easy to add. It > was not clear to me from looking a couple of levels deep that we could > guarantee initialization. InitRefHash is a fairly expensive method. Looking at the code, this may be a performance enhancement: we only fill the table if we need to grab a reference to it. If we're trying to add an entry and it doesn't exist, then it would be filled properly when we lazily create the table at use. > @@ +4105,2 @@ > > > > // Populate table with references of existing messages > > Is there some reason that you are removing the null check here? > In my own code, I would use NS_ENSURE_STATE(enumerator) here, as that is the > shortest available version. That has not caught on in mailnews, but I am > happy for you to start a trend :) It is my understanding that ::operator new is now infallible in mozilla code, so it would abort instead of return NULL.
Joshua, I know you're quite busy, so just touching base before this possibly bitrots horribly ... Is there any reason we might want this for TB31? (This probably should bake well and ride the train if we go out for TB31)
Flags: needinfo?(Pidgeot18)
(In reply to Wayne Mery (:wsmwk) from comment #6) > Joshua, I know you're quite busy, so just touching base before this possibly > bitrots horribly ... > > Is there any reason we might want this for TB31? > > (This probably should bake well and ride the train if we go out for TB31) The main benefit of this is cleaning up some old code, so it's not at all a priority.
Flags: needinfo?(Pidgeot18)
Jorg, would these pointer games be something for you? There was a review done, the patch just needs polishing.
Flags: needinfo?(jorgk)
OS: Windows 7 → All
Hardware: x86_64 → All
Hey, Aceman, you went into clean-up mode, but I'm still trying to fix bugs our users will benefit from.
Flags: needinfo?(jorgk)
If this reduces failure points or helps the memory reporters be more accurate (which they currently do not) - and I don't know that it will - then it will be a win for users, just not user facing.
Status: ASSIGNED → NEW
Whiteboard: [patchlove]
Refreshed to latest trunk. Sadly there are a few things that didn't compile here. Most importantly classes nsDataHashtable and nsInterfaceHashtable don't have methods 'Init' and 'IsInitialised' any more. Looks like the need to be initialised in the constructor now. The worst error is in nsMsgDatabase::CreateMsgHdr(): nsMsgDatabase.cpp(685): error C2664: 'nsMsgHdr::nsMsgHdr(const nsMsgHdr &)': cannot convert argument 2 from 'nsIMdbRow *' to 'already_AddRefed<nsIMdbRow>' One option to fix this was to revert these changes: - nsMsgHdr(nsMsgDatabase *db, nsIMdbRow *dbRow); + nsMsgHdr(nsMsgDatabase *db, already_AddRefed<nsIMdbRow> dbRow); - nsIMdbRow *m_mdbRow; + nsCOMPtr<nsIMdbRow> m_mdbRow; -nsMsgHdr::nsMsgHdr(nsMsgDatabase *db, nsIMdbRow *dbRow) +nsMsgHdr::nsMsgHdr(nsMsgDatabase *db, already_AddRefed<nsIMdbRow> dbRow) The second option was to use a static cast. No idea whether that's correct or not. Surprisingly TB even runs with this patch ;-) (In reply to :aceman from comment #8) > Jorg, would these pointer games be something for you? There was a review > done, the patch just needs polishing. That's extreme wishful thinking. Apart from being a pain to refresh, the underlying technology has changed and some stuff doesn't compile any more, most likely since M-C tightened some screws. Aceman, would you like to lend a hand here before we ask Kent for help?
Flags: needinfo?(acelists)
Yes, I can look at the new version.
Attached patch 853881-kill-pldhash.patch (v2). (obsolete) (deleted) — Splinter Review
Hi Kent, I've refreshed the patch and solved all the compile problems. However, I inserted a static cast as marked with XXX. I have addressed some of the issues from your comment #4 (removed 'bar' and 'IsInitialised' had to be removed anyway since the base class doesn't have such a method (any more?)). There are a few comments left re. refcounting and error handling. Since this abandoned code fell into my lap and I spent a few hours getting it into shape, could I ask you for your collaboration here to make the adjustments you require yourself. Sorry for the unusual and/or outrageous request, but the idea here is to save Joshua's work, and instead of going through a few loops your direct input would be much appreciated. Aceman, I solved the compile problems myself, and I'm asking Kent to finish it off.
Attachment #8839599 - Attachment is obsolete: true
Flags: needinfo?(acelists)
Attachment #8839731 - Flags: review?(rkent)
OK, then. But I played with the patch and saw some compile warnigs: 1. reorder variables in the constructor (according to the order in nsMsgDatabase.h): In file included from tbird-bin/dist/include/nsMailDatabase.h:10:0, from mailnews/db/msgdb/src/nsMsgDatabase.cpp:10: tbird-bin/dist/include/nsMsgDatabase.h: In constructor ‘nsMsgDatabase::nsMsgDatabase()’: tbird-bin/dist/include/nsMsgDatabase.h:438:48: warning: ‘nsMsgDatabase::m_headersInUse’ will be initialized after [-Wreorder] nsDataHashtable<nsMsgKeyHashKey, nsMsgHdr *> m_headersInUse; ^ tbird-bin/dist/include/nsMsgDatabase.h:434:12: warning: ‘uint32_t nsMsgDatabase::m_cacheSize’ [-Wreorder] uint32_t m_cacheSize; ^ mailnews/db/msgdb/src/nsMsgDatabase.cpp:924:1: warning: when initialized here [-Wreorder] nsMsgDatabase::nsMsgDatabase() ^ 2. not sure yet what to do with this one. Do we have a problem if hdr is not added to cache? Can we just ignore the return value? mailnews/db/msgdb/src/nsMsgDatabase.cpp: In member function ‘void nsMsgDatabase::AddHdrToCache(nsIMsgDBHdr*, nsMsgKey)’: mailnews/db/msgdb/src/nsMsgDatabase.cpp:505:55: warning: ignoring return value of ‘bool nsBaseHashtable<KeyClass, DataType, UserDataType>::Put(nsBaseHashtable<KeyClass, DataType, UserDataType>::KeyType, const UserDataType&, const fallible_t&) [with KeyClass = mozilla::mailnews::MsgKeyHashKey; DataType = nsCOMPtr<nsIMsgDBHdr>; UserDataType = nsIMsgDBHdr*; nsBaseHashtable<KeyClass, DataType, UserDataType>::KeyType = const unsigned int&; nsBaseHashtable<KeyClass, DataType, UserDataType>::fallible_t = mozilla::fallible_t]’, declared with attribute warn_unused_result [-Wunused-result] m_cachedHeaders.Put(key, hdr, mozilla::fallible_t()); ^ 3. looks like this function just needs removing: mailnews/db/msgdb/src/nsMsgDatabase.cpp: At global scope: mailnews/db/msgdb/src/nsMsgDatabase.cpp:3950:1: warning: ‘void msg_DHashFreeStringKey(PLDHashTable*, PLDHashEntryHdr*)’ defined but not used [-Wunused-function] msg_DHashFreeStringKey(PLDHashTable* aTable, PLDHashEntryHdr* aEntry)
Aceman, can you be a charm and fix those warnings for me and then r?rkent again. Re. item 2: We have two options: Either put a MOZ_ASSEERT(???, "Logic error: Entry already in cache"); or explicitly discard the return value. This is the code that got removed: -nsresult nsMsgDatabase::AddHdrToCache(nsIMsgDBHdr *hdr, nsMsgKey key) // do we want key? We could get it from hdr -{ - if (m_bCacheHeaders) - { - if (!m_cachedHeaders) - m_cachedHeaders = PL_NewDHashTable(&gMsgDBHashTableOps, (void *) nullptr, sizeof(struct MsgHdrHashElement), m_cacheSize ); - if (m_cachedHeaders) - { - if (key == nsMsgKey_None) - hdr->GetMessageKey(&key); - if (m_cachedHeaders->entryCount > m_cacheSize) - ClearHdrCache(true); - PLDHashEntryHdr *entry = PL_DHashTableOperate(m_cachedHeaders, (void *) key, PL_DHASH_ADD); - if (!entry) - return NS_ERROR_OUT_OF_MEMORY; // XXX out of memory - - MsgHdrHashElement* element = reinterpret_cast<MsgHdrHashElement*>(entry); - element->mHdr = hdr; - element->mKey = key; - NS_ADDREF(hdr); // make the cache hold onto the header - return NS_OK; - } - } - return NS_ERROR_FAILURE; So as far as I can see, the entry is just added, duplicate or not, hmm. What do you read here?
Whiteboard: [patchlove]
If it is a duplicate and it can't be added, we could silently ignore it. I rather asked if it can't be added to cache due to out of memory or something. The original code returns with NS_ERROR_OUT_OF_MEMORY. But it seems we freely empty the whole cache if it is full so it seems it's not critical if an item isn't in the cache (can't be added). I would just ignore the return value of the m_cachedHeaders.Put(). In the new code AddHdrToCache is void so we never check if it succeeded. It probably doesn't matter.
Attached patch 853881-kill-pldhash.patch (v2.1) (obsolete) (deleted) — Splinter Review
Solved the 3 compile warnings.
Attachment #8839731 - Attachment is obsolete: true
Attachment #8839731 - Flags: review?(rkent)
Attachment #8839830 - Flags: review?(rkent)
Comment on attachment 8839830 [details] [diff] [review] 853881-kill-pldhash.patch (v2.1) Review of attachment 8839830 [details] [diff] [review]: ----------------------------------------------------------------- It still applies and compiles, and I ran it and nothing seemed to blow up. Also looked over the code and did not see any obvious issues. Many of us have looked at this, I think this is enough, so let's just land it. r+=me
Thanks for looking over it again, but in comment #4 you had some issues which are *not* resolved, let me repeat them here: ==== Quote from comment #4 ==== I am suggesting that we cleanup at least one of memory management oddities, namely doing an addref in "new nsMsgHdr". @@ +597,5 @@ > + m_headersInUse.Put(key, hdr); > + > + // the hash table won't add ref, we'll do it ourselves > + // stand for the addref that CreateMsgHdr normally does. > + NS_ADDREF(hdr); This is really odd, non-standard behavior. Why do this here instead of the more conventional way of doing the addref in CreateMsgHdr? Also note that the comment in CreateMsgHdr now says: // don't need to addref here; GetHdrFromUseCache addrefs. which is wrong, if the odd behavior is kept, the addref is in AddHdrToUseCache. @@ +627,5 @@ > nsMsgHdr *msgHdr = new nsMsgHdr(this, hdrRow); > if(!msgHdr) > return NS_ERROR_OUT_OF_MEMORY; > msgHdr->SetMessageKey(key); > // don't need to addref here; GetHdrFromUseCache addrefs. As I said earlier, this comment is incorrect, and I would prefer if the addref was done here, in the conventional place. I don't expect *msgHdr = new ... to return an addrefed value. ==== End Quote from comment #4 ==== There is this additional issue where I had to add a static cast: + // XXX Compile problem here: Added static_cast. + nsMsgHdr *msgHdr = new nsMsgHdr(this, static_cast<already_AddRefed<nsIMdbRow>>(hdrRow)); I noticed that I also haven't addressed these: ==== Quote from comment #4 ==== @@ +4029,3 @@ > { > nsresult rv = InitRefHash(); > if (NS_FAILED(rv)) Unlike the normal NS_ERROR_FAILURE from this method, a failure at this point is something that the devs should know about. Could we at least make this NS_ENSURE_SUCCESS(rv, rv) so that we see a warning if this is failing? Otherwise this is a completely silent failure. @@ +4487,5 @@ > nsIMsgDBHdr *nsMsgDatabase::GetMsgHdrForSubject(nsCString &subject) > { This method AFAICT has no users. Perhaps it makes more sense to just remove it instead of fixing it. ==== End Quote from comment #4 ==== While I can address the last two issues, I think I need your input on the ones above.
Flags: needinfo?(rkent)
Actually, what is the benefit of - nsMsgHdr(nsMsgDatabase *db, nsIMdbRow *dbRow); + nsMsgHdr(nsMsgDatabase *db, already_AddRefed<nsIMdbRow> dbRow); - nsIMdbRow *m_mdbRow; + nsCOMPtr<nsIMdbRow> m_mdbRow; -nsMsgHdr::nsMsgHdr(nsMsgDatabase *db, nsIMdbRow *dbRow) +nsMsgHdr::nsMsgHdr(nsMsgDatabase *db, already_AddRefed<nsIMdbRow> dbRow) which makes this + // XXX Compile problem here: Added static_cast. + nsMsgHdr *msgHdr = new nsMsgHdr(this, static_cast<already_AddRefed<nsIMdbRow>>(hdrRow)); necessary? Looking at the caller of nsMsgDatabase::CreateMsgHdr() where the static cast was added, the get the rows from: m_mdbStore->GetRow() mRowCursor->NextRow(), mRowCursor->PrevRow() m_mdbStore->NewRow() m_mdbStore->FindRow() so I guess those rows are already AddRef'ed, so the static cast doesn't do any damage. So with this conclusion there are still the issues I repeated in comment #19 from comment #4, that is, the odd behaviour that "new nsMsgHdr" returns an AddRef'ed object where the AddRef happens in AddHdrToUseCache. OK, writing these comments, I had to look into the issues further, let me come up with another patch. So far I've only refreshed it and made it compile, so now I have to understand the issues :-(
Flags: needinfo?(rkent)
(In reply to Jorg K (GMT+1) from comment #20) > Looking at the caller of nsMsgDatabase::CreateMsgHdr() where the static cast > was added, they get the rows from: > m_mdbStore->GetRow() > mRowCursor->NextRow(), mRowCursor->PrevRow() > m_mdbStore->NewRow() and m_mdbStore->NewRowWithOid() > m_mdbStore->FindRow() > so I guess those rows are already AddRef'ed, so the static cast doesn't do > any damage. OK, just for a bit of homework I looked at all those Mork functions: morkStore::GetRow() calls morkRow::AcquireRowObject()/morkRow::AcquireRowObject and that does an AddRef() on the row it returns. morkTableRowCursor::PrevRow() and morkTableRowCursor::NextRow() also run the acquire, same for morkStore::NewRowWithOid() and morkStore::NewRow(). And also morkStore::FindRow().
I spent a lot of time trying to understand the issues here: 1) Why is it safe to do that static cast, answer in comment #21. I added a comment to that effect. 2) Added NS_ENSURE_SUCCESS(rv, rv); after InitRefHash(); as Kent had requested. 3) Remove unused function nsMsgDatabase::GetMsgHdrForSubject() Also: 4) Shifted the NS_ADDREF(hdr); from nsMsgDatabase::AddHdrToUseCache() to a NS_ADDREF(*result = msgHdr) in nsMsgDatabase::CreateMsgHdr(). In the end I tested with the patch and at first it crashed at shutdown, upon further testing it crashed after visiting a few folders, local folders, news folders, IMAP folders. The crash is always at: morkRowObject.cpp#572 - MORK_ASSERT(row->mRow_Object == this); with an access violation. Row is not null, but row->mRow_Object can't be accessed. Stack: > xul.dll!morkRowObject::CloseRowObject(morkEnv * ev) Line 572 C++ xul.dll!morkRowObject::~morkRowObject() Line 61 C++ [External Code] xul.dll!morkObject::Release() Line 35 C++ xul.dll!morkRowObject::Release() Line 88 C++ xul.dll!nsMsgHdr::~nsMsgHdr() Line 108 C++ [External Code] xul.dll!nsMsgHdr::Release() Line 18 C++ I've also undone 4) and it still blows up big time. Form comment #8: > the patch just needs polishing Haha, that was just wishful thinking. And debugging it now that the original author is not available, won't be much fun at all. Aceman, can you apply version 2.1 locally and bash some folders and see whether it crashes for you.
Flags: needinfo?(acelists)
Yes, I get the same crash: #04: morkRowObject::CloseRowObject(morkEnv*) (TB-hg/db/mork/src/morkRowObject.cpp:572) #05: morkRowObject::~morkRowObject() (TB-hg/db/mork/src/morkNode.h:254) #06: morkRowObject::~morkRowObject() (TB-hg/db/mork/src/morkRowObject.cpp:63) #07: morkObject::Release() (TB-hg/db/mork/src/morkObject.cpp:35 (discriminator 5)) #08: morkRowObject::Release() (TB-hg/db/mork/src/morkRowObject.cpp:88) #09: nsMsgHdr::~nsMsgHdr() (TB-hg/tbird-bin/dist/include/nsTArray.h:397) #10: nsMsgHdr::~nsMsgHdr() (TB-hg/mailnews/db/msgdb/src/nsMsgHdr.cpp:108) #11: nsMsgHdr::Release() (TB-hg/mailnews/db/msgdb/src/nsMsgHdr.cpp:18 (discriminator 5))
Flags: needinfo?(acelists)
Comment on attachment 8839830 [details] [diff] [review] 853881-kill-pldhash.patch (v2.1) This is an obsolete r? from an earlier patch. Looking at the crash issues associated with the row, really the attempted improvements in row refcount handling is not really related to the core of this patch, which is to remove the pldhash stuff. We should really just return that row handling to its earlier state and try to land the majority of the patch. On other issues with the patch, I review the older patch and saw the same issues I commented on, but they are not all that important compared to just getting this landed.
Attachment #8839830 - Flags: review?(rkent)
Same as (2.1) but refreshed on Windows.
Attachment #8839830 - Attachment is obsolete: true
Attached patch 853881-kill-pldhash.patch (v3). (deleted) — Splinter Review
OK, this doesn't crash any more. As Kent suggested, I reverted some of the ref-counting business. Also addressed here: - removed nsMsgDatabase::GetMsgHdrForSubject() - Added NS_ENSURE_SUCCESS(rv, rv) after InitRefHash(). - Shifted Addref from AddHdrToUseCache() to GetHdrFromUseCache() for more standard behaviour as requested by Kent. I hope the interdiff will show the changes. I guess this is good to go now after a try run.
Yep, interdiff is our friend here and shows exactly what I mentioned: https://bugzilla.mozilla.org/attachment.cgi?oldid=8860312&action=interdiff&newid=8860334&headers=1 Let's give this a good workout: https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=a835ed3abc23d3ea8a2304dd09531bb5d12e1b6b Oops, Mac didn't compile due to bug 1358444, so here's another one for Mac: https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=c99ae0b96aaf675ce033c52819fdf28fb16f874a BTW, landing that on try gave me the opportunity to look over the changes in a different format than "Splinter review", so I think they look good now.
Comment on attachment 8860382 [details] [diff] [review] 1358444-comma2.patch (v4). Oops, attached to wrong bug, the try is still relevant.
Attachment #8860382 - Attachment is obsolete: true
Attachment #8860334 - Attachment is obsolete: false
I'm going to land (v4) in the next comment. For the record: (v4) has the original code: nsMsgHdr::~nsMsgHdr() { if (m_mdbRow) { if (m_mdb) { NS_RELEASE(m_mdbRow); m_mdb->RemoveHdrFromUseCache(this, m_messageKey); } } NS_IF_RELEASE(m_mdb); } (v3) had: nsMsgHdr::~nsMsgHdr() { if (m_mdb) m_mdb->RemoveHdrFromUseCache(this, m_messageKey); NS_IF_RELEASE(m_mdb); } (v3) and (v4) do not crash :-), but it appears that (v3) does less free'ing. The following (v3-crash) crashes in exactly the same way (v2.1) crashes, see comment #22 and comment #23. 'row' was basically free'ed already when accessing 'row->mRow_Object': https://dxr.mozilla.org/comm-central/rev/5e4e889f13eb1fd5091f60921a1566c661f2c630/db/mork/src/morkRowObject.cpp#572 So a case of double-free: nsMsgHdr::~nsMsgHdr() { NS_IF_RELEASE(m_mdbRow); <==== added if (m_mdb) m_mdb->RemoveHdrFromUseCache(this, m_messageKey); NS_IF_RELEASE(m_mdb); } Aha!! Let's look again at (2.1). That has a tricky ref-counting scheme where - nsIMdbRow *m_mdbRow; + nsCOMPtr<nsIMdbRow> m_mdbRow; m_mdbRow is ref counted, this is apparently the same as (v3-crash) since m_mdbRow will be released when ref-counted down in nsMsgHdr::~nsMsgHdr(), only that it's already been free'ed. So Kent, with this research, do you understand the situation? Can you make sense of the code in nsMsgHdr::~nsMsgHdr()? I went for the non-crashing original version. In fact, like this, nsMsgHdr.cpp/nsMsgHdr.h don't change at all in this patch, apart from some type adjustments.
Flags: needinfo?(rkent)
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 55.0
Attachment #8860569 - Attachment description: 853881-kill-pldhash.patch (v4). → 853881-kill-pldhash.patch (v4) [landed in comment #34]
Hmm, after this push I see Mozmill failures on Windows, but not Linux or Mac (Mac has other failures now, bug 1358711). So another try with this backed out: https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=91e16cded787765b69b00208cb161400db74f669 Looks like the crash happens in: https://archive.mozilla.org/pub/thunderbird/tinderbox-builds/comm-central-win32-debug/1492843819/comm-central_win7_ix-debug_test-mozmill-bm112-tests1-windows-build19.txt.gz TEST-START | C:\slave\test\build\tests\mozmill\folder-display\test-selection.js | test_selection_extension I'll run it locally.
Test passes locally, so let's see what the next Daily does.
Flags: needinfo?(acelists)
I only see a failure in mozmill\folder-display\test-mail-views.js | test_verify_saved_mail_view . The test-selection.js isn't even in the log. Is TB crashing before that test?
Flags: needinfo?(acelists)
Hmm, are we looking at the same log? https://archive.mozilla.org/pub/thunderbird/nightly/2017/04/2017-04-22-03-02-07-comm-central/comm-central_win7_ix_test-mozmill-bm126-tests1-windows-build18.txt.gz 04:27:07 INFO - TEST-START | C:\slave\test\build\tests\mozmill\folder-display\test-mail-views.js | test_verify_saved_mail_view 04:27:07 INFO - Timeout: bridge.execFunction("1d7ea19e-274e-11e7-9660-002590c5f2bf", bridge.registry["{c5fef04f-61af-4bb1-8e21-092fdea3ac65}"]["runTestDirectory"], ["C:\\slave\\test\\build\\tests\\mozmill\\folder-display"]) 04:27:07 WARNING - TEST-UNEXPECTED-FAIL | Disconnect Error: Application unexpectedly closed Well, I was looking in https://archive.mozilla.org/pub/thunderbird/tinderbox-builds/comm-central-win32-debug/1492843819/comm-central_win7_ix-debug_test-mozmill-bm112-tests1-windows-build19.txt.gz 01:44:07 INFO - TEST-START | C:\slave\test\build\tests\mozmill\folder-display\test-selection.js | test_selection_extension (snip) 01:46:08 INFO - mozcrash INFO | Copy/paste: folder-display C:\slave\test\build\tests\mozmill\mozmillprofile\minidumps\116ffb50-fac0-4846-9afe-d3ddfb5a5cc9.dmp C:\slave\test\build\symbols 01:46:08 INFO - Timeout: bridge.execFunction("e0b8bc8f-2736-11e7-951f-002590c5411b", bridge.registry["{aea58b0f-faba-4ec6-b59d-f144899822e9}"]["runTestDirectory"], ["C:\\slave\\test\\build\\tests\\mozmill\\folder-display"]) 01:46:08 WARNING - TEST-UNEXPECTED-FAIL | Disconnect Error: Application unexpectedly closed But there is also: https://archive.mozilla.org/pub/thunderbird/tinderbox-builds/comm-central-win32/1492843819/comm-central_win7_ix_test-mozmill-bm112-tests1-windows-build16.txt.gz 01:29:56 INFO - TEST-START | C:\slave\test\build\tests\mozmill\folder-display\test-message-commands-on-msgstore.js | test_mark_messages_replied 01:29:56 INFO - Timeout: bridge.execFunction("5f7802e1-2735-11e7-af6e-002590c669e3", bridge.registry["{90d5f1a7-2a4b-406e-b1b0-e914e62050ff}"]["runTestDirectory"], ["C:\\slave\\test\\build\\tests\\mozmill\\folder-display"]) 01:29:56 WARNING - TEST-UNEXPECTED-FAIL | Disconnect Error: Application unexpectedly closed Great, so the crash is moving around. Does the code in comment #33 make any sense to you? You've worked on aspects of the database.
Flags: needinfo?(acelists)
I can't see any of those failures locally, even running whole folder-display folder passes fine (on Linux). So it's not sure this patch caused the problem. I'm not familiar with the code in question. I'd think jcranmer would know best as the author of the rework.
Flags: needinfo?(acelists)
I ran the whole folder with mozmake SOLO_TEST=folder-display mozmill-one and I do see the crash: [5224] ###!!! ASSERTION: row->mRow_Object == this: 'Error', file c:/mozilla-source/comm-central/db/mork/src/morkConfig.cpp, line 24 #01: morkRowObject::CloseRowObject (c:\mozilla-source\comm-central\db\mork\src\morkrowobject.cpp:573) #02: morkRowObject::~morkRowObject (c:\mozilla-source\comm-central\db\mork\src\morkrowobject.cpp:61) #03: morkRowObject::`scalar deleting destructor'[c:\mozilla-source\comm-central\obj-x86_64-pc-mingw32\dist\bin\xul.dll +0x77cd24] #04: morkObject::Release (c:\mozilla-source\comm-central\db\mork\src\morkobject.cpp:35) #05: morkRowObject::Release (c:\mozilla-source\comm-central\db\mork\src\morkrowobject.cpp:88) #06: nsMsgHdr::~nsMsgHdr (c:\mozilla-source\comm-central\mailnews\db\msgdb\src\nsmsghdr.cpp:111) #07: nsMsgHdr::`scalar deleting destructor'[c:\mozilla-source\comm-central\obj-x86_64-pc-mingw32\dist\bin\xul.dll +0xa0f6ec] #08: nsMsgHdr::Release (c:\mozilla-source\comm-central\mailnews\db\msgdb\src\nsmsghdr.cpp:18) #09: ReleaseObjects (c:\mozilla-source\comm-central\mozilla\xpcom\ds\nscomarray.cpp:272) #10: nsCOMArray_base::Clear (c:\mozilla-source\comm-central\mozilla\xpcom\ds\nscomarray.cpp:282) #11: nsArrayBase::~nsArrayBase (c:\mozilla-source\comm-central\mozilla\xpcom\ds\nsarray.cpp:41) #12: nsArrayCC::DeleteCycleCollectable (c:\mozilla-source\comm-central\mozilla\xpcom\ds\nsarray.cpp:50) #13: SnowWhiteKiller::~SnowWhiteKiller (c:\mozilla-source\comm-central\mozilla\xpcom\base\nscyclecollector.cpp:2656) #14: nsCycleCollector::FreeSnowWhite (c:\mozilla-source\comm-central\mozilla\xpcom\base\nscyclecollector.cpp:2840) #15: AsyncFreeSnowWhite::Run (c:\mozilla-source\comm-central\mozilla\js\xpconnect\src\xpcjsruntime.cpp:147) #16: nsThread::ProcessNextEvent (c:\mozilla-source\comm-central\mozilla\xpcom\threads\nsthread.cpp:1271) #17: XPTC__InvokebyIndex (c:\mozilla-source\comm-central\mozilla\xpcom\reflect\xptcall\md\win32\xptcinvoke_asm_x86_64.asm:99) #18: CallMethodHelper::Call (c:\mozilla-source\comm-central\mozilla\js\xpconnect\src\xpcwrappednative.cpp:1331) #19: XPCWrappedNative::CallMethod (c:\mozilla-source\comm-central\mozilla\js\xpconnect\src\xpcwrappednative.cpp:1296) #20: XPC_WN_CallMethod (c:\mozilla-source\comm-central\mozilla\js\xpconnect\src\xpcwrappednativejsops.cpp:983) It's the crash we already know. At least now I can try some options locally.
https://hg.mozilla.org/comm-central/rev/62bfaab2731fc47ee7ed26aefa001e63c4f2c371 (comment #34) https://hg.mozilla.org/comm-central/rev/5c3191edbbd4fe552450b582d520ba9be4cd431b The follow-up takes us to (v3) or (v2.1) as far as the DTOR of nsMsgHdr is concerned. That should avoid crashes due to double-freeing the same DB row twice. Gotta love Mork :-(
"Can you make sense of the code in nsMsgHdr::~nsMsgHdr()?" I've tried to understand the mork reference counting before with little success. It was all converted from very early Mozilla code, and (as we say here) trying to "fix" things usually causes more problems than it solves.
Flags: needinfo?(rkent)
Depends on: 1360873
Depends on: 1368786
Depends on: 1415723
Comment on attachment 8860569 [details] [diff] [review] 853881-kill-pldhash.patch (v4) [landed in comment #34] Review of attachment 8860569 [details] [diff] [review]: ----------------------------------------------------------------- Okay I took a look at this (I didn't dive in too deep, just made some notes for clarification). I think the general problem is you're addrefing the wrong stuff and doing a ton of churn. AFAICT previously: 1) m_CachedHeaders is supposed to be some sort of MRU cache, but it isn't really used correctly / at all 2) m_HeadersInUse is supposed to be some sort of mapping of all the headers that have been created, it holds refs (kind of...) Now you've made m_CachedHeaders be the thing that holds refs, but it has a lot of churn due to the MRU "algorithm". My guess is you're also leaking. ::: mailnews/db/msgdb/public/nsMsgDatabase.h @@ +139,5 @@ > + KeyType GetKey() const { return mValue; } > + bool KeyEquals(KeyTypePointer aKey) const { return *aKey == mValue; } > + > + static KeyTypePointer KeyToPointer(KeyType aKey) { return &aKey; } > + static PLDHashNumber HashKey(KeyTypePointer aKey) { return *aKey; } Note: this hash function is different than the previous: > return PLDHashNumber(NS_PTR_TO_INT32(aKey)); It's subtle but this is using an unsigned type directly, the NS_PTR_TO_INT32 is casting to a signed type. It probably doesn't matter, but it's different. Overall this could have just been: using MsgKeyHashKey = nsUint32HashKey; @@ -350,5 @@ > mdb_token m_threadNewestMsgDateColumnToken; > mdb_token m_offlineMsgOffsetColumnToken; > mdb_token m_offlineMessageSizeColumnToken; > > - // header caching stuff - MRU headers, keeps them around in memory I don't think this is ever actually used...like we add stuff to it and remove stuff from it, but never look up anything in it. TBH you could just delete this and file a bug to add it back. @@ +377,5 @@ > mdb_token m_offlineMsgOffsetColumnToken; > mdb_token m_offlineMessageSizeColumnToken; > > + // The following methods use m_cachedHeaders to keep strong references to > + // recent header objects around. You've inverted this. This is supposed to be the temporary most recently used cache (the implementation is awful but whatever that's not your fault) and shouldn't hold refs. @@ -354,5 @@ > - // header caching stuff - MRU headers, keeps them around in memory > - nsresult AddHdrToCache(nsIMsgDBHdr *hdr, nsMsgKey key); > - nsresult ClearHdrCache(bool reInit); > - nsresult RemoveHdrFromCache(nsIMsgDBHdr *hdr, nsMsgKey key); > - // all headers currently instantiated, doesn't hold refs So this is a lie, it definitely used to hold refs. @@ +382,5 @@ > + void AddHdrToCache(nsIMsgDBHdr *hdr, nsMsgKey key); > + void ClearHdrCache(); > + void RemoveHdrFromCache(nsIMsgDBHdr *hdr, nsMsgKey key); > + > + // These methods maintain a cache of all headers for the same message key. This isn't correct. They're supposed to hold all the existing headers mapped by key. @@ +430,5 @@ > return aMallocSizeOf(this) + SizeOfExcludingThis(aMallocSizeOf); > } > private: > uint32_t m_cacheSize; > + nsInterfaceHashtable<nsMsgKeyHashKey, nsIMsgDBHdr> m_cachedHeaders; Again this is inverted, this should be the nsDataHashtable and m_headersInUse should be the nsInterfaceHashtable. ::: mailnews/db/msgdb/src/nsMsgDatabase.cpp @@ +497,5 @@ > +{ > + if (key == nsMsgKey_None) > + hdr->GetMessageKey(&key); > + // TODO: Implement LRU replacement. > + if (m_cachedHeaders.Count() > m_cacheSize) And here is why it matters, you have a *ton* of addref churn since this is a interface table holding refs. @@ -510,5 @@ > - > - MsgHdrHashElement* element = static_cast<MsgHdrHashElement*>(entry); > - element->mHdr = hdr; > - element->mKey = key; > - NS_ADDREF(hdr); // make the cache hold onto the header But wait, this is addref'ing too, so ugh. Both tables were holding refs. @@ -834,2 @@ > msgHdr->SetMessageKey(key); > - // don't need to addref here; GetHdrFromUseCache addrefs. Except we didn't successfully retrieve it from GetHdrFromUseCache...nsMsHdr does call AddHeaderToUseCache with *does* addref it. So what's weird here is if we find it in the use cache we do an *additional* addref, but if we don't find it in use cache we don't addref it. Maybe that makes sense maybe it doesn't. That's kind of sort of fixed in your new version, but since you've inverted the meanings I think it's still probably messed up. @@ -837,2 @@ > > AddHdrToCache(msgHdr, key); But I guess AddHdrToCache used to addref too.
Thanks for the great analysis Eric. Jcranmer, can you follow from that? This really needs fixing and you may know best what you wanted to implement here. The patch should then go into bug 1415723. Thanks.
Flags: needinfo?(Pidgeot18)
I've backed this out in bug 1415723. We can try again in bug 1417018.
Flags: needinfo?(Pidgeot18)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: