Closed Bug 189832 Opened 22 years ago Closed 22 years ago

another case where fastload file can get corrupted (post bug 188744)

Categories

(Core :: XUL, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: jrgmorrison, Assigned: brendan)

References

(Blocks 1 open bug)

Details

(Whiteboard: fixed1.3)

Attachments

(2 files, 3 obsolete files)

This is a comment carried over from bug 188744. I need to look into this more. --- Additional Comment #5 From John Morrison 2003-01-16 22:35 ------- The corruption I'm seeing is reproducible as follows: 1) start nav; quit 2) start nav; quit 3) start nav; start msgcompo; quit 4) start nav; start messenger; triggers commonDialog; \ start messengercompose.xul; send a message (get \ progress dialog, and cert dialog [clientauthask.xul] \ quit 5) start messenger (-mail and avoid profile manager [-P]); \ get commonDialog login; start AccountManager.xul; \ ASSERT!!! and bork mObjectMap What I'm seeing in debug: I come in at nsFastLoadFileUpdater::Open, where it populates the object map. That all goes fine and I have 196 objects with first mOID being 8. Next, I pick it up on first call to nsFastLoadFileWriter::WriteObjectCommon after the PL_DHASH_ADD and inside of the !entry->mObject test, so we've added another entry. But entryCount is 196. Huh? When I inspect raw memory for mObjectMap, the entry for mOID of 8 is gone (?). Now, the entry count of 196 means we now get a duplicate mOID of 1576 in the map and that leads to trouble down the road. But I can't see where something would kill that first entry. [By the way, I have reproduced this in todays' mozilla win32 build so this corruption is independent of this fix, it would seem]. (I'm also referring to the untagged values for mOID [(index+1)<<3]).
I need to go back and confirm that I really did add an entry to the hash as described above. And need to trace through where the first entry is "going".
Status: NEW → ASSIGNED
Actually, I was somewhat confused earlier (what's new) > That all goes fine and I have 196 objects with first mOID being 8. Actually, that's not true. In nsFastLoadFileUpdater::Open, aReader->mFooter.mNumSharpObjects is equal to 197. I start populating the mObjectMap hash with the vector from the disk file. At index 80 of the vector, I get back an 'obj' (mReadEntry) with the same value that I had for the entry at index 0 of the vector. So, the PL_DHASH_ADD winds up overwriting that hash entry. At the end of the loop, 197 disk entries have resulted in 196 hash entries. (And the last entry has mOID of 1576 (index 196)). When it goes to write this mObjectMap out to disk, it can't find an entry with index 0 because it was overwritten above. It may also do an ABW, because it allocates a vector of 196 elements and then tries to copy into 'objvec[196]' (although I need to go check this).
[Just dumping some notes here for my own memory as I flail about in this code. Still pondering if this is an edge case that doesn't have bad affects or if it's something more...]. So, that first object (objvec[0]) is (no surprise) an nsSystemPrincipal (nsIPrincipal) with the initial serializing call to nsFastLoadFileWriter::WriteObjectCommon coming from nsTranscodeJSPrincipals. objvec[80] (i.e., mOID 648) gets added to object map when I open messengercompose.xul from navigator.xul, after a session where I had previously serialized all of navigator.xul into the fastload file. (Note below that objvec[0] is read in with obj of 0x9. (Didn't I earlier serialize an obj ref into that slot?)). *x*: StartMuxDoc 035CB8C8, R, 016C9F38, 00000000, chrome://messenger/content/messengercompose/messengercompose.xul $$$: nsFastLoadFileUpdater::Open, i = 0, obj = 0x00000009, oid = 8, off = 709 (2c5), str = 807, wk = 0 $$$: nsFastLoadFileUpdater::Open, i = 1, obj = 0x00000011, oid = 16, off = 304195 (4a443), str = 1, wk = 0 $$$: nsFastLoadFileUpdater::Open, i = 2, obj = 0x00000019, oid = 24, off = 305865 (4aac9), str = 1, wk = 0 ...
I've found that I can get a number of related assertions when I just open with '-browser -mail -edit' and then clicking around in the pref tree to switch panels. I also got one hang and one crash in a current trunk opt. build, so we still need to fix this bug. It all revolves around the serialization / deserialization of the nsSystemPrincipal object in a situation where there are several "not-live" instances in the fastload file and one (or more) instance that are live in memory and then triggering nsFastLoadFileUpdater::Open for a dialog that is not already in the fastload file. However, I don't really understand the problem fully enough to directly fix it for 1.3. I do know (or think I know) how I can add a band-aid to ::Open so that it will defend against some of the badness, and I'm going to work on that. Simpler testcase: 1) set up mailnews to not login to the server on startup (so we don't have to deal with commonDialog.xul). 2) delete the fastload file. 3) start with "./mozilla -P myProf -browser" and quit. 4) start with "./mozilla -P myProf -browser", open message compose and quit. 5) start with "./mozilla -P myProf -mail", and quit. 6) start with "./mozilla -P myProf -mail", and open any dialog (which given the above will not be in the fastload file). 7) mObjectMap will be corrupted (actually, add an assertion to the top of nsFastLoadFileWriter::WriteSharpObjectInfo that says the following to know that you are doomed) NS_ASSERTION(aInfo.mCIDOffset != 0 && aInfo.mCIDOffset < 5000000, "about to hork aInfo.mCIDOffset!!"); [where 5000000 is an arbitrary upper bound on the size of the fastload file; it should never be this big in practice.] The (sucky) defense is that in nsFastLoadFileUpdater::Open, the mCIDOffset values must monotonically increase by definition. If they don't, then the fastload file is broken and we should (unfortunately, no other choice right now) abort the fastload. I just have to make sure that the higher level xul prototype and js code will recover gracefully from an abort at this point.
Flags: blocking1.3b+
(this is the real fastload blocker, methinks).
Flags: blocking1.3b+ → blocking1.3b?
I need to test this further and think about whether this is what I want to do for 1.3b. But this seems like a reasonable way to guard against this bug (albeit by occasionally blowing away the fastload file :-/ ).
Flags: blocking1.3b? → blocking1.3b+
Comment on attachment 112946 [details] [diff] [review] patch to guard against bogus entries in the object map actually, I don't have any changes to make. This is the patch. jst: could you r or sr? I went over this with jag for about a half hour already. If you have any questions, let me know.
Attachment #112946 - Attachment description: draft patch to guard against bogus entries in the object map → patch to guard against bogus entries in the object map
Attachment #112946 - Flags: superreview?(jst)
Attachment #112946 - Flags: review?(jaggernaut)
I should note that at '@@ -1624,6 +1631,11 @@' where I don't bail out, it's okay because we will pick the bogus zero offset out at the beginning of the next fastload session, in either of the two other places where I bail.
Comment on attachment 112946 [details] [diff] [review] patch to guard against bogus entries in the object map + //XXXjrgm bug #189832 [temp. hack]; if an offset is zero, we've hit + // the bug and most abort fastload now. + NS_ASSERTION(aInfo->mCIDOffset != 0, + "fastload reader: Offset into file cannot be zero!"); + if (aInfo->mCIDOffset == 0) + return NS_ERROR_UNEXPECTED; Change assert(!foo); if (foo) { ... } to if (!foo) { NS_ERROR(); ... }, and replace "most" with "must" in the comment. Same thing in the last hunk of this patch. sr=jst with that change.
Attachment #112946 - Flags: superreview?(jst) → superreview+
Comment on attachment 112946 [details] [diff] [review] patch to guard against bogus entries in the object map r=ben@netscape.com
Attachment #112946 - Flags: review?(jaggernaut) → review+
Comment on attachment 112946 [details] [diff] [review] patch to guard against bogus entries in the object map a=asa (on behalf of drivers) for checkin to 1.3beta
Attachment #112946 - Flags: approval1.3b+
Checked in to trunk. (Discussed on irc with jst, and leaving this as is; when we get a real fix, the assertions should stay, but the memset becomes ifdef debug, and the runtime checks will be removed.) I realized that on bailing out from the updater, we don't arrange to nuke the fastload file, although the check in the reader must always happen before the check in the updater so this is still an ok fix and a file with a bogus entry won't be used and will be destroyed. Leaving open to look into a better fix [and removing blocking1.3b+] (Also, I can still get "ASSERTION: redundant multiplexed document?: 'docMapEntry->mString == nsnull'" when I start with all three of navigator, composer and mailnews concurrently, but doesn't seem to harmful at first look. I can also get that assertion while rapidly clicking through pref panels that are not yet in the fastload file. [And, I also seem to get a crash, shortly after that assertion for pref panels, in PresShell::sPaintSuppressionCallback where the presshell is trashed. But I don't know how that would relate to the mux assertion].)
Flags: blocking1.3b+ → blocking1.3b-
Depends on: 192341
Maybe my comment is useful, since it is correlated with fastload: Since today, without reason, mozilla has started to freeze EACH time I select Bookmarks/ManageBookmarks, while all the other features work normally. top prints that mozilla uses 99% of CPU. I started mozilla several times, but the problem persisted. I replaced bookmarks.html from my profile with a fresh one, but the problem persisted. I read on Internet that starting with version 1.2 the fastload was introduced and, if mozilla freezes, remove XUL.mfasl. I removed it, and the Bookmarks/ManageBookmarks works again... I can send you my XUL.mfasl if you need it (4MB), maybe you want to study this corrupted file. I use Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021226 Debian/1.2.1-9
Eugen, you are using a Mozilla build from last December. As this bug is summarized "...post bug 188744" and bug 188744 was checked in in mid-January, I don't think John is interested in your fastload file. Additionally, the comment right before your one says that a fix was checked in at January 30th. So I think only fastload files from after January 17th/30th are interesting now. John also states this in bug 169777 comment #172. Thanks for trying to help anyway.
Attached patch proposed fix (obsolete) (deleted) — Splinter Review
Testing help craved. /be
Attached patch remove bogus assertion (obsolete) (deleted) — Splinter Review
Thanks to jrgm for the testing help. /be
Attachment #114270 - Attachment is obsolete: true
We should fix this for 1.3. /be
Assignee: jrgm → brendan
Status: ASSIGNED → NEW
Flags: blocking1.3?
Flags: blocking1.3? → blocking1.3+
This should do it. /be
Attachment #114286 - Attachment is obsolete: true
Attachment #114376 - Flags: superreview?(ben)
Attachment #114376 - Flags: review?(jrgm)
Ok, I'm done. /be
Attachment #114376 - Attachment is obsolete: true
Attachment #114376 - Flags: superreview?(ben)
Attachment #114376 - Flags: review?(jrgm)
Attachment #114469 - Flags: superreview?(ben)
Attachment #114469 - Flags: review?(jrgm)
Comment on attachment 114469 [details] [diff] [review] optimize nsFastLoadFileUpdater::Open to minimize Tells and Seeks sr=ben@netscape.com
Attachment #114469 - Flags: superreview?(ben) → superreview+
I've thrown this build down the stairs in about as many weird ways as I could imagine, and it all works correctly. The fastload file looks sane, and there are no obvious problems. It definitely solves the specific case noted at the beginning of this bug. I also checked perf on linux and I don't see any significant change in startup time. Patch looks good! r=jrgm, and thanks! [Note: I can get some "redundant multiplexed document?" assertions, either, in some cases, by starting multiple master documents concurrently (e.g., mail, edit, browser), or by rapidly changing panels in the pref dialog. But that is a separate issue from the hangs since it appears to defend correctly. I filed bug 193530 for investigation].
Comment on attachment 114469 [details] [diff] [review] optimize nsFastLoadFileUpdater::Open to minimize Tells and Seeks r=jrgm
Attachment #114469 - Flags: review?(jrgm)
Attachment #114469 - Flags: review+
Attachment #114469 - Flags: approval1.3?
Attachment #114469 - Flags: approval1.3? → approval1.3+
Fix checked in for 1.3 final. Many thanks to jrgm -- onward to bug 193530! /be
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Whiteboard: fixed1.3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: