Closed Bug 59102 Opened 24 years ago Closed 24 years ago

Last install dir skin is clobbered by last profile dir skin

Categories

(Core Graveyard :: RDF, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bugs, Assigned: waterson)

References

()

Details

(Whiteboard: [rtm++])

Attachments

(2 files)

OK, this is an interesting bug ;) Two ways to reproduce: Use the installer and run a recommended install. Launch the app and go to View->Apply Theme. See the two themes supplied, Modern and Classic. Go to http://gooey/client/themes/ and install (select if you like, but not important, only installation is required) two of the skins there. Alternately, http://jrgm/bugs/58515 has variations on classic to install. Quit and restart the app. The Classic skin (from the install directory) is no longer in either the menu or in the list in preferences. Other method: Use the installer, choose custom install and deselect "Classic Skin" from the list. Launch the app, you should see just modern in both lists. Now install any other skin, restart as above, and you'll find that Modern is no longer listed. It seems that when the number of skins in the install directory (one RDF Seq) and that of the profile directory (another RDF Seq) matches, the last skin in the install directory is clobbered by the last profile directory (clobbered in terms of being reflected in the UI, all the files and datasources on disk are correct). It could be that this is a XUL Template Builder bug, so filing against RDF. Installing further skins however caused classic/modern to appear again. Needless to say, this is pretty nasty, and could leave a user in an unrecoverable state (especially method 2) if the installed skin is switched to and is malicious. (hence unable to download further themes, see comment in previous paragraph). Short of knowing the right files to remove or creating a new profile and pitching the old one, there is no recovery.
nominating for RTM. This seems serious.
Keywords: rtm
More comments: hyatt says this could be an issue when the clash in numbering between the install dir skin Seq and the profile dir skin Seq is resolved. (To display the list of available skins, he overlays the user profile graph on top of the install graph, both Seqs have identical numbering starting at _1, _2, etc, and apparently code in rdf/ resolves this conflict. This could be the source of this bug).
Severity: normal → critical
OS: Windows 2000 → All
Hardware: PC → All
Attached patch Fix the container enumerator (deleted) — Splinter Review
waterson, the above patch fixes the problem. I'm not completely sure about it though. Take a look and let me know. The basic idea is that when there are multiple targets for each ordinal (_1, _2, etc.), the enumerator only walks one of the final set of targets. I view this as being a stop-ship bug, since the user will quite easily hose their entire setup when they start installing skins. Upgrading to rtm need info.
Whiteboard: [rtm need info]
PDT is very concerned about this one. Can folks comment on this code path, and the impact it has on testing? If we took this, we'd have to focus extra testing on it, and we'd like to know how broad the impact (and regression potential) is.
Adding some folks to the cc list. This is a critical issue for Theme installation.
Wanted to add that this is a crucial bug, and I can't open up the theme park and have themes available from there, if this can't be fixed. This of course cuts into the page views/impressions/revenue to Netcenter, currently the only way the themes feature is making any money.
hyatt's fix is right. RDF containers are one-indexed; so we need to iterate until index <= count.
waterson: Does that count as an sr=? Who can r=?
cc'ing rjc and alecf, who are familiar with this stuff. Could you two guys take a look at hyatt's patch above?
If this could get checked into the trunk (pending r=) then I (and others) could test this in the commercial build for the 8pm spin tonight. [Although I see there was no 8pm win32 trunk spin last night (?)]. waterson: any particular things I should be looking for in other applications of this code, outside of whether we retain all the profiles?
The other places that use RDF sequences are bookmarks, search, the localstore. However, I can't think of any other place that's using them the way that the chrome register is...
I checked in hyatt's fix on the trunk.
great - jrgm, can you test on the trunk and post your results?
Okay so with this and bug 59130 (a crash that could happen on skin-switching, among other things) in the trunk, I should be good to go when the builds come out. I'm starting in my own debug build win2k, as soon as that completes.
r=rjc on hyatt's fix
I ran the smoketests on win32 and linux, and found no new problems, especially no new problems that could be related to RDF (and on windows, when I found what might be suspect, I found I could get the same behaviour with or without the patch). I also worked with search and bookmarks, but found no (new) problems there. I testing the skin install procedures, and I did not lose modern or classic as I added more skins (but I will test more tomorrow). I plan to also look some more at search/localstore/bookmarks tomorrow as well, but as it stands now, I don't see any problems with the patch (but this code is used in a lot of places, so greater depth of testing (e.g., more people) would be good).
I installed respins with this fix and the problem was solved. General use of the browser yielded no RDF-related problems.
Seeing all the reviews and approvals, the patch sure seems correct. My confusion is how could we have an off-by-one error on the upper limit in this code and not impact other sections of the system? Is it really the case that this method was not used by other parts of our system? ...or was count always zero (along other code paths), and hence we never use the upper limit in *ANY* other place in the code? Has anyone put in a breakpoint to try to verify that this code is not used elsewhere? Chris mentioned that bookmarks, search, and localstore are the likely candidates for possibly exercising this code... so running with a breakpoint set during some testing would at least be interesting. Setting a conditional breakpoint for "count > 0" on the changed line would be the interesting thing. In any case where the breakpoint triggered, I'd be interested in understanding how the calling code worked at all :-/ (for the full range of items). To be clear: I'm really trying to understand risk involved with this change. Identifying ANY additional users of this code is pivotal to that eval. Thanks, Jim
It's more complex than that. The bug only occurs if you have multiple elements occupying the same slot. The chrome registry is (as far as I know) the only RDF data source that uses this oddity of RDF. Basically you can have several elements at position 1, several at 2, several at 3, etc. The bug was that when you got to the last index, we only returned one of the multiple elements. So all other data sources still worked. Now we properly hang out without bailing until we're sure that entire enumerator (for the last index) has been depleted. We were (before) always bailing without depleting the enumerator, but we did hang around just long enough to return one element (which made everybody else work).
hmmm... given that explanation, why didn't you just say while (mCurrent || mNextIndex < count) and leave the count limit as it was? Why do you want to go into the loop construct "an extra time?" If you were always willing to "go an extra time," it seems like you could almost have done: while(1) and counted on the code to bail (return) in the first if(!mCurrent) block. Isn't that almost equivalent to what has been done, now that mNextIndex will never get bumped any further? (re: mCurrent won't be loaded). To be real specific, the current patch is *probably* identical to if (count > 0) while (1) ... since mNext wil never exceed count due to the loop. Doing it with the "go one extra time" would cause the effort to create an mCurrent to *always* happen when there was no work to be done. What am I missing now? Thanks, Jim (junior code reviewer, and paranoid PDT rep) Roskind
RDF containers are "one-indexed"; that is, their indices run from 1..n. So, in this case, we really do want to run the index up until it's == count.
Ok, I missed jar's point on this. It turns out that jar is right: we want to check if mCurrent is still valid *or* we still have indices we could try. Although hyatt's fix works, it works mainly because the code recovers properly from "running off the end".
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
I've checked jar's version of the fix into the branch and trunk, after verifying that it fixed this bug: diff -u -r1.12 nsContainerEnumerator.cpp --- nsContainerEnumerator.cpp 2000/07/24 22:42:32 1.12 +++ nsContainerEnumerator.cpp 2000/11/06 04:53:48 @@ -180,7 +180,7 @@ } // Now iterate through each index. - while (mNextIndex < count) { + while (mCurrent || mNextIndex < count) { if (! mCurrent) { NS_WITH_SERVICE(nsIRDFContainerUtils, rdfc, kRDFContainerUtilsCID, &rv); if (NS_FAILED(rv)) return rv; Marking bug "fixed"...
Hyatt's patch looks correct to me. Jar's patch seems to break the singleton case (count == 1), because on first call to hasMoreElements, mNextIndex == 1 and mCurrent is null, so the while loop will not iterate. What am I missing? /be
rjc had to set me straight on this. The local variable "count" *isn't* really the count of elements in the container. It's the largest value of "next available index" in the container. Which is "1" if the container is empty, "2" if the container has one element, and so on. (I picked a shitty name for that local variable.)
See, for example, http://lxr.mozilla.org/mozilla/source/rdf/base/src/nsRDFContainer.cpp#207 I'm going to change the variable name from "count" to "max" on the trunk...
Oh, and you couldn't use mContainer->GetCount because it does not handle multiple targets (http://lxr.mozilla.org/mozilla/source/rdf/base/src/nsRDFContainer.cpp#168). Blech. Trunk cleanup? (After mid-air collision:) Don't use max, it connotes maximum value. How about limit? /be
Or fencepost, if limit sounds like asymptotic value. But why not fix GetCount so it works on mContainer in this case? /be
If you fixed GetCount or used a "maximum index value not to exceed" as the loop control, then I claim hyatt's fix is minimal and avoids extra tests. Right now, with the limit or fencepost as the loop control, we jump from 0 to 2, 3, ... for the empty, single, double, etc. cases, and therefore waste an extra iteration with hyatt's patch, to terminate the loop. If the loop control used 0, 1, 2, ... for empty, single, double, etc. case counts, and used mNextIndex <= count, then no extra mCurrent test would be needed, and maybe the next person to maintain this code will not be confused. And you could still call the loop control variable 'count'. /be
The old loop test was wrong. The loop is not traversed in proportion to the size of "count." The loop counter assures that we actually generate up distinct values of mCurrent in proportion to the value of "count" (with the right off-by one adjustment). The loop must be traversed many more times. It must be traversed until each mCurrent is depleted, and until all mCurrents have been created. That is why the new limit needed to be placed.
The other confusing thing about this loop is that unlike a traditional loop that merely itterates, this method constantly returns (with values), and then must be called again and fight (via conditionals) to get back into the same state (same level of loop nesting etc.). The extra condition check assures that we ne re-visit this method, we'll be able to quickly get back to "where we left off." In this case (the bug), we were working on the last mCurrent, there are no more mCurrents to be created, but we have more than 1 element to take from mCurrent. The alternate fix by Hyatt simply asserted "There was an extra mCurrent" that needed to be created. Although the code somewhat gracefully handled this case, the performance impact of the extra mCurrent creation would need to be examined. I was not familiar enough with the code to estimate the cost of *always* creating an extra mCurrent, even though it ends up having no elements, and quickly being discarded. I don't know if there are lingering effects (is the construction lazy, but it leaves remnants now that construction has take place? What is the cost or impact of these remnants if any?) In the end, Waterson, Churchill and I felt more confident that the extra test avoided impact on other calls to this method that did not have a multitude of elements in the *last* mCurrent.
Whiteboard: [rtm need info] → [rtm++]
"What jar said." The bug here was *not* off-by-one indexing. The bug was that we prematurely terminated the loop in the case that we had >1 RDF Sequence aggregated with exactly the same number of elements. Let's say we have two RDF containers aggregated together with exactly one element each. Our first call to HasMoreElements() will succeed: - We'll compute "count" to be "2" (that is, both containers will report the next available index to be two). - We'll do mCurrent gets GetTargets(container, "1"). - We'll increment "mNextIndex" so that it's value is now "2". - We'll pull the first element out of mCurrent, and cache it as mResult. We'll then later call GetNextElement(), to pull the cached value out. Now we're on our *second* call to HasMoreElements(): - We'll again compute the "count" to be "2". - We'll now *fail* the loop condition, because mIndex == 2, even though mCurrent still hadn't been depleted! The fix is to alter the loop condition so that we *also* check mCurrent to see if it has been depleted (we set it to null when it's been depleted at line 204). So, hyatt's fix "worked" because it masked the bug, but was not optimal. With hyatt's change, we took an extra trip through the loop and *avoided* the case where mCurrent had not been depleted, but mNextIndex == count. jar's fix is optimal.
Whiteboard: [rtm++] → [rtm need info]
Yeah, I knew that it wasn't. I was trying to just pick a minimal patch that fixed the bug. :)
verified fixed mac/linux/win32 2000110616 builds MN6 -- installed win32 as both modern-only and modern+classic, installed mac as modern+classic, and linux as modern+classic and forced it to also be modern-only. Added skins, shutdown restarted and added another skin, etc. -- Modern and/or Classic were always available and could be switched to at any time -- also took win32 up to a total of 40 installed skins (hey it works, but it's a long menu :-] Marking vtrunk (e.g. verified on branch).
Keywords: vtrunk
Whiteboard: [rtm need info] → [rtm++]
tever is not RDF QA anymore
QA Contact: tever → nobody
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: