Closed Bug 3690 Opened 26 years ago Closed 24 years ago

File path strings should be stored in registry as UTF-8

Categories

(Core Graveyard :: Tracking, defect, P1)

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: saari, Assigned: rayw)

References

Details

(Whiteboard: investigating)

Mac apprunner starts up with blank window. The RDF_DLL is failing to load.
Summary: Mac apprunner starts up with blank window. → [BLOCKER] Mac apprunner starts up with blank window.
GetSharedLibrary() in prlink.c called from PR_LoadLibrary fails to find RDF_DLL
Assignee: don → mcmullen
Summary: [BLOCKER] Mac apprunner starts up with blank window. → [BLOCK] Mac apprunner starts up with blank window.
Target Milestone: M3
Re-assigned to mcmullen@netscape.com, changed target milestone to M3, and changed summary line slightly. John, we need some Mac expertise here to help Chris Saari ...
John, Pink says this is happening for Joe Francis too.
There is a workaround: remove all non-7-bit ascii characters and slashes from all superdirectories of your apprunner file. The problem seems to be in the DLL registration mechanism (use of full paths). Joe Francis had a bullet character in one of his superdirectory names, and Chris Saari had a slash in his hard drive name. Reassigning to dp. Removing the [BLOCK] from the summary line. This is a bad bug (we can't ship with us) but we have the time to fix it properly.
.s/us/this/
Clarification: the error is that PL_LoadLibrary fails for components in the Components subdirectory, if non-unix characters (or a unix separator) is in the full path. PR_LoadLibrary only works for (1) full paths or (2) DLL name for files directly in the launch directory. In the bad case, for some reason, a call finally gets made to PR_LoadLibrary with the string 'RDF_DLL'. This call will never succeeed, because the loading code does not search recursively in subdirectories. We do not understand why having interesting chars in the path causes this failure, though.
Priority: P3 → P2
Additional info: when all is working well, nsFactoryEntries for factories coming from DLLs in the Components folder contain UNIX-style full paths to those DLLs. When there are 8-bit chars in any of the parent folders, these factory entries contain the DLL name (fragment name) in the m_fullpath member.
OK, I found the real bug. Strings in the registry should be stored as UT8, and the registry throws back an error from NR_RegSetEntryString() when you attempt to store a string that is invalid UTF8 (like Mac paths containing 8-bit chars). But, of course, no-one is checking return values of ANY of the registry calls in the component manager code, so we missed the error. The call that failed is on line 494 of nsComponentManager.cpp.
Status: NEW → ASSIGNED
Yes it is a registry problem. Robert and I found that in our session of debugging too. A bug has been filed on dan veditz. If we dont get it fixed, we need to release note this.
Um, shouldn't you be passing UTF8 strings into the registry, instead of just assuming that it's a registy bug?
OH I understand what you mean now. You are right. So how to I convert from UTF8 and back...I will try that. frank frank where are you...
dp: think about the Japanese user, whose directory names on Mac may be full of Japanese characters (including 2-byte characters). Now be scared about doing anything with raw file paths :-)
Severity: normal → major
Priority: P2 → P1
Setting to P1. Will check with Mar15 build when it comes out.
We are going to release note this one. Keeping open for that reason.
I mean release note for dogfood...
Why not store the URL? It's another 7-bit another encoding, and supported with nsFileURL. Is that good enough for Japanese?
Target Milestone: M3 → M4
add the release note that you want to bug http://bugzilla.mozilla.org/show_bug.cgi?id=3646 moving this off the M3 list. move it back if we think we have a fix
Target Milestone: M4 → M5
I think this can be release noted for M4 as well.
Summary: Mac apprunner starts up with blank window. → [PP]Mac apprunner starts up with blank window.
What's up with this bug? Is it going to fade into obscurity?
dp has been in india visiting his family.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Well now Mac apprunner start up with an http error )April 25 build. On 2nd launch, No longer a blank window. So, I would say this particular bug is Fixed now. saari, will leave to you to mark Verified. ;-)
Status: RESOLVED → REOPENED
This bug is not fixed. I've adjusted the summary to reflect the real issue.
Resolution: FIXED → ---
Summary: [PP]Mac apprunner starts up with blank window. → File path strings should be stored in registry as UTF-8
Changed summary to "File path strings should be stored in registry as UTF-8". We should not be storing full paths in the registry at all; rather, we should use the nsPersistentFileDescriptor that mcmullen has written. And if nsPersistentFileDescriptor outputs anything that is not always 7-bit ascii, then its output needs to be stored as UTF-8. QA, to verify this bug (when fixed), you need to put 8-bit ascii chars, and slashes in your directory names before calling it fixed.
nsPersistentFileDescriptor outputs only 7-bit ascii (using base64 encoding). I include the source of two functions from libpref that show the correct encoding and decoding of filespecs for persistent storage. //------------------------------------------------------------------------------ NS_IMETHODIMP nsPref::GetFilePref(const char *pref_name, nsFileSpec* value) //------------------------------------------------------------------------------ { if (!value) return NS_ERROR_NULL_POINTER; char *encodedString = nsnull; PrefResult result = PREF_CopyCharPref(pref_name, &encodedString); if (result != PREF_NOERROR) return _convertRes(result); nsInputStringStream stream(encodedString); nsPersistentFileDescriptor descriptor; stream >> descriptor; PR_Free(encodedString); // Allocated by PREF_CopyCharPref *value = descriptor; return NS_OK; } //------------------------------------------------------------------------------ NS_IMETHODIMP nsPref::SetFilePref(const char *pref_name, const nsFileSpec* value, PRBool set_default) //------------------------------------------------------------------------------ { if (!value) return NS_ERROR_NULL_POINTER; nsresult rv = NS_OK; if (!value->Exists()) { // nsPersistentFileDescriptor requires an existing // object. Make it first. nsFileSpec tmp(*value); tmp.CreateDir(); } nsPersistentFileDescriptor descriptor(*value); char* encodedString = nsnull; nsOutputStringStream stream(encodedString); stream << descriptor; if (encodedString && *encodedString) { if (set_default) rv = PREF_SetDefaultCharPref(pref_name, encodedString); else rv = PREF_SetCharPref(pref_name, encodedString); } delete [] encodedString; // Allocated by nsOutputStringStream return rv; }
I just noticed that last line should be return _convertRes(rv); Nobody's perfect.
Status: REOPENED → ASSIGNED
Using filespec or any of its derivatives is not an option because of dependency problems. Now, I do have a simple half-ass solution. It I store filenames not as strings but as bytes and retrieve it back, then I think all will be ok except for '/' ':' being problems in filenames. Other 8 bit characters would work as the storage and retrieval will be in native encoding and NSPR can handle that. I am going to test this theory. Stop me if you know it wont work.
You CANNOT store full paths, in any form, in the registry, otherwise you will severely break Mac in a number of ways. For example, if the user renames their hard disk, or renames a folder in the path the Mozilla, things will break. Mac users routinely rename things and move stuff around on their hard drives. We MUST be able to deal with these things.
Alternatively, you can wait a few days till I fix bug #5784, (making nsFileSpec into a com interface) and then the dependency problem will be solved.
There a lot of problems with doing any of these on the mac. I do understand the right thing to do is use the nsFileSpec stuff. Trust me I use it a lot elsewhere. Simon, apart from all the problems you mentioned, there is one another problem that special characters like a bullet etc cannot be in any filename that XPCOM uses. Storing the filename as a byte sequence wont fix all the problems you mentioned. But it will fix the problem not being able to put apprunner in a directory whose full name has 8 bit characters. So I thought doing that would be an improvement to the situation we are in. No ?
But dp, if I can COMMify nsFileSpec and friends, can't you then use this (specifically, nsPersistentFileDescriptor)?
Even if nsFileSpec and co are COMified, there is going to be trouble making xpcom.dll depend on code in base.dll So I trust the only solution is to move autoregistration and anything that deals with files out of xpcom. Moving autoreg out is easy. But moving the dll loading part (uses the filename) is going to be out of xpcom is hard. I am stuck!
dp, why can't XPCOM use characters like bullets in file paths (if you were to use file paths anywhere)? We _have_ to get this right. It would not be appropriate for a build problem (all of which are fixable) to hinder us using the correct solution to this problem.
>Even if nsFileSpec and co are COMified, there is going to be trouble making >xpcom.dll depend on code in base.dll dp, I don't understand why. As long as base.dll isn't autoregistered, what is the problem? If you call through a com interface, you don't have link dependencies...I know I may be being rather stupid here, as usual...
have we figured out what to do on this one for M5?
See the comments in bug 4965 about a possible solution of storing only relative paths in the registry. But now I think about it, why do we have to store paths in the registry at all? After all, we know where the components directory is, and that we can find components in that directory. Is the plan that at some point, Netscape DLLs will be scattered throughout the users system?
Whiteboard: investigating
Target Milestone: M5 → M6
Ok I checked. The filename becomes a key. So targetting the full solution for M6.
Sorry, I don't understand what you mean by the "full solution". Do you mean full paths, or a complete (alternative) solution?
I mean a complete solution for all the nitty gritty problems. nsIFileSpec is one of the options... but...
If you plan on storing filenames in libreg using REGTYPE_ENTRY_BYTES please consider using REGTYPE_ENTRY_FILE -- On windows and Unix these are equivalent, but on the Mac the _FILE type stores a binary alias.
There was a reason why I didn't use that. Let me think...it was crashing on the mac for some reason. Maybe robert would remember why we switched it from FILE to STRING.
Ah... I don't recall exactly, but I think one of the issues was that (on Mac) XPCom internally always is using Unix pathnames which hit some bug as the registry really wants Mac style paths (with colons) Switching XPCom over to using nsFileSpec et.al. might help with this. :^)
QA Contact: 3853 → 1308
Target Milestone: M6 → M7
*** Bug 7029 has been marked as a duplicate of this bug. ***
This bug has lost us several hours of engineer and release team time, trying to figure out why builds don't run, only to find a / somewhere in the file path. It is imperative that this bug is fixed ASAP.
XPCOM 2.0 landing was step 0 for this. I am targetting this fix for M7. Really!
Depends on: 3081
Target Milestone: M7 → M8
dp ready to land this first part of m8
Status: ASSIGNED → RESOLVED
Closed: 25 years ago25 years ago
Resolution: --- → FIXED
This should be fixed with the XPCOM using nsIFileSpec changes
QA, to verify this bug (when fixed), you need to put 8-bit ascii chars, and slashes in your directory names before calling it fixed. You should also test the case of having a slash / in the name of one of the folders containing Mozilla. If this does not work still, please open a separate bug.
This should be classified as a cross-platform bug, I think. 8-bit characters in path names should fail for Unix and Windows without this fix. (if you agree, please change the platform.) The "/" would be Mac-specific, the bullet is not. I'll try other MacRoamn specific characters also. Is there any character on Windows-1251 (Latin 1 for Windows) and MacRoman tables which cannot be dealt with with this fix? -- just so that we know what the limitations are going in.
Moving all Apprunner bugs past and present to Other component temporarily whilst don and I set correct component. Apprunner component will be deleted/retired shortly.
Status: RESOLVED → REOPENED
OS: Mac System 8.5 → All
Hardware: Macintosh → All
** Checked with 6/25/99 Win32 and Mac M8 builds ** On Mac: 1. Starting up is OK if superdirectry names contain bullet or other 8-bit characters used in Latin 1. 2. Start-up succeeds with JPN super-directory names. 3. But the slash makes the start-up fail. 4. Mozilla Preference does not create a pref dicrectory if the name put by the user contains 8-bit characters or slash. Thus even though you created a pref directory, you are asked to create a new one again on the next start-up. If you have an existing pref. directory with 8-bit characters in its name, Mozilla does not recognize the directory -- my pre-set pref settings were not honored. On NT4-Japanese: 1. Start-up fails if super directory names contain Japanese characters. ---- I asked this question before, but we need a cross-platform fix for this. Therefore I changed the Platform and OS to all, copied bobj on this. I have a few questions about file systesms. A. Does the Mac 0S 8.5 or above file system use Unicode/UC2 for storing string data (e.g. path & file names)? Is what used to be called "Unicode Imaging" service being used? B. Does NT4 use Unicode/UC2 for the same? Essentially UC2? C. Does Win95 use Unicode/UC2 for the same? Locale specific charset? D. Does Unix use Unicode/UC2 for the same? Locale specific charset except in Unicode locales? Re-opening this bug because it is not fixed ...
Resolution: FIXED → ---
Simon, I didn't understand your comment about opening another bug about the slash. You want to create another bug for the slash bug because it's more of a file system specific bug? Would it be better to create new bugs for Mac items 3 and 4 above. I haven't done this yet but I suspect that Japanese profile directory names will probably fail on Windows also.
One more question: F. Does Japanese MacOS 8.5/later use the same file system charset code as the US OS 8.5/later?
Status: REOPENED → ASSIGNED
Target Milestone: M8 → M9
Target Milestone: M9 → M10
Assignee: dp → hyatt
Status: ASSIGNED → NEW
I tried special characters in filenames in unix and I got navigator showup. Some of the icons for teh chrome didn't show up. Otherwise things are fine. nsNativeComponentLoader: autoregistering /home/dp/tmp/50®special/bin/components nsNativeComponentLoader: autoregistering succeeded Thats a start. So maybe the xul icon things is not 8bit clean.
I know that strings data and key names in the registry MUST be UTF8. Maybe you're finding those .DLL's through autoreg each time, but they're not getting stored in the registry with 8-bit chars in the name. Or the prog-id for that matter... I have another bug open on ftang to implement a "ToUTF8()" in nsString, which will then make it easy to do the right thing.
Adding dp back to the CC field, I think this is/will be a component Manager problem.
Sounds like a dup of 10373 now -- URLs don't deal with non-ASCII.
Nope, URLs don't get stored in the registry. Different problem.
dveditz: I was responding to dp's comments: I tried special characters in filenames in unix and I got navigator showup. Some + of the icons for teh chrome didn't show up. Otherwise things are fine. + + Thats a start. So maybe the xul icon things is not 8bit clean.
Blocks: 13276
Assignee: hyatt → dp
reassigning to dp, not clear why this was given to hyatt/xptoolkit
Status: NEW → ASSIGNED
Target Milestone: M10 → M15
We are storing relative pathname in the registry. Hence this wont be an issue until people start storing fullpathnames that aren't in our distribution.
It's still an issue if people are silly enough to put 8-bit characters in their component name. But I guess that's a self-limiting problem because it won't register, so they won't be able to test, so they'll change the name before they ship it.
They probably won't put down component name as 8-bit char, but don't forget the case that they may install SeaMonkey under c:/MyCòmpùtêr/Nétsãpë/
That problem is avoided by the relative pathname support dp mentioned above.
XPCOM stores relative pathnames for our components. So this will hit us as an issue only if full pathnames are registered from outside components. PSM is going to be the first. The right solution is: 1. Get persistentDescriptor from nsIFile (avoids mac renaming harddrive issue) 2. Store the resulting filename as UTF8 strings in the registry. (1) already happenes (need to check). (2) needs a converter.
Target Milestone: M15 → M16
Ray, wanna have a crack at this.
Assignee: dp → rayw
Status: ASSIGNED → NEW
I have been trying to isolate the separate issues in this one that look enough like the same problem that they were included in the same report. Please correct me where I am in error. 1. Mozilla code frequently assumes unix-style paths, such that a slash or other certain characters in the directory (or probably module) names will cause the code to screw up. The offending code could be in a single place, or scattered throughout mozilla. I will try to track it down. But it seems to me that there is nothing that storing UTF-8 or native characters solves, because in most systems "/" is a simple seven-bit character whether using UTF8, ASCII, or some other 8-bit or otherwise-encoded character set. 2. Mozilla needs to store file path strings in either UTF-8 or in a native format. It is not clear to me why it needs to be UTF-8, because if you are just returning the bits to the system that it gave you, there should be no problem (except the separate problem identified as 1.). If there is, or it is anticipated that there will be in the future some GUI or other manipulation of the filename for which UTF-8 is required (or if the registry itself is better-equipped to handle UTF8), then two conversions probably need to be added: one where the data is written to the registry, and one when it is gotten from the registry. Or, if the code, as it currently exists, requires UTF-8 in some places (such as for filename manipulation), then the conversion needs to be done. But I doubt it will fix the slash problem in the reported situations.
There is apparently a method to convert characters from the filesystem charset to unicode, but there is no method to convert back. There are also apparently no existing methods to save or restore byte streams that are not UTF-8. Neither of these problems are hard to solve, but I regret having to add new interfaces, either way I solve the problem -- storing native paths or storing UTF-8.
libreg stores data as UTF-8. ASCII just happens to be a subset of UTF-8 so it works, but if there were a component with a non-ASCII character in the name (including the directory name for a non-relative component) then it had better be encoded as UTF-8. nsRegistry now supports a Unicode API which will automatically convert to UTF-8 storage for convenience. Note that RDF has the same limitation on key names -- they must be UTF-8 encoded. But frankly the problem is that files are being stored as strings in the first place. Since the component.reg uses the key name as the data you have little choice about that, but the component names could also be stored as data values of a key instead. This would give you the flexibility to use different types which might be more appropriate, such as raw bytes, or some nsIFile persistant format which would give you some hope of handling non-relative components correctly on the Mac.
> Note that RDF has the same limitation on key names -- they must be UTF-8 encoded. Actually, that is incorrect. :^)
First, let's make it clear that what really needs to be solved is local non-URL filenames in general. Not only are these file specifications being stored as strings, but the Mozilla code is manipulating them as strings. It is not clear to me how many other places use them. If we declare that these file specifications are not strings (because we don't know the characters), then they cannot be manipulated as strings, and every place that Mozilla currently manipulates them as strings must be replaced by a call to a native method that knows how to achieve the desired result on the specified platform. And no need for the "/" <--> "\" substitutions, etc., because normal code will never touch the path or search for specific characters, and the native code that does knows the right characters. There then needs to me a new registry method allowing storing native filenames into the registry. There appears to be low-level support for this still in the registry, but the high-level interface methods are missing, and apparently this caused some kind of problem in the past. Even with this approach, there may occasionally be a need to display the string or allow the user to manipulate it so conversion to/from unicode / UTF8 is needed. The alternative is that file interfaces could be modified such that native code performs character conversions such that every file specification passed in or out of the interfaces is UTF-8. The problems of differing filename syntaxes still persists in this case. The current code likes to translate filenames to unix-style before manipulating them, and this is a technique that would still work with UTF8. A platform for which slash is valid in a filename could escape it making these transformations would be bullet-proof-enough, IMO. Whenever the filename is displayed to the user, it probably should be displayed in the non-unix form, so the conversion of UTF8 between native syntax and unix syntax still needs to be requestable. The default syntax should be whatever it is now. So, the choices, as I see them, are: Filename specifications should be exposed: 1. Native non-strings with explicit platform-specific manipulation methods. 2. UTF8, with explicit methods to go between unix syntax and native syntax. I like solution 2, because it does not deny the common assumption that file specifications are strings. It is also easiest, because it allows us to keep most code the way it is, manipulating paths as strings rather than calling native utilities for all path / file specification manipulation, and storing them as currently-supported UTF8 strings in the registry. But platform-specific manipulation methods could have advantages in certain cases, where it is not easy to transform a native filename into a unix filename that exhibits all the desired behaviors as it is manipulated as a string.
We already have a way to manipulate filenames without touching them as a path string: nsIFile and nsILocalFile do all the platform-specific magic. Unfortunately for this particular problem nsILocalFile does not support a "persistant string" encoding as its predecessor nsFileSpec did (Doug, is there a reason or just ran out of time?), but if it did that could be stored in the registry. We could also now fairly easily expose storing filetypes as registry data values in nsIRegistry because JS could now pass nsILocalFile objects (we could implement a nsILocalFileMac to get at the Mac alias stuff we need). To take advantage of that, however, you'd have to change the structure of how the component manager stores components -- they'd have to be data values and not keys. Come talk to me if you're interested in going that route.
I now already tried storing file path strings as UTF-8, and it cannot be reasonably done at this time, since the components necessary for mapping between the many code pages of file systems and Unicode have not been loaded yet (which is the reason we need the settings). My next attempt is to create the registry methods for registration of binary stuff.
Status: NEW → ASSIGNED
I added a comment, which must not have saved properly, designating this as mostly-fixed, with the exception of the JS Component loader, which I do not know how to test yet.
Now I put back the js component fix, too, so I am marking this fixed.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago24 years ago
Resolution: --- → FIXED
Belated I'm going to verify this fix. If you look in registry.dat or mozregistry.dat when you have profiles in Japanese, you see familiar UTF-8 3-byte sequences for most of these Japanese characters. This ig good enough for me to verify this fix.
Status: RESOLVED → VERIFIED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.