Closed Bug 94323 Opened 23 years ago Closed 23 years ago

Reorganize nsIFile for API Freeze

Categories

(Core :: XPCOM, defect, P1)

defect

Tracking

()

RESOLVED WONTFIX
Future

People

(Reporter: pete, Assigned: pete)

References

(Blocks 1 open bug)

Details

(Keywords: embed, topembed-)

Attachments

(23 files)

(deleted), text/plain
Details
(deleted), text/plain
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), text/plain
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), text/plain
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), text/plain
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), text/plain
Details
Summary says it all. --pete
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla0.9.5
Blocks: 92569
Blocks: 57995
Blocks: 92329
Blocks: 53596
isSymlink() method will not be implemented using the followLinks bool check. Doesn't make sense for it to. --pete
I also implemented getLeafName to follow links and it works as you would expect. My question is should we? This is going to introduce a lot of bugs and a layer of confusion to callers, since by default followLinks is set to true. That means that if we want the leafName of our current initialized file, we would get the targets leafName not the files leafName. I can hear code breaking now. --pete
> I also implemented getLeafName to follow links and it works as you would expect. As long as it obeys the followLinks attribute. > My question is should we? We should. If somebody really wants the file to refer to the link and not the target (rare I would think), they can call SetFollowLinks(PR_FALSE). Of course, when this is done, the caller may have to get the current state of that attribute and restore it when done - depending on the scope and ownership of that file.
I'd like to discuss moveTo for a minute. In the jslib version i re-init the class so it reflects the newly moved file. I'd like to do the same here and set mPath to the new location. Anyone disagreew/ this idea? --pete
Update: I will be getting back to this this week. I haven't had much time at all to work on this. Arg! followLinks is implemented for the most part across all methods and is tested fully on unix. --pete
I am ready to rip out the unicode specific methods aka initWithUnicodePath, unicodeLeafName, unicodePath etc . . . The problem is nothing has been resolved as to how to deal w/ strings in the local file path world. What type of args should we be using for char *'s? Should i just have these methods take a PRUnichar * for now which will add a dependency to xpcom standalone? Any suggestions? CC'ing some more people for feedback. --pete
I think they should take an AString or a wstring. Point is, it should be a wide string. On Win32 (NT or 2000) or Mac (System 8.6 and an HFS+ format disk), the OS file system can use unicode strings. This is extremely preferable to doing the conversion ourselves. Whenever we convert from unicode to what we think is the FS charset, there is possible data loss in that conversion. In odd cases, like booting the system in one language, making a file, then switching languages and trying to specify that file in the now current FS charset, we can go wrong. For file systems in which the internal format is unicode, we should always use unicode and never convert. No conversion, no data loss, and we're not dependent on our i18n code - it's not needed. On systems which cannnot use unicode to specify files and we're forced to do conversion, we can do the conversion using OS facilities. glibc has this (not sure since which version) Also in this case, we're not dependent on our i18n code. CC'ing ftang - we were talking about this today.
I need to be able to still pass c string constants. Do not break this.
Why do you need to pass c-string constants? Why not just pass NS_LITERAL_STRING("foo").get()?
Or, if you take nsAString& instead of PRUnichar*, just drop the .get().
an nsAString parameter would be okay.
If you need any help with that or have questions, don't hesitate to ask.
sorry for the spam, changing summary from "Reorganize nsIFiles for API Freeze" because it's bugging me.
Summary: Reorganize nsIFiles for API Freeze → Reorganize nsIFile for API Freeze
Ok, jag, if i use nsAString& as my param, then i'm using DOMString as my idl param and including domstubs.idl. Is this correct? Or is there another idl param i should be using in this case? Thanks --pete
pete: use AString in .idl files. /be
http://lxr.mozilla.org/seamonkey/source/xpcom/base/nsrootidl.idl#75 Basically that means you can use AString for the type and don't specifically need to include any idl.
Ok, i just want to clarify this using a simple example. Below I stat a file using nsAString as my param. NS_IMETHODIMP nsLocalFile::Foo(const nsAString& aPath) { struct stat file; if (stat(NS_ConvertUCS2toUTF8(aPath).get(), &file) < 0) printf("stat failed\n"); else printf("file.st_size = %d\n", (int)file.st_size); return NS_OK; } This example works exactly as it should. Is the useage correct in this sample? If so, then this is how i will proceed. Thanks --pete
Yup. That looks good, assuming stat supports UTF8.
This is right assuming UTF-8 is the file system char set. For Unix, that's fine. For other platforms, it's a bit more work. For Mac & Windows, file system routines which take strings take Unicode *but* not all versions of the OS. You need to check at runtime and see if it's supported. For Mac, I have bug 95481 so I'll deal with that platform when you have a patch for nsIFile/nsILocalFile.idl.
Yea, Conrad, unfortunately i can't do the mac and win implementations for I only have a unix box at my disposal right now. I do plan on setting up a mac and win box as soon as i have income again. ;-) --pete
Jag, another question. I'm converting the method Append over to nsAString. There are many accessors using this method. ->Append(my->foo); nsCAutoString myString; ->Append((char *)myString); so i've been changing the calls over to ->Append(NS_LITERAL_STRING(my->foo)); ->Append(NS_LITERAL_STRING(myString)); Is this the correct way to do it? Thanks --pete
When you have a literal string, e.g. "foopy", you want to use NS_LITERAL_STRING("foopy"); If you have a nsCString, you'll need to use NS_ConvertUTF8toUCS2. I'd look some more into the code though to see if perhaps you'd end up converting back and forth between UCS2, and see if you can short-circuit that.
But, char *'s or static const char foo[] = "foopy.dat" should all use NS_LITERAL_STRING correct? Thanks --pete
> i can't do the mac and win implementations for I only have a unix box at my > disposal right now. Don't worry about the Mac - I'm on it. Before getting this far with the implementation, can you post the patches to the API so that people doing the implementation work on other platforms can code to them?
Anything that used to end up calling Append(const char* aPath) can, if you don't want to spend too much time, be wrapped in an NS_ConvertUTF8toUCS2 and you're done. However, for literal strings it's cheaper on some platforms if you use NS_LITERAL_STRING. So: file->Append("blah") becomes file->Append(NS_LITERAL_STRING("blah")) and everything else, e.g.: // nsCString somePath; // const char* somePath; // const char somePath[] = "foopy"; file->Append(somePath) becomes file->Append(NS_ConvertUTF8toUCS2(somePath)) Though you might want to replace somePath[] = "foopy" with an NS_LITERAL_STRING, or a #define somePath NS_LITERAL_STRING("foopy") if it's used in many places. Until I fix bug 73292 (soon, I hope), you'll need to add a .get() for the nsCString cases. Want to continue this conversation in private mail to keep bug spam down? We can c&p "highlights" afterwards :-)
Attached file nsIFile.idl (deleted) —
Attached file nsILocalFile.idl (deleted) —
Conrad, these would be the theoretical idl files. I am working method by method and am changing the interface one method at a time, fixing what breaks and then moving on to the next one. --pete
attribute AString persistentDescriptor; This return type on this needs to remain "string" This is because this data is not nesc a path (it is on some platforms, not on others). It's stored in prefs and registry entries so what that accessor takes and returns needs to binary compatible with what it's done in the past.
Attached patch current snap (deleted) — — Splinter Review
patch #47051 is for my own reference. --pete
- nsAutoString fileURLUnicode; fileURLUnicode.AssignWithConversion(docURLSpec); - res = webShell->SetURL(fileURLUnicode.get()); + res = webShell->SetURL(docURLSpec.ToNewUnicode()); FYI, this (kind of) change will leak the newly created unicode buffer. That's the reason they're doing this whole thing with nsAutoString and AssignWithConversion. What you could do is something like: + res = webShell->SetURL(NS_ConvertASCIItoUCS2(docURLSpec).get()); This will use the exact same conversion, though I think they should be using NS_ConvertUTF8toUCS2 here, but that's a different bug. Also, the member functions ToNewUnicode, ToNewCString and ToNewUTF8String are deprecated. Instead you should be using the global functions in nsReadableUtils.h. Thus: *aResult = myString.ToNewUnicode(); becomes (don't forget to #include nsReadableUtils.h): *aResult = ToNewUnicode(myString); I have a patch which switches all existing uses over from the one to the other form. I hope to check that in soonish (need to get r= and sr= first of course).
Jag, thanks for catching those. I'll fix them now. Yes, i only used ToNewUnicode() in one place. Is the ultimate plan to get rid of char*'s altogether and move everything over to nsAString? Thanks --pete
> Is the ultimate plan to get rid of char*'s altogether and move everything over > to nsAString? Yes. With the exception of persistentDescriptor. See my comment from 08/22. A persistentDescriptor shouldn't be mistaken for or used like a path. It may look like one on Unix & Windows but on the Mac, it's a base64 encoded file alias. It's supposed to be treated as opaque data which is pointed to by a char*. Also, there are many of these written into existing registries and prefs so this can't be changed. Not to nag about this again but since I pointed it out last, it's still a wide string in the latest patch.
Conrad, i was actually referring to the entire mozilla codebase, not just nsIFile in particular. It seems a lot of the surrounding implementations i see would be better suited using nsAutoString. Reguards to persistentDescriptor are you saying it needs to be a wide string and not a regular string like it is now? I haven't changed the interface for persistentDescriptor. http://lxr.mozilla.org/seamonkey/source/xpcom/io/nsILocalFile.idl#107 --pete
Conrad: I think the question was asked in a broader context. Pete: In which case the answer would be no. But we will be moving a lot of char* usage over to nsACString (note the C) or one of the specialized classes that inherits from it. Just FYI (though I think you've realized this already), ToNewCString() and ToNewUTF8String() also create new buffers, which also need to be freed manually, or avoided where possible. Regarding persistentDescriptor, I think what Conrad meant is that it's really a byte*, but we're playing a few tricks here to make it "work" with javascript.
Pete: Sorry, I completely mis-read. This is right: + attribute string persistentDescriptor; I must be seeing imaginary things, looking at the wrong patch, etc.
Blocks: 77833
Attached patch currnet snap (deleted) — — Splinter Review
The good new is i have almost all of the param changes finished. It compiles cleanly. The bad news is a can only get xpcshell to run. I am having a problem. I beleive somewhere there is a call to free or alloc that is passing the wrong pointer ala . . . in free(): warning: junk pointer, too high to make sense So i have pretty much been in regression mode. If anyone can spot an error in the huge patch i attached that would be great. I have been up and down it and anywhere i removed a call to free, the char* was replaced w/ a nsAutoString. So, anyway progress is halted while i track this bug down. Thanks --pete
Index: editor/base/nsEditorShell.cpp + nsAutoString fileURL; + fileURL.Assign(inFileURL); Code like that you can short circuit to: nsAutoString fileURL(inFileURL); But, since your only use of the nsAutoString there is to be a string container around the const PRUnichar* passed in, you really want nsDependentString. nsXPIDLString leafName; - docFile->GetUnicodeLeafName(getter_Copies(leafName)); + docFile->GetLeafName(leafName); So, this might compile (shame on us) but currently won't work (it will at some point in the future). /me goes to stand in a corner.
Heres on that will issue the error. myXPIDLCString.Adopt((char*)NS_ConvertUCS2toUTF8(myAuto).get()); needs to be: myXPIDLCString.Adopt(ToNewCString(myAuto)); --pete
Yep, that would do the trick :-) Nice catch. Whenever you need to cast with strings, ask yourself what's wrong. Could be the interface you're using forces you to use a |char*| even though they really meant a |const char*|, could be you're using strings differently than you should.
Ok, i got her running again. Yea! I am still getting a butt load of: mozilla-bin in free(): warning: junk pointer, too high to make sense. Error messages but she runs . . . I need to diligently go through all these calls again and catch more mistakes. --pete
This comment from nsRegistry.cpp is especially true now (to the point where it needs to be fixed) + registryLocation->GetPath(regFile); // dougt fix... // dveditz needs to fix his registry so that I can pass an // nsIFile interface and not hack Now, we get back a Unicode path. Down a few lines, it gets converted like this: + err = NR_RegOpen((char*)NS_ConvertUCS2toUTF8(regFile).get(), &mReg ); Say the path has non-ACSII chars in it - is NR_RegOpen going to do the right thing with that utf-8 string it's getting passed? It looks like it just gets passed through to fopen which I doubt is going to do the right thing on all platforms. The registry should take an nsILocalFile on which it could use OpenANSIFileDesc. Then we wouldn't have to do this conversion. One little problem - the registry is written in C.
Conrad you raise a good point here. Another issue, there is still a *lot* of code using nsIFileSpec. This all needs to be accomplished in tiers. So i think what needs to be done is get the nsIFile interface solid, let it bake in the oven for a little while and then check it in. Fix all accessor code that is implemented poorly. Your example is a good one. Let it bake and check it in. Then we need a serious performance and efficiency audit. There are over 3,000 calls to InitWithPath on startup. Tighten the interface up and get is lean and mean. Let it bake in the oven and check it in. So i think it would be best if we break out the tasks at hand into logical divisions by order of importance. After the new interface changes are solid and bug free and checked in, i'm real interested on an audit level focus on performance and efficiency. "A Lean, mean, I/O machine" It all needs to get done. --pete
Here we go. Making an assignment like this is bad: leafName = (char*)NS_ConvertUCS2toUTF8(autoleaf).get(); I change to this: leafName = ToNewCString(autoLeafName); We are good. No more of free junk pointers errors from malloc. I have a bunch of these i need to fix. --pete
Doh! I should've pointed you at our current string guide/FAQ, it lists most if not all of these. http://lxr.mozilla.org/seamonkey/source/string/doc/string-guide.html
Jag, i knew about the guide. I just need to read it. ;-) --pete
> Fix all accessor code that is implemented poorly. Your example is a good one. > Let it bake and check it in. Most of these cases need to be fixed before this gets checked in. If we fail to open the registry file on a system where the path contains a non-ASCII char, or lose people's mail, that's a showstopper. Many occurances of NS_ConvertUCS2toUTF8 in the patch are likely to cause this. This is a huge amount of work and you've already done a lot. I don't mean to heap more on you in order to get this done. I'd say file bugs against others and make them block this one. One stopper is a conversion like this: + nsAutoString pathBuf; + aFile->GetPath(pathBuf); res = NS_NewFileSpec(getter_AddRefs(tempSpec)); if (NS_FAILED(res)) return res; - res = tempSpec->SetNativePath(pathBuf); + res = tempSpec->SetNativePath(NS_ConvertUCS2toUTF8(pathBuf).get()); Ideally, nsFileSpec would die and we wouldn't have this problem. There is a bug on that and it could block this one. In the short term, there is a workaround: nsFileSpec has an assignment operator which takes an nsString&. If pathBuf was just assigned to an nsFileSpec, it goes through Unicode conversion to the FS charset.
Here is another bad one i just fixed a bunch of: localFile->Append(NS_LITERAL_STRING(PREFS_FILE_50_NAME)); --pete (BTW: nice attachment table w/ the new version of bugzilla)
Is that the fix, or broken? If it's about having a macro inside another macro, I believe that in the NS_LITERAL_STRING case it all just work because of the NS_L macro used inside that. (yes, the string-guide needs updating ;).
Attached patch current snap posted for backup (deleted) — — Splinter Review
It's getting there. I have some more bad assignments I'm zooming in on. jag is the conversion to nsAutoString here going to increase the amount of memory mozilla uses? How efficient are these string classes? Do they clean themselves up after being used? --pete
===== On nsAutoString: nsAutoString is an object which in typical usage gets allocated on the stack, usually as a short-lived helper object (read: only extra bloat, if any, for the duration of the function). Internally it has a 64 unit (char or PRUnichar) buffer (also allocated on the stack) which it will use unless/until it needs more, in which case it will allocate. If we do allocate, we now have a 64 unit buffer (so either 64 bytes or 128 bytes) of "wasted" stack memory for the duration of the function. When the nsAutoString is destroyed, i.e. when you leave the function/code block it was constructed in, it will deallocate the buffer it allocated. The advantage of nsAutoString really comes when your strings are typically up to 64 units, or when you build up strings (through Append or += ) and know normally not to exceed that limit. Since it already has a buffer (and on the stack!) it won't need to do expensive (re)allocations. If it did allocate a buffer, it will keep using it until you tell it to deallocate it (SetCapacity), or it needs a larger buffer. The direct benefit of this is that you can reuse the nsAutoString and assign a smaller string into it without a penalty (see below for more on this). If you skip the bit about having a 64 unit buffer on the stack, you have your nsString :-) If you know your string usage to normally exceed the 64 unit limit and you're going to be reusing the string object, or appending into it, you can use something like this: nsString foopy; foopy.SetCapacity(256); // allocate once (255 + \0) This will give you a string with an internal buffer of 256 units (though we don't promise it, it's more a hint to the string implementation of what you're about to do with it, see nsAString.h). You can now for example pass it to a function which you know to give you a string less than 256 units (if it's larger, that's fine, it will just need to reallocate), and then append to that string without having to reallocate as long as you stay within those 256 units. Another thing you can do is iterate over a list of strings with variable length and only have to allocate once (if the longest string fits within the capacity you set). That's a lot of theory up there. Please ask questions if something's not clear). To summarize, I don't think memory usage will increase much (after you've gotten rid of a bunch of unnecessary conversions back and forth) and may even decrease. The string classes allow efficient use, but it's up to the programmer to know what to use when. And yes, they do clean up after themselves and don't make it easy for you to run into trouble with that (though it's certainly possible). ===== On unnecessary string code/copies: + nsAutoString fileurl; + file->GetURL(fileurl); + newValue = fileurl; newValue here is an nsAutoString, so you can just as easily pass in newValue. I guess this falls under your rule of "try to do 1:1 conversion first, then go clean up redundant string copies / string class usage" ===== On charPtr != 0 vs. !IsEmpty(): char* foo; // do something with foo if (foo) { nsCAutoString foopy(foo); CallSomethingWith(foopy); } If you change this to: nsXPIDLCString foo; // do something with foo; if (!foo.IsEmpty()) { nsCAutoString foopy(foo); CallSomethingWith(foopy); } You've actually created different code. |!foo.IsEmpty()| for flat strings is equal to |foo != 0 && *foo != 0| (so you have a pointer to a buffer, and the buffer is not an empty string). The original code above allowed for the buffer to be an empty string, which the converted code doesn't. Sometimes this is exactly what you want (and was poorly expressed, aka a bug, in the original code), sometimes it doesn't matter (because CallSomethingWith subsequently does a test for empty string, or when it's a no-op like str.Append("")), and sometimes you just plain want to allow for empty strings. With nsXPIDLString, which covers about 90%+ of the cases, you're fine, since a .get() call on it will tell you whether a buffer exists or not. Maybe for specific code there's a cleaner way, but .get() will get you the 1:1 translation. If you're dealing with a nsString or nsAutoString however, that will always have a buffer, so you'll need to give such conversion a little more thought. ===== On FooWithConversion: + nsAutoString essenFiles; essenFiles.AssignWithConversion(ESSENTIAL_FILES); + pathToCleanupUtility->Append(essenFiles); Try to avoid FooWithConversion, they're going away (since they're hiding for the UCS2 -> ASCII case that you're throwing away bits) In this case, you can either use: pathToCleanupUtility->Append(NS_LITERAL_STRING(ESSENTIAL_FILES)); Or #define ESSENTIAL_FILES NS_LITERAL_STRING("Essential Files") pathToCleanupUtility->Append(ESSENTIAL_FILES); More later, when I hope to have more time.
jag, thanks for the great info. It would serve well to perhaps post it into the the implementors guide so hackers like myself can reference it. Thats good to know about "AssignWithConversion" i'll go fix those now. Thanks --pete
Conrad, can you explain this code in nsProfile.cpp is doesn't make any sense. http://lxr.mozilla.org/seamonkey/source/profile/src/nsProfile.cpp#811 First there is the creation and assignment of a nsAutoString to null. nsAutoString currProfileDirString; currProfileDirString.AssignWithConversion(strtok(NULL, " ")); w/ this assignment it will always be null. Then right after there is a test to see if the AutoString is not empty. !currProfileDirString.IsEmpty() all so a null PRUnichar nsILocalUnicodeFile can be created? Can you clarify this code for me? Thanks --pete
Attached patch current patch (deleted) — — Splinter Review
jag, in this latest patch, i have about 19 errors from malloc. mozilla-bin in free(): warning: junk pointer, too high to make sense. Can you give it a once over and see if you can catch any. I'm still lookin. I also got rid of any calls i made to AssignWithConversion. Thanks --pete
these mods are far reaching enough that they warrant a branch IMO. once you're ready, people can pull that branch on various platforms and beat on it. can you head toward a branch?
Yea, branching is a good idea, however i'm not ready quite yet. Once i branch, i'm not going to have the benefits of merges when i update. Lets shoot for a branch sometime this week. Thanks --pete
man strtok: NAME strtok, strtok_r - extract tokens from strings SYNOPSIS #include <string.h> char *strtok(char *s, const char *delim); char *strtok_r(char *s, const char *delim, char **ptrptr); DESCRIPTION A `token' is a nonempty string of characters not occurring in the string delim, followed by \0 or by a character occurring in delim. The strtok() function can be used to parse the string s into tokens. The first call to strtok() should have s as its first argument. Subsequent calls should have the first argument set to NULL. Each call returns a pointer to the next token, or NULL when no more tokens are found. ----- So, the strtok(NULL, " ") will continue where the last strtok (one line higher) left us. The equivalent of this code would be something like: nsACString::const_iterator start, end, iter; cmdResult.BeginReading(start); iter = start; cmdResult.EndReading(end); FindCharInReadable(' ', iter, end); NS_ConvertASCIItoUCS2 currProfileName(Substring(start, iter)); while (iter != end && *iter != ' ') ++iter; start = iter; FindCharInReadable(' ', iter, end); NS_ConvertASCIItoUCS2 currProfileDir(Substring(start, iter)); Except I need to check in the nsACString& constructor for NS_ConvertASCIItoUCS2 first.
just an update. I have converted most of the nsLocalFile internal implementations to use autostrings. In doing so, a lot of the implementations now work and are a lot smarter and just "do the right thing". For example SetLeafName on unix is flawed. Currently you can do this: js> f.leafName='/foo/bar/baz/'; /foo/bar/baz/ js> f.path; /tmp//foo/bar/baz/ this is bad because you are now using this setter like appendRelativePath. Now we do the right thing: js> f.leafName='/foo/bar/baz/'; /foo/bar/baz/ js> f.path; tmp/foo js> These implmentations should translate nicely over to windows and mac because of the autostring mechanics used. --pete
Attached patch beginnings of a js nsIFile test suite (deleted) — — Splinter Review
It's all starting to come together now. One weird side affect using nsAString as a param. In the methods copyTo and moveTo if you want to use the original leafname of the current file, you now have to pass an empty '' instead of null. f.copyTo(d, ''); *not* f.copyTo(d, null); This will no doubt lead to confusion for implementors. I guess clear documentation will solve this problem. --pete
Attached patch current patch - in case i die or something ;-) (deleted) — — Splinter Review
Attached file more additions to the test script (deleted) —
Status: I have been testing the shit out of the individual nsIFile implementations and fixing a lot of tough problems mostly related to symlinks and are long standing lingering issues. It is shaping up. I wish to continue testing and then early next week perhaps we can cut a branch for this work. --pete
Attached patch curent snap (deleted) — — Splinter Review
Attached patch current (deleted) — — Splinter Review
Jud and company. I am going to be tied up for the next three weeks solid. I landed some contract work porting windows code to linux. I doubt i will be able to touch this until after then. Thanks --pete
Attached file nsLocalFileUnixDefs (deleted) —
Blocks: 100676
pushing back
Target Milestone: mozilla0.9.5 → mozilla0.9.7
Conrad, have we decided what args we are going to use here? Keep nsAString? or use PRUnichar? I have a little time now and want to get this patch going again. --pete
> Keep nsAString? or use PRUnichar? I like nsAString.
Attached patch compiles and runs on the trunk (deleted) — — Splinter Review
Ok, this last patch posted looks good on unix. I have a few calls to free i need to track down but it runs and is solid. Next, i plan on taking out the entire io dependency on uconv. At this point using this patch it is not needed. I would tackle the win32 stuff, but i don't have a win machine. ;-) After things are tight, i paln on finishing a brutal test suite. Then i will start to clearly document the API's. I obviously can't do this in succession due to time restraints, but it will get done and it will be rock solid when finished. --pete
NS_NewLocalFile(const char* path, PRBool followLinks, nsILocalFile* *result); openANSIFileDesc(in string mode); NS_GetSpecialDirectory(const char* specialDirName, nsIFile* *result) attribute string persistentDescriptor; These are the methods i have left to convert. persistentDescriptor remains as string. What about the others? ok to convert over to AString? Thanks --pete
> openANSIFileDesc(in string mode); I'd keep that one as a c-string. It's just a string like "w", "r", etc and having it as a Unicode string would just mean the impl would have to convert it before passing it to fopen(). > NS_GetSpecialDirectory(const char* specialDirName, nsIFile* *result) This has to remain as a c-string. The specialDirName is an nsIProperties property name (which are c-strings), not a file name. You could rename it to "dirPropName" though.
pete - I filed a bug for the uconv dependency in nsI*File a few weeks ago - bug 100676.. nobody has done any work there though. do you want to just roll that bug into this one or work on it over there? Glad to hear you're tackling this, hopefully I can help on windows..
Yea, Alec roll that into this one. Thanks --pete
Conrad, i think what we are shooting for in this bug is landing it in stages. 1- API change 2- removing FileSpec 3- removing uconv from io Sound good? --pete
I was thinking it would go in the reverse order from that. If we change all methods on nsIFile to be Unicode-only, we are forcing all consumers to use methods which go through Unicode -> FS charset conversion which has problems. Getting rid of that conversion is, to me, the reason for making this API change, but that requires change to the back-end impls of nsLocalFile.
Conrad, since unix supports UTF-8 i should be right on track here. The first thing i did was make the changes to the implementations. You said you have the mac stuff working, so we just need windows implemented to land to API change first. Are you saying that there are problems using NS_ConvertUCS2toUTF8 on unix platforms as well? --pete
I have to agree with conrad - I don't see the rush in switching the API if the code behind the correct APIs are not doing the right thing. As conrad explained to me: 1) fix the implementation behind the unicode side to call the appropriate OS-specific unicode-based system calls. This requires platform experts. 2) fix the consumers in a way that makes sense within the context of each consumer 3) when consumers are completely migrated, then change the APIs. Why? (this is mostly from conrad, but I went to see for myself, and I am with him on this) your patch has a lot of NS_ConverUCS2toUTF8, which means that we're assuming that a) the filesystem can handle UTF8 and isn't just going to barf on the high-bit characters b) that the filesystem is in fact utf8. Imagine if the filesystem's charset is not utf8.. then we're writing out this odd character string using the |const char*|-based system APIs. Other programs in the system may try to read that filename as if it were the in the system charset, and get a garbage filename. There are pleanty of other strange scenarios that I can think of as well.
> 1) fix the implementation behind the unicode side to call the appropriate > OS-specific unicode-based system calls. This requires platform experts. Most unix systems pre glib 2.2 do not have unicode-based system calls or UTF-8 environment locales. At best there may be some partially implemented wchar.h libs. All that most current systems have is the `wcstombs' family of methods to do wide char conversions. So the only logical solution is hand the methods UTF-8 encoded characters since they are directly based on the ASCII encoding. In most cases this will be the basic latin character set anyway so there is no problem. If the machine is using say a Japanese character set for the system locale and is writing these characters, again no problem, we are just handing off the UTF-8 encoded characters in wide bytes, the kernel doesn't know the difference. The problem arises when say a system that can only handle basic latin is handed the UTF-8 high bit encodings of Japanese characters. They are now UTF-8 encoded chars that can't be mapped to the systems locale. In this case, it is a system level problem. There is just no way to interpret an extended char set if the system doesn't have the locale support for it. 2) fix the consumers in a way that makes sense within the context of each consumer What do you mean by consumers, accessors of the nsIFile interface? > b) that the filesystem is in fact utf8. Imagine if the filesystem's charset is > not utf8.. then we're writing out this odd character string using the |const > char*|-based system APIs. Other programs in the system may try to read that > filename as if it were the in the system charset, and get a garbage filename. > There are pleanty of other strange scenarios that I can think of as well. No we are writing out bytes that are going to be either relivant or not on the particular system. For example: 0x03b6 It's just a number to the filesystem. My term can't display it but it is a valid character that is unsupported by my local system. Whether the system can interpret that number to a character is another story. If it can't then the chars didn't come from that system. The way i see it is it is broken now, and will remain broken no matter what we do if the system can't support and extended character base right? If my system can't support a hebrew charset and i try to write a hebrew file name, it just isn't going to happen. We need a unicode *and* system expert on this matter. ;-) --pete
Ok, with the current implementation i have on my local box, i just saved a file w/ this name: (See if bugzilla takes to this, escape for example) /tmp/&#1040;&#1041;&#1042;&#1043;&#1044;&#1072;&#1073;&#1074;&#1075;&#1076;.txt It writes to my file system but my term can't properly display the filename because there are UTF-8 chars it can't map to. It comes up as: '?????01234.txt' when ls the dir. Currently mozilla won't let you even save this filename. So i think this is a step in the right direction here. --pete
Trying text from clipboard: /tmp/??????????.txt See if this works. --pete
Heh, looks like mysql can't deal w/ it either. Oh well. Anyway the point i'm trying to make is we can write this out to the file system now, it's up to the systems locale to interpret it. For unix using UTF-8 should the reasonable solution in this case. --pete
ack, that just seems lame. Let's say I do a File->Save Page As... and save it as my name, but my name is in Japanese. You're proposing we ignore the system locale (right?) and save it in UTF8. Now I go to open it with emacs, and look for my file. Instead of seeing my name, I see some garbled string. that's not cool.
Here is the problem as i see it. If a local system doesn't support unicode conversions natively. This is the case on many unix systems pre glib 2.2. We have a mozilla string that has been converted to UCS2. Our strings original locale charset mapping is now gone in place of unicode. We are now ready to convert this string back and pass it off to a libc call. We have no native converters. If i use wcstombs it is expecting the first conversion was made based on the systems locale where mozilla converted using ISO 10646 (i beleive). The resulting conversion back to mutlibyte string will be incongruous. The systems that support a native unicode conversion should be fine. The problem is that a large number of systems out there don't. At least w/ UTF-8 there is a shot that if the system is ISO_8859-1, the chars will map correctly. If your locale is Japanese, you have the glyphs and are running in UTF-8 you'll see your file. I just see a great deal of platform specific work to accomodate the few unix systems that have good unicode support implemented. Comments from any unix hackers out there? --pete
adding shaver for unix filesystem advice
Gack. Gimme a day to digest this? There's a lot of content here, and I'm ear-deep in kernel MM internals and the linker.
> At least w/ UTF-8 there is a shot that if the system is ISO_8859-1, the chars > will map correctly. Is there a way to determine this at runtime and create the right implementation of nsLocalFile (internally, completely hidden to consumers) which takes advantage of this and another one which reverts to using uconv to convert from UCS2 to the FS charset only if we're running on a lame machine? That's what I'm planning on doing on the Mac. It's possible to tell at runtime if UCS2 file system APIs are present (they are on all but the most decrepit Macs) and, if so, they'll be used.
Yea, you can check for glib 2.2 or see if __STDC_ISO_10646__ is defined. UCS2 is no good on unix. UTF-8 is preferred. Can someone tell me the big win here using double byte chars by default? It seems more logical to implement a chage like this when there is wider native support for uinicode. Since most of these implementations are wrapping libc file system calls that want single byte chars anyway. It seem to me at this point it would be better to focus on making making the existing nsIFile implementations rock solid. Do we really want to be adding changes like this even if incremental, at this stage of mozilla pre 1.0? Or is default nsIFile unicode support a great nsIFile2 task? --pete
Looking over the original goals why don't we revise the objective - NS_FILE_CONTRACTID should go away (IMPLEMENTED) - methods w/ [const] can remove those usages as "in" parameterspecification implies "const" (IMPLEMENTED) - remove all foo*FollowLinks and use boolean flag followLinks (IMPLEMENTED) - fully document semantics of the methods. make assumptions clear (TODO) - spawn should go away. (IMPLEMENTED) - nsILocalFile derrives from nsIFile (IMPLEMENTED) - Write intensive test suite, debug fix and stablize (STARTED) I have most of this implemented on unix already. I will obviously have to redo everything to revert back to char * params. But these ojectives make more sense to me. --pete
> Can someone tell me the big win here using double byte chars by default? It avoids bad conversions between Unicode and the current system char set on systems where the file system *is* Unicode. When we want to store the path to a file in, say, the profile registry, we use Unicode paths. When the Unicode path is later read from the registry and then passed to InitWithUnicodePath, the Unicode path is converted into a char* path by code in nsLocalFileCommon. If the system char set when InitWithUnicodePath is called is different from when the path was retrieved by GetUnicodePath, that conversion will be wrong. If the API is Unicode-only and the implementation uses Unicode FS routines, we never have to do any conversions. If that's not possible on Unix like it is on Windows (NT, 2K) and Mac (OS9, OSX) those implementations can be changed and weaned off of the conversion code in nsLocalFileCommon.
Agree, that makes sense. On unix we can do it for glib 2.2 systems as well. But that implies we are still keeping the unicode API's. Then we are just getting rid of the uconv dependencies where possible. Ok, sorry for the misunderstanding. --pete
How about a solution like this for unix systems. // Unicode interface Wrapper NS_IMETHODIMP nsLocalFile::InitWithUnicodePath(const PRUnichar *filePath) { wchar_t wpath[BUFSIZ]; char path[BUFSIZ]; size_t len = 0; while (filePath[len] != L'\0') wpath[len] = filePath[len++]; wpath[len] = L'\0'; wcstombs(path, wpath, BUFSIZ); return InitWithPath(path); } Can anyone see any problems w/ this? It is just converting the PRUnichar to the system wchar. Then wcstombs can take care of the systems current locale. Thanks --pete
Turning it into a macro we can ifdef the hooks for each system in nsLocalFileCommon or should we just implement them in each platform nsLocalFile*.cpp file? #define SET_UNIX_WS( func , arg ) \ { \ wchar_t wpath[BUFSIZ]; \ char path[BUFSIZ]; \ size_t len = 0; \ while (arg[len] != L'\0') \ wpath[len] = arg[len++]; \ wpath[len] = L'\0'; \ wcstombs(path, wpath, BUFSIZ); \ return (func)(path); \ }
pete: don't do len++ in the right-hand-side of an assignment to foo[len] if you want the index to be the pre-incremented value. (Also, nit: 3-space indent, then 2?!). Can we tune for the case where sizeof(wchar_t) == sizeof(PRUnichar) and just cast? /be
> or should we just implement them in each platform nsLocalFile*.cpp file? Yes. On other platforms, the file system takes unicode directly so we don't need to do this kind of conversion. Separating out the unicode wrapper routines is bug 103384.
> if you want the index to be the pre-incremented value. In this case Brendan we don't want the pre increment, we are starting at 0. > (Also, nit: 3-space indent, then 2?!). I thought "when in Rome" . . . > Can we tune for the case where sizeof(wchar_t) == sizeof(PRUnichar) > and just cast? A more complete macro: #define SET_WS( func , arg ) \ { \ char path[BUFSIZ]; \ if (sizeof(wchar_t) == sizeof(PRUnichar)) { \ if (wcstombs(path, (wchar_t *)arg, BUFSIZ) < 0) \ return NS_ERROR_FAILURE; \ } else { \ wchar_t wpath[BUFSIZ]; \ size_t len = 0; \ while (arg[len] != L'\0') \ wpath[len] = arg[len++]; \ wpath[len] = L'\0'; \ if (wcstombs(path, wpath, BUFSIZ) < 0 ) \ return NS_ERROR_FAILURE; \ } \ return (func)(path); \ }
This won't work in the case where we force wchar_t to be short (-fshort-wchar, see http://lxr.mozilla.org/seamonkey/source/configure.in#1968).
Pete, what value of "Rome" uses 3, then 2 space indentation? Not the nsLocalFile impls I've hacked on. My point about len++ is order of evaluation, not pre- vs. post-increment. You are assuming that the ++ takes place after the = operator. Standard C has no "sequence point" between the left and right hand sides of =, leaving it up to the impl. to decide order of effects (left-hand-side-of-assignment-address-calc can come before incremented-value-store, or vice versa -- it's not specified). /be
> Pete, what value of "Rome" uses 3, then 2 space indentation? Not the > nsLocalFile impls I've hacked on. nsLocalFileCommon.cpp has `3' space indentation. Why on earth would i use three space indentation? But don't worry, i will be moving the implementations to nsLocalFileUnix.cpp where it will have righteous indentation. The world will be a better place. Ok, i see. I will increment after the assignment. while (arg[len] != L'\0') { wpath[len] = arg[len]; ++len; } Is the explicit increment i want here. Thanks for the tip. Jag if wchar_t is forced to be a short, then the check that Brendan suggested accommodates this scenario. Because we know PRUnichar is a double byte char. if (sizeof(wchar_t) == sizeof(PRUnichar)) wcstombs(path, (wchar_t *)arg, BUFSIZ) --pete
So the problem I'm seeing is that if we force sizeof(wchar_t) to be 2 where it's normally 4, we'll be calling wcstombs with a buffer of 16-bit chars when it's expecting a buffer of 32-bit chars.
Not sure about this one, my version of gcc doesn't support that flag. Worst case, i can always typedef it back to: unsigned long wchar_t --pete
You would have to detect in which cases we forced wchar_t to be 2 byte where normally it would have been 4 byte. This is actually doable, though we'll need to add a define to configure.in in order for us to know when we're forcing short wchar. The harder part is that you can't use typedef to change a standard type's definition. So we'll have to use a type that'll be binary compatible with wchar_t (both in size and in signedness, meaning we'd probably have to have two additional defines, or infer the above define from the existence of these two). It can all be done, it's just a matter of elegance...
Or, if the only reason we are using the two byte -fshort-wchar flag (where available) is for conveniently setting HAVE_CPP_2BYTE_WCHAR_T so we can typedef it directly to PRUnichar, then maybe it would be wiser to just leave wchar_t in it's default state of 4 bytes. It seems that by forcing the two byte wchar_t is making an assumption that we are not going to call any native wide string functions that want the default 4 byte wide buffer. Any reason not to pull out this check and leave unix systems using PRUint16 for PRUnichar? --pete
Using 2-byte wchar_t allows Linux to use the L"foo" version of NS_LITERAL_STRING, which I don't think we want to throw away. (I know I sure don't).
Actually i think you *can* override the typedef. This seems to work fine: typedef PRUint32 __wchar_t; This explicitly sets wchar_t to 4 bytes within the scope of the caller. I just tested it on linux and BSD and it works fine. wchar_t is just a typedef anyway and not a "standard type", so this looks like the way to go here. I'm going to proceed using the above unitil someone persuades me not to. --pete
My unicode wrapper macros will now look something like this: #define SET_WS( func , arg ) \ { \ char str[BUFSIZ]; \ typedef PRUint32 __wchar_t; \ typedef __wchar_t wchar_t; \ wchar_t wpath[BUFSIZ]; \ size_t len = 0; \ while (arg[len] != L'\0') { \ wpath[len] = arg[len]; \ ++len; \ } \ wpath[len] = L'\0'; \ if (wcstombs(str, wpath, BUFSIZ) < 0 ) \ return NS_ERROR_FAILURE; \ return (func)(str); \ } #define GET_WS( func , arg) \ { \ nsXPIDLCString str; \ nsresult res; \ typedef PRUint32 __wchar_t; \ typedef __wchar_t wchar_t; \ if (NS_SUCCEEDED(res = (func)(getter_Copies(str)))) { \ wchar_t wpath[BUFSIZ]; \ if (mbstowcs(wpath, str.get(), BUFSIZ) < 0) \ return NS_ERROR_FAILURE; \ size_t len = 0; \ PRUnichar wbuf[BUFSIZ]; \ while (wpath[len] != L'\0') { \ wbuf[len] = wpath[len]; \ ++len; \ } \ wbuf[len] = L'\0'; \ *arg = nsCRT::strdup(wbuf); \ } \ return res; \ }
wchar_t is defined in ISO 10646. Why do you need the double typedef, rather than just a cast when calling wcstombs/mbstowcs? (If you prefer a typedef rather than a cast, for whatever reason, why do we need two, instead of just typedef PRUint32 wchar_t; ?
From "Accelerated C++", on built-in types: ``There is also a ''wide character'' type, called wchar_t, which must contain at least 16 bits, and is intended to be used for representing characters in languages such as Japanese, which have more characters than the Latin alphabet provides. The wchar_t type is required to behave the same way as one of the other integral types. The particular other type involved depends on the implementation and is normally chosen to yield the most efficient representation.'' I don't think you can safely assume that all unix compilers have wchar_t behave the same way as an unsigned 32-bits integral type. Secondly I don't think your typedef will work on all compilers, even though it happens to work with the compilers you used on linux and BSD (it surprised me, since wchar_t should be treated the same way as char, I would think). I still think our best bet is to add an autoconf test that stores the wchar_t's "system" size and signedness so we can typedef SYSTEM_WCHAR_T system_wchar_t and use that in your macros. SYSTEM_WCHAR_T could simply be wchar_t if we don't have to -fshort-wchar (instead of embedding that logic in your macros).
If i don't typedef the two, the compilers warns. nsLocalFileUnix.cpp:238: warning: passing `unsigned int *' as argument 2 of `wcstombs(char *, const __wchar_t *, unsigned int)' changes signedness As far as just casting, thats how i had it originally, but wcstombs should take a 4 byte buffer, handing it a two byte buffer is asking for trouble i think. I'll go w/ Jag's suggestion here. --pete
Attached patch this would be the correct autoconf test . . . (deleted) — — Splinter Review
Attached file new file (deleted) —
Ok, i pulled unicode conversions out of the unix builds. I will do windows next. Conrad has the mac stuff. --pete
Attached patch proposed implementation for windows (deleted) — — Splinter Review
Intentional error returns are: NS_ERROR_FILE_INVALID_PATH NS_ERROR_FILE_ALREADY_EXISTS If you see any other errors in the test output, then there is a problem. --pete
Hey conrad - check out bug 110371 - I found with nsFileSpec that there are actually very few consumers of the two routines that required conversion, and they were all in mail. The same may be true of the nsFileSpec routines in nsLocalFileUnicode. once I land bug 110371, I'll take a look at these too.
Blocks: 99160
pushing back
Target Milestone: mozilla0.9.7 → mozilla0.9.9
Is this really going to land for .99. Or can we continue to use and support the current nsIFile?
Comment on attachment 54447 [details] [diff] [review] proposed implementation for windows So in general I like what's going on here. Why don't we #ifdef stuff out and get this landed on a platform-by-platform basis? sr=alecf on the windows stuff
Attachment #54447 - Flags: superreview+
Comment on attachment 54446 [details] [diff] [review] added some logic to test for null pointers, got rid of damn three spaces this part looks good but where is nsLocalFileUnixDefs.h? (or is it in a patch above? I didn't feel like checking them all)
Attachment #54446 - Flags: superreview+
We can't land this stuff at this point. This is to big a change IMHO way to risky. At the time i posted the patch back in October, it was humming nicely and pretty well tested on unix. I have since landed nsIFile stuff that will break these patches. I only did the unix stuff. The windows and mac need *lots* of love. I don't have access to a win or mac to develop which sucks. ;-( Not sure how the wide char conversions will fly on all flavors of unix either. If i had full time to devote to this, doing it wouldn't even be an issue. But w/ my current rate of one small patch per week, there is no way i can make it fly. If you guys want to go for it, i can handle the unix end of things. I will need to get a new patch in order as this one will certainly fail. --pete
ok, I'll see what I can do to at least get windows landed.. then we can procede with linux and mac.
I'm putting together the Mac equivalent of attachment 54447 [details] [diff] [review].
The good news is that I've landed most of the real important bug fixes for nsLocalFIleUnix in separate bugs. So what i still need to do are: - check in a fix for followlinks to work corectly (bug 57995) - conver the API's over to use AStrings - remove the specific links methods like permissionsOfLink, copyToFollowingLinks etc I have been breaking this patch into pieces and landing fixes in seperate bugs. Thanks for helping out w/ windows. Conrad is on the mac stuff. If someone wants to mail me a cheap box I can borrow /w windows on it, then i'm up for it. ;-) --pete
Shouldn't attachment 54446 [details] [diff] [review] and 54447 be on bug 100676? Both of those patches and the one I now have for Mac don't do anything WRT the nsIFile API but, instead, use native conversion instead of uconv.
Yes, you are right. When i break apart this patch i will post the uconv code to bug 100676. --pete
The reason it was in this bug, is because using AStrings in nsIFile methods is an API change. ;-) --pete
this is true. I guess I was using that bug as part tracking bug, part minor-cleanup-patch-container :)
pete how much of this is left to go? Does this bug cover removal of the UNICODE function in favor of a soon-to-be-here utf8 string (which is the only change that I want to see)?
We still need to land bug 100676 on unix. I am waiting for testing and review. Since everything in this bug is being done incrementally in seperate bugs, perhaps we should break out removing the unicode methods and using AStrings as params into it's own bug. --pete
Here's what I'd like to see happen: 1. Combine all the char*/PRUnichar* pairs of methods to one method which takes an nsAString (or the soon-to-be utf8 string doug mentioned) 2. Drop the ...FollowingLinks methods and make sure the implementations actually observe the followLinks attribute. 3. On nsILocalFile, move reveal() and launch() to some service. (Maybe) 4. On the directoryEntries attribute, be specify whether the entries that are returned by the enumerator have symlinks resolved. This is independent of the followLinks attribute of the parent. The items returned should not just inherit this attribute from their parent. 5. Add a relativeDescriptor attribute to nsILocalFile. This is needed to get XP, relative paths in prefs and such instead of the absolute native paths being used now. Anything else? I'd like to get nsIFile.idl and nsILocalFile.idl nailed down soon as a roadmap for other bugs.
okay. we have 5 weeks. at that time, either we ask that mozilla 1.0 be delayed or we freeze what we got. who is signing up for what?
The only thing i'm *real* concerned with are regressions. - Pulling out the followLinks calls - replacing char* params w/ unicode - making followLinks work correctly - conrads other numbered items nsIFile is reasonably stable at this point in time. So i would suggest that only the changes that are *absolutely necessary* be made and we lean towards a conservative effort to freeze the interfaces and tighten/debug their implementations. Each change we make will cause a new wake of related bugs. --pete
I feel your pain. However there are some thing that should be done for API correctness. I can remove the UNICODE methods in favor for the utf8 string stuff. Pete, can you write up a bug on this? I am sure that Alec would like to see less unicode stuff in xpcom where possible. > 2. Drop the ...FollowingLinks methods and make sure the implementations actually observe the followLinks attribute. I agree. Promote the attribute |followLinks| to the nsIFile from nsILocalFile. If you ever have to "not follow links", you can clone and set this flag to false. > 3. On nsILocalFile, move reveal() and launch() to some service. (Maybe) I agree. this should *not* be part of the API freeze. Especially Launch. Too confusing with nsIProcess. > 4. On the directoryEntries attribute, be specify whether the entries that are returned by the enumerator have symlinks resolved. This is independent of the followLinks attribute of the parent. The items returned should not just inherit this attribute from their parent. I disagree. Why not have the enumerator inherit the attribute from the parent? > 5. Add a relativeDescriptor attribute to nsILocalFile. This is needed to get XP, relative paths in prefs and such instead of the absolute native paths being used now. I disagree. This, if anything, should be a service of some kind. Takes a nsIFile and return a relative persistant descriptor. This does not have to be part of the nsILocalFile api freeze. On items (2) and (3), are their any voluneers? Conrad, your suggestions, do you got any time for some good lovin'?
nah, I love unicode - I just don't like to see unicode semantics (i.e. case conversion/comparison) in xpcom :) I'm sort of up in the air on the utf8 stuff. I kind of think that more code, when written 'correctly' is going to be dealing with unicode strings for filename parts, not utf8... which means we're going to incur an overhead when we do the utf8 conversion.. however it's probably not that bad, and little of this stuff is probably used in code that must be performant... but the fact that it's not likely performance related means we don't necessarily need the overhead of a utf8 class, the standard wstring might do just fine. I don't know.. in any case I think the ASCII/Unicode method pairing has got to go and I'm more than willing to help with tree-wide reviews, no matter which way it goes.
> I disagree. Why not have the enumerator inherit the attribute from the parent? If you have a file which points to a directory through a symlink and you want to enumerate it but don't want the children to be resolved (making a directory listing). You have to resolve the parent or you can't enumerate it. In order to do this now you would have to: 1. see if the file is a symlink 2. if so, get its target - can't just set followLinks(false) on the file or it points to the wrong thing. 3. set followLinks(false) on the target 4. then enumerate That's a pain. Why not just make it easy and obvious and add a boolean param? > I disagree. This, if anything, should be a service of some kind. Takes a nsIFile and return a relative persistant descriptor. Well, we have the persistentDescriptor attribute, so a relativeDescriptor attribute should be alongside it. The reason I don't think it should be done by a service is that the descriptor pertains only to that file, so the file object should implement it. I'm in favor of moving things to services where something else (say, a process) takes some action on a file but not things like this which are identity attributes. > On items (2) and (3), are their any voluneers? On (2) I made the Mac impl observe the followLinks attribute. Pete, didn't you do the same for Unix? If somebody can make the Windows impl observe, I'll remove the methods from the API. I'll do (3) if I can come up with a better (existing) place to put reveal() and launch()
>> I disagree. Why not have the enumerator inherit the attribute from the parent? I take my objection back. >> I disagree. This, if anything, should be a service of some kind. Takes a nsIFile and return a relative persistant descriptor. >Well, we have the persistentDescriptor attribute, so a relativeDescriptor attribute should be alongside it. The reason I don't think it should be done by a service is that the descriptor pertains only to that file, so the file object should implement it. I'm in favor of moving things to services where something else (say, a process) takes some action on a file but not things like this which are identity attributes. There is a big difference between a persistent descriptor and a relative decscriptor. The former is always known by the object. The relative descriptor has a key question that needs to be answered before any meaning can be applied: Relative to what? The nsLocalFile object can never know all of the relative locations that can be created. So, I think that we should abstract this and make an interface which takes a nsIFile and returns a string which represents some relative location. There should also be a method that does the inverse. Examples would be a "res" implementation that would have relative results from the resourse directory. There could be a XPCOM Component implementation which hands back a string relative to the component direction. With all of that said, I do not think that this is required for mozilla 1.0. Or at least none of the requirement that I have read require this. > I'll do (3) if I can come up with a better (existing) place to put reveal() and launch() Great - make it happen. Do you need a bug #?
Doug, one more pitch... The location from which it's relative is known because it's an input param. Here's what I was thinking: =================================================================== RCS file: /cvsroot/mozilla/xpcom/io/nsILocalFile.idl,v retrieving revision 1.15 diff -w -u -2 -r1.15 nsILocalFile.idl --- io/nsILocalFile.idl 31 Jul 2001 19:05:28 -0000 1.15 +++ io/nsILocalFile.idl 21 Feb 2002 20:24:37 -0000 @@ -107,4 +107,28 @@ attribute string persistentDescriptor; + + /** + * getRelativeDescriptor + * + * Returns a utf-8 encoded relative file path in an XP format. + * It is therefore not a native path. + * + * @param fromFile + * the file from which the descriptor is relative + */ + string getRelativeDescriptor(in nsILocalFile fromFile); + + + /** + * setRelativeDescriptor + * + * Initializes the file to the location relative to fromFile using + * a string returned by getRelativeDescriptor. + * + * @param fromFile + * the file to which the descriptor is relative + */ + void setRelativeDescriptor(in string relativeDesc, in nsILocalFile fromFile); + I had originally thought the relative location would, instead of a file, be a directory service key. Shaver pointed out on porkjockeys that this would require N getService calls to resolve N descriptors which were relative to the same location.
In the best Cuba Gooding Jr. impression: "show me the code! show me the code! show me the code!" okay. I like it. :-)
string != UTF-8. See bug 84186 for more than you really want to know.
Oh yeah: we do something like "getRelativeDescriptor" at http://lxr.mozilla.org/seamonkey/source/xpcom/components/nsComponentManager.cpp#2373 or so. Just for reference: it's cheap and easy, and it works everywhere (I recall asking smfr about Mac-friendliness, I think). Which makes me wonder whether we need an API call for something that amounts to a tiny bit of string math. If so, do we need setRelativeDescriptor when we have append? What's the difference?
Because the component mgr is doing that with native paths. The point of what I'm after is that it's XP, i.e. a prefs file written on Unix is readable on a Mac.
I can work on (2), unix/windows If we aren't going set followLinks when we initWithPath, then we need to make it a setter. Right now it is readonly. FollowLinks has been broken for a very long time, i wonder what kind of surprises we'll find when it actually works. eg: when followLinks is fixed and is set to true, when you call GetPath and the file is a symlink, you will get the links ultimate target which is *not* the behavior we have now. As shaver would say, "I'll eat my hat" if this doesn't cause some regressions somewhere. Just a heads up. --pete
Doug, what are the new params going to be? Originally it was AString. Has that changed? It will be better to do this first, since follow links fix will touch many of the nsLocalFileX methods, it would be best to do it only once using the new param type. I agree w/ Shaver on local file path params. Since the *vast* majority are using char* 's, why do away w/ the double byte methods? We are soon to be completely free of uconv in xpcom. Why not leave them there for wide char consumers only. The rest of the world can use char *'s. Less conversions to make, single byte chars are less memory consumption. If we make a universal param change it limits consumer options by forcing the use of a string class or double byte chars when single byte will do just fine. --pete
Target Milestone: mozilla0.9.9 → mozilla1.0
Keywords: topembed
Keywords: mozilla1.0+
i *think* that the less we touch this interface the better. Changing nsIFile to use the new nsAUT8String *may* be a lot of work. If it is, I do not see how I can go to drivers and ask for permission for a zero sum gain. Thoughts?
It's not zero gain, it's large. Here is the problem with having nsIFile methods which are not Unicode. (1) Boot your machine in JA locale and create some files with multi-byte chars. (2) Boot again in US locale. (3) Try and use the files. It will fail because, if the names of the files created in (1) are not stored as Unicode, they're now meaningless in the the current locale. If the names were stored as Unicode, mapping them to the native charset will be meaningless as well. The way to avoid charset conversion is to change the implementations of nsLocalFile to use Unicode OS file APIs. To enforce this, to ensure that charset conversion never happens with filenames, the nsIFile API has to have only Unicode methods. I admit, we probably can't do that with the implementations by 1.0. In the meanwhile, we can at least change to API to Unicode-only, do the charset conversion that we're currently doing, and improve the implementations later. If we can't get rid of the char*/PRUnichar* duality mess that we currently have on nsIFile, the API is not worthy of being frozen.
I dont by that arguement. The problem with is is that we do offer a persistant description which can be used and that should preserve any and all "unicode" state and do the right stuff if in your above example. We will **not** provide a unicode only interface. Most users just want char*'s. If we have to make any change, we will remove the unicode cruft and make these API pass UTF8 encoded string. >If we can't get rid of the char*/PRUnichar* duality mess that we currently have on nsIFile, the API is not worthy of being frozen. Worthyness has nothing to do with it. Looks at the windows API where there are wide and narrow APIs. Dualities exist. My thoughts are that we either do nothing or we removed UNICODE and convert to UTF8. I am leaning toward the former.
> My thoughts are that we either do nothing Doing nothing means that we take what we have now, fix it, stress test it then bake it for 5 weeks at 400 degrees. > (1) Boot your machine in JA locale and create some files with multi-byte chars. > (2) Boot again in US locale. > (3) Try and use the files. Agree this is certainly a problem, but in reality, how many people will be actually doing this? I think it needs to be clearly spelled out as a known 1.0 local file issue. I personally beleive that we will gain so much more by taking what we have and use this time to make it rock solid. --pete
If we do end up with non-wide methods, I think we ought to change the names, such that "Unicode" does not make a method special.. what I mean is that we have things like SetLeafName and SetLeafNameUnicode - they should either be the same, similar to windows' A vs. W, something like SetLeafNameNative() and SetLeafNameUnicode(), or we should drop the "Unicode" and add "Native", like SetLeafNameNative() and SetLeafName() That said, we also must document the charset that the non-wide APIs take, and I think our two choices are: 1) native charset 2) UTF8 I'm leaning towards the latter, just so that the API is the same across platforms AND locales, (the locale issue being more important) but unfortunately the existing code is (as far as I know) expecting the native charset, so we'd have to fix up all the callers as well. The ugly ones will be in mail, where they use nsFileSpec, which uses the native charset natively. finally, when we write out to disk, we must store it using the system's unicode APIs (or in the case of unix, convert using wcstombs or something similar) and let the system do the 'right' thing...
I think "SetLeafName" and "SetLeafNameUnicode" are a logical choice. Heres why. There is only one consumer of SetUnicodeLeafName (in cpp) and approximately 75 or so consumers for SetLeafName. Making the API change for the Unicode methods makes more sense since they are less used. Less regressions, less files touched. Alec's other points sound reasonable to me. --pete
I disagree - just because most of the tree is using SetLeafUnicode doesn't mean they are right - we should be encouraging the use of unicode (or at least utf8) and make the 'standard' APIs be i18n-friendly. I don't think that SetLeafName is i18n friendly because it expects the system charset, which isn't easy to come by.. basically I think we want to indicate to people that "unicode is the norm, but if you need to use the system charset, then use this differently-named API"
Conrad, what is the bug number for the relative path stuff? I want different bug for different tasks. What specific task does this bug talk about? Can we agree on what has to be done?
Doug, the relative descriptor stuff is bug 12911 (and it's nearly done) The other bug I made from this, to remove reveal() and launch(), is bug 127098
Created a bug 129279 which will track the unicode/utf8/ascii task. Lets move that dicussion there. Please look at 99160. Are there any other task or discussions which need to be addressed?
are there any tasked in this bug that are not coverted by 99160? I think not.
Pete I do not think that anything in this bug will be fixed for mozilla 1.0. 99160 will track any remaining fall out. Please marked either WONTFIX, or future this bug.
Target Milestone: mozilla1.0 → Future
mozilla1.0- in favor of 99160
Keywords: mozilla1.0+mozilla1.0-
Keywords: topembedembed, topembed-
This aint going to happen prior to mozilla 1.0. Marking WONTFIX.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: