Closed Bug 166735 Opened 22 years ago Closed 22 years ago

Unicode file/io in Necko-nsIOService

Categories

(Core :: Internationalization, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

()

RESOLVED INVALID
mozilla1.2beta

People

(Reporter: tetsuroy, Assigned: tetsuroy)

References

Details

(Keywords: intl)

Attachments

(1 file, 1 obsolete file)

Need to update the nsIOService to handle UCS2 FileName nsIOService::GetURLSpecFromFile() and InitFileFromURLSpec() patch to follow.
Status: NEW → ASSIGNED
QA Contact: ruixu → ftang
Target Milestone: --- → mozilla1.2alpha
dougt: can you review?
Keywords: intl
please see bug 166792. why do you want to do this only for XP_WIN? also, are you sure this is the right thing to do? won't it break copy/paste for file:// URL strings? won't you also have compatibility problems in file formats, etc.? i originally tried to do this when i worked on the nsIFile API changes, but there just seem to be way too many problems with using UTF-8 instead of the native charset for file:// URLs.
Depends on: 166792
Darin: thanks for your comment. This bug is one of changes we need for making mozilla an unicode application in Windows platform. ( unicode app means we call RegisterClassW(), DefWindowProcW(), GetOpenFileNameW(), CallWIndowProcW(), DispatchMessageW(), etc) You can find related bugs here : 58866, 9449, 104305, 162361, 162362 Those bugs are due to the fact that we are selectively using the locale base Windows system APIs. There is no way to fix these bugs except to register moz as unicode app and start calling W APIs. >why do you want to do this only for XP_WIN? - so that we dont' break everything at once and only Windows OS for now. >there just seem to be way too many problems with using UTF-8 - yes, I will be modifying NSPR, XPCOM/IO, Widget and Necko. (and more??) However, with above changes, we _now_ can open/save doc in unicode filename, windows title shows correctly. >won't it break copy/paste for file:// URL strings? Not sure. I haven't test this case yet. (thus MOZ_UNICODE :) ) As long as we don't call xxNativeFoo() functions, we should be ok. Fran, DougT, Wan-Teh and myself had a couple of discussions with my approach. Please advise us of any considerations.
Blocks: 9449, 58866, 104305
Depends on: 162361, 162362
as for the cut/copy/paste issue... if you copy a file:// URL encoded using UTF-8 into an application like Netscape 6.2, you'll be unable to load the file:// URL. so, you either have to make the cut/copy/paste code do the conversion, or you have to live with this deficiency. what operating system APIs require UTF-8 file:// URLs?
BTW: if you simply modify the native charset to be UTF-8 when encoding file paths as narrow strings, then you'd get UTF-8 file:// URLs automatically.
>native charset to be UTF-8 We thought of the same initially; but - we want to distinguish the differences between the native charset and UTF-8 in Windows OS Some modules may _really_ want the native charset. Necko may be a special case where it requires to store URI as UTF-8 (i could be wrong though). However, we want to keep the existing interfaces of XPCOM/IO either in UCS2 or in the native charset; but not three. +---------+-----------+---------+ | Necko | Widget | Others | | | | | +---------+-----------+---------+ ^ ^ | | (UCS2) (Native char) | | V V +-------------------------------+ | XPCOM/IO | | stores Paths as UTF8 | | (may be changed to UCS2) | +-------------------------------+ ^ ^ | | (UCS2) (UCS2) | | V | +---------+ | | NSPR | | | | V +-------------------------------+ | OS System | | | +-------------------------------+ In the near future, we want to change the XPCOM/IO::mPaths to be in UCS2.
so, if file paths are not going to use UTF-8, then why should file:// URLs be any different? they are supposed to have the same format. i just know you are going to hit a lot of regressions if you try to change this, and i don't see the reason for doing so. what WIN32 wide-API expects an UTF-8 file:// URL? in other words, why does the format of the file:// URL need to change?
>why does the format of the file:// URL need to change? We are not going to change the format of file:// URL. URL can be in UTF8. Problem here is that because we are calling nsIFile::GetNatvePath() in nsIOService which corrupts path name if you open non-ASCII filename in Win-En. We are trying to eliminate the calls to xxNativeFoo(). >what WIN32 wide-API expects an UTF-8 file:// URL? None. We use wide-API with UCS2.
Target Milestone: mozilla1.2alpha → mozilla1.2beta
Here is the list of bugs related to unicode file i/o: 58866 104305 151162 172337 173151 107941 154629 159773 169712 101573 104305 112031 112741 I haven't verified if all the above bugs get addressed by my attached patch; but, it will be a good start. :) We need to move away from FS file URLs.....
Comment on attachment 97877 [details] [diff] [review] Call Unicode APIs instead of calling NativePath() >Index: base/src/nsIOServiceWin.cpp >+ nsCAutoString ePath; >+ ePath.Assign(NS_ConvertUCS2toUTF8(ucsPath).get()); efficiency nit: NS_ConvertUCS2toUTF8 ePath(ucsPath); remember, this patch will make mozilla use a file:// URL format that is incompatible with older applications. we need to weigh this fact against the benefit of using unicode encoding. an alternative would be to hide file:// URLs from the user, which is what IE appears to do.
Attachment #97877 - Flags: needs-work+
Attached patch incorporating nit (deleted) — Splinter Review
darin: thanks. Would you review the patch? >an alternative would be to hide file:// URLs One more question, would you know where I can find the code where i can strip the 'file://' from URL before displaying to the user?
Attachment #97877 - Attachment is obsolete: true
strip 'file://' and unescape the URL before displaying to the user, of-course :)
i'm not sure where that is handled. it may not be centralized.
Comment on attachment 103918 [details] [diff] [review] incorporating nit >Index: base/src/Makefile.in >+# For Unicode mozilla >+ifdef MOZ_UNICODE >+DEFINES += -DMOZ_UNICODE >+endif shouldn't you really be utilizing the file mozilla-config.h that is generated after running configure or something like that instead of tweaking individual makefiles like this? also, i know i asked this question before, but why don't you just convert NSPR narrow API to use UTF-8? then wouldn't all of this be unnecessary? you could then make nsIFile::GetNativePath (and friends) return UTF-8. what am i missing? why wouldn't this work? e.g., necko just uses whatever nsIFile thinks the native charset is. it doesn't make any assumptions about the charset.
right, Darin. We wouldn't need to change Necko. Sorry about this. I can make this work with changes in nsIFile only. Marking invalid.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: