Open Bug 267980 Opened 20 years ago Updated 2 years ago

URL with german characters not parsed correctly when protocol file:/ missing

Categories

(Core :: Internationalization, defect)

defect

Tracking

()

People

(Reporter: christophe_cornu, Unassigned)

References

Details

(Keywords: intl)

Attachments

(1 file)

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031128 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031128 Under a Linux Box (I am using RH9, german language set - LANG is de_DE.UTF-8) Create a folder with german characters äö as in below. Add any html file into it. /chrisx/äö/test.html Open Mozilla. Paste the following URL (that's an absolute path) file:///chrisx/äö/test.html It works. Paste the following URL: /chrisx/äö/test.html Mozilla brings up a dialog: The file /chrisx/%E4%F6/test.html cannot be found. Please check the location and try again. Verified with Mozilla 1.4 and Mozilla 1.7.2 It looks like when going through the code that 'guesses' the protocol that part of that code is not handling UTF8 correctly? Reproducible: Always Steps to Reproduce: See above Actual Results: Expected Results: Should be the same with or without file:/
> /chrisx/%E4%F6/test.html 0xE4 and 0xF6 are the representation of "äö" in ISO-8859-1. I tried the following cases under ko_KR.UTF-8 locale: /home/jshin/가각 (U+AC00 U+AC01) /home/jshin/가각/abc /home/jshin/ġĢ (U+0121 U+0122) /home/jshin/αβ (U+03B1 U+03B2) /home/jshin/äö /home/jshin/äö/abc The first four cases (Korean, Latin letters above U+0100, Greek letters) work fine while the last two (letters in [U+00A0, U+00FF]) don't work as reported here. This is very interesting because I expected all six cases to fail or to succeed together. It seems like somewhere we use 'if (uniChar < 0x100)' where we should use 'if (uniChar < 0x80)'. We have to test on Mac OS X and Windows to see if it's platform-specific or XP.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: intl
per code inspection, looks like it's win/os2/unix only (this includes mac) http://lxr.mozilla.org/seamonkey/source/docshell/base/nsDefaultURIFixup.cpp#461 521 if (PossiblyByteExpandedFileName(in)) { 522 // removes high byte 523 rv = NS_NewNativeLocalFile(NS_LossyConvertUCS2toASCII(in), PR_FALSE, getter_AddRefs(filePath)); 524 } 525 else { 526 // input is unicode 527 rv = NS_NewLocalFile(in, PR_FALSE, getter_AddRefs(filePath)); 528 } PossiblyByteExpandedFilename returns true for any non-ascii latin1 character... a better check would be to always try to interpret it as valid UTF-16 and check if the file exists. if it does, use that. otherwise, fall back to the LossyConvertUCS2 codepath.
Attached patch patch (deleted) — Splinter Review
I made a patch following cbie's suggestion. I'm wondering if we still need that fallback, though.
Assignee: smontagu → jshin
Status: NEW → ASSIGNED
Comment on attachment 164851 [details] [diff] [review] patch asking for r/sr
Attachment #164851 - Flags: superreview?(darin)
Attachment #164851 - Flags: review?(bzbarsky)
1) We have existing bugs on removing that docshell code, as I recall. All this mess is a workaround for the fact that our command line code is buggy. My questions as to whether this is still a problem in the relevant bugs went rather unanswered... jshin, could you look that stuff up, please? 2) If there is still an issue, I'd MUCH rather fix the command line code than hack this code yet more.
On Linux, I tested under ll_RR.UTF-8, ll_RR.ISO-8859-1 and ko_KR.EUC-KR. WIth the fallback code removed, it just worked, which is a bit unexpected (see the second paragraph from this) in a sense. When I added a couple of debug statements, it became a bit clearer as to why. When I invoked mozilla with '/full/path' on the command line, |ConvertFileToStringURI| is not invoked at all. I haven't checked where it's 'fixed up'. Unless there's a platform-dependency (other than '\' and vs '/'), which I don't think is the case , that long comment and fallback code can be removed. As for nsICmdLine* implementations, I don't see anything wrong (after a cursory look) [1]. Well, I don't like very much the fact that nsICmdLineService APIs return 'string' in the native encoding (and are still declared as 'scriptable'). However, that may be inevitable to avoid the round-trip between the native encoding and UTF-16. Perhaps, documenting clearly that those APIs return values in the native encoding would be necessary. What's suspicious is their consumers (especially nsAppRunner [2]) that apply '*WithConversion' and friends to what's returned by nsICmdLine as if they're ASCII-only. [1] I read comments on bug 58866 and bug 86948, but it seems to me that the problem in nsICmdLineService implementation that made necessary the hack in the current code has gone. [2] http://lxr.mozilla.org/seamonkey/source/xpfe/bootstrap/nsAppRunner.cpp#656
Attachment #164851 - Flags: superreview?(darin)
Attachment #164851 - Flags: review?(bzbarsky)
This code does the right thing (converting from the native encoding to Unicode): http://lxr.mozilla.org/seamonkey/source/xpfe/components/startup/src/nsAppStartup.cpp#850 This code can be simplified using NS_CopyNativeToUnicode as is done for nsProfile.cpp in bug 268266. An alternative is to change nsICmdLineService::URLtoLoad as talked about in bug 87127. Anyway, the fallback code can be safely removed.
(In reply to comment #6) > What's suspicious is their consumers (especially nsAppRunner [2]) that apply > '*WithConversion' and friends to what's returned by nsICmdLine as if they're > ASCII-only. Right. This code used to be called by such a codepath, hence the stuff about "byte-inflated" and so forth... > I read comments on bug 58866 and bug 86948, but it seems to me that the > problem in nsICmdLineService implementation that made necessary the hack in > the current code has gone. That's what it seems like to me to, but as I said in comment 58866 someone with a relevant Windows system would have to test removal of the fallback code.... In particular testing would need to be done on both initial startup and DDE invokations. Similar testing should be done for xremote. (Don't misunderstand me, as the comments in bug 87127 and bug 58866 indicate I would love to remove this code... if you're sure it's ok to remove, I'll gladly review the patch to do so.)
Blocks: 255344
Thanks for the explanation. I was also looking at XRemote code (as you suggested). I can test this on Windows, but I'm not familiar with DDE (and don't know how to test it. I'll google it :-)). Overall, I guess we have to put off removing the fallback code until removing it is demonstrated to be safe on all three tier-1 platforms. In the meantime, do you think it's worth fixing this bug only on branches using attachment 164851 [details] [diff] [review]? This is rather minor so that we can just release-note it if necessary.
(In reply to comment #9) > Overall, I guess we have to put off removing the fallback code until removing > it is demonstrated to be safe on all three tier-1 platforms. We should just aim to do that, and remove the code. CCing Javier in the hope that he can help out on Mac. > In the meantime, do you think it's worth fixing this bug only on branches > using attachment 164851 [details] [diff] [review]? I think pretty much any changes to this code are dangerous... if you're very sure the patch is safe, I guess we can try to put it on branches, but it won't make Firefox 1.0 anyway, so I'm not sure whether we care.
Tested the patch on the Mac, and it fixes the problem.
Thanks for testing. However, the patch still has the fallback code and I'm sure that it's safe. What we're talking about is removing the following + PRBool exist = PR_FALSE; + if (NS_SUCCEEDED(rv)) { + filePath->Exists(&exist); + } + if (NS_FAILED(rv) || !exist) { + // try deflating assuming it's byte-expanded + rv = NS_NewNativeLocalFile(NS_LossyConvertUTF16toASCII(in), + PR_FALSE, getter_AddRefs(filePath)); + }
Yes, still works with that bit of code removed. What's odd is that when I first open the file with non-ASCII characters, it gets escaped in the URL bar. If I go to the URL bar and remove "file://" from the escaped file name, it will fail to load, complaining that it cannot find the file. However, Mozilla opens the file just fine if I manually enter the file name (non-escaped).
(In reply to comment #13) > Yes, still works with that bit of code removed. Thanks again for testing. My Mac is back up again (after a mysterious post-10.3.6-upgrade syndrom :-)), I can test it, too. Can you tell me what's Mac OS X equivalent of Xremote (X11) and DDE (Windows)? That is, how can I pass a url to a running instance of mozilla? '-remote' option doesn't work on OS X. > What's odd is that when I first open the file with non-ASCII characters, it gets > escaped in the URL bar. If I go to the URL bar and remove "file://" from the > escaped file name, it will fail to load, complaining that it cannot find the > file. However, Mozilla opens the file just fine if I manually enter the file > name (non-escaped). That's just normal. Without 'file://', '/some/path/%41%42' is interpreted literally rather than 'file:///some/path/AB'. Unless you have a file '%41%42' in '/some/path', it should fail.
OS: Linux → All
Hardware: PC → All
(In reply to comment #14) > Can you tell me what's Mac > OS X equivalent of Xremote (X11) and DDE (Windows)? That is, how can I pass a > url to a running instance of mozilla? You know, I'm not quite sure. I think it just uses LaunchServices API to handle this. So if you set your browser as the default, and click on an HTML link in another app (Mail for example), then it will open it up in the browser. Ooh, actually, you can use the 'open' command from the command line. Just do "open -a .../MozillaDebug.app test.html".
Thanks, Javier. I'll play with it. Adding Masayuki to Cc for help with testing (see comment #12) DDE on Windows. (I can test it on Windows, but I rarely boot up Windows)
On DDE, it doesn't work. I try to test with 'file://D:/wwwtest/テスト/test.html'. But Mozilla read 'file://D:/wwwtest/eXg/test.html'.
Oops... I forget important comment. I tested on 'Current' build and 'Patched build', but both build failed my test.
Thanks for testing. 'テスト' in Shift_JIS is 0x83 0x65 0x83 0x58 0x83 0x67. Apparently, that was inflated to UTF-16 with AssignWithConversion or equivalent. How about 'mozilla \\D:\wwwtest\テスト\test.html' in Start | Run with an instance of Mozilla already running?
No, I use HTML Editor. The editor convert the path to file://. If I use explorer with DDE(WWW_OpenURL), Mozilla returns error message. The message is "'d' protocol is not supported". Should I test "mozilla %1"? Does not use DDE?
I tested "mozilla %1". If the code is removed(comment 12), it does _not_ work fine. But if the code exists, it works fine. On the removed case, mozilla return message following. 'd:/wwwtest/_e_X_g/test.html' is not found.
(In reply to comment #5) > 2) If there is still an issue, I'd MUCH rather fix the command line code than > hack this code yet more. Fixing the command line code is kinda 'put on hold' by Benjamin's plan for overhauling XUL(runner). See bug 98952. Should we go ahead with the patch with the fallback for 1.8 branch?
The nsICommandLine code was bug 276588 and has replaced all of this (in toolkit apps). If this is still an issue in FF1.5 it's caused by a different codepath.
QA Contact: amyy → i18n

The bug assignee didn't login in Bugzilla in the last 7 months, so the assignee is being reset.

Assignee: jshin1987 → nobody
Status: ASSIGNED → NEW
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: