Closed Bug 244754 Opened 21 years ago Closed 20 years ago

URL is not shown in the status bar when I point at a link on a page encoded as 8-bit Unicode

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: berndt.soderstrom, Assigned: jshin1987)

References

Details

(Keywords: intl)

Attachments

(3 files, 1 obsolete file)

User-Agent: Mozilla/5.0 (Windows; U; Win 9x 4.90; en-US; rv:1.7) Gecko/20040514 Build Identifier: Mozilla/5.0 (Windows; U; Win 9x 4.90; en-US; rv:1.7) Gecko/20040514 When you point at a link in a page that is encoded as ISO-8859-1, the URL of the file that the link refers to is shown on the status bar, as it should be. However, when you point at a link in a page that is encoded as UTF-8, the URL of the file that the link refers to will not appear in the status bar. Reproducible: Always Steps to Reproduce: 1. Just point at a link within a page that is encoded as UTF-8. 2. Move the mouse pointer to another link within the same page. Actual Results: The URLs of the files that the links refer to didn't appear on the status bar. Expected Results: The URLs of the files that the links refer to should have appeared on the status bar.
Sample page showing this problem? Testcase showing this problem?
Berndt, do not send email to me with details of the problem. Please make comments and attach files directly to the bug. Thanks.
Attached file Test file encoded as UTF-8 (deleted) —
Attached file Test file encoded as ISO-8859-1 (deleted) —
I've found out that the problem depends on whether the URL of the link contains non-ASCII character. Within ISO-8859-1 documents, the URL of the link that you point at will be shown in the status bar, regardless of what characters there are in the URL. Within UTF-8 documents, the URL of the link that you point at will be shown in the status bar if it contains only non-ASCII characters. I have attached two test files, one encoded as UTF-8 and another encoded as ISO-8859-1. Download both files to the same directory; the name of the directory must contain at least one non-ASCII character (eg. Ã or è) in order for you to see the bug.
So it's the directory name that has to have a non-ascii character? Putting a non-ascii character in the href itself (in the document) doesn't show the bug?
(In reply to comment #6) > So it's the directory name that has to have a non-ascii character? Putting a > non-ascii character in the href itself (in the document) doesn't show the bug? Yes.
Can you reproduce this with a non-ascii path on an HTTP server? Or only with a local file?
I reproduced this both with local files and on an HTTP server. The actual directory name got strangely corrupted when creating it over FTP, but that is a separate issue. http://smontagu.org/testcases/%88%91%88/test1.html - the UTF-8 file http://smontagu.org/testcases/%88%91%88/test2.html - the ISO-8859-1 file
Status: UNCONFIRMED → NEW
Ever confirmed: true
Simon, thanks for the testcase! I assume that directory name is in ISO-8859-1? Darin, it sounds like a relative URI resolution issue (we fail to do it right, so end up with either no URI or a bogus URI that can't be decoded into Unicode). The URI objects in question are created with nsContentUtils::NewURIWithDocumentCharset. Could it be a problem if the base URI has one charset set but the relative URI is getting a different charset?
Assignee: general → darin
Component: Browser-General → Networking
OS: Windows ME → All
QA Contact: general → benc
Hardware: PC → All
(In reply to comment #10) > Simon, thanks for the testcase! I assume that directory name is in ISO-8859-1? If anything it's in cp862, but I have no idea why. Maybe the Windows FTP client translates automatically from ISO-8859-8 to cp862? When we do display it in the status bar, we seem to display it as ISO-8859-1.
It's not just an issue with relative URIs. If you change the encoding of this page to UTF-8 and hover over the links in comment 9, nothing appears in the status bar.
That's because the links in comment 9 get converted into URI objects based on the page encoding (which means that we unescape and then treat the resulting bytes as being in the page encoding).
*** Bug 257481 has been marked as a duplicate of this bug. ***
Note that comment 13 is wrong. The real problem is described in bug 257481 comment 1... The fix suggested there is pretty trivial; some feedback on the suggestion would be much appreciated.
(In reply to bug 257481 comment #1) > Should we just use the escaped URI in the status bar in cases when the > conversion fails, perhaps? That may have security implications, but so does > showing nothing... I'd agree it's better to show the escaped URI than to show nothing or to show some garbage (as MS IE does). There might be security implications, but in a sense we'd 'fully disclose' the URI that way (instead of 'hiding' it) although we may 'obscure'(??) it > Note: the relevant code is nsWebShell::OnOverLink Thanks for the pointer. If we take the suggested path, I guess it's better to deal with it at call sites (if appropriate/necessary) than to tweak the API.
Keywords: intl
Hmm... you mean than change the UnEscapeURIForUI api? I'd suggest checking its callers. Chances are they all want to do things that way and we do indeed want to roll this change into the unescaping code...
related to Bug 229546?
Blocks: 229546
*** Bug 276516 has been marked as a duplicate of this bug. ***
(In reply to comment #17) > Hmm... you mean than change the UnEscapeURIForUI api? > > I'd suggest checking its callers. Chances are they all want to do things that > way and we do indeed want to roll this change into the unescaping code... Like this? Perhaps, we have to indicate that we fall back to escaped URI via the return value. // in case of failure, return escaped URI if (NS_FAILED(convertURItoUnicode( PromiseFlatCString(aCharset), unescapedSpec, PR_TRUE, _retval))) // use UTF-8 for IDN in auth part CopyUTF8toUTF16(aURIFragment, _retval); return NS_OK;
Yes, something like that. If you want to have a return code to indicate this, that's ok, though not really necessary... in that case it should be a success code, though.
I went through all the callers and some of them do their own error-processing. Should I get rid of them? http://lxr.mozilla.org/seamonkey/source/docshell/base/nsDocShell.cpp#2796 2796 rv = textToSubURI->UnEscapeURIForUI(charset, spec, formatStrs[0]); 2797 if (NS_FAILED(rv)) { 2798 CopyASCIItoUCS2(spec, formatStrs[0]); 2799 rv = NS_OK; 2800 } http://lxr.mozilla.org/seamonkey/source/content/html/document/src/nsMediaDocument.cpp#324 324 if (NS_SUCCEEDED(rv)) 325 rv = textToSubURI->UnEscapeURIForUI(docCharset, fileName, fileStr); 326 } 327 if (fileStr.IsEmpty()) 328 CopyUTF8toUTF16(fileName, fileStr); 329 } http://lxr.mozilla.org/seamonkey/source/dom/src/base/nsLocation.cpp#357 357 rv = textToSubURI->UnEscapeURIForUI(charset, ref, unicodeRef); 358 } 359 360 if (NS_FAILED(rv)) { 361 // Oh, well. No intl here! 362 NS_UnescapeURL(ref); 363 CopyASCIItoUTF16(ref, unicodeRef); 364 rv = NS_OK; 365 } 366 }
Yes. Doing the error-processing in a central place is exactly the point.
Attached patch patch (obsolete) (deleted) — Splinter Review
Attachment #173138 - Flags: superreview?(bzbarsky)
Attachment #173138 - Flags: review?(darin)
Comment on attachment 173138 [details] [diff] [review] patch >Index: intl/uconv/idl/nsITextToSubURI.idl >+ * <li> In case of the conversion error, the URI fragment (escaped) is "a conversion error" >+ * <li> Always succeeeds (callers don't need to do the error checking) "do error checking" >Index: docshell/base/nsDocShell.cpp >- rv = textToSubURI->UnEscapeURIForUI(charset, spec, formatStrs[0]); >- if (NS_FAILED(rv)) { >- CopyASCIItoUCS2(spec, formatStrs[0]); >- rv = NS_OK; >- } >+ // UnEscapeURIForUI always succeeds >+ textToSubURI->UnEscapeURIForUI(charset, spec, formatStrs[0]); You still need to set rv = NS_OK. What about the other callers? I see at least a few still effectively doing their own fallback. In particular, nsMediaDocument and nsExternalHelperAppService.cpp (callers of UnescapeFragment).
In case of nsMediaDocument, it's a bit different (That was in my first patch, but was not included in the patch uploaded). Although very unlikely, do_GetService may fail. The same is true of UnescapeFragment (there are other potential causes so that callers still need to handle them.) if (!fileName.IsEmpty()) { nsresult rv; nsCOMPtr<nsITextToSubURI> textToSubURI = do_GetService(NS_ITEXTTOSUBURI_CONTRACTID, &rv); if (NS_SUCCEEDED(rv)) rv = textToSubURI->UnEscapeURIForUI(docCharset, fileName, fileStr); } if (fileStr.IsEmpty()) CopyUTF8toUTF16(fileName, fileStr);
But then the same argument applies to docshell....
Attached patch update (deleted) — Splinter Review
How about this?
Attachment #173138 - Attachment is obsolete: true
Attachment #173161 - Flags: superreview?(bzbarsky)
Attachment #173161 - Flags: review?(darin)
Attachment #173138 - Flags: superreview?(bzbarsky)
Attachment #173138 - Flags: review?(darin)
Comment on attachment 173161 [details] [diff] [review] update sr=bzbarsky
Attachment #173161 - Flags: superreview?(bzbarsky) → superreview+
Attachment #173161 - Flags: review?(darin) → review+
Is there a nightly with this bug fixed?
Flags: blocking1.8b2?
Flags: blocking-aviary1.1?
Not yet. To jshin to get this landed.
Assignee: darin → jshin1987
oops. sorry I checked this in on Feb 22nd, but forgot to mark it as fixed. I've just verified that it's fixed in my trunk build.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Flags: blocking1.8b2?
Flags: blocking-aviary1.1?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: