Closed Bug 244754 Opened 21 years ago Closed 20 years ago

URL is not shown in the status bar when I point at a link on a page encoded as 8-bit Unicode

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: berndt.soderstrom, Assigned: jshin1987)

References

Details

(Keywords: intl)

Attachments

(3 files, 1 obsolete file)

Test file encoded as UTF-8 21 years ago Berndt Söderström (deleted), text/html		Details
Test file encoded as ISO-8859-1 21 years ago Berndt Söderström (deleted), text/html		Details
patch 20 years ago Jungshik Shin (deleted), patch		Details \| Diff \| Splinter Review
update 20 years ago Jungshik Shin (deleted), patch	darin.moz : review+ bzbarsky : superreview+	Details \| Diff \| Splinter Review

Berndt Söderström

Reporter

Description

•

21 years ago

User-Agent: Mozilla/5.0 (Windows; U; Win 9x 4.90; en-US; rv:1.7) Gecko/20040514 Build Identifier: Mozilla/5.0 (Windows; U; Win 9x 4.90; en-US; rv:1.7) Gecko/20040514 When you point at a link in a page that is encoded as ISO-8859-1, the URL of the file that the link refers to is shown on the status bar, as it should be. However, when you point at a link in a page that is encoded as UTF-8, the URL of the file that the link refers to will not appear in the status bar. Reproducible: Always Steps to Reproduce: 1. Just point at a link within a page that is encoded as UTF-8. 2. Move the mouse pointer to another link within the same page. Actual Results: The URLs of the files that the links refer to didn't appear on the status bar. Expected Results: The URLs of the files that the links refer to should have appeared on the status bar.

Bill Mason

Comment 1

•

21 years ago

Sample page showing this problem? Testcase showing this problem?

Bill Mason

Comment 2

•

21 years ago

Berndt, do not send email to me with details of the problem. Please make comments and attach files directly to the bug. Thanks.

Berndt Söderström

Reporter

Comment 3

•

21 years ago

Attached file Test file encoded as UTF-8 (deleted) — Details

Berndt Söderström

Reporter

Comment 4

•

21 years ago

Attached file Test file encoded as ISO-8859-1 (deleted) — Details

Berndt Söderström

Reporter

Comment 5

•

21 years ago

I've found out that the problem depends on whether the URL of the link contains non-ASCII character. Within ISO-8859-1 documents, the URL of the link that you point at will be shown in the status bar, regardless of what characters there are in the URL. Within UTF-8 documents, the URL of the link that you point at will be shown in the status bar if it contains only non-ASCII characters. I have attached two test files, one encoded as UTF-8 and another encoded as ISO-8859-1. Download both files to the same directory; the name of the directory must contain at least one non-ASCII character (eg. Ã or è) in order for you to see the bug.

Boris Zbarsky [:bzbarsky]

Comment 6

•

21 years ago

So it's the directory name that has to have a non-ascii character? Putting a non-ascii character in the href itself (in the document) doesn't show the bug?

Berndt Söderström

Reporter

Comment 7

•

21 years ago

(In reply to comment #6) > So it's the directory name that has to have a non-ascii character? Putting a > non-ascii character in the href itself (in the document) doesn't show the bug? Yes.

Boris Zbarsky [:bzbarsky]

Comment 8

•

21 years ago

Can you reproduce this with a non-ascii path on an HTTP server? Or only with a local file?

Simon Montagu :smontagu

Comment 9

•

21 years ago

I reproduced this both with local files and on an HTTP server. The actual directory name got strangely corrupted when creating it over FTP, but that is a separate issue. http://smontagu.org/testcases/%88%91%88/test1.html - the UTF-8 file http://smontagu.org/testcases/%88%91%88/test2.html - the ISO-8859-1 file

Status: UNCONFIRMED → NEW

Ever confirmed: true

Boris Zbarsky [:bzbarsky]

Comment 10

•

21 years ago

Simon, thanks for the testcase! I assume that directory name is in ISO-8859-1? Darin, it sounds like a relative URI resolution issue (we fail to do it right, so end up with either no URI or a bogus URI that can't be decoded into Unicode). The URI objects in question are created with nsContentUtils::NewURIWithDocumentCharset. Could it be a problem if the base URI has one charset set but the relative URI is getting a different charset?

Assignee: general → darin

Component: Browser-General → Networking

OS: Windows ME → All

QA Contact: general → benc

Hardware: PC → All

Simon Montagu :smontagu

Comment 11

•

21 years ago

(In reply to comment #10) > Simon, thanks for the testcase! I assume that directory name is in ISO-8859-1? If anything it's in cp862, but I have no idea why. Maybe the Windows FTP client translates automatically from ISO-8859-8 to cp862? When we do display it in the status bar, we seem to display it as ISO-8859-1.

Simon Montagu :smontagu

Comment 12

•

21 years ago

It's not just an issue with relative URIs. If you change the encoding of this page to UTF-8 and hover over the links in comment 9, nothing appears in the status bar.

Boris Zbarsky [:bzbarsky]

Comment 13

•

21 years ago

That's because the links in comment 9 get converted into URI objects based on the page encoding (which means that we unescape and then treat the resulting bytes as being in the page encoding).

Henrik Pauli

Comment 14

•

20 years ago

*** Bug 257481 has been marked as a duplicate of this bug. ***

Boris Zbarsky [:bzbarsky]

Comment 15

•

20 years ago

Note that comment 13 is wrong. The real problem is described in bug 257481 comment 1... The fix suggested there is pretty trivial; some feedback on the suggestion would be much appreciated.

Jungshik Shin

Assignee

Comment 16

•

20 years ago

(In reply to bug 257481 comment #1) > Should we just use the escaped URI in the status bar in cases when the > conversion fails, perhaps? That may have security implications, but so does > showing nothing... I'd agree it's better to show the escaped URI than to show nothing or to show some garbage (as MS IE does). There might be security implications, but in a sense we'd 'fully disclose' the URI that way (instead of 'hiding' it) although we may 'obscure'(??) it > Note: the relevant code is nsWebShell::OnOverLink Thanks for the pointer. If we take the suggested path, I guess it's better to deal with it at call sites (if appropriate/necessary) than to tweak the API.

Keywords: intl

Boris Zbarsky [:bzbarsky]

Comment 17

•

20 years ago

Hmm... you mean than change the UnEscapeURIForUI api? I'd suggest checking its callers. Chances are they all want to do things that way and we do indeed want to roll this change into the unescaping code...

HARUNAGA Hirotoshi

Comment 18

•

20 years ago

related to Bug 229546?

Boris Zbarsky [:bzbarsky]

Updated

•

20 years ago

Blocks: 229546

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 19

•

20 years ago

*** Bug 276516 has been marked as a duplicate of this bug. ***

Jungshik Shin

Assignee

Comment 20

•

20 years ago

(In reply to comment #17) > Hmm... you mean than change the UnEscapeURIForUI api? > > I'd suggest checking its callers. Chances are they all want to do things that > way and we do indeed want to roll this change into the unescaping code... Like this? Perhaps, we have to indicate that we fall back to escaped URI via the return value. // in case of failure, return escaped URI if (NS_FAILED(convertURItoUnicode( PromiseFlatCString(aCharset), unescapedSpec, PR_TRUE, _retval))) // use UTF-8 for IDN in auth part CopyUTF8toUTF16(aURIFragment, _retval); return NS_OK;

Boris Zbarsky [:bzbarsky]

Comment 21

•

20 years ago

Yes, something like that. If you want to have a return code to indicate this, that's ok, though not really necessary... in that case it should be a success code, though.

Jungshik Shin

Assignee

Comment 22

•

20 years ago

I went through all the callers and some of them do their own error-processing. Should I get rid of them? http://lxr.mozilla.org/seamonkey/source/docshell/base/nsDocShell.cpp#2796 2796 rv = textToSubURI->UnEscapeURIForUI(charset, spec, formatStrs[0]); 2797 if (NS_FAILED(rv)) { 2798 CopyASCIItoUCS2(spec, formatStrs[0]); 2799 rv = NS_OK; 2800 } http://lxr.mozilla.org/seamonkey/source/content/html/document/src/nsMediaDocument.cpp#324 324 if (NS_SUCCEEDED(rv)) 325 rv = textToSubURI->UnEscapeURIForUI(docCharset, fileName, fileStr); 326 } 327 if (fileStr.IsEmpty()) 328 CopyUTF8toUTF16(fileName, fileStr); 329 } http://lxr.mozilla.org/seamonkey/source/dom/src/base/nsLocation.cpp#357 357 rv = textToSubURI->UnEscapeURIForUI(charset, ref, unicodeRef); 358 } 359 360 if (NS_FAILED(rv)) { 361 // Oh, well. No intl here! 362 NS_UnescapeURL(ref); 363 CopyASCIItoUTF16(ref, unicodeRef); 364 rv = NS_OK; 365 } 366 }

Boris Zbarsky [:bzbarsky]

Comment 23

•

20 years ago

Yes. Doing the error-processing in a central place is exactly the point.

Jungshik Shin

Assignee

Comment 24

•

20 years ago

Attached patch patch (obsolete) (deleted) — Details — Splinter Review

Attachment #173138 - Flags: superreview?(bzbarsky)

Attachment #173138 - Flags: review?(darin)

Boris Zbarsky [:bzbarsky]

Comment 25

•

20 years ago

Comment on attachment 173138 [details] [diff] [review] patch >Index: intl/uconv/idl/nsITextToSubURI.idl >+ * <li> In case of the conversion error, the URI fragment (escaped) is "a conversion error" >+ * <li> Always succeeeds (callers don't need to do the error checking) "do error checking" >Index: docshell/base/nsDocShell.cpp >- rv = textToSubURI->UnEscapeURIForUI(charset, spec, formatStrs[0]); >- if (NS_FAILED(rv)) { >- CopyASCIItoUCS2(spec, formatStrs[0]); >- rv = NS_OK; >- } >+ // UnEscapeURIForUI always succeeds >+ textToSubURI->UnEscapeURIForUI(charset, spec, formatStrs[0]); You still need to set rv = NS_OK. What about the other callers? I see at least a few still effectively doing their own fallback. In particular, nsMediaDocument and nsExternalHelperAppService.cpp (callers of UnescapeFragment).

Jungshik Shin

Assignee

Comment 26

•

20 years ago

In case of nsMediaDocument, it's a bit different (That was in my first patch, but was not included in the patch uploaded). Although very unlikely, do_GetService may fail. The same is true of UnescapeFragment (there are other potential causes so that callers still need to handle them.) if (!fileName.IsEmpty()) { nsresult rv; nsCOMPtr<nsITextToSubURI> textToSubURI = do_GetService(NS_ITEXTTOSUBURI_CONTRACTID, &rv); if (NS_SUCCEEDED(rv)) rv = textToSubURI->UnEscapeURIForUI(docCharset, fileName, fileStr); } if (fileStr.IsEmpty()) CopyUTF8toUTF16(fileName, fileStr);

Boris Zbarsky [:bzbarsky]

Comment 27

•

20 years ago

But then the same argument applies to docshell....

Jungshik Shin

Assignee

Comment 28

•

20 years ago

Attached patch update (deleted) — Details — Splinter Review

How about this?

Jungshik Shin

Assignee

Updated

•

20 years ago

Attachment #173138 - Attachment is obsolete: true

Attachment #173161 - Flags: superreview?(bzbarsky)

Attachment #173161 - Flags: review?(darin)

Jungshik Shin

Assignee

Updated

•

20 years ago

Attachment #173138 - Flags: superreview?(bzbarsky)

Attachment #173138 - Flags: review?(darin)

Boris Zbarsky [:bzbarsky]

Comment 29

•

20 years ago

Comment on attachment 173161 [details] [diff] [review] update sr=bzbarsky

Attachment #173161 - Flags: superreview?(bzbarsky) → superreview+

Darin Fisher

Updated

•

20 years ago

Attachment #173161 - Flags: review?(darin) → review+

Sergey Sokoloff

Comment 30

•

20 years ago

Is there a nightly with this bug fixed?

Sergey Sokoloff

Updated

•

20 years ago

Flags: blocking1.8b2?

Flags: blocking-aviary1.1?

Boris Zbarsky [:bzbarsky]

Comment 31

•

20 years ago

Not yet. To jshin to get this landed.

Assignee: darin → jshin1987

Jungshik Shin

Assignee

Comment 32

•

20 years ago

oops. sorry I checked this in on Feb 22nd, but forgot to mark it as fixed. I've just verified that it's fixed in my trunk build.

Status: NEW → RESOLVED

Closed: 20 years ago

Resolution: --- → FIXED

Asa Dotzler [:asa]

Updated

•

20 years ago

Flags: blocking1.8b2?

Flags: blocking-aviary1.1?

You need to log in before you can comment on or make changes to this bug.