Closed Bug 102656 Opened 23 years ago Closed 23 years ago

IDN support for HREF IDN-links

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla0.9.9

People

(Reporter: wil, Assigned: nhottanscp)

References

(
URL
)

Details

(Keywords: intl)

Attachments

(2 files, 12 obsolete files)

This patch makes a "mixed" URL where the hostname is in UTF8. It is just a proof-of-concept, please suggest optimizations. 23 years ago Wil Tan (deleted), patch		Details \| Diff \| Splinter Review
Moved it inside else block of IsAscii check. More cleanup. 23 years ago Wil Tan (deleted), patch		Details \| Diff \| Splinter Review
FIxed aResult leak, thanks jst. 23 years ago Wil Tan (deleted), patch		Details \| Diff \| Splinter Review
fixed using nsIIOService::ExtractUrlPart as suggested by Darin. Made into a separate function. 23 years ago Wil Tan (deleted), patch		Details \| Diff \| Splinter Review
fixed nsCOMPtr going out of scope, and checking for nsresult return for NS_NewURI instead. 23 years ago Wil Tan (deleted), patch		Details \| Diff \| Splinter Review
Here's a version of ConvertHostnameToUTF8 that takes an nsXPIDLCString. 23 years ago Wil Tan (deleted), patch	darin.moz : superreview+	Details \| Diff \| Splinter Review
targetSpec => char ** 23 years ago Wil Tan (deleted), patch	jst : review+ darin.moz : superreview+	Details \| Diff \| Splinter Review
Incorporated many of the comments give yesterday. 23 years ago Wil Tan (deleted), patch		Details \| Diff \| Splinter Review
removed the call to ExtractScheme(), thanks Darin. 23 years ago Wil Tan (deleted), patch	darin.moz : review+	Details \| Diff \| Splinter Review
cleanup as suggested by Johnny. 23 years ago Wil Tan (deleted), patch		Details \| Diff \| Splinter Review
removed fake error code return when hostname is ascii. 23 years ago Wil Tan (deleted), patch		Details \| Diff \| Splinter Review
*targetSpec = spec // when hostname IsAscii, caller no longer uses nsXPIDLCString 23 years ago Wil Tan (deleted), patch		Details \| Diff \| Splinter Review
*targetSpec = nsnull when hostname is ascii, as suggested by Darin. 23 years ago Wil Tan (deleted), patch	jst : review+ darin.moz : superreview+	Details \| Diff \| Splinter Review
2 changes suggested by Darin and Johnny, i.e. use newSpec.IsEmpty() and moved *targetSpec = nsnull to top. 23 years ago Wil Tan (deleted), patch		Details \| Diff \| Splinter Review

Wil Tan

Reporter

Description

•

23 years ago

ToLowerCase (http://lxr.mozilla.org/mozilla/source/netwerk/base/src/nsURLHelper.cpp#176) is being called to downcase the hostnames for the internal representation of "host" in nsIURI. This creates a problem for some internationalized domain names in their native encoding (like Shift-JIS, Big5). Is there a reason why it is done this way? This is also true in nsStdURLParser and nsNoAuthURLParser.

Boris Zbarsky [:bzbarsky]

Updated

•

23 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

Wil Tan

Reporter

Updated

•

23 years ago

Summary: nsAuthURLParser downcases hostnames → IDN support for HREF IDN-links

nhottanscp

Assignee

Comment 1

•

23 years ago

Another approach is to prevent the native encoding (like Shift-JIS, Big5) host name and use UTF-8 instead. I think keeping the native encoding in nsIURI is not a good idea, parsing is difficult, also the host eventualy needs to be unicode in order to create ACE (ASCII Compatibe Encodeing which is needed for IDN). So, we want to convert to unicode earlier instead of changing the parser to handle the native encodings. This can be done by changing NS_MakeAbsoluteURIWithCharset()in nsHTMLUtils.cpp where URI for HREF is processed. In the function, the URI is in UCS2, then it is converted to a doucment charset. Instead, we can keep the host name as unicode (UTF-8). William, could you attach your patch for nsHTMLUtils.cpp?

Wil Tan

Reporter

Comment 2

•

23 years ago

Attached patch This patch makes a "mixed" URL where the hostname is in UTF8. It is just a proof-of-concept, please suggest optimizations. (obsolete) (deleted) — Details — Splinter Review

nhottanscp

Assignee

Comment 3

•

23 years ago

In the function, there is ASCII check for URI spec. Also, some cases already use UTF-8 URI. So, we only need to set UTF-8 host name when the URI is converted to a doucment charset. http://lxr.mozilla.org/seamonkey/source/content/shared/src/nsHTMLUtils.cpp#149 149 if (encoder) { 150 // Got the encoder: let's party.

Wil Tan

Reporter

Comment 4

•

23 years ago

I will update the patch within a day or two. I agree with Naoki that nsIURI should be given a UTF8 hostname in the first place. Is there any other entry-point for creating a URL other than NS_MAkeAbsoluteURIWithCharset? * CC'ing Darin.

Darin Fisher

Comment 5

•

23 years ago

so, let me make sure i understand things correctly... if the standard url implementation simply avoids lowercasing the hostname, then we should be OK, right?

Darin Fisher

Comment 6

•

23 years ago

oh.. wait.. there is one more problem. namely that of IPv6 address literals which can contain ':' chars. we currently strchr for ':' and then wrap the hostname in []'s. this is defined in RFC 2732. won't this be a problem if hostname is UTF-8? or, perhaps i just need to use some UTF-8 aware strchr?? or, use nsCRT::IsAscii first??

Wil Tan

Reporter

Comment 7

•

23 years ago

Attached patch Moved it inside else block of IsAscii check. More cleanup. (obsolete) (deleted) — Details — Splinter Review

Wil Tan

Reporter

Comment 8

•

23 years ago

Darin: I'm not sure I understand your concerns. If we avoid lowercasing the hostname, it would work but it's not an ideal solution since the hostname would be in local encoding. Many interfaces are still using C strings which means that the hostnames are passed in its raw local charset. IPv6 hostnames are be in ASCII, so encoding it into UTF8 should have no effect. There is also no problem with ':' clashing since if ':' appears in UTF8 then it is the bona fide ':'.

Darin Fisher

Comment 9

•

23 years ago

ok, thanks for explaining why we don't need to worry about embedded colons. but, what i still don't understand is why removing the code which lowercase's the hostname is not sufficient. doesn't a UTF-8 encoded string carry all the information needed to perform the IDN conversion? and isn't that all we really need at the networking layer?

Wil Tan

Reporter

Comment 10

•

23 years ago

If the hostname in URI was UTF8 encoded, ToLowerCase wouldn't have any effect on it, and we would be fine. But it is not, it is converted to the document charset in NS_MakeAbsoluteURI. And since it is represented as a C string it is just as good sequence of raw bytes. It works with i-DNS.net's software because we use guesswork to "detect" the encoding, but would not be elegant. But that is really just legacy support for non unicode-aware applications.

Darin Fisher

Comment 11

•

23 years ago

ok, so it sounds to me as though we can/should leave necko alone, and concentrate on making sure that all clients of nsIURI pass in UTF-8 hostnames. right?

Wil Tan

Reporter

Comment 12

•

23 years ago

Are there many such places that instantiates nsIURI's? I wouldn't exclude the possibility of having nsStdURL::SetSpec doing IDN conversion so that other places (like the current HTTP and DNS modules) do not need to worry about it. So, nsIURI::GetHost gives the ACE hostname, nsIURI::GetSpec gives the URL string with UTF8 hostname, and a separate interface for getting the UTF8 hostname for display (such as the status bar). Is that reasonable, or is that silly?

Darin Fisher

Comment 13

•

23 years ago

well, i'm not sure if GetSpec should return a UTF-8 hostname or an ACE encoded hostname. i say this because when speaking to a HTTP proxy, we need to send the spec with an ACE encoded hostname. perhaps nsIURI is not the right stopgap for IDN conversion then. i'm really not sure.

Wil Tan

Reporter

Comment 14

•

23 years ago

Oh yes, forgot about proxies. So let's forget about doing IDN in nsIURI, at least for now.

Darin Fisher

Comment 15

•

23 years ago

Wil Tan

Reporter

Updated

•

23 years ago

Attachment #54705 - Attachment is obsolete: true

Wil Tan

Reporter

Comment 16

•

23 years ago

Attached patch FIxed aResult leak, thanks jst. (obsolete) (deleted) — Details — Splinter Review

Darin Fisher

Comment 17

•

23 years ago

you can also use nsIIOService::extractUrlPart to parse out the hostname of the UTF8 spec. this would eliminate the need to create a second uri object.

Wil Tan

Reporter

Comment 18

•

23 years ago

Attached patch fixed using nsIIOService::ExtractUrlPart as suggested by Darin. Made into a separate function. (obsolete) (deleted) — Details — Splinter Review

Darin Fisher

Comment 19

•

23 years ago

Comment on attachment 56549 [details] [diff] [review] fixed using nsIIOService::ExtractUrlPart as suggested by Darin. Made into a separate function. >Index: content/shared/src/nsHTMLUtils.cpp > >+/* >+ * Extracts the hostname part of the given targetSpec, >+ * encodes it to UTF8, and replace it in-place. >+ */ >+nsresult >+ConvertHostnameToUTF8(nsCAutoString& targetSpec, >+ const nsString& uSpec, >+ nsIIOService *aIOService) >+{ >+ nsresult rv; >+ >+ if (!aIOService) { >+ nsCOMPtr<nsIIOService> serv; >+ serv = do_GetIOService(&rv); >+ if (NS_FAILED(rv)) return rv; >+ aIOService = serv.get(); >+ } this block of code won't work... the destructor for |serv| would have already been called by the time you exit this block of code. you need to give the nsCOMPtr declaration function scope. >+ // Try creating a URI from the original targetSpec >+ nsCOMPtr<nsIURI> uri; >+ NS_NewURI(getter_AddRefs(uri), targetSpec); >+ // This will fail for non-absolutely URIs, in that case we just ignore it. >+ if (!uri) return NS_OK; capturing rv is better (in the XPCOM sense) than checking for a non-null uri. >+ >+ // convert the original spec into UTF8 >+ nsCAutoString specUTF8 = NS_ConvertUCS2toUTF8(uSpec); >+ // Extract the UTF8 hostname >+ nsXPIDLCString hostUTF8; >+ rv = aIOService->ExtractUrlPart(specUTF8.get(), nsIIOService::url_Host, NULL, NULL, getter_Copies(hostUTF8)); >+ NS_ASSERTION(NS_SUCCEEDED(rv), "Unable to extract Hostname part from URL"); >+ if (NS_FAILED(rv)) return rv; >+ // replace the original hostname with hostUTF8 >+ uri->SetHost(hostUTF8.get()); >+ >+ nsXPIDLCString tmp; >+ uri->GetSpec(getter_Copies(tmp)); >+ // replace the targetSpec >+ targetSpec = tmp; >+ return NS_OK; >+} > > nsresult > NS_MakeAbsoluteURIWithCharset(char* *aResult, >@@ -165,6 +204,9 @@ > encoder->Finish(p, &len); > p[len] = 0; > spec += p; what if |aSpec| contains only ascii characters? adding an IsAscii check would probably be an optimization here, since NS_NewURI is expensive. >+ >+ // iDNS support: encode the hostname in the URL to UTF8 >+ ConvertHostnameToUTF8(spec, aSpec, aIOService); > > if (p != buf) > delete[] p; >@@ -174,8 +216,8 @@ > // by default. > spec = NS_ConvertUCS2toUTF8(aSpec.get()); > } >- > } >+ > // Now we need to URL-escape the string. > // XXX andreas.otte has warned that using the nsIIOService::Escape > // method in this way may be too conservative (e.g., it won't

Attachment #56549 - Flags: needs-work+

Wil Tan

Reporter

Comment 20

•

23 years ago

Attached patch fixed nsCOMPtr going out of scope, and checking for nsresult return for NS_NewURI instead. (obsolete) (deleted) — Details — Splinter Review

Wil Tan

Reporter

Comment 21

•

23 years ago

Thanks Darin! > what if |aSpec| contains only ascii characters? adding an IsAscii check > would probably be an optimization here, since NS_NewURI is expensive. There is an IsAscii check at the beginning of NS_MakeAbsoluteURIWithCharset() so this function will only be called when (!IsAscii(aSpec.get()).

Wil Tan

Reporter

Updated

•

23 years ago

Attachment #55232 - Attachment is obsolete: true

Wil Tan

Reporter

Updated

•

23 years ago

Attachment #55882 - Attachment is obsolete: true

Wil Tan

Reporter

Updated

•

23 years ago

Attachment #56549 - Attachment is obsolete: true

Darin Fisher

Comment 22

•

23 years ago

does |spec| need to be a nsCAutoString? why not just declare it |char *spec| to avoid an extra strcpy/strdup in ConvertHostnameToUTF8?

Wil Tan

Reporter

Comment 23

•

23 years ago

It's already an nsCAutoString in NS_MakeAbsoluteURIWithCharset, so just pass it to ConvertHostnameToUTF8 so that it can be updated. If ConvertHostnameToUTF8 uses |char *| wouldn't it be messy as well? I don't get it, please elaborate.

Darin Fisher

Comment 24

•

23 years ago

well, when you call GetSpec, you end up malloc'ing a copy of the url spec, but then you copy this string into |nsCAutoString &targetSpec| passed in by the user. does the caller need this to be a nsCAutoString, or could they use nsXPIDLCString? if the caller could be modified to use nsXPIDLCString then you could avoid making the additional string copy. make sense?

Wil Tan

Reporter

Comment 25

•

23 years ago

Ok, now I see where the extra strdup is. But still don't think that the caller can be (easily) modified to use nsXPIDLCString because it uses many methods in nsCString (eg. AssignWithConversion, FindChar) and uses NS_ConvertUCS2toUTF8, which is derived from nsCAutoString. Naoki, any idea? I'm still trying to understand the entire hierarchy of string classes, wished the inheritance tree was simpler.

nhottanscp

Assignee

Comment 26

•

23 years ago

I am not sure, changing the caller seems to involve a lot of changes. How about using nsXPIDLCString as an argument and remove the string copy out of ConvertHostnameToUTF8? Later, the caller may change to use nsXPIDLCString. BTW, is ConvertHostnameToUTF8 a local function?

Darin Fisher

Comment 27

•

23 years ago

yeah, i agree with nhotta: if you can't easily change the caller, then at least make the caller responsible for the additional strdup.

Wil Tan

Reporter

Comment 28

•

23 years ago

Yes it's a local function, at least for now.

Wil Tan

Reporter

Comment 29

•

23 years ago

Attached patch Here's a version of ConvertHostnameToUTF8 that takes an nsXPIDLCString. (obsolete) (deleted) — Details — Splinter Review

Darin Fisher

Comment 30

•

23 years ago

Comment on attachment 56810 [details] [diff] [review] Here's a version of ConvertHostnameToUTF8 that takes an nsXPIDLCString. >Index: content/shared/src/nsHTMLUtils.cpp >+nsresult >+ConvertHostnameToUTF8(nsXPIDLCString& targetSpec, this is fine, but you could also pass |char **targetSpec|, and then use getter_Copies at the callsite of ConvertHostnameToUTF8. either way, sr=darin

Attachment #56810 - Flags: superreview+

Wil Tan

Reporter

Comment 31

•

23 years ago

Yep I think that's a good idea, I'll upload another soon. hang on, if sr=darin, then r= who? Johnny did say that he would sr= if I get an r=, please advise.

Wil Tan

Reporter

Comment 32

•

23 years ago

Attached patch targetSpec => char ** (obsolete) (deleted) — Details — Splinter Review

nhottanscp

Assignee

Comment 33

•

23 years ago

I think having two sr= is just fine, no r= needed for that case. William, could you reassign the bug to yourself since you are actually working on? I can help check in the file.

Wil Tan

Reporter

Comment 34

•

23 years ago

Reassigning to myself.

Assignee: neeti → william.tan

Darin Fisher

Comment 35

•

23 years ago

Comment on attachment 56867 [details] [diff] [review] targetSpec => char ** sr=darin (you can use this as an r= if you prefer)

Attachment #56867 - Flags: superreview+

Wil Tan

Reporter

Updated

•

23 years ago

Attachment #56618 - Attachment is obsolete: true

Wil Tan

Reporter

Updated

•

23 years ago

Attachment #56810 - Attachment is obsolete: true

Wil Tan

Reporter

Comment 36

•

23 years ago

Is 0.9.6 a realistic target for this?

nhottanscp

Assignee

Comment 37

•

23 years ago

If you have reviews and ready for check in, please get an approval and check in, otherwise move to 0.9.7.

Johnny Stenback (:jst)

Comment 38

•

23 years ago

Comment on attachment 56867 [details] [diff] [review] targetSpec => char ** r/sr=jst

Attachment #56867 - Flags: review+

Asa Dotzler [:asa]

Updated

•

23 years ago

Keywords: mozilla0.9.6-

nhottanscp

Assignee

Comment 39

•

23 years ago

move to 0.9.7

Target Milestone: --- → mozilla0.9.7

David Baron :dbaron:

Comment 40

•

23 years ago

Have you run performance tests on this patch? IIRC, NS_MakeAbsoluteURIWithCharset is on one of the most performance-critical paths in page loading since it is used to canonicalize URIs for lookup in global history to determine whether links should be colored as visited or unvisited.

David Baron :dbaron:

Comment 41

•

23 years ago

If that's the case, the string usage here looks unacceptably slow. You could improve it a bit by using the NS_ConvertUCS2toUTF8 inline rather than copying a second time by assigning the result into an nsCAutoString.

Darin Fisher

Comment 42

•

23 years ago

william: also, what about checking if the hostname is ASCII? if the hostname were actually ASCII (padded with zero bytes) wouldn't we be able to skip this entire function call?

Chris Waterson

Comment 43

•

23 years ago

What dbaron says. You ought to understand the performance impact of this, especially on link-riddled pages (go try some LXR docs, e.g.).

nhottanscp

Assignee

Comment 44

•

23 years ago

The code is only executed when the URL contains non ASCII.

David Baron :dbaron:

Comment 45

•

23 years ago

OK, but it would still be good not to make it too slow (e.g., by fixing the extra copy I mentioned above).

nhottanscp

Assignee

Comment 46

•

23 years ago

I agree. Willam, please make the change, thanks.

Darin Fisher

Comment 47

•

23 years ago

my point was that while the URL may contain non-ascii characters, the hostname may only contain ascii characters. it might be a performance win to just scan the hostname before going through the trouble of instantiating a new nsStdURL. so, it might be worth it to move the call to ExtractUrlPart above the call to NS_NewURI. you could even capture just the hostname offset and length (ie. don't capture the hostname as a nsXPIDLCString), then if this section of the UTF-8 spec contains non-ascii characters, you could continue on by calling NS_NewURI, etc. but, moreover it seems really unfortunate to have to make an extra copy of the spec just to support iDNS. isn't there a better way? seems like we really need a way of parsing a unicode URL. then we could locate the unicode hostname and check if it is ascii. if it is ascii, then there is no need to do any more work. unfortunately, we don't have a unicode URL parser. i feel like i should have looked more closely at the patch in context, because it really is adding substantial work to NS_MakeAbsoluteWithCharset just to support an edge case. let's try to come up with something that isn't so costly for non-IDN links.

Darin Fisher

Comment 48

•

23 years ago

what's the format of |spec| when ConvertHostnameToUTF8 is called? can we use it to determine if the hostname is really just ascii?

Johnny Stenback (:jst)

Comment 49

•

23 years ago

This will only be triggerd if the href contains non-ASCII data tho, so the normal case will not be any different from what's on the trunk today. With that said, if there's a faster way to do this I'm all for doing that...

Johnny Stenback (:jst)

Comment 50

•

23 years ago

While we're at it, we should remove the IsAscii() method from nsHTMLUtils.cpp and simply call nsCRT::IsAscii().

Darin Fisher

Comment 51

•

23 years ago

aren't non-ascii url paths more common than non-ascii hostnames?

Wil Tan

Reporter

Comment 52

•

23 years ago

Thanks to all who commented. A patch is underway. > You could improve it a bit by using the NS_ConvertUCS2toUTF8 inline rather > than copying a second time by assigning the result into an nsCAutoString. Oh I missed that. Will do. > my point was that while the URL may contain non-ascii characters, the > hostname may only contain ascii characters. it might be a performance win > to just scan the hostname before going through the trouble of instantiating > a new nsStdURL. Agreed. I'll address this one too. > so, it might be worth it to move the call to ExtractUrlPart above the call > to NS_NewURI. you could even capture just the hostname offset and > length (ie. don't capture the hostname as a nsXPIDLCString), then if this > section of the UTF-8 spec contains non-ascii characters, you could continue > on by calling NS_NewURI, etc. Ok. > but, moreover it seems really unfortunate to have to make an extra copy of > the spec just to support iDNS. isn't there a better way? seems like we > really need a way of parsing a unicode URL. then we could locate the > unicode hostname and check if it is ascii. if it is ascii, then there is > no need to do any more work. unfortunately, we don't have a unicode URL > parser. > i feel like i should have looked more closely at the patch in context, > because it really is adding substantial work to NS_MakeAbsoluteWithCharset > just to support an edge case. let's try to come up with something that > isn't so costly for non-IDN links. Yeah there should be a better way of doing it. I'm especially concerned about the other dozens of calls to NS_NewURI. When nsIURI uses unicode internally then there'd be less troubles. But I think there would be a lot to change. > what's the format of |spec| when ConvertHostnameToUTF8 is called? can we use it to determine if the hostname is really just ascii? It is basically whatever in the HREF attribute. So it could be relative or absolute. Are you suggesting an alternative to using ExtractUrlPart? > While we're at it, we should remove the IsAscii() method from nsHTMLUtils.cpp and simply call nsCRT::IsAscii() Agreed. Shall I do it in the same patch?

Status: NEW → ASSIGNED

Wil Tan

Reporter

Comment 53

•

23 years ago

Attached patch Incorporated many of the comments give yesterday. (obsolete) (deleted) — Details — Splinter Review

* construct NS_ConvertUCS2toUTF8 directly. * additional check for ascii hostname (even though url may be non-ascii) before calling NS_NewURI * removed IsAscii() in favor of nsCRT::IsAscii() (the context diff is screwed up, so it looks like ConvertHostnameToUTF8 replaced IsAscii. sorry) We could have used ExtractUrlPart on the |char *spec| to save the UTF8 conversion if the hostname is ascii. But i figured ExtractUrlPart is fairly expensive as well, (for parsing anything beyond the url_Scheme). Please advise.

Attachment #56867 - Attachment is obsolete: true

Darin Fisher

Comment 54

•

23 years ago

Comment on attachment 57310 [details] [diff] [review] Incorporated many of the comments give yesterday. don't you still need to UTF8'ize the hostname if it is absolute?

Darin Fisher

Comment 55

•

23 years ago

s/it/the spec/

Wil Tan

Reporter

Comment 56

•

23 years ago

> don't you still need to UTF8'ize the hostname if it is absolute? The rationale is that there may be lots of non-absolute links, so we can save on the constructing an NS_ConvertUCS2toUTF8. Does it make sense, or is it unnecessary?

Darin Fisher

Comment 57

•

23 years ago

you can have relative URLs of the form //hostname/foo/bar.html

Wil Tan

Reporter

Comment 58

•

23 years ago

Sorry, I wasn't aware of that. In that case it that step would be pointless, I can snip it.

Wil Tan

Reporter

Comment 59

•

23 years ago

Darin, could you please clarify which "it" do you mean in "s/it/the spec/"? Thanks.

Wil Tan

Reporter

Comment 60

•

23 years ago

Attached patch removed the call to ExtractScheme(), thanks Darin. (obsolete) (deleted) — Details — Splinter Review

Nevermind, I get it now. You're referring to your previous comment :)

Attachment #57310 - Attachment is obsolete: true

Darin Fisher

Comment 61

•

23 years ago

Comment on attachment 57363 [details] [diff] [review] removed the call to ExtractScheme(), thanks Darin. r/sr=darin

Attachment #57363 - Flags: review+

Wil Tan

Reporter

Comment 62

•

23 years ago

Thanks everyone. Darin: could you please check it in for me? Should I email Asa for approval?

Darin Fisher

Comment 63

•

23 years ago

william: you still need another r= or sr= on your latest patch. then if you want this for 0.9.6, you'll need to send mail to drivers@mozilla.org for approval... though i doubt they'll take it.

Wil Tan

Reporter

Comment 64

•

23 years ago

0.9.6 would be too tight, i can wait. Johnny, would you mind taking a look at the latest patch for me? TIA.

Johnny Stenback (:jst)

Comment 65

•

23 years ago

- In ConvertHostnameToUTF8(), |if| and its statement on different lines, i.e.: + if (nsCRT::IsAscii(hostUTF8.get())) return NS_ERROR_ABORT; on one line is a no no, fix all those. Also spread the code out a bit to make it more readable. - In NS_MakeAbsoluteURIWithCharset(): + if (NS_SUCCEEDED(rv)) + spec = newSpec.get(); Loose the .get(), all that does is slow things down. With that, r/sr=jst

Wil Tan

Reporter

Comment 66

•

23 years ago

Attached patch cleanup as suggested by Johnny. (obsolete) (deleted) — Details — Splinter Review

Thanks Johnny. Does this look better now?

Attachment #57363 - Attachment is obsolete: true

Johnny Stenback (:jst)

Comment 67

•

23 years ago

Yeah, looks way better, but I saw one more thing when looking over it again... Returning an error from ConvertHostnameToUTF8() just because the host name is ASCII is just wrong, if it's ASCII, return the host name as is and return NS_OK. Make the code that calls this method return the error code to the caller in case ConvertHostnameToUTF8() really fails. Don't attempt to encode non-errors in error codes... sr=jst with that fixed, sorry for not catching this the first time.

Wil Tan

Reporter

Comment 68

•

23 years ago

Attached patch removed fake error code return when hostname is ascii. (obsolete) (deleted) — Details — Splinter Review

If hostname is ascii, 2 extra strdup are needed though. Moved the "delete[] p" portion up so that it doesn't leak in case ConvertHostnameToUTF8 fails.

Attachment #58090 - Attachment is obsolete: true

Johnny Stenback (:jst)

Comment 69

•

23 years ago

If you think the double strdup is a problem then make the code not do that, but don't piggyback on the error code :-)

Wil Tan

Reporter

Comment 70

•

23 years ago

Attached patch *targetSpec = spec // when hostname IsAscii, caller no longer uses nsXPIDLCString (obsolete) (deleted) — Details — Splinter Review

I agree that piggybacking on the error code is wrong. This alternative piggybacks on the (spec==targetSpec) property, please suggest any alternative idea you have. But patch 58093 is still the cleanest way IMHO, can we live with the extra strdups, Darin?

Darin Fisher

Comment 71

•

23 years ago

this looks pretty yuck to me... i mean, now targetSpec is sometimes freshly malloc'd, and other times it's just a weak reference to an existing string. how about returning NS_OK and *targetSpec = nsnull?

Wil Tan

Reporter

Comment 72

•

23 years ago

Attached patch *targetSpec = nsnull when hostname is ascii, as suggested by Darin. (deleted) — Details — Splinter Review

Agreed. Why didn't I think of that...

Attachment #58187 - Attachment is obsolete: true

Darin Fisher

Comment 73

•

23 years ago

Comment on attachment 58528 [details] [diff] [review] *targetSpec = nsnull when hostname is ascii, as suggested by Darin. r/sr=darin, with one change: if (newSpec.get()) should be if (!newSpec.IsEmpty())

Attachment #58528 - Flags: superreview+

Wil Tan

Reporter

Comment 74

•

23 years ago

Do I submit another patch for the change, and then bug you for it again? I'm just confused about how things should go...

Darin Fisher

Comment 75

•

23 years ago

william: no need to submit another patch... just get another reviewer to look at your code, and then make sure to make the change i suggested before checking in. thx!

Johnny Stenback (:jst)

Comment 76

•

23 years ago

What Darin said, and move the: + *targetSpec = nsnull; from inside the IsAscii() check to the top of the method to make sure you never return w/o initializing the out parameter. r=jst

Johnny Stenback (:jst)

Updated

•

23 years ago

Attachment #58528 - Flags: review+

Wil Tan

Reporter

Comment 77

•

23 years ago

Attached patch 2 changes suggested by Darin and Johnny, i.e. use newSpec.IsEmpty() and moved *targetSpec = nsnull to top. (deleted) — Details — Splinter Review

Darin, could you checkin for me please? I haven't an account. I'd just upload this anyway to make your job easier (hopefully it doesn't cause even more trouble :) Thanks!

Attachment #58093 - Attachment is obsolete: true

Darin Fisher

Comment 78

•

23 years ago

sure, i'll check this in once the tree opens.

Roy Yokoyama

Updated

•

23 years ago

Blocks: 102701

Darin Fisher

Comment 79

•

23 years ago

fixed-on-trunk

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

David Baron :dbaron:

Comment 80

•

23 years ago

This patch caused regression bug 118995.

nhottanscp

Assignee

Comment 81

•

23 years ago

Typing IDN in the URL bar is currently broken (see bug 42898). Is this bug (IDN links in a document) also broken?

nhottanscp

Assignee

Comment 82

•

23 years ago

It does not work after I applied the fix for bug 42898, so I reopen this. With the fix of bug 42898, IDN from the URL bar works but from HREF it is escaped. Even with the other patch in bug 42898 which remove the escape (#64946), the link still not resolved.

Status: RESOLVED → REOPENED

Keywords: intl

Resolution: FIXED → ---

nhottanscp

Assignee

Comment 83

•

23 years ago

In the examples in the page, http://playground.i-dns.net/~wil/moztest/ I see at least one link is ACE encoded but the link is not found. I am not sure the links in the page are updated. Willam, could you applied the patches in bug 42898 and check if this bug really exists or it's the test URL problem?

Teruko Kobayashi

Updated

•

23 years ago

Keywords: nsbeta1

Wil Tan

Reporter

Comment 84

•

23 years ago

Darin: Thanks for the fix to bug 42898. will try to update my sources and verify the patch.

Wil Tan

Reporter

Comment 85

•

23 years ago

With darin's patch to remove the nsStdEscape call in nsHTMLUtils: http://bugzilla.mozilla.org/attachment.cgi?id=64616&action=view It works.

Darin Fisher

Comment 86

•

23 years ago

thx wil... -> me (to get the patch landed)

Assignee: william.tan → darin

Status: REOPENED → NEW

Wil Tan

Reporter

Comment 87

•

23 years ago

thanks darin, you'll be landing the patch in 42898 right? me -> work on display issue.

Darin Fisher

Comment 88

•

23 years ago

are you referring to attachment 64616 [details] [diff] [review]? if so, then yes.

Darin Fisher

Updated

•

23 years ago

Status: NEW → ASSIGNED

Priority: -- → P2

Target Milestone: mozilla0.9.7 → mozilla0.9.9

Darin Fisher

Comment 89

•

23 years ago

attachment 64616 [details] [diff] [review] has landed.

Darin Fisher

Comment 90

•

23 years ago

this bug does not belong to the networking module. i believe that necko is doing everything correctly now in terms of supporting IDN. this bug needs to be fixed outside of necko. -> nhotta

Assignee: darin → nhotta

Status: ASSIGNED → NEW

Component: Networking → Internationalization

nhottanscp

Assignee

Comment 91

•

23 years ago

William, could you try the latest build? I think this is fixed by darin's check in (attachment 64616 [details] [diff] [review]).

Status: NEW → ASSIGNED

Darin Fisher

Comment 92

•

23 years ago

no, if you just hover over a IDN link, you'll get garbage in the status bar. however, that may be considered a new bug. it's up to you :-)

nhottanscp

Assignee

Comment 93

•

23 years ago

The mouse over problem is bug 81024, mark this as fixed.

Status: ASSIGNED → RESOLVED

Closed: 23 years ago → 23 years ago

Resolution: --- → FIXED

Wil Tan

Reporter

Comment 94

•

23 years ago

Verified fixed on Linux.

benc

Comment 95

•

23 years ago

qa to william...

QA Contact: benc → william.tan

You need to log in before you can comment on or make changes to this bug.