Closed Bug 237818 Opened 21 years ago Closed 21 years ago

nsStandardURL should normalize UTF-8 hostnames

Categories

(Core :: Networking, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla1.8alpha1

People

(Reporter: darin.moz, Assigned: dbaron)

References

Details

(Keywords: intl, Whiteboard: [IDN][patch])

Attachments

(2 files, 1 obsolete file)

nsStandardURL should normalize UTF-8 hostnames. BuildNormalizedSpec, SetHost, and any other methods that modify the host field of nsStandardURL should normalize the given value if it is UTF-8. We might want to add a NormalizeUTF8 method on nsIIDNService.
Status: NEW → ASSIGNED
Whiteboard: [IDN]
Target Milestone: --- → mozilla1.8alpha
Blocks: IDN
Attached file testcases (deleted) —
> We might want to add a NormalizeUTF8 method on nsIIDNService Can we just use nsIUnicodeNormalizer? There may be some issues to think about, though.
Keywords: intl
> Can we just use nsIUnicodeNormalizer? There may be some issues to think about, > though. We could do that, but personally I think we should use the same normalization algorithm used by nsIIDNService. (Maybe nsIUnicodeNormalizer is what nsIIDNService already uses?? -- I thought IDN required special normalization rules.) Whatever works! :-)
Yeah, I vaguely remembered that IDN has its own (slightly different?) normalization rules so that I wrote an 'escape clause' ("there may be issues to think about, though") :-) We have to read the IDN standard document :-)
We currently just use the normalizer, it seems: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/netwerk/dns/src/nsIDNService.cpp&rev=1.17&mark=393#351 Since we probably don't want to rip huge pieces of a URL out if the user enters it and it's invalid according to IDN (so they can fix their typo, say?), perhaps just normalization is best at this stage -- since it seems to be the only thing we do other than check for invalid things.
Er, except that the "Map" bit is probably doing something. :-)
Attached patch possible patch (obsolete) (deleted) — Splinter Review
Comment on attachment 144199 [details] [diff] [review] possible patch first glance looks right on to me... i'll take a closer look tomorrow. thx dbaron!
Attachment #144199 - Flags: review?(darin)
Comment on attachment 144199 [details] [diff] [review] possible patch >Index: base/src/nsStandardURL.cpp >+ // However, perform Unicode normalization on it, as IDN does. >+ mHostEncoding = eEncoding_ASCII; >+ if (mHost.mLen > 0) { >+ const nsACString& tempHost = >+ Substring(spec + mHost.mPos, spec + mHost.mPos + mHost.mLen); >+ if (IsASCII(tempHost)) { >+ approxLen += mHost.mLen; >+ } else { >+ mHostEncoding = eEncoding_UTF8; >+ if (gIDNService && >+ NS_SUCCEEDED(gIDNService->Normalize(tempHost, encHost))) { >+ approxLen += encHost.Length(); >+ } else { >+ encHost.Truncate(); >+ approxLen += mHost.mLen; >+ } >+ } >+ } only some nits here... (1) this file avoids brackets when not needed (2) should not be necessary to Truncate encHost in the failure case. an XPCOM method is not supposed to set an out-param value when it fails. plus, it looks like nsIDNService::Normalize isn't going to cause any trouble in this regard. > nsStandardURL::GetAsciiHost(nsACString &result) > { >- if (mHostEncoding == eEncoding_Unknown) { >- if (IsASCII(Host())) >- mHostEncoding = eEncoding_ASCII; >- else >- mHostEncoding = eEncoding_UTF8; >- } >+ NS_ASSERTION(mHostEncoding != eEncoding_Unknown, "encoding should be set"); my only concern here is that someone could construct a nsStandardURL via CreateInstance. they could decide not to initialize the nsStandardURL. should we really assert in that case? perhaps we should initialize mHostEncoding to eEncoding_ASCII in the nsStandardURL ctor. it should be okay to regard an empty hostname as ASCII. then again, i'm not sure that the rest of nsStandardURL works that well when not initialized :-/ sr=darin
Attachment #144199 - Flags: superreview+
Attachment #144199 - Flags: review?(jshin)
Attachment #144199 - Flags: review?(darin)
(In reply to comment #9) > my only concern here is that someone could construct a nsStandardURL via > CreateInstance. they could decide not to initialize the nsStandardURL. personally, I think they deserve whatever they get :)
Attached patch patch (deleted) — Splinter Review
This fixes the two numbered comments and changes where mHostEncoding is set (so it's never Unknown).
Attachment #144199 - Attachment is obsolete: true
Attachment #144426 - Flags: superreview?(darin)
Attachment #144426 - Flags: review?(jshin)
Attachment #144199 - Flags: review?(jshin)
Comment on attachment 144426 [details] [diff] [review] patch Here we assert. >+ if (gIDNService && >+ NS_SUCCEEDED(gIDNService->Normalize(tempHost, encHost))) >+ approxLen += encHost.Length(); >+ else { >+ NS_ASSERTION(encHost.IsEmpty(), >+ "Normalize failed by modified encHost"); BTW, it's not documented in the idl file that the out parameter is empty when normalize fails (although the only implementation does that)... >+ approxLen += mHost.mLen; However, we don't here. >+ if (!IsASCII(flat)) { >+ mHostEncoding = eEncoding_UTF8; >+ if (gIDNService && >+ NS_SUCCEEDED(gIDNService->Normalize(flat, escapedHost))) { >+ host = escapedHost.get(); >+ len = escapedHost.Length(); >+NS_IMETHODIMP nsIDNService::Normalize(const nsACString & input, nsACString & output) >+{ >+ nsAutoString outUTF16; >+ nsresult rv = stringPrep(NS_ConvertUTF8toUTF16(input), outUTF16); >+ if (NS_SUCCEEDED(rv)) >+ CopyUTF16toUTF8(outUTF16, output); >+ return rv; How about moving the assertion here? Isn't stringPrep always supposed to succeeed? Alternatively, we can assert at both places. r=jshin
Attachment #144426 - Flags: review?(jshin) → review+
Comment on attachment 144426 [details] [diff] [review] patch >+ "Normalize failed by modified encHost"); s/by/but/ sr=darin
Attachment #144426 - Flags: superreview?(darin) → superreview+
Assignee: darin → dbaron
Status: ASSIGNED → NEW
Priority: -- → P1
Whiteboard: [IDN] → [IDN][patch]
(In reply to comment #12) > BTW, it's not documented in the idl file that the out parameter is empty when > normalize fails (although the only implementation does that)... darin said in comment 9 that that's implied, although I'd never heard that rule before. > >+ approxLen += mHost.mLen; > > However, we don't here. Don't what? > How about moving the assertion here? Isn't stringPrep always supposed to > succeeed? Alternatively, we can assert at both places. No, stringPrep can fail for some cases of input text (I think there are some rules on bidi text and an "isprohibited" function) -- and I'd rather leave the URL unmodified in such cases so the user can correct it. Presumably the failure is handled by the DNS lookup code in a way that causes the hostname lookup to fail. I should probably write a testcase for this, though...
Comment on attachment 144426 [details] [diff] [review] patch Requesting approval. This should be pretty safe, and it fixes what could be perceived as a minor security problem (strange things could appear in the URL bar for well-known sites).
Attachment #144426 - Flags: approval1.7?
Comment on attachment 144426 [details] [diff] [review] patch a=chofmann for 1.7
Attachment #144426 - Flags: approval1.7? → approval1.7+
Fix checked in to trunk, 2004-04-02 23:32 -0800.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
This caused bug 307259, fwiw.
Verified with windows 1.0.7, 1.7.12, Mac 1.0.7, 1.7.12
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: