722299 - Implement new IDN Unicode display algorithm

Reporter

Description

•

13 years ago

This bug tracks the implementation of the new IDN Unicode display algorithm (which decides when we show Unicode and when we show Punycode). https://wiki.mozilla.org/IDN_Display_Algorithm#Algorithm I have attempted to specify what is to be done in sufficient detail on that wiki page, but whoever implements this should feel free to seek clarification. Given the impending launch of a number of additional top-level domains, and the increased use of IDN domain names on the web, we should implement this with all speed. Gerv

Gervase Markham [:gerv]

Reporter

Comment 1

•

13 years ago

smontagu: are you able to put this bug into your or your group's planning processes? Gerv

Gervase Markham [:gerv]

Reporter

Comment 2

•

13 years ago

Jet: can you and Simon work on prioritizing this? I'd argue that extending the conditions under which we display IDN domains as Unicode is important for our acceptance in non-Latin locales. Gerv

Jet Villegas (inactive)

Comment 3

•

13 years ago

smontagu will be in the USA from 3/10 to 4/1. We'll add this to the IDN work list while he's here.

Simon Montagu :smontagu

Assignee

Comment 4

•

13 years ago

In https://wiki.mozilla.org/IDN_Display_Algorithm#Algorithm, it's not clear to me whether displaying Unicode or Punycode is an all or nothing thing for all the labels in a given domain name (as it is currently with the whitelist), or whether its going to be possible for a single domain name to be displayed partly in Unicode and partly in Punycode according to whether each label passes or fails the checks in the algorithm.

Gervase Markham [:gerv]

Reporter

Comment 5

•

13 years ago

Hi Simon. The answer is that the algorithm is applied on a per-label basis. Labels which pass are displayed as Unicode, labels which do not pass are displayed as Punycode. Does that present problems, or will that work fine? Feel free to update the document in the place you think this information is best clarified. Gerv

Simon Montagu :smontagu

Assignee

Comment 6

•

13 years ago

(In reply to Gervase Markham [:gerv] from comment #5) > Hi Simon. The answer is that the algorithm is applied on a per-label basis. > Labels which pass are displayed as Unicode, labels which do not pass are > displayed as Punycode. > > Does that present problems, or will that work fine? I don't think it's a problem in principle, but it makes implementation a bit trickier, because the current code assumes in a number of places that domain names are either non-ASCII or Punycode, but never both.

Component: Internationalization → ImageLib

QA Contact: i18n → imagelib

Simon Montagu :smontagu

Assignee

Updated

•

13 years ago

Component: ImageLib → Internationalization

QA Contact: imagelib → i18n

Simon Montagu :smontagu

Assignee

Updated

•

13 years ago

Assignee: nobody → smontagu

Simon Montagu :smontagu

Assignee

Comment 7

•

13 years ago

>Display as Punycode labels which use more than one numbering system (we would need >a list of numbering systems in Unicode) I'm not sure what this means. What is the definition of "numbering system" for the purposes of this restriction? There are Unicode classes for "Decimal_Number", "Letter_Number" and "Other_Number", and some Unicode characters are used as both letters and as part of numbering systems (think Roman numerals). However, almost all Unicode characters in the Number classes are defined as belonging to particular scripts, except for ASCII digits which are Common, so I'm not sure what cases this is supposed to cover which are not already covered anyway.

Simon Montagu :smontagu

Assignee

Comment 8

•

13 years ago

After writing the above I saw the text in http://www.unicode.org/reports/tr39/ which gives examples: >Check for mixing numbers from different systems, such as U+0660 ( ٠ ) ARABIC-INDIC > DIGIT ZERO with U+06F0 ( ۰ ) EXTENDED ARABIC-INDIC DIGIT ZERO, or U+09EA ( ৪ ) > BENGALI DIGIT FOUR with U+0038 ( 8 ) DIGIT EIGHT. I'm still not happy with this formulation. These could be used for phishing without any mixing of number systems, e.g. http://super৪.com. That particular example could be excluded by changing the last in the list of permitted combinations to "Common + Inherited + Latin + any single other script except Cyrillic, Greek, or Cherokee, excluding characters with the Decimal_Number property", but that would still leave open the reverse case (i.e. substituting ASCII EIGHT for BENGALI FOUR), and we can't just forbid ASCII digits with a non-Latin script since there are some scripts that don't have their own digits and use only ASCII digits.

Gervase Markham [:gerv]

Reporter

Comment 9

•

13 years ago

smontagu: the "any single script + Latin" thing can be used for phishing in the way you describe if any character in that script looks like a Latin character - whether it's a number or not. We exclude Russian, Greek and Cherokee because they are the ones with the most such characters, leading to a combinatorial explosion of phishing possibilities, but the door is not slammed shut by any means. We are resigned to the fact that any heuristic of this sort will have edge cases, and we rely on the registries to help us deal with them. I hope that any registry which allows BENGALI FOUR will have rules about making sure it can't be used to phish DIGIT EIGHT... Your proposed fix basically disallows anyone's digits and Latin together, which seems like a too-strong restriction to me. Gerv

Simon Montagu :smontagu

Assignee

Comment 10

•

13 years ago

So is there a particular reason why you want to allow the "any single script + Latin" combination; i.e. to use UTR 36's "Moderately Restrictive" profile instead of "Highly Restrictive"? In terms of the whole domain name, "Highly Restrictive" would certainly be too restrictive, but if we're thinking in terms of individual labels, it seems to be not unreasonable.

Assignee: smontagu → ashuk

Component: Internationalization → Java APIs for DOM

QA Contact: i18n → dom-apis

Simon Montagu :smontagu

Assignee

Updated

•

13 years ago

Assignee: ashuk → smontagu

Component: Java APIs for DOM → Internationalization

QA Contact: dom-apis → i18n

Gervase Markham [:gerv]

Reporter

Comment 11

•

13 years ago

Simon: my understanding is that there are plenty of real-world uses where people combine their local language with ASCII (or even Latin) - in company names and brand names. This is particularly true because historically, the DNS has been effectively LDH-only. One example I seem to remember being given was something like http://lg-한국.com/ (lg-korea.com) - LG is "LG" everywhere, even in places where Latin is not used. Although that domain name doesn't resolve. If we went to Highly Restrictive, people would still be able to combine their local language with Arabic numerals (0-9), right? Gerv

Simon Montagu :smontagu

Assignee

Comment 12

•

13 years ago

(In reply to Gervase Markham [:gerv] from comment #11) > If we went to Highly Restrictive, people would still be able to combine > their local language with Arabic numerals (0-9), right? Yes. All of the ASCII range except for A-Z and a-z are defined as Common script. > One example I seem to remember being given was something like > http://lg-한국.com/ (lg-korea.com) - LG is "LG" everywhere, even in places > where Latin is not used. Although that domain name doesn't resolve. As it happens, this particular example would be fine with Highly Restrictive anyway, since it's Latin + Hangul which is a permitted combination. Conversely, some other variants on the same theme are forbidden anyway for other reasons, like http://lg-Ελλάς.com/ (Latin + Greek) or http://lg-مصر.com (mixed direction label). The question is whether in the remaining cases where it's not already forbidden or already permitted, such as http://lg-ไทย.com (Thai), people can live with separating the Latin script and local script into separate labels.

Gervase Markham [:gerv]

Reporter

Comment 13

•

13 years ago

How much of what you are doing depends on this decision? Is it just a few lines of code, or does it affect the architecture? I could certainly go through the pros and cons of it with the IDN special interest group, but I don't want to hold you up. Gerv

Simon Montagu :smontagu

Assignee

Comment 14

•

13 years ago

It's just a few lines of code. It also wouldn't be particularly hard to include both code paths and enable Highly Restrictive or Moderately Restrictive conditionally (per domain, or by pref, or something -- I'm not saying that is necessarily something we want to do, but it's not a problem if we do).

Gervase Markham [:gerv]

Reporter

Comment 15

•

13 years ago

Simon: if it seems easier, make it a (global) pref. We can then have more time to work outwhat the correct default value should be. (I'm not seeing much of a risk that there will be a mass revolt of websites telling people to change it.) Gerv

Simon Montagu :smontagu

Assignee

Comment 16

•

13 years ago

Attached patch w-i-p patch (obsolete) (deleted) — Details — Splinter Review

There are tryserver builds with this patch at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smontagu@mozilla.com-b7a042eb454e/ if anybody wants to experiment. There are a whole lot of unexpected fails and passes in the tryserver results, some of which may be because the expected results of the tests need to be updated to new behaviour -- I haven't had time yet to analyze the results. The patch implements the script restrictions from https://wiki.mozilla.org/IDN_Display_Algorithm#Algorithm. By default it uses the Highly Restrictive profile, but can be changed to use Moderately Restrictive by setting network.IDN.highly_restrictive_profile to false in about:config. It also implements "Display as Punycode labels which have sequences of the same nonspacing mark". It doesn't implement "Display as Punycode labels which use more than one numbering system" for reasons given above: I'm not sure that it adds much, and it's not currently sufficiently clearly defined to implement. It also doesn't implement "Display as Punycode labels which contain both simplified-only and traditional-only Chinese characters". This will require adding a new API to nsUnicodeProperties or somewhere, but it should be straightforward enough to do that.

Gervase Markham [:gerv]

Reporter

Comment 17

•

13 years ago

A few thoughts: - I think the preference should be "network.IDN.restriction_profile", and the value "high". > MOZ_SCRIPT_LATIN_JPAN_COMBO Any reason this lost the A? A suggestion: would the legalScriptCombo() function be better implemented as a table-driven FSM, returning the new state, or 0 if the state would be disallowed? RelevantScripts = [MOZ_SCRIPT_LATIN, MOZ_SCRIPT_HAN, ..., MOZ_SCRIPT_LATIN_CHINESE_COMBO, ...] lastScriptIdx = findIdx(lastScript) thisScriptIdx = findIdx(thisScript) /* thisScript: LATIN HAN ... */ table = [ /* lastScript */ /* LATIN */ [LATIN, HAN, ... ], /* HAN */ [HAN, HAN, ... ], ... ]; if (lastScriptIdx && thisScriptIdx) { return table[thisScriptIdx][lastScriptIdx]; } That kind of thing? Just a thought... Gerv

Gervase Markham [:gerv]

Reporter

Comment 18

•

13 years ago

Simon: it appears that there is an updated copy of UTR 39, which now contains the detection algorithm we are using (section 5). Mark Davis asks if it answers our questions, as outlined on the wiki page. I note that it seems to increase the complexity of the algorithm, as a character can now have multiple scripts. http://www.unicode.org/reports/tr39/tr39-5.html I would guess that the data which tells us about the multiple-scripts of a character is something new, and not in our platform. Do we plan to ship that? How much bloat would that be? This seems like it would require an algorithm rewrite - is that correct? Gerv

Simon Montagu :smontagu

Assignee

Comment 19

•

13 years ago

The multiple-scripts data is in http://unicode.org/Public/6.1.0/ucd/ScriptExtensions.txt, and is indeed new in Unicode 6.1. The bloat isn't huge: there are only 372 codepoints defined in the file, with between 2 and 6 scripts each. I suggest we open a separate bug to add this data to UnicodePropertyData.cpp (it may well be useful in the font selection code that already uses script codes too) and add support for it in IDN display as a follow-up. www.unicode.org/reports/tr39/tr39-5.html#Restriction_Level_Detection has another very significant change: it introduces a distinction between "Recommended", "Aspirational" and "Limited" scripts (defined in http://www.unicode.org/reports/tr31/#Table_Recommended_Scripts and following), and while characters from Limited scripts are not permitted in the Moderately Restrictive profile, characters from Aspirational scripts are permitted. This distinction seems rather arbitrary to me.

Gervase Markham [:gerv]

Reporter

Comment 20

•

13 years ago

Simon: your first para sounds like a plan. How hard would it be to implement the stuff in your second para? I think there is value in sticking to their algorithm, for consistency. Gerv

Simon Montagu :smontagu

Assignee

Updated

•

13 years ago

Depends on: 738101

Simon Montagu :smontagu

Assignee

Comment 21

•

13 years ago

Most of the data we need for the second para is in http://www.unicode.org/Public/security/beta/xidmodifications.txt, and that should be a whole lot easier to parse than ScriptExtensions.txt There doesn't seem to be any data file that includes the list of Aspirational scripts, so I guess we will have to hard-code that for now (it's only five scripts currently). Opened bug 737931 on Script_Extensions and bug 738101 on Identitifier Modifications.

Simon Montagu :smontagu

Assignee

Comment 22

•

13 years ago

Attached patch Patch part 1 (obsolete) (deleted) — Details — Splinter Review

This includes: A pref network.IDN.restriction_profile that can be set to "high" or "moderate", and also to "ASCII" to turn off this feature completely and restore the status quo. Testing for character not in the Identifier Profile as defined in http://www.unicode.org/Public/security/latest/xidmodifications.txt. N.B. This is NOT http://www.unicode.org/Public/security/beta/xidmodifications.txt as referred to in earlier comments. I don't think the file is yet mature enough for use. I submitted some comments on it myself and I heard from the Unicode folks that they will be posting a corrected version soon. I'll update to the newer version in a later iteration. Checking for mixed scripts as defined in http://www.unicode.org/reports/tr39/tr39-5.html#Mixed_Script_Detection, except for the "Aspirational Scripts" part which only makes sense after updating to the newer version of xidmodifications.txt Checking for mixed numbering systems as defined in http://www.unicode.org/reports/tr39/tr39-5.html#Mixed_Number Checking for repeated sequences of the same non-spacing mark. Checking for both simplified-only and traditional-only Chinese characters in the same label.

Simon Montagu :smontagu

Assignee

Comment 23

•

13 years ago

Attached patch Patch part 2 (obsolete) (deleted) — Details — Splinter Review

I separated out the change to use an FSM in isLegalScriptCombo into a separate patch because it seems a bit clumsy and I'm not sure that I totally get it.

Simon Montagu :smontagu

Assignee

Comment 24

•

13 years ago

Attached patch Changes to existing tests (obsolete) (deleted) — Details — Splinter Review

Some existing tests depend on non-whitespaced TLDs not displaying IDNs as Unicode. This patch sets network.IDN.restriction_profile to "ASCII" for those tests in order to retain that behaviour.

Simon Montagu :smontagu

Assignee

Comment 25

•

13 years ago

Attached patch Tests for this bug (obsolete) (deleted) — Details — Splinter Review

Simon Montagu :smontagu

Assignee

Comment 26

•

13 years ago

I'm not sure who should review this. Does it also need Security Review?

Gervase Markham [:gerv]

Reporter

Comment 27

•

13 years ago

I'm not sure this comes to the level where it needs security review (there is no risk of computer compromise, and domain obfuscation is only one component of data loss via phishing) but we could ask curtisk. I don't know about the most appropriate normal reviewer. Gerv

Curtis Koenig [:curtisk-use curtis.koenig+bzATgmail.com]]

Comment 28

•

13 years ago

I am going to mark this sec-review-needed & privacy-review-needed so we can triage this and make a definitive decision as a group as to whether or not either of these activities are warranted.

Keywords: privacy-review-needed, sec-review-needed

Curtis Koenig [:curtisk-use curtis.koenig+bzATgmail.com]]

Comment 29

•

13 years ago

assigning to dveditz and removing privacy flag per his recommendation

Keywords: privacy-review-needed

Whiteboard: [secr:dveditz]

Curtis Koenig [:curtisk-use curtis.koenig+bzATgmail.com]]

Updated

•

13 years ago

Whiteboard: [secr:dveditz] → [sec-assigned:dveditz]

Gervase Markham [:gerv]

Reporter

Comment 30

•

12 years ago

dveditz: ping? Do you know when you might get to security review for this? Gerv

Daniel Veditz [:dveditz]

Updated

•

12 years ago

Flags: sec-review?(dveditz)

Gervase Markham [:gerv]

Reporter

Comment 31

•

12 years ago

(Pinged dveditz and curtisk by email today.) Gerv

Gervase Markham [:gerv]

Reporter

Comment 32

•

12 years ago

And again today. Gerv

Curtis Koenig [:curtisk-use curtis.koenig+bzATgmail.com]]

Updated

•

12 years ago

Depends on: 840036

Daniel Veditz [:dveditz]

Updated

•

12 years ago

Depends on: 843689

Gordon P. Hemsley [:GPHemsley]

Updated

•

12 years ago

Blocks: 843689

No longer depends on: 843689

Daniel Veditz [:dveditz]

Updated

•

12 years ago

Depends on: 843739

Daniel Veditz [:dveditz]

Updated

•

12 years ago

No longer blocks: 843689

Gervase Markham [:gerv]

Reporter

Comment 33

•

12 years ago

smontagu: can you please update your patch to: - fix any bitrot - add a bool pref, network.IDN.use_whitelist, defaulting to "true", which bypasses the whitelist when set to false - include the removal of .com, .net and .name from the whitelist in all.js (bug 843739) ? I would like to get this checked in in the next few days and then after it's baked for a short time we can consider an uplift to Aurora. Gerv

Gervase Markham [:gerv]

Reporter

Comment 34

•

12 years ago

Also, can you confirm that we do restrict the scripts permitted to those on the "Recommended" and "Aspirational" lists? I think you said in the security review that we did, but I just wanted to be sure. If we do, we can remove special processing for Cherokee, because it is in neither list. Gerv

Gervase Markham [:gerv]

Reporter

Comment 35

•

12 years ago

Also, it looks like there might be some useful test data attached to bug 316727, which I just discovered. It seems like an earlier attempt at doing this. Gerv

Simon Montagu :smontagu

Assignee

Comment 36

•

12 years ago

I just pushed a try build with the changes suggested in comment 33: https://tbpl.mozilla.org/?tree=Try&rev=dd961ea5f775 I don't have time to look at comments 34 and 35 today, will come back to them.

Simon Montagu :smontagu

Assignee

Comment 37

•

12 years ago

The try build mentioned in comment 36 had errors. A later successful try build is at https://tbpl.mozilla.org/?tree=Try&rev=59a29fb39801, and builds with this changeset can be downloaded from ftp://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/smontagu@mozilla.com-59a29fb39801/ (In reply to Gervase Markham [:gerv] from comment #34) > Also, can you confirm that we do restrict the scripts permitted to those on > the "Recommended" and "Aspirational" lists? I think you said in the security > review that we did, but I just wanted to be sure. More accurately, we restrict the codepoints permitted to those with "allowed" in http://www.unicode.org/Public/security/latest/xidmodifications.txt. In the current version of Unicode, this only includes "Recommended" scripts. I'll add special-casing for "Aspirational" scripts, which seems to be a change in UTR39 since I originally wrote the patch. > If we do, we can remove special processing for Cherokee, because it is in > neither list. As of now There is no special processing for Cherokee.

Simon Montagu :smontagu

Assignee

Comment 38

•

12 years ago

Attached patch Updated patch (deleted) — Details — Splinter Review

Attachment #606806 - Attachment is obsolete: true

Attachment #610298 - Attachment is obsolete: true

Attachment #610299 - Attachment is obsolete: true

Attachment #718598 - Flags: review?(honzab.moz)

Simon Montagu :smontagu

Assignee

Comment 39

•

12 years ago

Attached patch Add pref to enable/disable the IDN whitelist (deleted) — Details — Splinter Review

Attachment #718601 - Flags: review?(honzab.moz)

Simon Montagu :smontagu

Assignee

Comment 40

•

12 years ago

Attached patch Changes to existing tests (deleted) — Details — Splinter Review

Attachment #610301 - Attachment is obsolete: true

Attachment #718602 - Flags: review?(honzab.moz)

Simon Montagu :smontagu

Assignee

Comment 41

•

12 years ago

Attached patch Tests for this bug (deleted) — Details — Splinter Review

Attachment #610302 - Attachment is obsolete: true

Attachment #718605 - Flags: review?(honzab.moz)

Honza Bambas (:mayhemer)

Comment 42

•

12 years ago

Comment on attachment 718601 [details] [diff] [review] Add pref to enable/disable the IDN whitelist Review of attachment 718601 [details] [diff] [review]: ----------------------------------------------------------------- r=honzab

Attachment #718601 - Flags: review?(honzab.moz) → review+

Honza Bambas (:mayhemer)

Comment 43

•

12 years ago

Could I get a little context for the new 'convertAllLabels' argument?

Simon Montagu :smontagu

Assignee

Comment 44

•

12 years ago

See comment 4 through comment 6. Before this patch, any IRI was either all displayed in Unicode or all displayed in punycode, depending on whether the TLD was in the white list and the IRI passed stringprep validation. With the patch, we check each label individually, and display in Unicode only those labels that pass the new display algorithm.

Honza Bambas (:mayhemer)

Comment 45

•

12 years ago

Comment on attachment 718598 [details] [diff] [review] Updated patch Review of attachment 718598 [details] [diff] [review]: ----------------------------------------------------------------- OK, took a while since I didn't know this code that well. Also, I may not be the right to review this code. Technically the code seems to be correct, and as far as I understand and know the algorithm, it does what it has to. The nsIDNService::isLabelSafe is a piece of work, really! r=honzab Comments are mostly nits. ::: netwerk/dns/nsIDNService.cpp @@ +114,5 @@ > + if (NS_FAILED(prefBranch->GetCharPref(NS_NET_PREF_IDNRESTRICTION, > + getter_Copies(profile)))) { > + profile.Truncate(); > + } > + if (profile.Equals("moderate")) { using NS_LITERAL_CSTRING is a bit faster (no rt strlen calculation). @@ +354,5 @@ > } else { > *_isASCII = true; > } > } else { > + // First convert any ACE labels to UTF8 Can we add more detailed comment why? @@ +359,5 @@ > + nsAutoCString temp; > + if (isACE) { > + ACEtoUTF8(input, temp, false, true); > + } else { > + temp = input; is this sharing buffers? @@ +735,5 @@ > + if (mRestrictionProfile == eASCIIOnlyProfile) { > + return false; > + } > + > + nsAString::const_iterator start, end; Rename start to caret or current or so, it's confusing in the code bellow. @@ +744,5 @@ > + uint32_t previousChar = 0; > + uint32_t savedNumberingSystem = 0; > + HanVariantType savedHanVariant = HVT_NotHan; > + > + mSavedScriptIndex = -1; According the implementation you may just pass ref to some local var to illegalScriptCombo method, instead of holding this as a member. @@ +763,5 @@ > + (script == MOZ_SCRIPT_CANADIAN_ABORIGINAL || > + script == MOZ_SCRIPT_MIAO || > + script == MOZ_SCRIPT_MONGOLIAN || > + script == MOZ_SCRIPT_TIFINAGH || > + script == MOZ_SCRIPT_YI))) { Should have prefs instead of hard coding? ::: netwerk/dns/nsIDNService.h @@ +45,5 @@ > bool allowUnassigned); > nsresult decodeACE(const nsACString& in, nsACString& out, > + bool allowUnassigned, bool convertAllLabels); > + nsresult SelectiveUTF8toACE(const nsACString& in, nsACString& out); > + nsresult SelectiveACEtoUTF8(const nsACString&in, nsACString& out); nsACString&in can you keep the arg names in sync with Convert* methods? Makes the code more readable. @@ +52,2 @@ > nsresult ACEtoUTF8(const nsACString& in, nsACString& out, > + bool allowUnassigned, bool convertAllLabels); I've spend the longest time by figuring out where all convertAllLabels arg flows with what values. Docs needed! For me it means: * false only when a domain we want to display is not on whitelist and is non-ascii/punicoded (needs conversion) ; false value makes the conversion routines leave intact: - domain portions that *are unsafe* to convert when doing ACE->UTF8 - domain portions that *are safe* to convert when doing UTF8->ACE * true otherwise, what means to convert all or nothing fix me if I'm wrong @@ +54,4 @@ > bool isInWhitelist(const nsACString &host); > void prefsChanged(nsIPrefBranch *prefBranch, const PRUnichar *pref); > + bool isLabelSafe(const nsAString &label); > + bool illegalScriptCombo(int32_t script); Definitely (and just these two) need more comments. Can be added after landed, tho, since this is in rush.

Attachment #718598 - Flags: review?(honzab.moz) → review+

Honza Bambas (:mayhemer)

Comment 46

•

12 years ago

Comment on attachment 718602 [details] [diff] [review] Changes to existing tests Review of attachment 718602 [details] [diff] [review]: ----------------------------------------------------------------- I've not checked what other test would need this, tho.

Attachment #718602 - Flags: review?(honzab.moz) → review+

Honza Bambas (:mayhemer)

Comment 47

•

12 years ago

Comment on attachment 718605 [details] [diff] [review] Tests for this bug Review of attachment 718605 [details] [diff] [review]: ----------------------------------------------------------------- Good work. r=honzab ::: netwerk/test/unit/test_bug722299.js @@ +1,3 @@ > +// Test algorithm for unicode display of IDNA URL (bug 722299) > +const testcases = [ > + // Original Punycode or Expected result by profile Expected *UTF8* by profile ? @@ +4,5 @@ > + // URL normalized form ASCII-Only, High, Moderate > + // > + // Latin script > + ["cuillère", "xn--cuillre-6xa", false, true, true], > + // repeated non-spacing marks I think adding a blank line like bellow can make this even more readable. // comment ["idn", "puny", results], // comment ["idn", "puny", results], // comment ["idn", "puny", results], @@ +37,5 @@ > + ["beethovenドイツ語ではルートヴィヒ", > + "xn--beethoven-8d4h2cy4aka26b5bl3npswf7fo191r", > + false, true, true], > + // Han with common > + ["中国〆", "xn--v6j902gn7f", false, true, true], What is the common character here? Could we have the both common first/common last char variants? (actually for all tests to check the table jumping is correct as much as possible) @@ +72,5 @@ > + // Hangul and katakana > + ["한글ハングル", > + "xn--qck1c2d4a9266lkmzb", false, false, false] > +]; > + Can we have a test for UCS4 surrogates (preferably also broken 1-part only at the end) ?

Attachment #718605 - Flags: review?(honzab.moz) → review+

Gervase Markham [:gerv]

Reporter

Comment 48

•

12 years ago

smontagu: can we get this checked in (with review fixes) today? Gerv

Simon Montagu :smontagu

Assignee

Comment 49

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/58cdc6e6fad9 https://hg.mozilla.org/integration/mozilla-inbound/rev/abb653a2a022 https://hg.mozilla.org/integration/mozilla-inbound/rev/a9c14516f31a https://hg.mozilla.org/integration/mozilla-inbound/rev/eab8e07eea58

Flags: in-testsuite+

Ryan VanderMeulen [:RyanVM]

Comment 50

•

12 years ago

Backed out for mochitest failures on B2G. https://hg.mozilla.org/integration/mozilla-inbound/rev/72c49203f293 https://tbpl.mozilla.org/php/getParsedLog.php?id=20200673&tree=Mozilla-Inbound 15:57:48 INFO - 975 ERROR TEST-UNEXPECTED-PASS | /tests/dom/tests/mochitest/dom-level0/test_setting_document.domain_idn.html | wrong sender - http://sub1.exämple.test should equal http://sub1.exämple.test 15:57:48 INFO - 976 ERROR TEST-UNEXPECTED-PASS | /tests/dom/tests/mochitest/dom-level0/test_setting_document.domain_idn.html | wrong sender - http://sub1.exämple.test should not equal http://sub1.xn--exmple-cua.test 15:57:48 INFO - 979 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/tests/mochitest/dom-level0/test_setting_document.domain_idn.html | wrong response for IDN - got idn-nowhitelist-response wrong-initial-domain(sub1.exämple.test), expected idn-nowhitelist-response 15:57:48 INFO - 981 ERROR TEST-UNEXPECTED-PASS | /tests/dom/tests/mochitest/dom-level0/test_setting_document.domain_idn.html | wrong sender - http://sub1.exämple.test should equal http://sub1.exämple.test 15:57:48 INFO - 982 ERROR TEST-UNEXPECTED-PASS | /tests/dom/tests/mochitest/dom-level0/test_setting_document.domain_idn.html | wrong sender - http://sub1.exämple.test should not equal http://sub1.xn--exmple-cua.test 15:57:48 INFO - 985 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/tests/mochitest/dom-level0/test_setting_document.domain_idn.html | wrong response for punycode - got punycode-nowhitelist-response wrong-initial-domain(sub1.exämple.test), expected punycode-nowhitelist-response

Jeff Walden [:Waldo]

Comment 51

•

12 years ago

Looks like you're being bitten by bug 414090, specifically that various places origins/URLs are exposed in the DOM are affected by IDN-display decisions. The comments in the test are reasonably extensive as to what was going on, what should go on, and why various things were tested as they were. Find a few new (different) representative URLs to put in the tests, and you should be good to go again. If anything's unclear, feel free to poke me with questions -- I did write the test, even if it was, um, five years ago. :-)

Simon Montagu :smontagu

Assignee

Comment 52

•

12 years ago

Attachment 718602 [details] [diff] includes a workaround for exactly that problem, which works on other platforms and even on B2G for the other affected tests. I don't see why it doesn't help with this particular test. :S

Simon Montagu :smontagu

Assignee

Comment 53

•

12 years ago

Repushed with changes to the URLs as Waldo suggested. I think this is in any case a better long term solution than using the network.IDN.restriction_profile pref, which we probably don't want to keep around for ever. https://hg.mozilla.org/integration/mozilla-inbound/rev/84b9049008e9 https://hg.mozilla.org/integration/mozilla-inbound/rev/d475e6d54520 https://hg.mozilla.org/integration/mozilla-inbound/rev/fd84967d0389 https://hg.mozilla.org/integration/mozilla-inbound/rev/eea81f144307

Ryan VanderMeulen [:RyanVM]

Comment 54

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/84b9049008e9 https://hg.mozilla.org/mozilla-central/rev/d475e6d54520 https://hg.mozilla.org/mozilla-central/rev/fd84967d0389 https://hg.mozilla.org/mozilla-central/rev/eea81f144307

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla22

Philip Chee

Comment 55

•

12 years ago

make.py[2]: Leaving directory 'c:\t1\hg\objdir-sm\mozilla\netwerk\cookie' make.py[2]: Entering directory 'c:\t1\hg\objdir-sm\mozilla\netwerk\dns' nsIDNService.cpp c:\t1\hg\comm-central\mozilla\config\rules.mk:1055:0$ c:/t1/hg/objdir-sm/mozilla/_virtualenv/Scripts /python.exe -O c:/t1/hg/comm-central/mozilla/build/cl.py cl -FonsIDNService.obj -c -D_HAS_EXCEPTIONS =0 -I../../dist/stl_wrappers -DMOZILLA_INTERNAL_API -D_IMPL_NS_COM -DEXPORT_XPT_API -DEXPORT_XPTC_A PI -D_IMPL_NS_GFX -D_IMPL_NS_WIDGET -DIMPL_XREAPI -DIMPL_NS_NET -DIMPL_THEBES -DMOZ_SUITE=1 -DNO_NS PR_10_SUPPORT -DIMPL_NS_NET -Ic:/t1/hg/comm-central/mozilla/netwerk/dns/../base/src -I. -Ic:/t1/hg/ comm-central/mozilla/netwerk/dns -I. -I../../dist/include -Ic:/t1/hg/objdir-sm/mozilla/dist/include /nspr -Ic:/t1/hg/objdir-sm/mozilla/dist/include/nss -wd4099 -TP -nologo -W3 -Gy -Fdgenerated. pdb -wd4251 -wd4345 -wd4351 -wd4482 -wd4800 -wd4819 -we4553 -GR- -DNDEBUG -DTRIMMED -Zi -UDEBUG -DN DEBUG -O1 -Oy -MD -FI ../../dist/include/mozilla-config.h -DMOZILLA_CLIENT c:/t1/hg/com m-central/mozilla/netwerk/dns/nsIDNService.cpp nsIDNService.cpp c:\t1\hg\objdir-sm\mozilla\dist\include\harfbuzz\hb-common.h(56) : error C2371: 'int8_t' : redefinit ion; different basic types c:\t1\hg\objdir-sm\mozilla\dist\include\mozilla/MSStdInt.h(82) : see declaration of 'int8_t'

Philip Chee

Comment 56

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/84b9049008e9#l2.11 2.11 #include "nsUnicharUtils.h" 2.12 +#include "nsUnicodeProperties.h" 2.13 +#include "nsUnicodeScriptCodes.h" 2.14 +#include "harfbuzz/hb.h" I'm using VS2008. Ms2ger says: That someone's using harfbuzz includes outside the five dirs that define HB_DONT_DEFINE_STDINT

Philip Chee

Comment 57

•

12 years ago

Attached patch Patch to get VS2008/VC9 building. (deleted) — Details — Splinter Review

This allows my build to continue.

Attachment #720285 - Flags: review?(honzab.moz)

Honza Bambas (:mayhemer)

Updated

•

12 years ago

Attachment #720285 - Flags: review?(honzab.moz) → review+

Bill Gianopoulos [:WG9s]

Updated

•

12 years ago

Depends on: 847078

Jonathan Kew [:jfkthame]

Comment 58

•

12 years ago

Pushed the followup to inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/333e89511b31

Ryan VanderMeulen [:RyanVM]

Comment 59

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/333e89511b31

u279076

Comment 60

•

12 years ago

There was a request via email to have some QA eyes on this. I'm setting Matt Wobensmith as QA Contact since he is our security QA lead. I know he is at CanSecWest today but I think he'll be back in the office on Monday. I can help assist if this can't wait until then, pending guidance from Matt.

Keywords: qawanted, verifyme

QA Contact: mwobensmith

Daniel Veditz [:dveditz]

Updated

•

12 years ago

Flags: sec-review?(dveditz) → sec-review+

Daniel Veditz [:dveditz]

Updated

•

12 years ago

Depends on: 853226

Daniel Veditz [:dveditz]

Updated

•

12 years ago

Depends on: 853231

Daniel Veditz [:dveditz]

Updated

•

12 years ago

Depends on: 853237

Simon Montagu :smontagu

Assignee

Updated

•

12 years ago

Depends on: 857481

Simon Montagu :smontagu

Assignee

Updated

•

12 years ago

Depends on: 857490

Simon Montagu :smontagu

Assignee

Updated

•

12 years ago

Depends on: 858417

Gervase Markham [:gerv]

Reporter

Comment 61

•

12 years ago

Comment on attachment 718598 [details] [diff] [review] Updated patch [Approval Request Comment] This uplift request is for all patches on this bug, the patch on bug 843739 (which removes .com, .net and .name from the whitelist) and the follow-up fixes from bug 857481 and bug 857490 which were the result of the QA pass. (smontagu: please add to that list of bugs if I've missed any.) If you need a rollup patch prepared, perhaps smontagu would be kind enough to do that. Bug caused by (feature/regressing bug #): This is part regression fix, part feature. In bug 770877, we added .com, .net and .name to the IDN TLD whitelist. However, we subsequently found that, although Verisign's current policies are OK, there were some historical registrations in .com which were spoofy and these have not been revoked. To fix that without regressing legitimate .com domains, we finished off a patch to switch from a whitelist to an algorithm - see https://wiki.mozilla.org/IDN_Display_Algorithm - and married that with a patch to remove .com, .net and .name so those TLDs rely solely on the algorithm. User impact if declined: possible spoofing security risk. Also, ESR will not have new algorithmic code, which will mean IDNs in all non-whitelisted TLDs will not work. The list of TLDs will soon be greatly expanded and IDN is starting to become widely used. Also, we want to deprecate and eventually remove the whitelist; if ESR is still relying on it, we can't do that totally for a much longer period than would otherwise be the case. Testing completed (on m-c, etc.): Yes. Extensive testing has been done by the awesome Matt Wobensmith over in bug 854041, and a couple of fixes have been made as a result (see bug numbers above). There are a small number of corner cases outstanding but they are not ship-stoppers. The code is currently on Aurora. Risk to taking this patch (and alternatives if risky): An alternative would be to take _only_ the patch to bug 843739. That would mean all .com, .net and .name IDNs would all go back to "not displaying correctly" for an entire ESR cycle, and IDN would not display correctly on ESR in any of the non-whitelisted TLDs or new TLDs. String or IDL/UUID changes made by this patch: None. Please let me know if there are further questions. Gerv

Attachment #718598 - Flags: approval-mozilla-beta?

bhavana bajaj [:bajaj]

Comment 62

•

12 years ago

Comment on attachment 718598 [details] [diff] [review] Updated patch Discussed/Clarified offline with Gerv on this.Given the risk of uplifting such a huge change in second half of the cycle, we feel this should ride the trains instead . In addition the user impact/risk around ESR is not applicable here given our next ESR is ESR24.

Attachment #718598 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

3ric Johanson

Comment 63

•

11 years ago

What on earth does "ride the trains" mean?

Gervase Markham [:gerv]

Reporter

Comment 64

•

11 years ago

3ric: Firefox development is organized into "release trains", which pass from nightly through Aurora and Beta to Release. "Riding the trains" means letting the patch follow the normal path rather than forward-porting it. The code here is due for release tomorrow in Firefox 22, after which I hope you will find that the problematic URLs you have been discussing no longer show as Unicode in released Firefox. Gerv

Tracy Walker [:tracy]

Comment 65

•

11 years ago

mass remove verifyme requests greater than 4 months old

Keywords: verifyme

Peter Kasting

Comment 66

•

11 years ago

FWIW, I filed https://code.google.com/p/chromium/issues/detail?id=336973 about implementing this algorithm in Chrome. Patches welcome in case any Mozilla folks want to write a Chrome patch and/or contribute to more cross-browser behavioral alignment :)

w-i-p patch 13 years ago Simon Montagu :smontagu (deleted), patch		Details \| Diff \| Splinter Review
Patch part 1 13 years ago Simon Montagu :smontagu (deleted), patch		Details \| Diff \| Splinter Review
Patch part 2 13 years ago Simon Montagu :smontagu (deleted), patch		Details \| Diff \| Splinter Review
Changes to existing tests 13 years ago Simon Montagu :smontagu (deleted), patch		Details \| Diff \| Splinter Review
Tests for this bug 13 years ago Simon Montagu :smontagu (deleted), patch		Details \| Diff \| Splinter Review
Updated patch 12 years ago Simon Montagu :smontagu (deleted), patch	mayhemer : review+ bajaj : approval-mozilla-beta-	Details \| Diff \| Splinter Review
Add pref to enable/disable the IDN whitelist 12 years ago Simon Montagu :smontagu (deleted), patch	mayhemer : review+	Details \| Diff \| Splinter Review
Changes to existing tests 12 years ago Simon Montagu :smontagu (deleted), patch	mayhemer : review+	Details \| Diff \| Splinter Review
Tests for this bug 12 years ago Simon Montagu :smontagu (deleted), patch	mayhemer : review+	Details \| Diff \| Splinter Review
Patch to get VS2008/VC9 building. 12 years ago Philip Chee (deleted), patch	mayhemer : review+	Details \| Diff \| Splinter Review