231162 - text-transform is not using language dependent casing rules

Reporter

Description

•

21 years ago

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113 CSS calls for using language specific rules and has even since CSS 1. I decided to write a test case involving the dotted and dotless i's. In both Turkish and Azerbaijani, the normal casing rules are not follwed for the letters I (U+0049) and i (U+0069). In those two languages, lowercase U+0049 is ı (U+0131) and uppercase U+0069 is İ (U+0130). Instead, even when I downloaded the Turkish locale (tested with 1.5) and switched to it, text marked as either language still used the default casing with both text-transform: uppercase and text-transform:lowercase. Reproducible: Always Steps to Reproduce: 1. Markup a section of HTML as either Turkish (lang="tr") or Azerbaijani (lang="az") 2. Apply a CSS text-transform to it. Actual Results: text-transform:lowercase of U+0049 was displayed as U+0069. text-transform:uppercase of U+0069 was displayed as U+0049. Expected Results: text-transform:lowercase of U+0049 should have been displayed as U+0131. text-transform:uppercase of U+0069 should have been displayed as U+0130. May affect other language dependendent casing as well, but have only tested for the Dotted/Undotted I's of Turkish/Azerbaijani. Will also attach one test file I used. Another that placed the language info on the same element as the element that had the text-transform applied to it will not be uploaded as it produced the same non-result.

Ernest Cline

Reporter

Comment 1

•

21 years ago

Attached file HTML Testcase (deleted) — Details

David Baron :dbaron:

Updated

•

21 years ago

Assignee: nobody → smontagu

Status: UNCONFIRMED → NEW

Component: Layout: Fonts and Text → Internationalization

Ever confirmed: true

QA Contact: core.layout.fonts-and-text → amyy

Simon Montagu :smontagu

Comment 2

•

21 years ago

I assume that this is a regression, and it's odd because we have code to handle it in intl/unicharutil/src/nsCaseConversionImp2.cpp (at least for lang="tr": we also need to add lang="az"). I guess that code is somehow not being reached.

Simon Montagu :smontagu

Comment 3

•

21 years ago

As far as I can tell ToUpper() with the locale argument is just never being called.

Status: NEW → ASSIGNED

Simon Montagu :smontagu

Comment 4

•

21 years ago

>CSS calls for using language specific rules and has even since CSS 1. For the record, this is not strictly accurate: CSS3 is the first version to require conformance to language specific rules. CSS1: http://www.w3.org/TR/CSS1.html#text-transform CSS1 core: UAs may ignore 'text-transform' (i.e., treat it as 'none') for characters that are not from the Latin-1 repertoire and for elements in languages for which the transformation is different from that specified by the case-conversion tables of Unicode. CSS2: http://www.w3.org/TR/REC-CSS2/text.html#caps-prop, unchanged in CSS2.1 http://www.w3.org/TR/CSS21/text.html#caps-prop Conforming user agents may consider the value of 'text-transform' to be 'none' for characters that are not from the Latin-1 repertoire and for elements in languages for which the transformation is different from that specified by the case-conversion tables of ISO 10646 http://www.w3.org/TR/css3-text/#caps-prop Conforming user agents MUST support case mapping rules according to the Unicode Standard for all characters specified by that standard.

fantasai

Updated

•

21 years ago

OS: Windows XP → All

Hardware: PC → All

Ernest Cline

Reporter

Comment 5

•

21 years ago

I could buy that, but in that case shouldn't Mozilla be leaving the dotted I and dotless i alone since they aren’t in Latin-1? Or are the two clauses supposed to be independent? I read it as saying if a UA doesn’t implement language sensitive case mapping rules, it must case map Latin-1 only. If the two were supposed to be independent, I would expect an "or" instead of the "and" in the quotes you gave from CSS 1 and CSS 2. Of course, all this is a battle of semantics for no purpose, since the dotted/dotless I case mapping for Turkish and Azerbaijani is mentioned in the Unicode Standard and so CSS 3 clearly requires this be supported.

Simon Montagu :smontagu

Comment 6

•

21 years ago

Attached patch Remove unused code (deleted) — Details — Splinter Review

Let's begin by removing the existing code, because: (a) it isn't used (b) it wouldn't work correctly if it were used, due to errors such as: if(kDot_I == *s) *s = kDot_I; so it isn't even useful as the basis for a working implementation. (c) having it in the tree is misleading and creates a superficial implession that we support Turkic casing.

Simon Montagu :smontagu

Comment 7

•

21 years ago

Comment on attachment 139462 [details] [diff] [review] Remove unused code Requesting reviews for removing the unused and inaccurate version.

Attachment #139462 - Flags: superreview?(dbaron)

Attachment #139462 - Flags: review?(jshin)

David Baron :dbaron:

Updated

•

21 years ago

Attachment #139462 - Flags: superreview?(dbaron) → superreview+

Jungshik Shin

Comment 8

•

21 years ago

Comment on attachment 139462 [details] [diff] [review] Remove unused code r=jshin just a reminder (you may not need it, but just in case), we have bug 210501 in which we have to overhaul the case conversion APIs anyway.

Attachment #139462 - Flags: review?(jshin) → review+

Simon Montagu :smontagu

Comment 9

•

21 years ago

Comment on attachment 139462 [details] [diff] [review] Remove unused code Checking this in made all non-clobber tinderboxen go orange, so I backed it out again.

Brian Ryner (not reading)

Comment 10

•

21 years ago

Attached patch fix dependency problem (obsolete) (deleted) — Details — Splinter Review

we've had a longstanding dependency problem with the unicharutil_s static library. We weren't relinking things that use this library when the unicharutil library changes. Rather than go add EXTRA_DEPS in a couple dozen Makefiles, I opted to just handle the dependency in rules.mk. I think this is the cause of the dep tinderbox bustage.

Brian Ryner (not reading)

Updated

•

21 years ago

Attachment #139555 - Flags: review?(cls)

cls

Comment 11

•

21 years ago

Comment on attachment 139555 [details] [diff] [review] fix dependency problem Grumblesmurf. We didn't have the dependency problem when MOZ_UNICHARUTIL_LIBS were used as part of SHARED_LIBRARY_LIBS. I think the special casing is a bad idea in the long run. There should be a generic mechanism to track dependencies from EXTRA_DSO_LDOPTS. Something like: DSO_LDOPTS_DEPS = $(filter %.$(LIB_SUFFIX) %$(DLL_SUFFIX), $(EXTRA_DSO_LDOPTS)) At some point, the VPATH issue would need to be fixed so that the -lfoo dependencies could be tracked as well.

Attachment #139555 - Flags: review?(cls) → review-

Brian Ryner (not reading)

Comment 12

•

21 years ago

Right, but we don't want to use --whole-archive for this static library. I'd be ok with adding a mechanism like you suggest, and it would fix this case even if we don't address the -lfoo case right now. And rather than trying to track through the linker -L switches figuring out a library path to search in (ugh), I think I'd rather just say that the full library name should be used for linking against static libraries, and for linking against shared libraries, chances are that nothing will break if we don't relink (in the case where a symbol was removed from the shared library, we should be relinking anyway to remove references to that from the current module, and if new symbols were added to the shared library that the current module doesn't use, it won't affect anything).

Brian Ryner (not reading)

Comment 13

•

21 years ago

Attached patch handle static library dependencies (deleted) — Details — Splinter Review

Attachment #139555 - Attachment is obsolete: true

Brian Ryner (not reading)

Updated

•

21 years ago

Attachment #139709 - Flags: review?(cls)

cls

Updated

•

21 years ago

Attachment #139709 - Flags: review?(cls) → review+

Brian Ryner (not reading)

Comment 14

•

21 years ago

Comment on attachment 139709 [details] [diff] [review] handle static library dependencies This is checked in. It should now be possible to land the original patch in this bug without breaking the depend tinderboxes.

Phil Ringnalda (:philor)

Updated

•

15 years ago

QA Contact: amyy → i18n

Avram Lyon

Comment 17

•

15 years ago

Before this lands, please note that there are other affected languages. The modern Latin scripts for Crimean Tatar (crh), Volga Tatar (tt), and Bashkir (ba) all use the Turkish/Azerbaijani-style i/İ and ı/I pairings. All three have both Cyrillic and Latin scripts, and only the latter is affected, so perhaps this would require the use of script variants (e.g., tt-Latn), but Azerbaijani also has a well-represented Cyrillic script. I would like to see the new text-transform behavior apply to tt, crh, and ba as well.

Riz

Comment 19

•

14 years ago

I am the submitter of the dublicate bug 570333. I am sorry to have missed this bug when I searched the database. I would like to add my comemnts if I may. 1- The Turkish/Turkic i/I case transform is NOT only limited to the text-transform CSS function but also to the font-variant:small-caps as well 2- I can see in this thread that you guys know of this bug since beginning of 2004. That is SIX years! What is it stopping to fix it so that 100+ Million people can enjoy using their language fully as the other Latin alphabet users? There is a stark reality here: This bug stops Turks to format their own country's name, Turkiye properly! Shouldn't be a priority?

Riz

Comment 20

•

14 years ago

I am the submitter of the dublicate bug 570333. I am sorry to have missed this bug when I searched the database. I would like to add my comemnts if I may. 1- The Turkish/Turkic i/I case transform is NOT only limited to the text-transform CSS function but also to the font-variant:small-caps as well 2- I can see in this thread that you guys know of this bug since beginning of 2004. That is SIX years! What is it stopping to fix it so that 100+ Million people can enjoy using their language fully as the other Latin alphabet users? There is a stark reality here: This bug stops Turks to format their own country's name, Turkiye properly! Shouldn't be a priority?

David Baron :dbaron:

Updated

•

14 years ago

Blocks: css2.1-tests

Tantek Çelik

Comment 21

•

13 years ago

adding cc. Not that I'm personally biased/affected or anything. ;)

Jonathan Kew [:jfkthame]

Assignee

Comment 22

•

13 years ago

Attached patch patch, add Turkish support for text-transform and small-caps (deleted) — Details — Splinter Review

This adds support for the Turkish-style İ/i and I/ı casing behavior in text-run transformations. It's handled within the transforming text run, rather than by adding locale support to the low-level Unicode case mapping functions; I think it makes more sense at this level given the limited scope of the changes needed. I'm expecting further changes and refactoring of nsCaseTransformTextRunFactory::RebuildTextRun in bug 307039 (for Greek support), but this can at least serve as a starting point for adding language-sensitive behavior.

Attachment #606188 - Flags: review?(smontagu)

Jonathan Kew [:jfkthame]

Assignee

Comment 23

•

13 years ago

Attached patch reftests for Turkish casing behavior (deleted) — Details — Splinter Review

Attachment #606212 - Flags: review?(smontagu)

Simon Montagu :smontagu

Comment 24

•

13 years ago

Comment on attachment 606188 [details] [diff] [review] patch, add Turkish support for text-transform and small-caps Review of attachment 606188 [details] [diff] [review]: ----------------------------------------------------------------- This doesn't work when the language is specified with xml:lang (which means that the tests in http://www.w3.org/International/tests/html-css/list-text-transform#special fail). I also don't understand how it handles I with U+307 COMBINING DOT ABOVE, though as far as I can tell it does do so correctly. Is the text normalized before it reaches this code?

Jonathan Kew [:jfkthame]

Assignee

Comment 25

•

13 years ago

(In reply to Simon Montagu from comment #24) > This doesn't work when the language is specified with xml:lang This is due to bug 702121 (perhaps duplicating bug 234485), I think.... > (which means > that the tests in > http://www.w3.org/International/tests/html-css/list-text-transform#special > fail). ....although according to bz in bug 234485 comment 40, "xml:lang is ignored in text/html content, as it should be", which makes me suspect some of those w3.org testcases (the non-XHTML ones) may be incorrect, as they're using xml:lang in html content.

Jonathan Kew [:jfkthame]

Assignee

Comment 26

•

13 years ago

(In reply to Simon Montagu from comment #24) > I also don't understand how it handles I with U+307 COMBINING DOT ABOVE, > though as far as I can tell it does do so correctly. Is the text normalized > before it reaches this code? No, we don't apply any normalization. The sequence <U+0049, U+0307> (İ) will be lowercased as <U+0069, U+0307> (i̇) regardless of whether the element is lang="tr" or not. Many, though not all, fonts will suppress the "extra" dot in this case, either by ligating the "i" and the dot to a simple "i" glyph, or by contextually replacing "i" by "ı" when followed by a diacritic above. Arguably, it would be good to _remove_ the combining dot when lowercasing this sequence, so that it reliably displays as "i" without an extra dot, but this is a separate issue from the Turkish behavior.

Simon Montagu :smontagu

Comment 27

•

13 years ago

Comment on attachment 606188 [details] [diff] [review] patch, add Turkish support for text-transform and small-caps Review of attachment 606188 [details] [diff] [review]: ----------------------------------------------------------------- r=me (assuming rebasing for bug 605021)

Attachment #606188 - Flags: review?(smontagu) → review+

Simon Montagu :smontagu

Updated

•

13 years ago

Attachment #606212 - Flags: review?(smontagu) → review+

Jonathan Kew [:jfkthame]

Assignee

Comment 28

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/4e28b565455d https://hg.mozilla.org/integration/mozilla-inbound/rev/c510b7d0069c

Assignee: smontagu → jfkthame

Target Milestone: --- → mozilla14

Jonathan Kew [:jfkthame]

Assignee

Comment 29

•

13 years ago

Once these patches for Turkish etc are merged to m-c, I think we should resolve this bug as fixed, and file followups for any additional languages that require special case-mapping treatment. (We already have work in progress in bug 307039 for Greek.)

Ed Morley [:emorley]

Comment 30

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/4e28b565455d https://hg.mozilla.org/mozilla-central/rev/c510b7d0069c

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Jean-Yves Perrier [:teoli]

Updated

•

13 years ago

Blocks: 740477

Jean-Yves Perrier [:teoli]

Updated

•

13 years ago

Keywords: dev-doc-needed

Jean-Yves Perrier [:teoli]

Comment 31

•

13 years ago

I updated: https://developer.mozilla.org/en/Firefox_14_for_developers#section_1 https://developer.mozilla.org/en/CSS/text-transform and https://developer.mozilla.org/en/CSS/font-variant

Keywords: dev-doc-needed → dev-doc-complete

Selim Şumlu

Comment 32

•

13 years ago

I never thought this was going to be fixed. Thank you guys, on behalf of all Turkish web makers! Is this fix live yet? http://www.w3.org/International/tests/html-css/generate?test=text-transform-040&format=h5 doesn't seem to be working in the latest Nightly.

Simon Montagu :smontagu

Comment 33

•

13 years ago

(In reply to Selim Sumlu from comment #32) > Is this fix live yet? > http://www.w3.org/International/tests/html-css/generate?test=text-transform- > 040&format=h5 doesn't seem to be working in the latest Nightly. See comment 25 above. Perhaps we should file a bug on the tests?

Selim Şumlu

Comment 34

•

13 years ago

Sorry, I've missed that. I've just tested a few more Turkish websites on Nightly and they all work well.

Jonathan Kew [:jfkthame]

Assignee

Comment 35

•

13 years ago

(In reply to Simon Montagu from comment #33) > (In reply to Selim Sumlu from comment #32) > > Is this fix live yet? > > http://www.w3.org/International/tests/html-css/generate?test=text-transform- > > 040&format=h5 doesn't seem to be working in the latest Nightly. > > See comment 25 above. Perhaps we should file a bug on the tests? I think that would be appropriate. I just checked the current text at http://dev.w3.org/html5/spec/single-page.html#the-lang-and-xml:lang-attributes, and it says (in part): <quote> Authors must not use the lang attribute in the XML namespace on HTML elements in HTML documents. To ease migration to and from XHTML, authors may specify an attribute in no namespace with no prefix and with the literal localname "xml:lang" on HTML elements in HTML documents, but such attributes must only be specified if a lang attribute in no namespace is also specified, and both attributes must have the same value when compared in an ASCII case-insensitive manner. NOTE: The attribute in no namespace with no prefix and with the literal localname "xml:lang" has no effect on language processing. </quote> AFAICS, the way xml:lang is used in the testcase mentioned (and other similar ones) violates this, and Firefox is correct to ignore it and just respect the lang="en" setting from the root element.

Jonathan Kew [:jfkthame]

Assignee

Comment 36

•

13 years ago

cc'ing Richard Ishida as author of the testcase concerned.

Richard Ishida

Comment 37

•

13 years ago

Thanks for pointing out that bug, Jonathan. http://www.w3.org/International/tests/html-css/generate?test=text-transform-040&format=h5 and associated tests should be fixed now.

Gordon P. Hemsley [:GPHemsley]

Updated

•

12 years ago

Blocks: 772268

Maximilian Franzke

Comment 38

•

12 years ago

sadly this isn't totally solved right now - regarding the allowed value "tr-TR" for the lang-attribute, the described problem still occurs.

Jonathan Kew [:jfkthame]

Assignee

Comment 39

•

12 years ago

(In reply to Maximilian Franzke from comment #38) > sadly this isn't totally solved right now - regarding the allowed value > "tr-TR" for the lang-attribute, the described problem still occurs. You mean it works as expected with "tr", but fails with "tr-TR"? If so, please file a new bug to track the remaining issue - thanks.

David Baron :dbaron:

Updated

•

11 years ago

Depends on: 905381

Selim Şumlu

Updated

•

9 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1225827

HTML Testcase 21 years ago Ernest Cline (deleted), text/html		Details
Remove unused code 21 years ago Simon Montagu :smontagu (deleted), patch	jshin1987 : review+ dbaron : superreview+	Details \| Diff \| Splinter Review
fix dependency problem 21 years ago Brian Ryner (not reading) (deleted), patch	cls : review-	Details \| Diff \| Splinter Review
handle static library dependencies 21 years ago Brian Ryner (not reading) (deleted), patch	cls : review+	Details \| Diff \| Splinter Review
patch, add Turkish support for text-transform and small-caps 13 years ago Jonathan Kew [:jfkthame] (deleted), patch	smontagu : review+	Details \| Diff \| Splinter Review
reftests for Turkish casing behavior 13 years ago Jonathan Kew [:jfkthame] (deleted), patch	smontagu : review+	Details \| Diff \| Splinter Review