Closed Bug 231162 Opened 21 years ago Closed 13 years ago

text-transform is not using language dependent casing rules

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla14

People

(Reporter: ernestcline, Assigned: jfkthame)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: dev-doc-complete)

Attachments

(5 files, 1 obsolete file)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113 CSS calls for using language specific rules and has even since CSS 1. I decided to write a test case involving the dotted and dotless i's. In both Turkish and Azerbaijani, the normal casing rules are not follwed for the letters I (U+0049) and i (U+0069). In those two languages, lowercase U+0049 is ı (U+0131) and uppercase U+0069 is İ (U+0130). Instead, even when I downloaded the Turkish locale (tested with 1.5) and switched to it, text marked as either language still used the default casing with both text-transform: uppercase and text-transform:lowercase. Reproducible: Always Steps to Reproduce: 1. Markup a section of HTML as either Turkish (lang="tr") or Azerbaijani (lang="az") 2. Apply a CSS text-transform to it. Actual Results: text-transform:lowercase of U+0049 was displayed as U+0069. text-transform:uppercase of U+0069 was displayed as U+0049. Expected Results: text-transform:lowercase of U+0049 should have been displayed as U+0131. text-transform:uppercase of U+0069 should have been displayed as U+0130. May affect other language dependendent casing as well, but have only tested for the Dotted/Undotted I's of Turkish/Azerbaijani. Will also attach one test file I used. Another that placed the language info on the same element as the element that had the text-transform applied to it will not be uploaded as it produced the same non-result.
Attached file HTML Testcase (deleted) —
Assignee: nobody → smontagu
Status: UNCONFIRMED → NEW
Component: Layout: Fonts and Text → Internationalization
Ever confirmed: true
QA Contact: core.layout.fonts-and-text → amyy
I assume that this is a regression, and it's odd because we have code to handle it in intl/unicharutil/src/nsCaseConversionImp2.cpp (at least for lang="tr": we also need to add lang="az"). I guess that code is somehow not being reached.
As far as I can tell ToUpper() with the locale argument is just never being called.
Status: NEW → ASSIGNED
>CSS calls for using language specific rules and has even since CSS 1. For the record, this is not strictly accurate: CSS3 is the first version to require conformance to language specific rules. CSS1: http://www.w3.org/TR/CSS1.html#text-transform CSS1 core: UAs may ignore 'text-transform' (i.e., treat it as 'none') for characters that are not from the Latin-1 repertoire and for elements in languages for which the transformation is different from that specified by the case-conversion tables of Unicode. CSS2: http://www.w3.org/TR/REC-CSS2/text.html#caps-prop, unchanged in CSS2.1 http://www.w3.org/TR/CSS21/text.html#caps-prop Conforming user agents may consider the value of 'text-transform' to be 'none' for characters that are not from the Latin-1 repertoire and for elements in languages for which the transformation is different from that specified by the case-conversion tables of ISO 10646 http://www.w3.org/TR/css3-text/#caps-prop Conforming user agents MUST support case mapping rules according to the Unicode Standard for all characters specified by that standard.
OS: Windows XP → All
Hardware: PC → All
I could buy that, but in that case shouldn't Mozilla be leaving the dotted I and dotless i alone since they aren’t in Latin-1? Or are the two clauses supposed to be independent? I read it as saying if a UA doesn’t implement language sensitive case mapping rules, it must case map Latin-1 only. If the two were supposed to be independent, I would expect an "or" instead of the "and" in the quotes you gave from CSS 1 and CSS 2. Of course, all this is a battle of semantics for no purpose, since the dotted/dotless I case mapping for Turkish and Azerbaijani is mentioned in the Unicode Standard and so CSS 3 clearly requires this be supported.
Attached patch Remove unused code (deleted) — Splinter Review
Let's begin by removing the existing code, because: (a) it isn't used (b) it wouldn't work correctly if it were used, due to errors such as: if(kDot_I == *s) *s = kDot_I; so it isn't even useful as the basis for a working implementation. (c) having it in the tree is misleading and creates a superficial implession that we support Turkic casing.
Comment on attachment 139462 [details] [diff] [review] Remove unused code Requesting reviews for removing the unused and inaccurate version.
Attachment #139462 - Flags: superreview?(dbaron)
Attachment #139462 - Flags: review?(jshin)
Attachment #139462 - Flags: superreview?(dbaron) → superreview+
Comment on attachment 139462 [details] [diff] [review] Remove unused code r=jshin just a reminder (you may not need it, but just in case), we have bug 210501 in which we have to overhaul the case conversion APIs anyway.
Attachment #139462 - Flags: review?(jshin) → review+
Comment on attachment 139462 [details] [diff] [review] Remove unused code Checking this in made all non-clobber tinderboxen go orange, so I backed it out again.
Attached patch fix dependency problem (obsolete) (deleted) — Splinter Review
we've had a longstanding dependency problem with the unicharutil_s static library. We weren't relinking things that use this library when the unicharutil library changes. Rather than go add EXTRA_DEPS in a couple dozen Makefiles, I opted to just handle the dependency in rules.mk. I think this is the cause of the dep tinderbox bustage.
Attachment #139555 - Flags: review?(cls)
Comment on attachment 139555 [details] [diff] [review] fix dependency problem Grumblesmurf. We didn't have the dependency problem when MOZ_UNICHARUTIL_LIBS were used as part of SHARED_LIBRARY_LIBS. I think the special casing is a bad idea in the long run. There should be a generic mechanism to track dependencies from EXTRA_DSO_LDOPTS. Something like: DSO_LDOPTS_DEPS = $(filter %.$(LIB_SUFFIX) %$(DLL_SUFFIX), $(EXTRA_DSO_LDOPTS)) At some point, the VPATH issue would need to be fixed so that the -lfoo dependencies could be tracked as well.
Attachment #139555 - Flags: review?(cls) → review-
Right, but we don't want to use --whole-archive for this static library. I'd be ok with adding a mechanism like you suggest, and it would fix this case even if we don't address the -lfoo case right now. And rather than trying to track through the linker -L switches figuring out a library path to search in (ugh), I think I'd rather just say that the full library name should be used for linking against static libraries, and for linking against shared libraries, chances are that nothing will break if we don't relink (in the case where a symbol was removed from the shared library, we should be relinking anyway to remove references to that from the current module, and if new symbols were added to the shared library that the current module doesn't use, it won't affect anything).
Attachment #139555 - Attachment is obsolete: true
Attachment #139709 - Flags: review?(cls)
Attachment #139709 - Flags: review?(cls) → review+
Comment on attachment 139709 [details] [diff] [review] handle static library dependencies This is checked in. It should now be possible to land the original patch in this bug without breaking the depend tinderboxes.
QA Contact: amyy → i18n
Before this lands, please note that there are other affected languages. The modern Latin scripts for Crimean Tatar (crh), Volga Tatar (tt), and Bashkir (ba) all use the Turkish/Azerbaijani-style i/İ and ı/I pairings. All three have both Cyrillic and Latin scripts, and only the latter is affected, so perhaps this would require the use of script variants (e.g., tt-Latn), but Azerbaijani also has a well-represented Cyrillic script. I would like to see the new text-transform behavior apply to tt, crh, and ba as well.
I am the submitter of the dublicate bug 570333. I am sorry to have missed this bug when I searched the database. I would like to add my comemnts if I may. 1- The Turkish/Turkic i/I case transform is NOT only limited to the text-transform CSS function but also to the font-variant:small-caps as well 2- I can see in this thread that you guys know of this bug since beginning of 2004. That is SIX years! What is it stopping to fix it so that 100+ Million people can enjoy using their language fully as the other Latin alphabet users? There is a stark reality here: This bug stops Turks to format their own country's name, Turkiye properly! Shouldn't be a priority?
I am the submitter of the dublicate bug 570333. I am sorry to have missed this bug when I searched the database. I would like to add my comemnts if I may. 1- The Turkish/Turkic i/I case transform is NOT only limited to the text-transform CSS function but also to the font-variant:small-caps as well 2- I can see in this thread that you guys know of this bug since beginning of 2004. That is SIX years! What is it stopping to fix it so that 100+ Million people can enjoy using their language fully as the other Latin alphabet users? There is a stark reality here: This bug stops Turks to format their own country's name, Turkiye properly! Shouldn't be a priority?
Blocks: css2.1-tests
adding cc. Not that I'm personally biased/affected or anything. ;)
This adds support for the Turkish-style İ/i and I/ı casing behavior in text-run transformations. It's handled within the transforming text run, rather than by adding locale support to the low-level Unicode case mapping functions; I think it makes more sense at this level given the limited scope of the changes needed. I'm expecting further changes and refactoring of nsCaseTransformTextRunFactory::RebuildTextRun in bug 307039 (for Greek support), but this can at least serve as a starting point for adding language-sensitive behavior.
Attachment #606188 - Flags: review?(smontagu)
Attachment #606212 - Flags: review?(smontagu)
Comment on attachment 606188 [details] [diff] [review] patch, add Turkish support for text-transform and small-caps Review of attachment 606188 [details] [diff] [review]: ----------------------------------------------------------------- This doesn't work when the language is specified with xml:lang (which means that the tests in http://www.w3.org/International/tests/html-css/list-text-transform#special fail). I also don't understand how it handles I with U+307 COMBINING DOT ABOVE, though as far as I can tell it does do so correctly. Is the text normalized before it reaches this code?
(In reply to Simon Montagu from comment #24) > This doesn't work when the language is specified with xml:lang This is due to bug 702121 (perhaps duplicating bug 234485), I think.... > (which means > that the tests in > http://www.w3.org/International/tests/html-css/list-text-transform#special > fail). ....although according to bz in bug 234485 comment 40, "xml:lang is ignored in text/html content, as it should be", which makes me suspect some of those w3.org testcases (the non-XHTML ones) may be incorrect, as they're using xml:lang in html content.
(In reply to Simon Montagu from comment #24) > I also don't understand how it handles I with U+307 COMBINING DOT ABOVE, > though as far as I can tell it does do so correctly. Is the text normalized > before it reaches this code? No, we don't apply any normalization. The sequence <U+0049, U+0307> (İ) will be lowercased as <U+0069, U+0307> (i̇) regardless of whether the element is lang="tr" or not. Many, though not all, fonts will suppress the "extra" dot in this case, either by ligating the "i" and the dot to a simple "i" glyph, or by contextually replacing "i" by "ı" when followed by a diacritic above. Arguably, it would be good to _remove_ the combining dot when lowercasing this sequence, so that it reliably displays as "i" without an extra dot, but this is a separate issue from the Turkish behavior.
Comment on attachment 606188 [details] [diff] [review] patch, add Turkish support for text-transform and small-caps Review of attachment 606188 [details] [diff] [review]: ----------------------------------------------------------------- r=me (assuming rebasing for bug 605021)
Attachment #606188 - Flags: review?(smontagu) → review+
Attachment #606212 - Flags: review?(smontagu) → review+
Once these patches for Turkish etc are merged to m-c, I think we should resolve this bug as fixed, and file followups for any additional languages that require special case-mapping treatment. (We already have work in progress in bug 307039 for Greek.)
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Blocks: 740477
I never thought this was going to be fixed. Thank you guys, on behalf of all Turkish web makers! Is this fix live yet? http://www.w3.org/International/tests/html-css/generate?test=text-transform-040&format=h5 doesn't seem to be working in the latest Nightly.
(In reply to Selim Sumlu from comment #32) > Is this fix live yet? > http://www.w3.org/International/tests/html-css/generate?test=text-transform- > 040&format=h5 doesn't seem to be working in the latest Nightly. See comment 25 above. Perhaps we should file a bug on the tests?
Sorry, I've missed that. I've just tested a few more Turkish websites on Nightly and they all work well.
(In reply to Simon Montagu from comment #33) > (In reply to Selim Sumlu from comment #32) > > Is this fix live yet? > > http://www.w3.org/International/tests/html-css/generate?test=text-transform- > > 040&format=h5 doesn't seem to be working in the latest Nightly. > > See comment 25 above. Perhaps we should file a bug on the tests? I think that would be appropriate. I just checked the current text at http://dev.w3.org/html5/spec/single-page.html#the-lang-and-xml:lang-attributes, and it says (in part): <quote> Authors must not use the lang attribute in the XML namespace on HTML elements in HTML documents. To ease migration to and from XHTML, authors may specify an attribute in no namespace with no prefix and with the literal localname "xml:lang" on HTML elements in HTML documents, but such attributes must only be specified if a lang attribute in no namespace is also specified, and both attributes must have the same value when compared in an ASCII case-insensitive manner. NOTE: The attribute in no namespace with no prefix and with the literal localname "xml:lang" has no effect on language processing. </quote> AFAICS, the way xml:lang is used in the testcase mentioned (and other similar ones) violates this, and Firefox is correct to ignore it and just respect the lang="en" setting from the root element.
cc'ing Richard Ishida as author of the testcase concerned.
Thanks for pointing out that bug, Jonathan. http://www.w3.org/International/tests/html-css/generate?test=text-transform-040&format=h5 and associated tests should be fixed now.
sadly this isn't totally solved right now - regarding the allowed value "tr-TR" for the lang-attribute, the described problem still occurs.
(In reply to Maximilian Franzke from comment #38) > sadly this isn't totally solved right now - regarding the allowed value > "tr-TR" for the lang-attribute, the described problem still occurs. You mean it works as expected with "tr", but fails with "tr-TR"? If so, please file a new bug to track the remaining issue - thanks.
Depends on: 905381
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: