Closed
Bug 231162
Opened 21 years ago
Closed 13 years ago
text-transform is not using language dependent casing rules
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
mozilla14
People
(Reporter: ernestcline, Assigned: jfkthame)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
(Keywords: dev-doc-complete)
Attachments
(5 files, 1 obsolete file)
(deleted),
text/html
|
Details | |
(deleted),
patch
|
jshin1987
:
review+
dbaron
:
superreview+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
cls
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
smontagu
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
smontagu
:
review+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113
CSS calls for using language specific rules and has even since CSS 1.
I decided to write a test case involving the dotted and dotless i's.
In both Turkish and Azerbaijani, the normal casing rules are not follwed
for the letters I (U+0049) and i (U+0069). In those two languages,
lowercase U+0049 is ı (U+0131) and uppercase U+0069 is İ (U+0130).
Instead, even when I downloaded the Turkish locale (tested with 1.5)
and switched to it, text marked as either language still used the default
casing with both text-transform: uppercase and text-transform:lowercase.
Reproducible: Always
Steps to Reproduce:
1. Markup a section of HTML as either Turkish (lang="tr") or Azerbaijani (lang="az")
2. Apply a CSS text-transform to it.
Actual Results:
text-transform:lowercase of U+0049 was displayed as U+0069.
text-transform:uppercase of U+0069 was displayed as U+0049.
Expected Results:
text-transform:lowercase of U+0049 should have been displayed as U+0131.
text-transform:uppercase of U+0069 should have been displayed as U+0130.
May affect other language dependendent casing as well, but have only tested for
the Dotted/Undotted I's of Turkish/Azerbaijani. Will also attach one test file I
used. Another that placed the language info on the same element as the element
that had the text-transform applied to it will not be uploaded as it produced
the same non-result.
Reporter | ||
Comment 1•21 years ago
|
||
Updated•21 years ago
|
Assignee: nobody → smontagu
Status: UNCONFIRMED → NEW
Component: Layout: Fonts and Text → Internationalization
Ever confirmed: true
QA Contact: core.layout.fonts-and-text → amyy
Comment 2•21 years ago
|
||
I assume that this is a regression, and it's odd because we have code to handle
it in intl/unicharutil/src/nsCaseConversionImp2.cpp (at least for lang="tr": we
also need to add lang="az"). I guess that code is somehow not being reached.
Comment 3•21 years ago
|
||
As far as I can tell ToUpper() with the locale argument is just never being called.
Status: NEW → ASSIGNED
Comment 4•21 years ago
|
||
>CSS calls for using language specific rules and has even since CSS 1.
For the record, this is not strictly accurate: CSS3 is the first version to
require conformance to language specific rules.
CSS1: http://www.w3.org/TR/CSS1.html#text-transform
CSS1 core: UAs may ignore 'text-transform' (i.e., treat it as 'none') for
characters that are not from the Latin-1 repertoire and for elements in
languages for which the transformation is different from that specified by the
case-conversion tables of Unicode.
CSS2: http://www.w3.org/TR/REC-CSS2/text.html#caps-prop, unchanged in CSS2.1
http://www.w3.org/TR/CSS21/text.html#caps-prop
Conforming user agents may consider the value of 'text-transform' to be 'none'
for characters that are not from the Latin-1 repertoire and for elements in
languages for which the transformation is different from that specified by the
case-conversion tables of ISO 10646
http://www.w3.org/TR/css3-text/#caps-prop
Conforming user agents MUST support case mapping rules according to the Unicode
Standard for all characters specified by that standard.
Reporter | ||
Comment 5•21 years ago
|
||
I could buy that, but in that case shouldn't Mozilla be leaving the dotted I and
dotless i alone since they aren’t in Latin-1? Or are the two clauses supposed
to be independent? I read it as saying if a UA doesn’t implement language
sensitive case mapping rules, it must case map Latin-1 only. If the two were
supposed to be independent, I would expect an "or" instead of the "and" in the
quotes you gave from CSS 1 and CSS 2.
Of course, all this is a battle of semantics for no purpose, since the
dotted/dotless I case mapping for Turkish and Azerbaijani is mentioned in the
Unicode Standard and so CSS 3 clearly requires this be supported.
Comment 6•21 years ago
|
||
Let's begin by removing the existing code, because:
(a) it isn't used
(b) it wouldn't work correctly if it were used, due to errors such as:
if(kDot_I == *s)
*s = kDot_I;
so it isn't even useful as the basis for a working implementation.
(c) having it in the tree is misleading and creates a superficial implession
that we support Turkic casing.
Comment 7•21 years ago
|
||
Comment on attachment 139462 [details] [diff] [review]
Remove unused code
Requesting reviews for removing the unused and inaccurate version.
Attachment #139462 -
Flags: superreview?(dbaron)
Attachment #139462 -
Flags: review?(jshin)
Updated•21 years ago
|
Attachment #139462 -
Flags: superreview?(dbaron) → superreview+
Comment 8•21 years ago
|
||
Comment on attachment 139462 [details] [diff] [review]
Remove unused code
r=jshin
just a reminder (you may not need it, but just in case), we have bug 210501 in
which we have to overhaul the case conversion APIs anyway.
Attachment #139462 -
Flags: review?(jshin) → review+
Comment 9•21 years ago
|
||
Comment on attachment 139462 [details] [diff] [review]
Remove unused code
Checking this in made all non-clobber tinderboxen go orange, so I backed it out
again.
Comment 10•21 years ago
|
||
we've had a longstanding dependency problem with the unicharutil_s static
library. We weren't relinking things that use this library when the
unicharutil library changes.
Rather than go add EXTRA_DEPS in a couple dozen Makefiles, I opted to just
handle the dependency in rules.mk.
I think this is the cause of the dep tinderbox bustage.
Updated•21 years ago
|
Attachment #139555 -
Flags: review?(cls)
Comment 11•21 years ago
|
||
Comment on attachment 139555 [details] [diff] [review]
fix dependency problem
Grumblesmurf. We didn't have the dependency problem when MOZ_UNICHARUTIL_LIBS
were used as part of SHARED_LIBRARY_LIBS. I think the special casing is a bad
idea in the long run. There should be a generic mechanism to track
dependencies from EXTRA_DSO_LDOPTS. Something like:
DSO_LDOPTS_DEPS = $(filter %.$(LIB_SUFFIX) %$(DLL_SUFFIX), $(EXTRA_DSO_LDOPTS))
At some point, the VPATH issue would need to be fixed so that the -lfoo
dependencies could be tracked as well.
Attachment #139555 -
Flags: review?(cls) → review-
Comment 12•21 years ago
|
||
Right, but we don't want to use --whole-archive for this static library.
I'd be ok with adding a mechanism like you suggest, and it would fix this case
even if we don't address the -lfoo case right now. And rather than trying to
track through the linker -L switches figuring out a library path to search in
(ugh), I think I'd rather just say that the full library name should be used for
linking against static libraries, and for linking against shared libraries,
chances are that nothing will break if we don't relink (in the case where a
symbol was removed from the shared library, we should be relinking anyway to
remove references to that from the current module, and if new symbols were added
to the shared library that the current module doesn't use, it won't affect
anything).
Comment 13•21 years ago
|
||
Attachment #139555 -
Attachment is obsolete: true
Updated•21 years ago
|
Attachment #139709 -
Flags: review?(cls)
Attachment #139709 -
Flags: review?(cls) → review+
Comment 14•21 years ago
|
||
Comment on attachment 139709 [details] [diff] [review]
handle static library dependencies
This is checked in. It should now be possible to land the original patch in
this bug without breaking the depend tinderboxes.
Updated•15 years ago
|
QA Contact: amyy → i18n
Comment 17•15 years ago
|
||
Before this lands, please note that there are other affected languages. The modern
Latin scripts for Crimean Tatar (crh), Volga Tatar (tt), and Bashkir (ba) all
use the Turkish/Azerbaijani-style i/İ and ı/I pairings. All three have both
Cyrillic and Latin scripts, and only the latter is affected, so perhaps this
would require the use of script variants (e.g., tt-Latn), but Azerbaijani also
has a well-represented Cyrillic script. I would like to see the new
text-transform behavior apply to tt, crh, and ba as well.
Comment 19•14 years ago
|
||
I am the submitter of the dublicate bug 570333. I am sorry to have missed this bug when I searched the database.
I would like to add my comemnts if I may.
1- The Turkish/Turkic i/I case transform is NOT only limited to the text-transform CSS function but also to the font-variant:small-caps as well
2- I can see in this thread that you guys know of this bug since beginning of 2004. That is SIX years! What is it stopping to fix it so that 100+ Million people can enjoy using their language fully as the other Latin alphabet users?
There is a stark reality here: This bug stops Turks to format their own country's name, Turkiye properly!
Shouldn't be a priority?
Comment 20•14 years ago
|
||
I am the submitter of the dublicate bug 570333. I am sorry to have missed this bug when I searched the database.
I would like to add my comemnts if I may.
1- The Turkish/Turkic i/I case transform is NOT only limited to the text-transform CSS function but also to the font-variant:small-caps as well
2- I can see in this thread that you guys know of this bug since beginning of 2004. That is SIX years! What is it stopping to fix it so that 100+ Million people can enjoy using their language fully as the other Latin alphabet users?
There is a stark reality here: This bug stops Turks to format their own country's name, Turkiye properly!
Shouldn't be a priority?
Updated•14 years ago
|
Blocks: css2.1-tests
Comment 21•13 years ago
|
||
adding cc. Not that I'm personally biased/affected or anything. ;)
Assignee | ||
Comment 22•13 years ago
|
||
This adds support for the Turkish-style İ/i and I/ı casing behavior in text-run transformations. It's handled within the transforming text run, rather than by adding locale support to the low-level Unicode case mapping functions; I think it makes more sense at this level given the limited scope of the changes needed.
I'm expecting further changes and refactoring of nsCaseTransformTextRunFactory::RebuildTextRun in bug 307039 (for Greek support), but this can at least serve as a starting point for adding language-sensitive behavior.
Attachment #606188 -
Flags: review?(smontagu)
Assignee | ||
Comment 23•13 years ago
|
||
Attachment #606212 -
Flags: review?(smontagu)
Comment 24•13 years ago
|
||
Comment on attachment 606188 [details] [diff] [review]
patch, add Turkish support for text-transform and small-caps
Review of attachment 606188 [details] [diff] [review]:
-----------------------------------------------------------------
This doesn't work when the language is specified with xml:lang (which means that the tests in http://www.w3.org/International/tests/html-css/list-text-transform#special fail).
I also don't understand how it handles I with U+307 COMBINING DOT ABOVE, though as far as I can tell it does do so correctly. Is the text normalized before it reaches this code?
Assignee | ||
Comment 25•13 years ago
|
||
(In reply to Simon Montagu from comment #24)
> This doesn't work when the language is specified with xml:lang
This is due to bug 702121 (perhaps duplicating bug 234485), I think....
> (which means
> that the tests in
> http://www.w3.org/International/tests/html-css/list-text-transform#special
> fail).
....although according to bz in bug 234485 comment 40, "xml:lang is ignored in text/html content, as it should be", which makes me suspect some of those w3.org testcases (the non-XHTML ones) may be incorrect, as they're using xml:lang in html content.
Assignee | ||
Comment 26•13 years ago
|
||
(In reply to Simon Montagu from comment #24)
> I also don't understand how it handles I with U+307 COMBINING DOT ABOVE,
> though as far as I can tell it does do so correctly. Is the text normalized
> before it reaches this code?
No, we don't apply any normalization. The sequence <U+0049, U+0307> (İ) will be lowercased as <U+0069, U+0307> (i̇) regardless of whether the element is lang="tr" or not. Many, though not all, fonts will suppress the "extra" dot in this case, either by ligating the "i" and the dot to a simple "i" glyph, or by contextually replacing "i" by "ı" when followed by a diacritic above.
Arguably, it would be good to _remove_ the combining dot when lowercasing this sequence, so that it reliably displays as "i" without an extra dot, but this is a separate issue from the Turkish behavior.
Comment 27•13 years ago
|
||
Comment on attachment 606188 [details] [diff] [review]
patch, add Turkish support for text-transform and small-caps
Review of attachment 606188 [details] [diff] [review]:
-----------------------------------------------------------------
r=me (assuming rebasing for bug 605021)
Attachment #606188 -
Flags: review?(smontagu) → review+
Updated•13 years ago
|
Attachment #606212 -
Flags: review?(smontagu) → review+
Assignee | ||
Comment 28•13 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/4e28b565455d
https://hg.mozilla.org/integration/mozilla-inbound/rev/c510b7d0069c
Assignee: smontagu → jfkthame
Target Milestone: --- → mozilla14
Assignee | ||
Comment 29•13 years ago
|
||
Once these patches for Turkish etc are merged to m-c, I think we should resolve this bug as fixed, and file followups for any additional languages that require special case-mapping treatment. (We already have work in progress in bug 307039 for Greek.)
Comment 30•13 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/4e28b565455d
https://hg.mozilla.org/mozilla-central/rev/c510b7d0069c
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•13 years ago
|
Keywords: dev-doc-needed
Comment 31•13 years ago
|
||
I updated:
https://developer.mozilla.org/en/Firefox_14_for_developers#section_1
https://developer.mozilla.org/en/CSS/text-transform
and
https://developer.mozilla.org/en/CSS/font-variant
Keywords: dev-doc-needed → dev-doc-complete
Comment 32•13 years ago
|
||
I never thought this was going to be fixed. Thank you guys, on behalf of all Turkish web makers!
Is this fix live yet?
http://www.w3.org/International/tests/html-css/generate?test=text-transform-040&format=h5 doesn't seem to be working in the latest Nightly.
Comment 33•13 years ago
|
||
(In reply to Selim Sumlu from comment #32)
> Is this fix live yet?
> http://www.w3.org/International/tests/html-css/generate?test=text-transform-
> 040&format=h5 doesn't seem to be working in the latest Nightly.
See comment 25 above. Perhaps we should file a bug on the tests?
Comment 34•13 years ago
|
||
Sorry, I've missed that.
I've just tested a few more Turkish websites on Nightly and they all work well.
Assignee | ||
Comment 35•13 years ago
|
||
(In reply to Simon Montagu from comment #33)
> (In reply to Selim Sumlu from comment #32)
> > Is this fix live yet?
> > http://www.w3.org/International/tests/html-css/generate?test=text-transform-
> > 040&format=h5 doesn't seem to be working in the latest Nightly.
>
> See comment 25 above. Perhaps we should file a bug on the tests?
I think that would be appropriate. I just checked the current text at http://dev.w3.org/html5/spec/single-page.html#the-lang-and-xml:lang-attributes, and it says (in part):
<quote>
Authors must not use the lang attribute in the XML namespace on HTML elements in HTML documents. To ease migration to and from XHTML, authors may specify an attribute in no namespace with no prefix and with the literal localname "xml:lang" on HTML elements in HTML documents, but such attributes must only be specified if a lang attribute in no namespace is also specified, and both attributes must have the same value when compared in an ASCII case-insensitive manner.
NOTE: The attribute in no namespace with no prefix and with the literal localname "xml:lang" has no effect on language processing.
</quote>
AFAICS, the way xml:lang is used in the testcase mentioned (and other similar ones) violates this, and Firefox is correct to ignore it and just respect the lang="en" setting from the root element.
Assignee | ||
Comment 36•13 years ago
|
||
cc'ing Richard Ishida as author of the testcase concerned.
Comment 37•13 years ago
|
||
Thanks for pointing out that bug, Jonathan. http://www.w3.org/International/tests/html-css/generate?test=text-transform-040&format=h5 and associated tests should be fixed now.
Comment 38•12 years ago
|
||
sadly this isn't totally solved right now - regarding the allowed value "tr-TR" for the lang-attribute, the described problem still occurs.
Assignee | ||
Comment 39•12 years ago
|
||
(In reply to Maximilian Franzke from comment #38)
> sadly this isn't totally solved right now - regarding the allowed value
> "tr-TR" for the lang-attribute, the described problem still occurs.
You mean it works as expected with "tr", but fails with "tr-TR"? If so, please file a new bug to track the remaining issue - thanks.
Updated•9 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•