Closed Bug 204272 Opened 22 years ago Closed 19 years ago

Why do we classify shaped Arabic characters as left-to-right?

Categories

(Core :: Layout: Text and Fonts, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: smontagu, Assigned: smontagu)

References

(Blocks 1 open bug, )

Details

Attachments

(5 files, 5 obsolete files)

I just noticed a problem in ordering in the testcase for bug 192008 (attachment 113746 [details]), which I think is caused by the following code in nsBidiPresUtils.cpp which sets |charType = eCharType_LeftToRight| for shaped Arabic characters in the U+FExx range. As far as I remember we originally did that for compatibility with IE on Windows, but I don't see the same problem in IE6 on Windows 2000. Is it an outdated hack which we can now remove?
The recent testing of GetStringTypeW(CT_CTYPE2, ...) and GetCharacterPlacementW (..., GCP_REORDER) on W2K and WinNT shows that shaped Arabic characters are properly classifies as RTL and get reordered by the mentioned platforms. However, IIRC we assigned |eCharType_LeftToRight| to shaped characters for compatibility with the bidi engine of [some version of] Windows, which treated those as LTR. So is it possible to ask the platform itself about character type? ***** BTW, I don't see a problem in attachment 113746 [details]. What platform is experiencing it?
Does the platform (in which the problem is encountered) have native bidi support?
Attached file Testcase (obsolete) (deleted) —
I see the bug on Windows 2000 with support installed for Arabic, Hebrew, Indic, and Japanese (default language is Hebrew and user locale is English United States). I don't see it on current trunk, but I do see it on 1.3, and also with attachment 122359 [details] [diff] [review] from bug 192088 applied.
Attached image Screenshot (deleted) —
Display of the testcase on the system described above. Debugging, I see that both lines are passed to ExtTextOutW in the same (visual) ordering, but as the screenshot shows, the first line apparently gets reordered again by Windows (which is theoretically correct) while the second is displayed as-is (which is the expected bug which we try to work around, as Lina has said).
Attached file Extended testcase. (deleted) —
Curiouser and curiouser: in this version of the testcase, cases 2-4 are displayed correctly in Mozilla 1.3, and cases 1-3 are displayed correctly in IE 6. My head is aching.
Attachment #122497 - Attachment is obsolete: true
Attached patch Patch (obsolete) (deleted) — Splinter Review
I think this is the correct fix, subject to testing on various flavours of Windows. (ResolveBackwards is only ever called when mRightToLeftText is TRUE, i.e. only for right to left text and only for Windows with right-to-left support, so this won't cause any slowdown for non-Bidi pages) A big advantage of the patch is that it moves the workaround for a bug in Windows to a Windows-only code path. The current code will cause problems on non-Windows platforms with their own Bidi support.
Comment on attachment 122521 [details] [diff] [review] Patch Simon: 1. Why don't we provide a similar fix in nsFontMetricsWin.cpp when mRightToLeftText is FALSE? 2. Can it be acceptable to normalize (convert to 06xx range) Arabic text before rendering? (Though I think that once we rejected it, probably because of possible loss of composed characters.) 3. The removed code in CalculateCharType in nsBidiPresUtils.cpp seems to be affecting only visual text. It shouldn't impact the attached testcases. I'm saying that because it may help us in understanding the bizarre behaviour on windows (and it must be removed if buggy, of course :).)
What I said in comment 7.3, applies to bidi platform. I doesn't seem to affect text of any type on non-bidi one.
Oops, I'm sorry. Mozilla performs its own ordering when character type is inconsistent with the embedding level. So the code in nsBidiPresUtils involves both types of text.
>Why don't we provide a similar fix in nsFontMetricsWin.cpp when >mRightToLeftText is FALSE? mRightToLeftText is set according to the value of isRightToLeftOnBidiPlatform in nsTextFrame::PaintUnicodeText(), where isRightToLeftOnBidiPlatform = (isBidiSystem && (eCharType_RightToLeft == charType || eCharType_RightToLeftArabic == charType)); After the change to nsBidiPresUtils.cpp, charType will be eCharType_RightToLeftArabic, and isBidiSystem will be TRUE on Bidi versions of Windows (which is where the bug exists), so mRightToLeftText will be TRUE in all cases where we need the workaround, no?
>Can it be acceptable to normalize (convert to 06xx range) Arabic text before >rendering? (Though I think that once we rejected it, probably because of >possible loss of composed characters.) Not only the possible loss of composed characters. Windows has at least one bug with rendering 06xx range characters (Farsi yeh), and if the page author is deliberately using preshaped characters to avoid that bug, they will get a nasty shock if we normalize :-)
Comment on attachment 122521 [details] [diff] [review] Patch Being agree with Simon's comment #10 and comment #11, I think that the suggested patch is the best workaround for now. However, though rendering character by character will work 100%, ideally, in my opinion, we should try to investigate the problem more thoroughly and seek fine tuned solution. Parentheses seem to be missing in + IS_FE_CHAR(*currChar+1)) { It probably should be + IS_FE_CHAR(*(currChar+1))) {
Attached image Screenshot (obsolete) (deleted) —
Illustrates how ExtTextOutW handles shaped characters on Windows 2000: 1. Homogeneous string of shaped characters, without output flags, is not ordered. 2. Same string, with ETO_IGNORELANGUAGE flag (disabling international support), looks identically. 3. Same string with attached 06xx range character, without output flags, is reordered. 4. String as above, but with ETO_IGNORELANGUAGE flag, is not ordered.
Attached image Screenshot (deleted) —
Correct image (with *Arabic* Alef)
Attachment #123030 - Attachment is obsolete: true
Summarizing behavior of ExtTextOutW (and maybe other Windows GDI functions), it looks like ordering of shaped Arabic characters doesn't take place unless there is a strong RTL character or BIDI control (which supposes implicit or explicit text ordering). This explains described in comment 3, comment 4, and comment 5 (in Mozilla, but maybe also in IE). On one hand, such Windows behavior apparently contradicts the Unicode Bidi Algorithm, on the other hand, it's pretty reasonable that pre-shaped data would also be pre-ordered (visual); AFAIK shaped characters are commonly used to represent visually formatted data. I wonder how other bidi platforms handle shaped Arabic characters?
As far as I can tell from looking at the testcases from here and bug 154399 in Konqueror, qt treats shaped Arabic characters as right-to-left. It would be interesting to know what OSX does.
At least in some circumstances this patch also prevents the regression of bug 192088 even with the original fix backed out. Time to head for reviews.
Windows seems to have a similar bug with Hebrew presentation forms as well judging by this testcase, which is reduced from the "Exemplar characters" in http://oss.software.ibm.com/cgi-bin/icu/lx/en/?_=he IE exhibits the bug in the testcase, but not in the original page, which is consistent with what Lina said in comment 15. Whether Mozilla exhibits it depends on font prefs. If the presentation characters are rendered in a different font, and so by a separate call to ExtTextOut, they are not ordered correctly.
Attached patch Patch to fix the Hebrew case as well (obsolete) (deleted) — Splinter Review
Attachment #122521 - Attachment is obsolete: true
Pango needs this patch too
Behdad, would you take a look at this please?
Blocks: Persian
Attached patch Patch synched to trunk (obsolete) (deleted) — Splinter Review
Attachment #132169 - Attachment is obsolete: true
Attachment #178243 - Attachment is obsolete: true
Attachment #178363 - Flags: superreview?(rbs)
Attachment #178363 - Flags: review?(rbs)
Comment on attachment 178363 [details] [diff] [review] Fixed a bug in the previous patch r+sr=rbs
Attachment #178363 - Flags: superreview?(rbs)
Attachment #178363 - Flags: superreview+
Attachment #178363 - Flags: review?(rbs)
Attachment #178363 - Flags: review+
Any reason this was never checked in?
This is low risk, but I didn't think it was major enough to be worth asking for drivers' approval during beta freeze. If you can find real-world sites that are broken without the patch, feel free to request a1.8b3.
Oh, sorry. I didn't realize that the trunk was already frozen in April.
Attachment #178363 - Flags: approval1.8b3?
Comment on attachment 178363 [details] [diff] [review] Fixed a bug in the previous patch a=chofmann
Attachment #178363 - Flags: approval1.8b3? → approval1.8b3+
As you wish... Checked in.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Component: Layout: BiDi Hebrew & Arabic → Layout: Text
QA Contact: zach → layout.fonts-and-text
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: