Closed
Bug 240943
Opened 21 years ago
Closed 20 years ago
Update bidi data files to Unicode 4.0.1
Categories
(Core :: Layout: Text and Fonts, defect)
Core
Layout: Text and Fonts
Tracking
()
RESOLVED
FIXED
People
(Reporter: smontagu, Assigned: smontagu)
References
()
Details
(Keywords: fixed1.7)
Attachments
(3 files, 1 obsolete file)
(deleted),
patch
|
jshin1987
:
review+
blizzard
:
superreview+
mkaply
:
approval1.7+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
smontagu
:
review+
rbs
:
superreview+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
roc
:
review+
roc
:
superreview+
|
Details | Diff | Splinter Review |
Unicode 4.0.1 includes changes to some Bidi properties, and Unicode 4.0 already
added several new right-to-left characters, including some beyond the BMP.
We need at least to rerun layout/tools/genbidicattable.pl and
layout/tools/gensymmtable.pl on the latest version of the Unicode Character
Database, and the scripts themselves may need some modifications.
Comment 1•20 years ago
|
||
Asking for blocking1.7. I think making the next stable release compliant with
Unicode 4.0.1 makes a lot of sense. Moreover, fixing this will also resolve Bug
73251 which currently represents the most frequent and obvious rendering bug
with Hebrew pages.
Prog.
Flags: blocking1.7?
Assignee | ||
Comment 2•20 years ago
|
||
I wouldn't like to take the risk of adding support for Bidi surrogate
characters to 1.7 at this late stage, so here is a patch which only changes the
character types within the BMP, generated from a subset of the UCDB.
Assignee | ||
Comment 3•20 years ago
|
||
Comment on attachment 147221 [details] [diff] [review]
Patch for 1.7
blizzard, can you rs this change to the generated file bidicattable.h? No
actual code changes are involved, only modifications and additions to the bidi
character types.
Attachment #147221 -
Flags: superreview?(blizzard)
Comment 4•20 years ago
|
||
Given that smontagu doesn't want to take take the risk of adding Bidi surrogate
characters, is there good reason to hold the release of 1.7 for this patch? (I'm
not suggesting we wouldn't approve a reviewed patch, just questioning whether or
not we should block the release for this change).
Comment 5•20 years ago
|
||
Asa, see comment #1 for a couple of good reasons to block 1.7 for this bug.
Prog.
Assignee | ||
Comment 6•20 years ago
|
||
(In reply to comment #4)
> Given that smontagu doesn't want to take take the risk of adding Bidi surrogate
> characters, is there good reason to hold the release of 1.7 for this patch?
Let me clarify two points: (1) this patch only includes the parts that I don't
think are risky; (2) as far as I know, no Bidi surrogate characters are yet in
use in the wild, although Unicode 4.0 defined a few.
Assignee | ||
Comment 7•20 years ago
|
||
Comment on attachment 147221 [details] [diff] [review]
Patch for 1.7
I guess I was forgetting correct process and thinking that an "rs" request
didn't need r= first. jshin, can you review?
Attachment #147221 -
Flags: review?(jshin)
Assignee | ||
Comment 8•20 years ago
|
||
As well as the necessary changes for non-BMP characters, this includes some
tightening up of the code in GetBidiCat(), and changes to the license block to
match the manual changes that were made to bidicattable.h.
I'll cut out the #ifdef DEBUG_Simon part before checkin.
Assignee | ||
Comment 9•20 years ago
|
||
Comment on attachment 147705 [details] [diff] [review]
Patch for trunk
Jungshik, can you review this one too?
> #define CHAR_IS_BIDI(c) ( (IS_HINDI_DIGIT(c) ) || (IS_HEBREW_CHAR(c) ) \
>- || (IS_06_CHAR(c) ) || (IS_FE_CHAR(c) ) )
>+ || (IS_06_CHAR(c) ) || (IS_FE_CHAR(c) ) \
>+ || (IS_CYPRIOT_CHAR(c) ) )
I don't like this part very much myself. There are a whole bunch of RTL scripts
roadmapped for addition to Unicode, and I don't like the idea of adding another
condition to this #define for each one.
The way to go is probably to move bidicattable.h back into intl/unicharutils
and call it directly, but let's do that separately.
Attachment #147705 -
Flags: review?(jshin)
Comment 10•20 years ago
|
||
Comment on attachment 147221 [details] [diff] [review]
Patch for 1.7
r=jshin
Attachment #147221 -
Flags: review?(jshin) → review+
Assignee | ||
Comment 11•20 years ago
|
||
Comment on attachment 147221 [details] [diff] [review]
Patch for 1.7
transferring rs request. See comment 3 for justification of rs.
Attachment #147221 -
Flags: superreview?(blizzard) → superreview?(roc)
Updated•20 years ago
|
Attachment #147221 -
Flags: superreview?(roc) → superreview+
Assignee | ||
Comment 12•20 years ago
|
||
Comment on attachment 147221 [details] [diff] [review]
Patch for 1.7
Requesting approval for 1.7.
The behaviour of hyphen-minus between right-to-left characters and numbers has
long been one of the most obvious difference between the Bidi behaviours of
Mozilla and IE, and has led to many bug reports (see bug 73251 and its
duplicates).
In the past Mozilla was conforming to the Unicode standard better than IE, but
meanwhile the standard has changed under us and we have become non-compliant.
risk: minimal. The patch has no semantic code changes, only additions and
modifications to the table of Bidi categories.
Attachment #147221 -
Flags: approval1.7?
Comment 13•20 years ago
|
||
Comment on attachment 147221 [details] [diff] [review]
Patch for 1.7
a=mkaply
Attachment #147221 -
Flags: approval1.7? → approval1.7+
Comment 14•20 years ago
|
||
I'm removing my blocking1.7 request, this is already checked into the Branch. I
hope the Trunk patch will soon follow (reminder: jshin).
Prog.
Flags: blocking1.7?
Comment 15•20 years ago
|
||
Could you land the branch patch on the trunk until the trunk patch is ready so
that the branch patch gets more testing?
Assignee | ||
Comment 16•20 years ago
|
||
(In reply to comment #15)
> Could you land the branch patch on the trunk until the trunk patch is ready so
> that the branch patch gets more testing?
I did so, unfortunately regressing the recent manual change to the license block
of bidicattable.h, but that will be corrected when I check in the final version.
Comment 17•20 years ago
|
||
Comment on attachment 147705 [details] [diff] [review]
Patch for trunk
r=jshin sorry for the delay.
>Index: content/shared/src/nsTextFragment.cpp
>@@ -344,17 +345,24 @@
> while (cp < end) {
>- PRUnichar ch = *cp++;
>- if (CHAR_IS_BIDI(ch) ) {
>+ PRUnichar ch1 = *cp++;
>+ PRUint32 utf32Char = ch1;
>+ if (IS_HIGH_SURROGATE(ch1) && cp < end) {
>+ PRUnichar ch2 = *cp++;
>+ if (IS_LOW_SURROGATE(ch2)) {
>+ utf32Char = SURROGATE_TO_UCS4(ch1, ch2);
>+ }
>+ }
>+ if (CHAR_IS_BIDI(utf32Char) ) {
> mState.mIsBidi = PR_TRUE;
> break;
an edge case: If a high surrogate code point is followed by a 'BIDI character'
(in BMP), it wouldn't be detected. Therefore, it may be necessary to add 'else
if (CHAR_IS_BIDI(ch2))' after 'if (IS_LOW_SURROGATE(ch2))'
Comment 18•20 years ago
|
||
Comment on attachment 147705 [details] [diff] [review]
Patch for trunk
sorry that i forgot to set the r flag.
Attachment #147705 -
Flags: review?(jshin) → review+
Assignee | ||
Comment 19•20 years ago
|
||
Thanks for review.
> an edge case: If a high surrogate code point is followed by a 'BIDI character'
> (in BMP), it wouldn't be detected. Therefore, it may be necessary to add 'else
> if (CHAR_IS_BIDI(ch2))' after 'if (IS_LOW_SURROGATE(ch2))'
Good catch! And if we are talking edge cases, I suppose we also have to consider
the case of an isolated high surrogate followed by a well-formed surrogate pair.
Assignee | ||
Comment 20•20 years ago
|
||
I changed that to:
+ PRUnichar ch1 = *cp++;
+ PRUint32 utf32Char = ch1;
+ if (IS_HIGH_SURROGATE(ch1) &&
+ cp < end &&
+ IS_LOW_SURROGATE(*cp)) {
+ PRUnichar ch2 = *cp++;
+ utf32Char = SURROGATE_TO_UCS4(ch1, ch2);
+ }
and removed the debugging code.
Assignee | ||
Updated•20 years ago
|
Attachment #147705 -
Attachment is obsolete: true
Assignee | ||
Comment 21•20 years ago
|
||
Comment on attachment 148662 [details] [diff] [review]
Patch for trunk
Transferring r and requesting sr
Attachment #148662 -
Flags: superreview?(rbs)
Attachment #148662 -
Flags: review+
Comment 22•20 years ago
|
||
Comment on attachment 148662 [details] [diff] [review]
Patch for trunk
sr=rbs
Attachment #148662 -
Flags: superreview?(rbs) → superreview+
Assignee | ||
Comment 23•20 years ago
|
||
Fix checked in.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Comment 24•20 years ago
|
||
This isn't working the way you want. On Linux gcc throws pairs of warnings like
these:
nsTextBoxFrame.cpp:510:
warning: comparison is always false due to limited range of data type
nsTextBoxFrame.cpp:510:
warning: comparison is always true due to limited range of data type
This happens because CHAR_IS_BIDI now contains IS_CYPRIOT_CHAR and
IS_CYPRIOT_CHAR tests against values > 0xffff. Gcc knows that PRUnichar is only
16 bits so it always evaluates IS_CYPRIOT_CHAR as false.
If you're expecting to handle U+1xxxx characters you'll have to do something else.
I'm reopening this bug because the patch clearly doesn't work as designed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 25•20 years ago
|
||
I saw the warning, but why does it mean that the patch is not working as expected?
I have suggested getting rid of CHAR_IS_BIDI anyway for performance reasons (bug
244036). If we don't go ahead with that we could at least split it into two to
remove the warning, assuming that the caller will always know whether the
character is BMP or above:
#define UCS2_CHAR_IS_BIDI(c) ( (IS_HINDI_DIGIT(c) ) || (IS_HEBREW_CHAR(c) ) \
|| (IS_06_CHAR(c) ) || (IS_FE_CHAR(c) ) )
#define UTF32_CHAR_IS_BIDI(c) ( (IS_CYPRIOT_CHAR(c)) )
Comment 26•20 years ago
|
||
> I saw the warning, but why does it mean that the patch is not working as
> expected?
I was basing my statement on what's in the code. One would normally get the
impression that CHAR_IS_BIDI can detect Cypriot glyphs from reading the code.
One would have to know that this doesn't work for mozilla's PRUnichar. I think
that's a sure way to create subtle bugs as other people work on the code. While
gcc handles this at compile time other compilers might leave the tests in which
wouldn't help performance.
I like the idea of useing UCS2 in its name. That makes it more obvious what's
happening. I wonder about having both UCS2 and UTF32/UCS4 running around in
mozilla. Someone will eventually manage to assign one to the other and back
again which obviously doesn't work. Personally I would have gone with 32-bit
quantities form the beginning but I realize why that wasn't done.
Comment 27•20 years ago
|
||
(In reply to comment #9)
> ... There are a whole bunch of RTL scripts
> roadmapped for addition to Unicode, and I don't like the idea of adding another
> condition to this #define for each one.
Unicode has clearly roadmapped 0590-08FF and 00010800-00010FFF for RTL scripts,
see http://www.unicode.org/roadmaps/ - although these blocks contain some
combining marks which are formally bidi neutral and in principle usable with LTR
scripts. Although this roadmap is not set in stone, it is reasonable at this
stage to add just two conditions to cover these whole blocks. There are some
other RTL characters, apparently FB1D-FDFF and FE70-FEFF. But the safest thing
is to check with the Unicode bidi tables.
Assignee | ||
Comment 28•20 years ago
|
||
I think Peter Kirk's suggestion is excellent. It will make CHAR_IS_BIDI a
coarser-grained and faster test, more appropriate for the places it is used on
every character in a document, e.g. nsTextFragment::SetBidiFlag, leaving the
more precise and inevitably slower test for the detailed processing of Bidi text
in nsBidi::GetDirProps.
Assignee | ||
Comment 29•20 years ago
|
||
Assignee | ||
Updated•20 years ago
|
Attachment #152143 -
Flags: superreview?(roc)
Attachment #152143 -
Flags: review?(roc)
Attachment #152143 -
Flags: superreview?(roc)
Attachment #152143 -
Flags: superreview+
Attachment #152143 -
Flags: review?(roc)
Attachment #152143 -
Flags: review+
Assignee | ||
Comment 30•20 years ago
|
||
Checked in.
Status: REOPENED → RESOLVED
Closed: 20 years ago → 20 years ago
Resolution: --- → FIXED
Comment 31•20 years ago
|
||
*** Bug 291511 has been marked as a duplicate of this bug. ***
Component: Layout: BiDi Hebrew & Arabic → Layout: Text
QA Contact: zach → layout.fonts-and-text
You need to log in
before you can comment on or make changes to this bug.
Description
•