Closed Bug 168944 Opened 22 years ago Closed 17 years ago

hebrew numbers 5 and 5000 written the same

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: ian, Assigned: mkaply)

References

()

Details

(Keywords: testcase)

Attachments

(1 file)

I don't know if this is a bug. Our list-style-type:hebrew numbering system gives the same output for the numbers five (5) and five thousand (5000), as shown in: http://www.hixie.ch/tests/adhoc/css/box/list/list-style-type/001.xml Is this the right behaviour? From a western point of view it seems wrong, but I can't find anyone which says that the numbers 5 and 5000 are different in Hebrew. How do you write "look at page 5000, then look at page 5, and..."?
Keywords: testcase
The only reference I could find on Google is http://www.qsm.co.il/Hebrew/Gimatria.htm but it doesn't have much of an explanation.
That's the document ftang used to implement the algorithm.
The well-structured and well-commented code to generate the list text is here, I believe: http://lxr.mozilla.org/seamonkey/source/layout/html/base/src/nsBulletFrame.cpp#7 88
This weakness is inherent in the Hebrew numbering system. I was aware of it when working on the IBMBIDI parts of the Hebrew numbering code, but thought that we could live with it because in an actual numbered list, the distinction will normally be clear from the context of the previous and or subsequent numbers. Alternatively, maybe we should bail out to decimal at some upper limit such as 999,999 or even 999.
Based on your comments on IRC, we could go up to at least 999,999 (using the special "thousands" word). Can you chain thousands to go up to a million? (As in, one thousand thousand?) We should definitely fallback on to decimal for numbers outside the range of the numbering system.
Status: UNCONFIRMED → NEW
Ever confirmed: true
I think we may have been taking each other out of context on IRC. Using the "thousands" word works for an isolated number, but I'm not sure about it as part of a numbered list. Firstly, the transition is unnatural, as you can see from the English equivalent: 4,998 4,999 5 thousand 5,001 5,002 Secondly, there are special cases that need to be handled. "Beit Alef-Lamed-Pe-Yud-Mem Sofit" is a solecism for 2,000 (it should be "Alef-Lamed-Pe-Yud-Yud-Mem Sofit") and "Alef Alef-Lamed-Pe-Yud-Mem Sofit" for 1,000 is impossible (it should be "Alef-Lamed-Pe Sofit"). With those caveats, chaining is certainly an option. n * 10^6 could be expressed as "[0x5cf+n] Alef-Lamed-Pe-Yud Alef-Lamed-Pe-Yud-Mem Sofit"; n * 10 ^ 9 as "[0x5cf+n] Alef-Lamed-Pe-Yud Alef-Lamed-Pe-Yud Alef-Lamed-Pe-Yud-Mem Sofit" and so on.
Sorry, I didn't use the correct Unicode terms. For Mem Sofit and Pe Sofit, read Final Mem and Final Pe.
Sigh. The hebrew numbering system is so ridiculously complicated. Anyway. Because the list style numbering systems are going to be used in other contexts too, in CSS3, we need to find an algorithm that does the Right Thing whether in a list or whether standing on its own. Is that possible? Given that lists are not likely to reach 4999, 5000, 5001, I think it is safe to ignore the problem with 5000 looking out of place there. If the number 5000 is involved, it is more likely to be reached in large steps (4000, 5000, 6000) or stand on its own (through the conversion of an attribute value to a number, for instance, if certain CSS3 proposals happen).
Could the "other contexts" you mention include numbers embedded in text? The Right Thing in that case is different, I'm afraid. I'll try to work up a mini white paper which lays out the issues and possible solutions better than I can do in bugzilla comments.
That would be excellent, thanks.
> Our list-style-type:hebrew numbering system gives the same output for the > numbers five (5) and five thousand (5000), as shown in: > > http://www.hixie.ch/tests/adhoc/css/box/list/list-style-type/001.xml At any case, the heh should be separated with a geresh from the rest of the number, isn't that currect Simon? >Sigh. The hebrew numbering system is so ridiculously complicated. That was always my opinion wrt the roman numbering system (I II III IV V etc.)
(In reply to comment #12) > At any case, the heh should be separated with a geresh from the rest of the > number, isn't that currect Simon? Not in my opinion, see the document linked in comment 11: " When numbers appear in isolation, e.g. as page numbers or as list indexes, they should be written with the letters alone. If they appear embedded in other text, punctuation marks are added to clarify that they are numbers and not words."
Attached file General Hebrew List Testcase (deleted) —
Testcase from http://www.w3.org/TR/css3-lists/#hebrew0. This testcase shows that currently, Hebrew numbering over 1000 is massively flawed: not only are 2 and 2000 indistinguishable, numbers like 2001 are backwards. Patch coming up.
If you're able to make a patch, does that mean you understand what the algorithm should be? Because the algorithm in that spec is known to be wrong, although nobody can tell me exactly how to fix it.
Never mind about posting a patch; this needs more discussion. What exactly was the complaint with the CSS3 draft's version? I might be able to figure out something that takes that into account. Excepting punctuation and the fact that numbering over 1000 isn't well-defined, I'm pretty sure there isn't any better way than what CSS3 Lists says. Another option is to cap Hebrew numbers at 999. I think this is viable, since numbers over 999 are rarely used. Another option is to use repeating "tav"'s, which would be unwieldy but correct. By the way, this bug shouldn't be marked Windows 2000.
We don't want to cap at 999, counters can start at any arbitrary number and hebrew numbering is possible above 999. I don't know what the error in the algorithm is. I was just told that the current text was not correct in all cases.
=> All/All and => Internationalization, since this isn't really a layout issue (and certainly not a bidi issue).
Component: Layout: BiDi Hebrew & Arabic → Internationalization
OS: Windows 2000 → All
Hardware: PC → All
Summary: hebrew 5 and 5000 written the same → hebrew numbers 5 and 5000 written the same
Depends on: 413928
Fixed by bug 413928. We now display 5 as ה, ‎5000 as ה׳, and 5000000 as 5000000, all of which is the same as Safari.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: