Closed Bug 454 Opened 27 years ago Closed 24 years ago

Unix: 0x80-0x9F in cp1252 do not display correctly

Categories

(Core :: Internationalization, defect, P2)

x86
Linux
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: tim, Assigned: erik)

References

Details

(Keywords: platform-parity)

Attachments

(1 file)

Created by Tim Eliseo (tim@quiknet.com) on Friday, June 19, 1998 8:48:18 PM PDT Additional Details : Many Web pages use quote characters in the range 0x91-0x94 which are Microsoft codepage 1252 extensions. For X these are currently mapped to the ? character rather than normal quote characters. A patch follows to fix this. Note that sequences such as &#145; are currently mapped properly; this problem only shows up when the actual characters are in the file. --- mozilla/lib/libi18n/sbconvtb.c Sat May 9 03:57:48 1998 +++ mozilla/lib/libi18n/sbconvtb.c.new Fri Jun 19 20:28:19 1998 @@ -71,7 +71,7 @@ /* Tables for Win CP1252 -> ISO 8859-1 */ PRIVATE unsigned char cp1252_to_iso8859_1[] = { /*8x*/ '?', '?', ',', 'f', '?', '?', '?', '?', '^', '?', 'S', '<', '?', '?', '?', '?', -/*9x*/ '?', '?', '?', '?', '?', '*', '-', '-', '~', '?', 's', '>', '?', '?', '?', 'Y', +/*9x*/ '?', '`', '\'', '"', '"', '*', '-', '-', '~', '?', 's', '>', '?', '?', '?', 'Y', /*Ax*/ 0xA0,0xA1,0xA2,0xA3,0xA4,0xA5,0xA6,0xA7,0xA8,0xA9,0xAA,0xAB,0xAC,0xAD,0xAE,0xAF, /*Bx*/ 0xB0,0xB1,0xB2,0xB3,0xB4,0xB5,0xB6,0xB7,0xB8,0xB9,0xBA,0xBB,0xBC,0xBD,0xBE,0xBF, /*Cx*/ 0xC0,0xC1,0xC2,0xC3,0xC4,0xC5,0xC6,0xC7,0xC8,0xC9,0xCA,0xCB,0xCC,0xCD,0xCE,0xCF, For those of you like myself annoyed by this bug in the commercial Netscape version, here's a quick fix: adb -w netscape cp1252_to_iso8859_1+0x11?W 0x22222760 ^d This is correct for little-endian architectures.
Assignee: bobj → ftang
Status: NEW → ASSIGNED
reassigned this to erik. The patch does not work since it will break JavaScript string litera which force to terminate eariler than it should. We have to move the fallback to the XFE. But we need to keep those character value.....
Summary: codepage 1252 quote characters not mapped properly → 0x80-0x9F in cp1252 does not display correctly on Mac and UNIX
We won't take the same approach but we need to put code in the X rendering engine to rneder those unicode code point which correspoding in 0x80-0x9F of cp1252. Change the Summary to - 0x80-0x9F in cp1252 does not display correctly on Mac and UNIX
QA Contact: 3851
Mac and Window is now working on apprunner and viewer. I don't think UNIX is working. IQA, could you verify. We need to fix GTK GFX ...
I18n component in Bugzilla being retired. Moving these bugs to Internationalization component.
OS: other → Linux
Summary: 0x80-0x9F in cp1252 does not display correctly on Mac and UNIX → 0x80-0x9F in cp1252 does not display correctly on UNIX
Whiteboard: Mac is fixed. Unix is not.
Change summary from "0x80-0x9F in cp1252 does not display correctly on Mac and UNIX" to 0x80-0x9F in cp1252 does not display correctly on UNIX". I believe Mac is now working. IQA, please verify Mac. If Mac is not working, please open a seperate bug. One bug for two platform is difficult to track. Thanks.
Assignee: ftang → erik
Status: ASSIGNED → NEW
Target Milestone: M5
reassign the UNIX rendering bug to erik and mark the target fix as M5.
Status: NEW → ASSIGNED
Summary: 0x80-0x9F in cp1252 does not display correctly on UNIX → UNIX GFX Unicode Text Drawing- 0x80-0x9F in cp1252 does not display correctly on UNIX
Summary: UNIX GFX Unicode Text Drawing- 0x80-0x9F in cp1252 does not display correctly on UNIX → [PP]UNIX GFX Unicode Text Drawing- 0x80-0x9F in cp1252 does not display correctly on UNIX
Summary: [PP]UNIX GFX Unicode Text Drawing- 0x80-0x9F in cp1252 does not display correctly on UNIX → [PP] Unix: 0x80-0x9F in cp1252 do not display correctly
Target Milestone: M5 → M6
Target Milestone: M6 → M7
Target Milestone: M7 → M10
*** Bug 7880 has been marked as a duplicate of this bug. ***
Target Milestone: M10 → M12
Target Milestone: M12 → M15
*** Bug 5383 has been marked as a duplicate of this bug. ***
Blocks: 16507
Added bug 16507 as dependant on this.
Has this been tested on a font server configured to serve fonts as windows-1252 or in utf-16 or something where they are accessible through the proper unicode codepoints? Probably as many of these characters as possible should be displayed using things like ' and " and -- if the correct glyphs aren't available rather than displaying a character-not-displayed character.
Agreed -- displaying nothing at all (the current behavior) is even worse than the 4.x behavior of showing the entity, since it means you don't see that you're missing characters. Showing something vaguely close to the right character would be a lot better than showing nothing.
Keywords: pp
Moving all of my M15s to M16. Please add comments if you disagree.
Target Milestone: M15 → M16
Summary: [PP] Unix: 0x80-0x9F in cp1252 do not display correctly → Unix: 0x80-0x9F in cp1252 do not display correctly
Subject: Quotes problem Date: Fri, 11 Feb 2000 12:52:38 +0000 From: Markus Kuhn <Markus.Kuhn@cl.cam.ac.uk> Erik, Could you check out, whether the following small character conversion table fix could be made on the Netscape Web browser: As you can see on the test page http://www.cl.cam.ac.uk/~mgk25/ucs/CP1252.html My Netscape Navigator 4.6 for Linux maps &#8216; (LEFT SINGLE QUOTATION MARK &#x2018;) to 0x60 (GRAVE ACCENT). While this does look good in the current X11 Adobe fonts which follow the old Adobe standard encoding for ASCII and have on 0x27 "quoteright" and on 0x60 "quoteleft", the new X11 fonts will follow the modern Adobe Unicode mapping <http://partners.adobe.com/asn/developer/typeforum/unicodegn.html> and have accordingly instead on 0x27 "quotesingle" and on 0x60 "grave" (because Unicode fonts have on U+2018 "quoteleft" and on U+2019 "quoteright".) In other words: The ASCII text 'quote' will look acceptable with both old and new fonts but `quote' will look slightly ugly with the new fonts (this has been the case for a long time on MS-Windows already). The advantage of the new fonts is that you will now find the proper directional quotation marks on 0x2018 and 0x2019 such that you can show all forms of the quotation marks accurately. Therefore my urgent suggestion: Whenever you do a Unicode -> Latin-1 mapping, then please map both U+2018 and U+2019 to 0x27 and do NOT map U+2018 to 0x60. For details and background information on this issue, please read http://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html Sorry if you have fixed all this already long ago in Mozilla. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
&#8220; and &#8221; are lost completely. They are generated by SGML-tools for the DocBook tag "quote".
We should address bug 31252 first, to get some basic fallback in place, and then address this bug, so I'm targetting M17.
Target Milestone: M16 → M17
*** Bug 16872 has been marked as a duplicate of this bug. ***
*** Bug 24924 has been marked as a duplicate of this bug. ***
Blocks: 17962
I am working on this right now.
Severity: trivial → normal
Priority: P3 → P2
Target Milestone: M17 → M15
It's done. I would like a code review. Anybody?
erik, if you attach the patch, I *may* take a look at it (no promises at all).
Hi Pav, I'm about to attach the diffs to get "smart quotes", trademark, ellipsis, and all those other windows-1252 characters to display on ordinary Unix systems via fallbacks. If you're OK with these, I'd like to check in.
Roger, I have written the code to do fallbacks for windows-1252 characters (e.g. ellipsis -> ...) and '?' for others on Unix. The fix is attached to this bug. It is quite similar to the code you wrote recently for Windows (thanks). Would you be willing to review it for me so that I can check in?
OK, I have read the diff. The following + nsFontGTK* font = FindFont('a'); should be based on the actual REPLACEMENT_CHAR, i.e., FindFont('?'). This way, if someone has, e.g., font-family: Symbol, the search will still return straight away because Symbol has '?'. Other than that, the patch looks fine.
I decided to use 'a' instead of '?' as the argument to FindFont because the replacements are strings such as "EUR" (for euro), "OE" (for OE ligature), "..." (for ellipsis), and so on. So we need to pass something that is likely to return a font that has all of those characters. On Unix, there are several fonts that do not even contain 'a'. For example, all the East Asian fonts (Japanese, Chinese, Korean). Also, Symbol does not contain all of the upper-case and lower-case letters A-Z and a-z. Ideally, nsFontGTKSubstitute would actually do some font switching of its own in GetWidth and DrawString, but since all of the current replacement chars (e.g. "EUR") are from ASCII, and since all fonts that contain 'a' also contain the rest of the ASCII characters, I think FindFont('a') is the best first step we can take in this development. Maybe we'll do actual font switching later. Thanks for the review, Roger! Checked in; marking FIXED.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
I verified this in 2000041307 Mac and 2000041310 Linux build.
Status: RESOLVED → VERIFIED
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Setting platform to OS/2 and clearing status whiteboard.
OS: Linux → OS/2
Hardware: Other → PC
Whiteboard: Mac is fixed. Unix is not.
Target Milestone: M15 → ---
Daniel, please create a separate bug for OS/2. This bug is specifically for Unix. Marking FIXED again.
Status: REOPENED → RESOLVED
Closed: 25 years ago24 years ago
OS: OS/2 → Linux
Resolution: --- → FIXED
Verifying, based on Teruko's comment.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: