Closed Bug 186463 Opened 22 years ago Closed 21 years ago

Request to provide Tamil character coding (TSCII) support

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: ev122, Assigned: smontagu)

References

()

Details

(Keywords: intl)

Attachments

(6 files, 1 obsolete file)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.2.1) Gecko/20021130 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.2.1) Gecko/20021130 At present, Mozilla doesn't provide support to view Tamil webpages. It would be better if one may be able select Tamil by selecting View > Character Coding > More > SE & SW Asian > Tamil. The necessary fonts (and related info) may be obtained here: http://www.tamil.net/tscii/ Reproducible: Always Steps to Reproduce: 1. Please go to: http://www.tamil.net/projectmadurai//pub/pm0143/kprose1.html Actual Results: See the words below "in Tamil Script, TSCII format)". This shouldn't be a bunch of question marks. Expected Results: A sample of Tamil characters: http://www.tamil.net/projectmadurai/pmdr0.gif Two good Tamil fonts: TSCMaduram (Serif) and TSCArial (Sans-Serif)
see also bug 140013
The Tamil characters are in Unicode (see http://www.unicode.org/charts/ ) so presuming the fonts are correctly encoded (which bug 140013 suggests *may* not be the case), then the characters should be supported when using UTF-8 or UCS2 encoding without any support from Mozilla. Do you know if the encoding is documented somewhere? Is it in use on the web? What other browsers support it? If none, then it's probably better to just encourage sites to use UTF-8 or UCS2. ->i18n
Assignee: font → smontagu
Component: Layout: Fonts and Text → Internationalization
QA Contact: ian → ylong
The page in question has <META HTTP-EQUIV="Content-Type" CONTENT="text/html"; charset="x-user-defined"> I really don't see how Mozilla is supposed to know what the character encoding is. Therefore it's going to assume some default, and since the Tamil font you specified doesn't have glyphs for the characters Mozilla thinks are there, it will (and should) use some other font.
Thanks for the suggestions. Someday, of course, UTF-8 would become the preferred encoding for all Tamil webmasters. However at present, most webmasters continue write Tamil pages using TSCII encoding. (Mostly because UTF-8 is not supported by WinMe, Win98, Win95.) More info: http://www.tamil.net/tscii/tscii.html As to the "x-user-defined", it was simply a temporary measure. As explained below: "Internationalisation part of HTML standards propose usage of "character set" to display non-roman language materials. One of the near-term goals of the Internet Working Group for TSCII is to get Internet Protocols Standardisation Agencies such as IETF to accept the proposed Encoding scheme TSCII as a "char-set" for Tamil. This is along the same lines of specific character sets we have for Russian, Korean, Japanese, Greek etc. Then we can have TSCII as one of the recognized character set to invoke in HTML files. "Till that time, an immediate option is to invoke "x-user-defined" case for the char-set in the META Header of the HTML file and have the end-user choose TSCII-conformant Tamil font as the font to use for the "User-defined" encoding (using Browser Preferences Menu)." [Archived at: http://www.geocities.com/athens/5180/tscguide.html ]
Oh btw, IE 6.0 does have a provision to Select "Tamil". ( Tools > Internet Options > Fonts > Tamil ) Netscape and Mozilla doesn't have any such provision. :(
TSCII is most widely used by tamils as of now. Mandrake Linux supports TSCII. Till unicode becomes popular its necessary for Mozilla to support 8-bit character encoing TSCII also.
Keywords: intl
TSCII is the 8-bit glyph encoding standard.glibc2.3.1 includes tscii support. TSCII is fully explained in this page... http://www.geocities.com/athens/5180/tscii.html TSCII encoding is represented in this gif http://www.tamil.net/tscii/charset17.gif Google search for Tamil is also based on TSCII encoding only. http://dmoz.org/World/Tamil/ Because of no-support in mozilla, it is difficult for a normal 'joe' user to view tamil sites.(Every site is forced to use currently using x-user-defined.). As an alternate most of the sites are using dynamic fonts which is not supported in mozilla. Since this is similair to ISO8859-1, full suport was included for TSCII in Mandrake Linux 9.0. The tamil community is 70 million strong and spread across India, Sri Lanka, Singapore, Malaysia, Canada, USA & other countries.TSCII is most commonly used in Internet by tamils from all countries around the world. The list of sites using TSCII is documented here.. http://groups.yahoo.com/group/e-Uthavi/database?method=reportRows&tbl=2 We're ready to provide TSCII fonts with opentype fonts for inclusion with Mozilla if requried under any license of your Choice..
Severity: enhancement → normal
The l10n is fast underway. And the its Alpha should be available for download soon at http://thamizha.com before the end of Jan'03. However, the lack of TSCII support would deny Mozilla based browsers the much needed 'edge' that might be necessary to 'convert' Tamil surfers around the world.
Hello sir, Please provide the Tamil font coding(TSCII) support for our New Browser. It is a request 5 Million people in the South India Tamil is universally accepted as old language, which has good grammer, poem etc please provide the acceptence Thanks Mathavan, tamilan
Please fix this bug in your next release and help everybody. This will def. increase the number of netscape users.
> "Till that time, an immediate option is to invoke "x-user-defined" case for the > char-set in the META Header of the HTML file and have the end-user choose > TSCII-conformant Tamil font as the font to use for the "User-defined" encoding > (using Browser Preferences Menu)." [Archived at: > http://www.geocities.com/athens/5180/tscguide.html ] You could have used 'x-tscii', but then browsers have to recognize it, which no browser does at the moment. Anyway, to support TSCII, we need the mapping (tabular or algorithmic) between TSCII and Unicode. Can you provide that? With that, it _might_ (or might not) be possible to do something quickly > Since this is similair to ISO8859-1, full suport was included for TSCII in Well, there's a big difference between ISO 8859-x and TSCII in that the former has a well-defined *one-to-one* _character_ mapping to Unicode while the latter appears to be a 'font/glyph encoding' rather than character-encoding. Therefore, mapping to TSCII from Unicode is not so simple as mapping between Unicode and ISO-8859-x. Nonetheless, it's possible if we have the mapping between TSCII and Unicode. TSCII page (http://www.tamil.net/tscii/tscii.html) doesn't seem to have any mapping table other than font-glyph tables. Do you have any TSCII <=> Unicode mapping table available? It cannot be 1 to 1 so that it may come in a kind of pseudo-code. Do you have anything like that? My version of glibc doesn't support TSCII and I wonder how Mandrake 9.0 supports TSCII. (Supporting Tamil with Unicode is different from supporting Tamil with TSCII). I'll look into Yudit source code which supports Tamil in both TSCII and Unicode. > The Tamil characters are in Unicode (see http://www.unicode.org/charts/ ) so > presuming the fonts are correctly encoded (which bug 140013 suggests *may* not > be the case), then the characters should be supported when using UTF-8 or UCS2 > encoding without any support from Mozilla. Not that simple because Tamil script is a complex script as other scripts in South Asia are. It takes some work on Mozilla's side (reordering, ligature for conjunction, etc). Under Windows XP(or even 2k), TextOutW may do the magic by invoking Uniscribe and OTLS on its own if Tamil is supported by Uniscribe. Yeah, it does (see http://www.microsoft.com/typography/otfntdev/tamilot/) Have you tried UTF-8 encoded Tamil web pages with Mozilla running under Windows XP **after** installing a Tamil **opentype* font? Win XP may come with at least one Tamil OT font. If not, you can use CODE2000 font (at http://home.att.net/~jameskass) that has the opentype layout table for Tamil. On other platforms, Mozilla has more work to do unless it becomes capable of taking advantage of 'native' rendering/layout libraries like Pango and AAT(? what's the name of Apple's rough equivalent of Uniscribe/OTLS?). BTW, you may find it interesting to see http://sila.mozdev.org/
I found the mapping table between TSCII and Unicode at http://www.tamil.net/tscii/faq5.html As expected, it's context-dependent and m to n.
This bug can be splitted into three parts: 1. supporting TSCII (x-tscii) as a text/document encoding. 2. rendering Tamil text with TSCII font (custom-encoded font). (a temporary/stop-gap measure until the third item below is implemented and existing Tamil truetype fonts are converted to opentype fonts.) 3. rendering Tamil text with opentype fonts for Tamil. The first one is pretty straightforward with the second one not so difficult with the latest patch for bug 176290 checked in. The third one is platform-dependent. It seems like under Windows XP (or it might be true of 9X as well with the latest version of Uniscribe installed) that Mozilla has not so much to do(if I'm not mistaken). For gtk, implementing nsFontMetricsXftPango(? derived from nsFontMetricsXft) that delegates most of layout/rendering to Pango could well do 'the trick' (as in SILA. actually a lot more extensive delegation can/should be done to Pango than to Graphite, I think). At the moment, built-in PangoLite is used for Thai and Devanagari in gtk-x11 (but not in gtk-xft) when CTL is turned on.
> At present, Mozilla doesn't provide support to view Tamil webpages As you and numerous other people in South Asia know, you can view non-standard-compliant web pages in Indic scripts treating them as if they're ISO-8859-1 or Windows-1252 and setting fonts to one of 'custom-encoded' fonts. It's an old trick that has been used for Indic scripts perhaps since Netscape 3.x, isn't it? What's missing is rendering support of *properly* encoded Tamil web pages (in UTF-8 or other Unicode transformation formats) with opentype Tamil fonts and as a temporray measure with TSCII-encoded font. (the third and the second items in my list). I realized that that's the realm of bug 140013. > Thanks for the suggestions. Someday, of course, UTF-8 would become the preferred > encoding for all Tamil webmasters. > More info: http://www.tamil.net/tscii/tscii.html > As to the "x-user-defined", it was simply a temporary measure. Considering that there are quite a lot of web pages in TSCII, Mozilla may support TSCII as a document encoding, but I think it's not a top priority because it's a dead end to support TSCII as a document encoding. (the first item in my list) [1] TSCII is a glyph encoding (as opposed to character encoding). As such, it appears to me that it cannot represent Tamil text as faithfully as Unicode. I guess those at Tamil.net and elsewhere trying to support TSCII should consider this point and decide which way is better in the long run, to keep promoting a limited measure like TSCII or to help and encourage people to switch over to Unicode (inclduing the conversion of TISCII encoded truetype fonts to opentype fonts) . Everybody is moving to Unicode and other Indic script users have been making transition. > However at present, most webmasters continue > write Tamil pages using TSCII encoding. (Mostly because UTF-8 is not supported > by WinMe, Win98, Win95.) OS doesn't support Unicode as well as Win2k/XP, but there are a couple of freely available U nicode editors that run under Win9x/ME. For instance, try Yudit at http://www.yudit.org. Windows version is somewhere hidden :-) It supports Tamil well and can import TSCII-encoded text and export to UTF-8. You can also try 'SC Unipad' (try to google it). If you dont' feel at home in Yudit, you can just edit your html files with your favorite editor, save it in TSCII and convert it to UTF-8 with 'uniconv -decode tscii -encode utf-8 input.tscii output.utf8'
My opinion is, as mentioned in the point No: 1. we should still go for a new document encoding [x-tscii ] to support tscii encoding. This also will ensure that we are able to move smoothly from tscii --> Unicode . This is of importance considering the vast amount of data in TSCII format available. And still TSCII documents are being made on daily basis. It might take quiet long time for the tamil users to change completely to Unicode. So its very important for Mozilla to support a new encoding type 'x-tscii' as a interim solution for the tamil user. This will also will ensure that the tamil language usage dont suffer till complete Unicode adoption.
Attached image a screenshot of Mozilla rendering Tamil poetry (obsolete) (deleted) —
How did I get the screenshot? It's easy with my patch for bug 176290 applied. (so please vote for bug 176290 if you want Tamil pages encoded in TSCII and tagged as x-user-defined to be rendered in Mozilla-Xft 1.4. Even better is to speak up that you want that feature for 1.4 :-)) Nothing I had to do other than just adding an entry to fontEncoding.properties file (in res/fonts directory) like this: (I suspected this would work, but my first experiment failed because I typed 'i' in place of 'l' in 'tsc_avarangal'. # Tamil fonts encoding.tsc_avarangal.ttf = x-user-defined You can add as many entries like the above as you want. Then, set View|Character Coding to User-defined. You also have to set fonts to use for user-defined in Edit| Preference|Appearance|Fonts to Tamil fonts. You can avoid this step if you web pages specify fonts to use with either old font-face or new CSS font-family. BTW, because Mozilla Window already has custom-font encoding support so that you can enjoy the feature right now by following the procedure above. I think with this there's little need to support TSCII as a document encoding (item 1 in my list). Needless to say, we need to work on item 2 and item 3, but that's for bug 1400??.
Sorry for attachment 121512 [details]. Even though I don't know Tamil, I thought it looked strange, but couldn't pinpoint what's wrong. Now I know.... Please, take a look at this new screenshot and let me know this time it's rendered correctly. While trying to figure out why Yudit's mapping of TSCII <-> Unicode is different from the mapping at tscii.net, I found out that TSCII truetype fonts have a _bogus_ Mac Roman cmap (PID=1, EID=1).(Additionally, some of them have PID=0, EID=0 Cmap - Unicode default - Cmaps ans MS Symbol cmap while others have PID=3,EID=1 - MS Unicode - Cmaps) This is the same problem as Mathematica fonts have. This problem was solved by specifying the cmap to use for Freetype library. (see bug 176290 comment #75 and other comments references therein) Therefore, for each TSCII font, two lines are necessary in fontEncoding.properties file as shown below. # Tamil fonts (TSCII encoding : see http://www.tscii.net) encoding.tsc_avarangal.ttf = x-user-defined encoding.tsc_aparanarpdf.ttf = x-user-defined encoding.tsc_avarangal.ttf = x-user-defined encoding.tsc_aandaal.ttf = x-user-defined encoding.tsc_avarangalfxd.ttf = x-user-defined encoding.tsc_paranbold.ttf = x-user-defined encoding.tsc_paranarpdf.ttf = x-user-defined encoding.tsc_paranarho.ttf = x-user-defined encoding.tsc_kannadaasan.ttf = x-user-defined # TSCII fonts have psuedo Apple Roman Cmap as mathematica fonts do. encoding.tsc_avarangal.ftcmap=apple-roman encoding.tsc_aparanarpdf.ftcmap=apple-roman encoding.tsc_avarangal.ftcmap=apple-roman encoding.tsc_aandaal.ftcmap=apple-roman encoding.tsc_avarangalfxd.ftcmap=apple-roman encoding.tsc_paranbold.ftcmap=apple-roman encoding.tsc_paranarpdf.ftcmap=apple-roman encoding.tsc_paranarho.ftcmap=apple-roman encoding.tsc_kannadaasan.ftcmap=apple-roman
Attachment #121512 - Attachment is obsolete: true
Comment on attachment 121630 [details] another incorrect rendering of www.tamil.net/poetry this is still incorrect. x-user-defined mapping is straight pass-thru converter so that it should work.... something is wrong with the font I used for testing... I'll keep trying..
Attachment #121630 - Attachment description: the correct rendering of www.tamil.net/poetry → another incorrect rendering of www.tamil.net/poetry
This screenshot was obtained with Moz-Win 1.3 with fonts for user-defined set to TSC_Kannadassan. (http://www.tamil.net/projectmadurai/pub/pm0098/kanavin.html) With the same setting, MS IE 6 rendered the page exactly identically. So, I think we can take this shot as 'the' reference. User-defined converter is just a straight-through single byte converter (with 0x80-0xFF stored in U+F780 - U+F7FF when in UTF-16) so that this should just work fine under Linux as well. But it does not...
Attached image TSCII table with a TSC font (deleted) —
This is Mozilla-Xft's rendering of the TSCII table (in x-user-defined) at http://jshin.net/i18n/tscii2.html. Compare it with http://www.tamil.net/tscii/charset17.gif and you'll see that they're in exact match. With this particular font (TSCAparanar.ttf), what I described in comment #18 works (along with the patch for bug 176290). Why not with other fonts? See below. > a straight-through single byte converter (with 0x80-0xFF stored in U+F780 - > U+F7FF when in UTF-16) so that this should > just work fine under Linux as well. But it does not... This turned out to be because 'TSCII-compliant' truetype fonts come with various Truetype Cmaps with the internal inconsistency. Some fonts come with PID=0, EID=0 (Unicode Cmap) and others with PID=3, EID=1(MS Unicode Cmap). Still others come with PID=3, EID=0(MS Symbol). _All_ of them come with MacRoman Cmap(PID=1, EID=0). So I assumed that using MacRoman Cmap would give Mozilla the same glyph arrangement as given in <http://www.tamil.net/tscii/charset17.gif> It turned out that MacRoman Cmaps are not consistent with PID=3, EID=1 /PID=3, EID=0 Cmaps. Among 7 or so TSCII fonts I downloaded, only TSCAparanar.ttf has MacRoman cmap that matches TSCII 1.7 glyph arrangement. I don't know how they work under Windows (they do according to my test). Anyway, to render Tamil text with these fonts and Mozilla-Xft, we need to write a new converter(TSCII codepoints [0x80 - 0x9F] mapped as if they're in Windows-1252) assuming that Unicode Cmap has pseudo-Unicode glyph indices for TSCII glyphs.
Supporting TSCII is not such a good idea. It's better to write utf-8 encoded tamil web pages and ask mozilla to write converters to make the existing tscii encoded fonts show utf-8 encoded tamil pages properly. I understand infrastructure for doing this is already available and has been implemented for other Indic scripts. That way webmasters can create utf-8 encoded pages and viewers don't new fonts to view thm either. FTR http://dmoz.org/World/Tamil , on which Google's Tamil search is based, got converted to UTF-8, in fact the whole of http://dmoz.org is about to convert to UTF-8. The sooner we convert Tamil pages to utf-8 the better. It's a big task which will become bigger as each day passes by so we might as well start it soon. Enabling TSCII would only delay the transition to UTF-8.
Google in Tamil http://google.com -> Preferences, choose Tamil is also moving from Latin script to Tamil script, UTF-8 encoded.
> It's better to write utf-8 encoded > tamil web pages and ask mozilla to write converters to make the existing tscii > encoded fonts show utf-8 encoded tamil pages properly. Sure. See bug 204039. Writing TSCII->Unicode converter is certainly doable, but is a low-priority item per your comment and because user-defined should work with some fonts.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → WONTFIX
jshin: Do you agree with the WONTFIX (26 votes so far) ?
(In reply to comment #26) > jshin: > Do you agree with the WONTFIX (26 votes so far) ? -------------- True, but I don't think anyone is attempting fix this bug. Moreover, as time goes on, it becomes less and less useful, with UTF-8 sites becoming the preferred site among Tamil webmasters... I would rather see effort put in fixing UTF-8 related bugs -- especially the 'justify' problem!
*** Bug 261239 has been marked as a duplicate of this bug. ***
*** Bug 266617 has been marked as a duplicate of this bug. ***
*** Bug 292946 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: