Closed
Bug 7969
Opened 25 years ago
Closed 17 years ago
need dictionary based Thai line breaker and intergrate into line/word breaker
Categories
(Core :: Internationalization, defect, P3)
Core
Internationalization
Tracking
()
RESOLVED
WONTFIX
Future
People
(Reporter: ftang, Assigned: smontagu)
References
(Blocks 2 open bugs)
Details
(Keywords: helpwanted, intl)
Attachments
(8 files, 17 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
application/octet-stream
|
Details | |
(deleted),
text/html
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
jshin1987
:
review-
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review |
Reporter | ||
Updated•25 years ago
|
Status: NEW → ASSIGNED
Target Milestone: M10
Reporter | ||
Updated•25 years ago
|
Target Milestone: M10 → M15
change to M15 since this is post beta
Reporter | ||
Updated•25 years ago
|
Whiteboard: Help Wanted
Updated•25 years ago
|
Keywords: helpwanted
Whiteboard: Help Wanted
Reporter | ||
Comment 2•24 years ago
|
||
line / work breaking related issue. Reassign to shanjian and cc erik/nhotta
Assignee: ftang → shanjian
Status: ASSIGNED → NEW
Updated•24 years ago
|
Status: NEW → ASSIGNED
Line breaking issue is big issue for Thai. Browser without Line breaking
(Netscape 4) display text worse than IE5 (with line-breaking) so much.
This also impontant in Word Processing too. (should be in composer too).
What will you all do with this bug?
Comment 5•24 years ago
|
||
We need help to resolve this problem. Could you provide us help in this area?
I can explain the current code to you if necessary.
Comment 6•23 years ago
|
||
for a thai dictionary for wordbreaker, i think samphan@thai.net can help you.
Comment 7•23 years ago
|
||
What is the current status of this issue ?
Here's the patch for dictionary-based Thai line breaker for Mozilla.
The work is not finished and have been tested on Linux only.
There're some problems with it such as :-
- On Linux the dictionary will be loaded as "/usr/share/inthai/inthai.lex"
- On other platform it'll be loaded as "inthai.lex" so it must be in
current directory! See mozilla/intl/inthai/src/PackedTrie.cpp
Set your browser to "http://opensource.thai.net" if you want some test data in
Thai.
- cd to mozilla/.. and tar -xzvf libinthai-moz.tar.gz
The library will be installed in mozilla/intl/inthai
- Copy the dictionary file mozilla/intl/inthai/data/inthai.lex to
/usr/share/inthai for Linux or the executable directory for Win.
(you must mkdir /usr/share/inthai first).
Comment 10•23 years ago
|
||
Sorry, I forgot to tell that the above patch is made on mozilla 0.9.8 source.
Comment 11•23 years ago
|
||
How does the functionality of this patch compare to the functionality of
Mozilla's CTL(=Complex Text Layout) extension ?
Comment 12•23 years ago
|
||
as of my knowledge,
CTL provide the functionalities of:
- render Thai characters (cell-base, 4 levels of characters in 1 cell)
- cell-base caret travelling, delete, backspace
- cell-base highlight selection
There's NO wordbreak functionality in CTL (at least, for now).
Prabhat may better clarify this.
Reporter | ||
Comment 13•23 years ago
|
||
please do not attach application/octect-stream type in bugzilla attachment.
Comment 14•23 years ago
|
||
I looked at this patch. With some minor changes we could integrate this
to co-exist with current CTL code. One issue for discussion could be if
anyone in intl team is planning to integrate ICU (which i believe also
provides good thai word-breaking support). Could someone in intl team
comment on this?
Comment 15•22 years ago
|
||
I'd love to hear about using ICU in mozilla. IMHO, this will solve many problems.
> ICU (which i believe also provides good thai word-breaking support)
Thai dictionary-based word-breaker in ICU is not good enough. People from
IBM/Sun Thailand who use/test ICU with Thai language can tell you that. The
problems come from :-
1) Greedy, longest-matching algorithm use in ICU can't handle ambiguous word
sequences at all (it was mentioned as over-kill). Modern algorithms in Thai use
maximal-matching algorithm which use dynamic programming technic to handle
ambiguous sequences efficiently.
2) ICU algorithm don't handle unknown words (words not in the dictionary) very
well. Unknown words happend a lot in real Thai texts so it's *very* serious.
3) It's not been tested enough with Thai so there may be cases that cause error.
One case I can think of is the use of '.' in Thai to denote abbreviations. The
'.' could be threated as part of Thai word or as punctuation (some use it to
denote end of sentences) depend on context and dictionary. The dictionary should
also contain list of abbreviations for this to work.
On the other hand, IMO, ICU's BreakIterator API is the way to go. Libinthai
actually adopted BreakIterator API from the beginning. The interface you see in
my Mozilla patch is the result of painfully adapting the stateful BreakIterator
interface to Mozilla's stateless breaker API. Internally libinthai is ICU
compatible. I think Mozilla should consider using BreakIterator API directly in
the future.
Comment 16•22 years ago
|
||
> The work is not finished and have been tested on Linux only.
> There're some problems with it such as :-
> - On Linux the dictionary will be loaded as "/usr/share/inthai/inthai.lex"
> - On other platform it'll be loaded as "inthai.lex" so it must be in
> current directory! See mozilla/intl/inthai/src/PackedTrie.cpp
The dependency on the dictionary data file (inthai.lex) in libinthai has been
solved. The (default) main dictionary is compiled-in as a module as in ICU.
We're working on (re)building libinthai on Windows. Now libinthai also has
spelling checking/correction functionality too (spelling check and word breaking
for Thai are the two sides of the same coin).
Comment 17•22 years ago
|
||
hi samphan - Please post the latest patch and library with dictionary dependency
removed so that i can get our thai team to verify this.
thanks,
prabhat
Comment 18•22 years ago
|
||
The new libinthai can be d/l at
ftp://opensource.thai.net/pub/office-tle/libinthai_0.9.tar.gz
I haven't look at the patches for a while but will do ASAP if you want to.
BTW, what's the current plan?
1) change current code to use ICU's BreakIterator API and
1.1) go directly to ICU now
1.2) use libinthai until Thai BreakIterator in ICU is ok
2) use libinthai with its mozilla specific API as a temporaly solution
After all, please note that this particular patch (from above) fix the 'bug'
that prevent word-breaking for Thai eventhough the rule-based code is already
check-in (mozilla/intl/lwbrk/src/rulebrk.c) and used to work in old versions.
diff -ur tmp/mozilla/intl/lwbrk/src/nsJISx4501LineBreaker.cpp
dev/mozilla/intl/lwbrk/src/nsJISx4501LineBreaker.cpp
--- tmp/mozilla/intl/lwbrk/src/nsJISx4501LineBreaker.cpp Wed Jan 16 07:03:16 2002
+++ dev/mozilla/intl/lwbrk/src/nsJISx4501LineBreaker.cpp Wed Feb 13 18:06:52 2002
@@ -234,7 +233,8 @@
return ((0x1100 <= (u) && (u) <= 0x11ff) ||
(0x2e80 <= (u) && (u) <= 0xd7ff) ||
(0xf900 <= (u) && (u) <= 0xfaff) ||
- (0xff00 <= (u) && (u) <= 0xffff) );
+ (0xff00 <= (u) && (u) <= 0xffff) ) ||
+ (0x0e00 <= (u) && (u) <= 0x0e5f);
}
static inline int
Comment 19•20 years ago
|
||
Simplest solution for line break will be to allow separation after every Thai
Character
A better solution will be to separate before
characters in Thai Code page 874: 0xE0 to 0xE4
or Unicode: 0x0E40 to 0x0E44
Best will be to have a complete algorithm for word separation as e.g. in MS
Internet Explorer.
Comment 20•20 years ago
|
||
This patch is just for reviewing. It is for mozilla/firefox to use ICU for
line breaking. It works on both Linux and Win32. The problem with it is that
- you need ICU installed in the target machine
- to build it is quite ad-hoc
- on linux you have to put ICU in /usr/lib
- on win32 you need to add icu/lib path to the LIB variable
Anyway, integrating a lite version of ICU into the tree will solve this problem.
Other than that it works quite well. The patched firefox has been released
and being used by many people.
A flaw in the patch, nsJISx4051LineBreaker::BreakInBetween() is not implemented.
It simply do this :-
if((CLASS_THAI == c1) && (CLASS_THAI == c2))
{
*oCanBreak = 0;
}
else
{
*oCanBreak = GetPair(c1,c2);
}
I'm not sure if this is ok but I guess that BreakInBetween() is only need for
CJK (cause the patch work for Thai) and I can't think of a way to implement
BreakInBetween() using ICU.
Any comment?
Comment 21•20 years ago
|
||
Comment 22•20 years ago
|
||
This is the new patch that fix a bug that make some space missing when it is at
the end of line in HTML source. It simply use
DictionaryBasedBreakIterator::CreateWordInstance instead of CreateLineInstance.
Many people are using the patched firefox on Linux. I also distribute a setup
program on Windows that contain patched firefox and ICU together so Windows
users can start using it.
Attachment #172558 -
Attachment is obsolete: true
Reporter | ||
Comment 23•20 years ago
|
||
shanjian is no longer working on mozilla for 2 years and these bugs are still
here. Mark them won't fix. If you want to reopen it, find a good owner first.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → WONTFIX
Comment 24•20 years ago
|
||
Mass Re-open of Frank Tangs Won't fix debacle. Spam is his responsibility not my own
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Comment 25•20 years ago
|
||
Mass Re-assinging Frank Tangs old bugs that he closed won't fix and had to be
re-open. Spam is his fault not my own
Assignee: shanjian → nobody
Status: REOPENED → NEW
Updated•20 years ago
|
Assignee: nobody → smontagu
QA Contact: ftang → amyy
Comment 26•19 years ago
|
||
On Windows, I built ICU with --enable-static and modified the source as shown
in the next patch. Then I statically link the ICU binaries with the patched
firefox (which is built statically with default --enable-optimize). The result
firefox.exe with Thai line-breaking feature is only 328K bigger.
This proofs that it may be possible to include ICU in Mozilla/Firefox w/o too
much size penalty. ICU size can be reduced a lot. And static linking will pull
only the code we need from ICU.
Updated•19 years ago
|
Attachment #173704 -
Attachment is obsolete: true
Attachment #186465 -
Flags: review?(jshin1987)
Comment 27•19 years ago
|
||
This patch remove a lot of stuffs. I make sure only that Thai
DictionaryBasedBreakIterator is working.
Attachment #186467 -
Flags: review?(jshin1987)
Comment 28•19 years ago
|
||
I've just build the original firefox source with the same MOZCONFIG/setting I
used for the modified source. The result firefox.exe is a little bit smaller
than Mozilla.org official build. So I recompute the actual size that my ICU add
to firefox.exe. The modified firefox.exe is actually 528K bigger.
Comment 29•19 years ago
|
||
Comment on attachment 186465 [details] [diff] [review]
Patch to use ICU for Thai line-breaking, statically link on Windows
Thanks for your work, but introducing a new dependency (although only at the
compilation time) is not under my 'jurisdiction'. Perhaps, we need to figure
out a better way to make use of ICU (including a subset of them in our tree may
be an option).
Attachment #186465 -
Flags: review?(jshin1987)
Comment 30•19 years ago
|
||
1) Is there any other possible use of ICU in Mozilla now? There'll be more
chance to include ICU in the tree if there're many uses of it.
2) Is it possible to introduce a configure option, e.g. --with-icu[=icu-prefix],
to enable the use of ICU at compile time? This should not interfere with the
normal build while providing the option to use ICU in Mozilla (anywhere).
Comment 31•19 years ago
|
||
(In reply to comment #30)
> 1) Is there any other possible use of ICU in Mozilla now? There'll be more
> chance to include ICU in the tree if there're many uses of it.
There may be many if we decide to use ICU, but ...
> 2) Is it possible to introduce a configure option, e.g. --with-icu[=icu-prefix],
> to enable the use of ICU at compile time?
That's another possibility. Perhaps, that's a good interim solution. Can you
make a patch to implement that?
Comment 32•19 years ago
|
||
I don't know how yet but I'll try.
I've looked at similar stuff in configure.in.
Comment 33•19 years ago
|
||
I split the patch into two parts. The patch to configure.in (add --with-icu) is
filed as another bug #299324 which this bug depends. This patch now use ICU for
Thai line-breaking only if configure with --with-icu flag (which will #define
ENABLE_ICU).
Attachment #186465 -
Attachment is obsolete: true
Attachment #187900 -
Flags: review?(jshin1987)
Comment 34•19 years ago
|
||
Comment on attachment 187900 [details] [diff] [review]
Patch to optionally use ICU for Thai line breaking
Sorry for the delay.
We have to keep the current Thai line breaker if ENABLE_ICU is NOT defined,
don't we?
Attachment #187900 -
Flags: review?(jshin1987) → review-
Comment 35•19 years ago
|
||
Comment on attachment 186467 [details] [diff] [review]
Patch to reduce the size of ICU for Thai line-breaking
This is not releavant at least for now
Attachment #186467 -
Flags: review?(jshin1987)
Comment 36•19 years ago
|
||
> We have to keep the current Thai line breaker if ENABLE_ICU is NOT defined,
don't we?
No. The old rule-based Thai word-breaking code should be removed. It doesn't get
called anyway. nsJISx4501LineBreaker.cpp w/o the new patch only process CJK now
(it used to called the Thai rule-based code). That's why Thai word-breaking is
currently not working.
Comment 37•19 years ago
|
||
You will need to install libthai (http://libthai.sf.net/) on your system for
the Thai word breaking function to work.
The code loads libthai.so dynamically so if there is no libthai in the system,
it will do nothing. The benefit is that the patch is small and doesn't hurt
other users who don't speak Thai. For Thai users, they just have to install
libthai in their system and the vanilla version Mozilla/Firefox should work
immediately, no need to re-compile in order to have Thai support.
The code uses dlopen() (tested on Linux Debian unstable +Firefox-1.0.6), to be
more cross-platform, we might switch to GLib's GModule? I have included an
experimental code in the patch but had a problem with Makefile/INCLUDE stuff so
didn't test it yet.
For Windows, instead of libthai, we can use the word breaking function provided
by Uniscribe? I have not much idea on this so Windows guru could you help?
Comment 38•19 years ago
|
||
The ability to break Thai words of web browsers is really needed for Thai
users. The libthai-based patch above (modified from Samphan's ICU patch,
tested) is a solution to add this support to Mozilla/Firefox. It loads
libthai.so (http://libthai.sf.net/), if exists in the system, dynamically, so
no need to do ./configure --enable-something and/or recompile Mozilla/Firefox
for every new releases.
Not sure if Mozilla will eventually go for ICU/Pango/... in the future but for
the time being, is a solution like this acceptable?
Comment 39•19 years ago
|
||
Will someone in Mozilla read this thread/bug?
Comment 40•19 years ago
|
||
*** Bug 257935 has been marked as a duplicate of this bug. ***
Comment 41•19 years ago
|
||
I have another idea. Since incorporating ICU into Mozilla code base seems to be impossible. Is it possible to write an extension for the job?
The extension will do this :-
1) It'll implement the interface nsILineBreaker, using ICU for Thai line breaking
2) It'll register the new interface in the service NS_LWBRK_CONTRACTID, so nsDocument::GetLineBreaker() will get the new implementation from the extension instead.
I know nothing about XPCOM.
- Is this possible? Is it possible to replace the interface implementation from an extension?
What do you think? The way to go?
Comment 42•19 years ago
|
||
An updated version of libthai patch. It uses NSPR instead of dlopen() to load libthai according to the information here:
http://www.mozilla.org/projects/nspr/reference/html/prlink.html#24072
Thank you Jean-Marc Desperrier for poiting this out.
What is the proper way to seach for libthai.so from default paths specified in /etc/ld.so.conf or LD_LIBRARY_PATH? Now I just hard-code it to load from /usr/lib.
pattara
Comment 43•19 years ago
|
||
Without looking into all details of the bug, does
http://developer.mozilla.org/en/docs/Using_Dependent_Libraries_In_Extension_Components
help? Maybe not, but still putting the reference here.
Comment 44•19 years ago
|
||
(In reply to comment #42)
> What is the proper way to seach for libthai.so from default paths specified in
> /etc/ld.so.conf or LD_LIBRARY_PATH? Now I just hard-code it to load from
> /usr/lib.
If you use PR_LoadLibrary, I think you don't have to do anything. NSPR does the job for you afaik.
Comment 45•19 years ago
|
||
Comment on attachment 194910 [details] [diff] [review]
Patch to add Thai word break support to Mozilla/Firefox using libthai
obsoleted by the libthai NSPR patch
Attachment #194910 -
Attachment is obsolete: true
Comment 46•19 years ago
|
||
Hi,
Months have passed. The patch is there. The patch does fix the problem. But it is left there.
Just wondering how long will it take until Mozilla/Firefox can break Thai words correctly. Who is the right contact point for this? Where is the right place to post? If not here then what is this place for?
Does Mozilla/Firefox usually accept contribution like this from users? Is it the same for other bugs or this is the special case that nobody seems to care at all? Is it a technical problem or an administrative one? If the patch doesn't conform to some guidelines please let us know what to fix.
Just need some responses. Please at least show me that Mozilla is an Open Source project with open collaboration from its users. Otherwise how should I tell my users when things like Thai word-breaking don't work and don't seem to get fixed in any near future?
Pattara
Updated•19 years ago
|
Attachment #211242 -
Flags: review?(jshin1987)
Comment 47•19 years ago
|
||
Updated libthai patch as modified when submitted for inclusion in Ubuntu dapper. It now loads libthai.so.0 (with library version info specified) instead of just libthai.so, to guarantee ABI. Besides, the debug messages are now put only if DEBUG is defined.
Comment 48•19 years ago
|
||
The code lacks platform ifdefs to boot.
Re Windows and ICU, going the extension way may be a good idea. A post to
mozilla.dev.platform and .i18n may be good here (.platform is probably more
important than .i18n).
Any idea about the mac? jshin, is any of this related to the indic problems we
have?
Comment 49•19 years ago
|
||
(In reply to comment #48)
> The code lacks platform ifdefs to boot.
you mean something like
#ifdef LINUX
[...the added part...]
#endif
?
If not could you give me an example or point to some URLs?
Would you like to help test the patch?
> Re Windows and ICU, going the extension way may be a good idea. A post to
Windows has Uniscribe, so IMHO we should just use it (i don't know how).
> mozilla.dev.platform and .i18n may be good here (.platform is probably more
> important than .i18n).
> Any idea about the mac? jshin, is any of this related to the indic problems we
> have?
I think the final solution which should work on all platforms is to use Pango. I heard Theppitak is working on it. But meanwhile, for the sake of Thai Linux users, temporary-but-works-and-doesn't-hurt-anybody solution like libthai should not be ignored.
Comment 50•19 years ago
|
||
I've rewritten the LibThai patch. It's against MOZILLA_1_7_BRANCH, but I think still applies to MOZILLA_1_8_BRANCH as well (but not HEAD, of course).
In this patch, I split the libthai stuffs into a separate nsLibThaiLineBreaker class and call it from nsJISx4501LineBreaker at appropriate places. (At first, I was about to add a case in nsLWBreakerFimp factory, but it seemed all the callers passed a blank parameter. So, having more than one implementation is just impossible.)
Also, a preprocessor is added so that libthai is called only on Unix platforms, although making libthai available on Windows is also possible with cygwin/mingw. But that can be added later.
Attachment #218666 -
Attachment is obsolete: true
Comment 51•19 years ago
|
||
OK. I managed to rediff the rewritten libthai patch against 1.8 branch, by using debian's xulrunner 1.8.0.1 pristine source.
Attachment #220497 -
Attachment is obsolete: true
Comment 52•19 years ago
|
||
Related info: I've opened Bug #336959 for Pango/Uniscribe approach to line breaking in Trunk.
Comment 53•19 years ago
|
||
As another alternative, libthai dependency may be better for deployment in some distributions that also ship libthai.
Updated•18 years ago
|
Attachment #220509 -
Flags: review?(jshin1987)
Updated•18 years ago
|
Attachment #221289 -
Flags: review?(jshin1987)
Comment 54•18 years ago
|
||
I'm really sorry for dragging my feet here. Would you please make a new patch against the trunk instead of 1.8 branch? (note that nsILineBreaker is deCOMtaminated so that you need to make some changes). Without baking in the trunk for a while, it can't be landed in 1.8 branch (if it's not too late). BTW, why don't you use 'cvs diff' to make your patch?
Comment 55•18 years ago
|
||
> BTW, why don't you use 'cvs diff' to make your patch?
Because new files were introduced, and without CVS write permission,
I couldn't 'cvs add' them before diff-ing. They would thus have been
lost in the patch.
Regarding working with trunk or branch, it's quite a long story.
In short, it's a mix of results from Thai team's timing and from distro
contacts which forward the bug back upstream. So, I lose the focus on
trunk. Now we seem to miss all the plans, anyway.
Sorry for the wrongly targeted patch. I shall rework it against trunk
soon.
Comment 56•18 years ago
|
||
(In reply to comment #55)
> > BTW, why don't you use 'cvs diff' to make your patch?
>
> Because new files were introduced, and without CVS write permission,
> I couldn't 'cvs add' them before diff-ing. They would thus have been
> lost in the patch.
Ok. It doesn't matter much, but in a sense it may be more convenient for you to upload two files (one is a patch against existing files obtained with 'cvs diff' and the other contains new files).
> Regarding working with trunk or branch, it's quite a long story.
....
> trunk. Now we seem to miss all the plans, anyway.
Sorry I'm partly to blame ....
Comment 57•18 years ago
|
||
OK. I have reworked the patch against trunk, using the configure switch instead of PR_LoadLibrary. Using the switch makes the solution available on other platforms than XP_UNIX as well, provided that libthai is made available.
This patch is adjusted a little bit from previous ones. The existing rule-based Thai word segmentation in rulebrk.c is added back as the fallback when libthai is not linked.
Two additional files are coming.
Attachment #220509 -
Attachment is obsolete: true
Attachment #221289 -
Attachment is obsolete: true
Attachment #220509 -
Flags: review?(jshin1987)
Attachment #221289 -
Flags: review?(jshin1987)
Attachment #230603 -
Flags: review?(jshin1987)
Comment 58•18 years ago
|
||
The PR_LoadLibrary thing was actually better... You don't need to rebuild mozilla to enable/disable libthai support...
Comment 59•18 years ago
|
||
Attachment #230604 -
Flags: review?(jshin1987)
Comment 60•18 years ago
|
||
Attachment #230607 -
Flags: review?(jshin1987)
Comment 61•18 years ago
|
||
Mike Hommey (#58) wrote:
> The PR_LoadLibrary thing was actually better... You don't need to rebuild
> mozilla to enable/disable libthai support...
One drawback is that library names and directories for various platforms will have to be listed, right?
With --enable switch, you turn it on only if libthai is available on your platform, eliminating explicit conditions for different platforms in the code.
Any other points I miss?
Comment 62•18 years ago
|
||
Wouldn't the availability of libthai depend on installation instead of platform?
Comment 63•18 years ago
|
||
(In reply to comment #62)
Yes, libthai availability depends on installation. But its location and name depends on platform.
From my old patch in attachment #220509 [details] [diff] [review]:
+static const char* LibraryPaths[] = {
+ "/usr/local/lib",
+ "/usr/lib",
+ nsnull
+};
+
+nsLibThaiLineBreaker::nsLibThaiLineBreaker()
+ : mLibThai (nsnull)
+{
+ for (const char** ppLibPath = LibraryPaths; *ppLibPath; ++ppLibPath) {
+ char* pFullName = PR_GetLibraryName (*ppLibPath, "libthai.so.0");
+ if (pFullName) {
+ mLibThai = PR_LoadLibrary (pFullName);
+ PR_FreeLibraryName (pFullName);
+ if (mLibThai)
+ break;
+ }
+ }
+}
Yes, I can list all possible directories (/usr/lib, /usr/local/lib, c:/windows,
c:/Program\ Files/Firefox, etc.) and library names (libthai.so.0, libthai-0.dll, etc.) if this approach is chosen. Just feeling it's a bit awkward.
On the other hand, in distro maintainer's point of view, it might be convenient not to rebuild the whole suite to have libthai support.
This morning, I got another improvement idea: let's encapsulate all the mess and fallbacks for Thai in a class, say nsThaiLineBreaker. And the code in nsJISx4501LineBreaker would become a little bit cleaner. I will post the patch soon after testing it. But I will not switch back to PR_LoadLibrary yet at the moment, until we reach some decision.
Updated•18 years ago
|
Attachment #230603 -
Flags: review?(jshin1987)
Updated•18 years ago
|
Attachment #230604 -
Flags: review?(jshin1987)
Updated•18 years ago
|
Attachment #230607 -
Flags: review?(jshin1987)
Comment 64•18 years ago
|
||
Let me reiterate what i told you in the bugreport you did on debian: put it in a component. Then, the component can be linked against libthai without problem. It won't be loaded if libthai is not present at runtime.
Comment 65•18 years ago
|
||
I know. And my reply was that the approach was currently not supported in the code base. Currently, JISx4501LineBreaker is practically the sole LineBreaker ever created to handle Unicode string in general. There is no string analysis in Pango's manner which allows multiple engines to be created for individual scripts yet.
Comment 66•18 years ago
|
||
Ahem, I decide to cancel the idea about the nsThaiLineBreaker encapsulation. Leaving the fallback in nsJISx4501LineBreaker.cpp should be more beneficial for componentization in the future. The new component just does its job. And the fallback is there on its absence. Isn't that good?
Anyway, I've added some caching to nsLibThaiLineBreaker to improve its performance. I'm posting the new version here.
Comment 67•18 years ago
|
||
Attachment #230604 -
Attachment is obsolete: true
Comment 68•18 years ago
|
||
Attachment #230607 -
Attachment is obsolete: true
Comment 69•18 years ago
|
||
Re: the path of libthai
How about adding a pref. entry (to be set in about:config) to point to the list of candidate paths (with the reasonable *platform-dependent* values set by default)? On Windows, we need to do a little more than that ... This is not a very clean solution, but there's a precedent of a similar approach (for libfreetype), I think.
Comment 70•18 years ago
|
||
I agree with using about:config. Could you suggest some sample code?
Comment 71•18 years ago
|
||
Can we put "[line-breaking]" in "Status Whiteboard" filed ?
(just as Bug #157967 "Make Gecko interoperate better with advanced typography systems such as ATSUI, Uniscribe, Pango & STSF")
Updated•18 years ago
|
Blocks: line-breaking
Comment 72•18 years ago
|
||
Meta bug:
Bug #206152 [meta] line breaking bugs
Comment 73•18 years ago
|
||
(In reply to comment #69)
> This is not a very clean solution, but there's a precedent of a similar approach (for libfreetype), I think.
This is indeed very unclean, and FWIW, the libfreetype thing is useless nowadays...
Comment 74•18 years ago
|
||
(In reply to comment #73)
> the libfreetype thing is useless nowadays...
I know that at least as well as anybody else here.
Adding a pref. entry can be considered if there's no better way than hard-coding the library path or assuming the build-time availability to be the same as the run-time availability. (the latter will work if firefox is built for , say, a Thai linux distribution)
In reply to comment #70)
> I agree with using about:config. Could you suggest some sample code?
See
http://lxr.mozilla.org/seamonkey/source/gfx/src/freetype/nsFreeType.cpp#498
for a sample.
Comment 75•18 years ago
|
||
(In reply to comment #74)
> Adding a pref. entry can be considered if there's no better way than
> hard-coding the library path or assuming the build-time availability to be the
> same as the run-time availability. (the latter will work if firefox is built
> for , say, a Thai linux distribution)
There's also the possibility to do like with the mozgnome component: if the gnome libs are present at build time, build the component, and if they are not at run time, the component won't be loaded (the latter doesn't require anything, the component loading system just can't load a component when its dependencies are not here ; the former is not hard to do).
Comment 76•18 years ago
|
||
(In reply to comment #75)
All right, that would be perfect, as both gecko and libthai will then rely on component system to get connected, rather than hard-coding or low-level configuration. Can we make that interface Thai-specific for the moment, to be called from nsJISx4501LineBreaker for Thai chunks?
Being Thai-specific means only Thai Unicode range is supported for the moment, until a proper Pango-like text analysis is implemented.
I know nothing about XPCOM, either, anyway.
Comment 77•18 years ago
|
||
I managed to work out the mozlibthai component.
Attachment #230603 -
Attachment is obsolete: true
Comment 78•18 years ago
|
||
Attachment #230731 -
Attachment is obsolete: true
Attachment #230732 -
Attachment is obsolete: true
Comment 79•18 years ago
|
||
Hmm... It seems splitting the word breaker as a component has solved many issues. Now I have adjusted the patch so that:
- nsJISx4501LineBreaker _always_ tries to load a component of generic nsIThaiLineBreaker interface, with rule-based fallback on its absence. (That is, it's not guarded by libthai build condition any more.)
- The mozlibthai component installs itself as an implementation of nsIThaiLineBreaker interface, without direct link to libi18n.so. It will be built only if libthai is available at configure time, and --disable-libthai option is not set.
This way, distributions can build firefox with libthai support and split the mozlibthai component as a companion package, without enforcing dependency on non-Thai users. Meanwhile, a separate language pack is also possible on Windows in the same manner. (Thanks to Mike Hommey for mentioning mozgnome example.)
We can generalize the interface for more South East Asian languages later. It's still Thai-specific for the moment.
Comment 80•18 years ago
|
||
May I request for a review?
Attachment #230985 -
Attachment is obsolete: true
Attachment #231067 -
Flags: review?(jshin1987)
Comment 81•18 years ago
|
||
Attachment #230986 -
Attachment is obsolete: true
Attachment #231068 -
Flags: review?(jshin1987)
Comment 82•18 years ago
|
||
I have some updates for the last two patches. They happened to work by accident on my system, due to some trace left between the builds. Now I have moved the IDL into the component's directory, so that the .xpt file gets built and installed. Besides, the code is also adjusted a little bit.
Comment 83•18 years ago
|
||
Attachment #231067 -
Attachment is obsolete: true
Attachment #231067 -
Flags: review?(jshin1987)
Attachment #241552 -
Flags: review?(jshin1987)
Comment 84•18 years ago
|
||
Attachment #231068 -
Attachment is obsolete: true
Attachment #231068 -
Flags: review?(jshin1987)
Attachment #241553 -
Flags: review?(jshin1987)
Comment 85•18 years ago
|
||
Note: To test the patches, you need libthai, which is available at:
ftp://linux.thai.net/pub/thailinux/software/libthai
or libthai-dev package in Debian/Ubuntu.
Comment 86•18 years ago
|
||
jshin, can you review the latest patches?
simon, I wonder if you could help out and review to if jshin doesn't have cycles to review.
thanks
Comment 87•18 years ago
|
||
While caching up recent changes in trunk to update the patch, and weighing on alternatives, I think another way might be simpler: just replace nsJIS4051LineBreaker with full Pango break support.
I have updated the patch in Bug 336959 (Attachment 260003 [details] [diff]). And the performance seems now acceptable. So, please also consider such option.
Comment 88•17 years ago
|
||
According to my tests, Thai line breaking looks fixed (or at least very strongly enhanced) since the checkin of bug 255990. The recent checkin of bug 336959 did not change anything to the behavior.
If the current behavior is correct, maybe this bug can be closed ?
NB : For info, I created the following test page to help test the line breaking behavior for thai : http://jmdesp.free.fr/i18n/linebreak/LBThai.html
Is there a test in the testsuite already to check for that ?
Comment 89•17 years ago
|
||
Bug 336959 is for linux only, we still need help for windows and mac.
Comment 90•17 years ago
|
||
AFAIK, bug 255990 was about applying line breaks to ASCII chars. Probably, the formerly-bypassed rule-based Thai line breaker got called as a side-effect of the patch. But that's not what we need here, after all. We need more precise approch like dictionary-based breaker.
Bug 336959 makes Thai chunks be analyzed by Pango breaker, which in turn handles Thai line breaking with dictionary-based engine--what we try to achieve in this bug.
But it's only for Unix platforms. We still need more works on Windows and Mac.
Comment 92•17 years ago
|
||
(In reply to comment #90)
> Bug 336959 makes Thai chunks be analyzed by Pango breaker, which in turn
> handles Thai line breaking with dictionary-based engine--what we try to achieve
> in this bug.
>
> But it's only for Unix platforms. We still need more works on Windows and Mac.
>
Thep, though I see that using pango would be elegant for linux, why don't we try to land something like what we currently have in ubuntu?
Comment 93•17 years ago
|
||
Alexander, it would clearly help the general audience if you'd made a more precise statement on what ubuntu currently has.
Anyway, as probably said elsewhere, too, but at least on http://weblogs.mozillazine.org/roc/archives/2007/07/status_3.html, we went for using pango on linux, "and modern Pangos can hook up to libthai to make this happen." Thus I'm not sure why we would need to do additional stuff for linux/unix.
Comment 94•17 years ago
|
||
Alexander meaned my another patch currently shipped with Ubuntu. It calls libthai without Pango in between, in an XPCOM component. In other words, the last patches proposed in this bug, as inspired by Mike's suggestion.
Why not? Well, many reasons:
- You can see it has got no response so far, while the Pango patch gets reviewed and committed. But that's probably a non-scientific reason. :)
- It's roc's suggestion, which I agree at some degree. It's just simpler, while allowing supports for other languages like Lao and Khmer immediately when their Pango works are ready. (But availibility on other platforms is an affordable trade-off.)
- My patch here may still require more adjustments to be checked-in. The component communicates with nsJISx4501LineBreaker via nsIThaiLineBreaker interface, which, as suggested by its name, is still Thai-specific. A Pango-like layer will still be needed to separately analyze Lao and Khmer in the future. Actually, while I like to actually have such layer in Mozilla, the change may require architectural changes, which seem to be too huge for me to work alone without any comment to the initially proposed patch.
- Packaging is another issue. While libthai is available on Debian and Ubuntu, and probably other Linux distros as well, it's not available on Windows and Mac yet. The introduced external dependency may still require further re-arrangement, both for libthai itself and Mozilla's build system.
So, as Thai line breaking is now available in most platforms, simply exploiting them appears to be a shorter path.
Comment 95•17 years ago
|
||
So we need two additionnal bugs blocking this one :
- Use Uniscribe breaker to analyze Thai chunks under Windows
- Use ATSUI breaker to analyze Thai chunks under Mac OS
And if those two are done this bug will be fully solved.
It will be easier to get that in that it would have been to add libthai as an additional dependency under linux.
Comment 96•17 years ago
|
||
Oups, I meant "than to add libthai as an additional dependency under windows and Mac OS"
Comment 97•17 years ago
|
||
Bug 389520 has been filed for Mac.
Comment 98•17 years ago
|
||
Bug #390048 has been filed for Windows.
Comment 99•17 years ago
|
||
Since Bug #336959 (Linux), Bug #389520 (Mac) and Bug #390048 (Windows) are all fixed and checked-in, the current dependent Bug #299324 (ICU) approach is obsolete. Could we change the dependent list so that this bug will get fixed?
Comment 100•17 years ago
|
||
In fact, it seems to me this means this bug is now fixed ?
Can we close as fixed and thus get rid of another 4 digit bug :-) ?
Comment 101•17 years ago
|
||
(In reply to comment #100)
> In fact, it seems to me this means this bug is now fixed ?
> Can we close as fixed and thus get rid of another 4 digit bug :-) ?
Agreed.
Assignee | ||
Comment 102•17 years ago
|
||
If I understand the last few comments correctly, the correct resolution here is WONTFIX, since Thai line breaking has been fixed by other means.
Status: NEW → RESOLVED
Closed: 20 years ago → 17 years ago
Resolution: --- → WONTFIX
Updated•16 years ago
|
Attachment #241552 -
Flags: review?(jshin1987)
Comment 103•16 years ago
|
||
Comment on attachment 241552 [details] [diff] [review]
Patch to lwbrk so it calls mozlibthai component at appropriate places
clear outstanding reviews
Updated•16 years ago
|
Attachment #211242 -
Flags: review?(jshin1987)
Updated•16 years ago
|
Attachment #241553 -
Flags: review?(jshin1987)
You need to log in
before you can comment on or make changes to this bug.
Description
•