Closed
Bug 345823
Opened 18 years ago
Closed 1 year ago
Implement Unicode word breaking (UAX #29, section 4)
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
DUPLICATE
of bug 1719535
People
(Reporter: uriber, Assigned: m_kato)
References
(Blocks 1 open bug, )
Details
(Keywords: intl)
We currently use two separate algorithms to determine word boundaries (for ctrl-left/right, double-click, etc.).
One is for "ASCII"- (really, Latin-1) only text, implemented directly in nsTextTransformer, and the other is a very simplistic algorithm implemented by nsSampleWordBreaker, used for anything that contains non-"ASCII" characters.
We should replace both (or at least nsSampleWordBreaker) with a word breaker that implements the Unicode word breaking algorithm, described in section 4 of UAX #29 (see URL).
See bug 56652 for the equivalent line-breaking issue.
Comment 1•18 years ago
|
||
Another related bug is bug 229896 (for grapheme clusters). This had better be filed under i18n because it can be potentially used for places other than layout.
Component: Layout: Fonts and Text → Internationalization
Reporter | ||
Updated•18 years ago
|
Assignee: nobody → smontagu
QA Contact: layout.fonts-and-text → amyy
Reporter | ||
Updated•18 years ago
|
Blocks: word-select
Updated•15 years ago
|
QA Contact: amyy → i18n
Assignee | ||
Updated•3 years ago
|
Assignee: smontagu → m_kato
Updated•2 years ago
|
Severity: normal → S3
Comment 2•1 year ago
|
||
We've integrated ICU4X word segmenter in bug 1719535, which is UAX 29 compatible.
You need to log in
before you can comment on or make changes to this bug.
Description
•