Closed Bug 345823 Opened 18 years ago Closed 1 year ago

Implement Unicode word breaking (UAX #29, section 4)

Tracking

()

Status:

RESOLVED DUPLICATE of bug 1719535

People

(Reporter: uriber, Assigned: m_kato)

References

(Blocks 1 open bug,
URL
)

Details

(Keywords: intl)

Uri Bernstein (Google)

Reporter

Description

•

18 years ago

We currently use two separate algorithms to determine word boundaries (for ctrl-left/right, double-click, etc.). One is for "ASCII"- (really, Latin-1) only text, implemented directly in nsTextTransformer, and the other is a very simplistic algorithm implemented by nsSampleWordBreaker, used for anything that contains non-"ASCII" characters. We should replace both (or at least nsSampleWordBreaker) with a word breaker that implements the Unicode word breaking algorithm, described in section 4 of UAX #29 (see URL). See bug 56652 for the equivalent line-breaking issue.

Jungshik Shin

Comment 1

•

18 years ago

Another related bug is bug 229896 (for grapheme clusters). This had better be filed under i18n because it can be potentially used for places other than layout.

Component: Layout: Fonts and Text → Internationalization

Uri Bernstein (Google)

Reporter

Updated

•

18 years ago

Assignee: nobody → smontagu

QA Contact: layout.fonts-and-text → amyy

Uri Bernstein (Google)

Reporter

Updated

•

18 years ago

Blocks: word-select

Phil Ringnalda (:philor)

Updated

•

15 years ago

QA Contact: amyy → i18n

Zibi Braniecki [:zbraniecki][:gandalf]

Updated

•

4 years ago

Blocks: segmenter

Makoto Kato [:m_kato]

Assignee

Updated

•

3 years ago

Assignee: smontagu → m_kato

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Ting-Yu Lin [:TYLin] (UTC-8)

Comment 2

•

1 year ago

We've integrated ICU4X word segmenter in bug 1719535, which is UAX 29 compatible.

Status: NEW → RESOLVED

Closed: 1 year ago

Duplicate of bug: 1719535

Resolution: --- → DUPLICATE

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Implement Unicode word breaking (UAX #29, section 4)

Categories

(Core :: Internationalization, defect)

Tracking

()

People

(Reporter: uriber, Assigned: m_kato)

References

(Blocks 1 open bug,
URL
)

Details

(Keywords: intl)

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Updated

Updated

Updated

Updated

Updated

Comment 2