Closed Bug 1730809 Opened 3 years ago Closed 3 years ago

Make WordBreaker::GetClass public

Categories

(Core :: Internationalization, task)

task

Tracking

()

RESOLVED WONTFIX

People

(Reporter: Jamie, Assigned: Jamie)

References

Details

Attachments

(1 obsolete file)

In a fantastic example of bad timing, I've been working on patches over in bug 1729407 which rely on WordBreaker::GetClass... but the just landed bug 1728708 made it private!

Currently, accessibility uses nsIFrame::PeekOffset to calculate word boundaries. We want to switch to calculating word boundaries ourselves. Among other reasons, this is necessary because we will be caching text from content processes in the parent process and we can't use layout in this case.

Our boundaries still need to be (mostly) consistent with layout, but layout has a slightly different concept of words to WordBreaker. For example, it doesn't treat space or punctuation as a separate word. We also need to handle words which are broken across nodes.

I was using WordBreaker::GetClass to implement this functionality. For example, we might need to scan backwards (potentially across nodes) looking for punctuation or space.

Could this be made public again? I'm happy to write the (trivial) patch.

Flags: needinfo?(aethanyc)
Blocks: 1729407

These were made private in bug 1728708.
However, accessibility will need them as part of the new text implementation in bug 1729407.

Assignee: nobody → jteh
Status: NEW → ASSIGNED

Ah, this is really a bad timing. Our current WordBreaker is old and doesn't conform to the Unicode Text Segmentation standard and it also doesn't find correct word boundary for Chinese, Japanese (bug 774965), or other East Asian languages like Thai, Lao, etc. So we are in the process of cleaning it up, and replacing it with a new word breaker implementation in Rust in bug 1684927. WordBreaker::GetClass is made private because it is a implementation detail of the WordBreaker, and I don't want people to introduce a new usage for it.

Jamie: I see you are crafting a WordBreak for accessbility usage in Bug 1729407 part 3, and pointing out there are bugs in nsIFrame::PeekOffset that prevent you from using it. Could you elaborate a bit more? I'm not familiar with the code, but maybe Jonathan can chime in and we can figure out the possible ways to moving forward.

Flags: needinfo?(aethanyc) → needinfo?(jteh)

(In reply to Ting-Yu Lin [:TYLin] (UTC-7) from comment #2)

Jamie: I see you are crafting a WordBreak for accessbility usage in Bug 1729407 part 3, and pointing out there are bugs in nsIFrame::PeekOffset that prevent you from using it. Could you elaborate a bit more?

It's debatable as to whether they are bugs, quirks, just incompatibilities with what a11y needs or all of the above. As an example, layout is somewhat asymmetric with respect to punctuation. For example, if you have this:
a @b c
Next word from "a" will put you on the "@", but previous word from "c" will put you on the "b". A11y really needs word boundaries to be symmetric, regardless of traversal direction. Many of the bugs (both open and closed) blocking bug 368895 were caused in some way by Peekoffset not giving a11y what it expected.

Aside from that, the more important issue (as noted in comment 0) is that we need to be able to get word boundaries from cached text. For performance and stability reasons, we're moving towards caching the entire a11y tree of content processes in the parent process. That means we don't have access to the layout frame tree, so we can't use PeekOffset. The alternative would be sending all word boundaries to the parent process and caching them, but that is pretty expensive.

Flags: needinfo?(jteh)

(In reply to Ting-Yu Lin [:TYLin] (UTC-7) from comment #2)

we are in the process of cleaning it up, and replacing it with a new word breaker implementation in Rust in bug 1684927.

Will this still be usable from C++? I assume so, since layout needs it.

WordBreaker::GetClass is made private because it is a implementation detail of the WordBreaker, and I don't want people to introduce a new usage for it.

The other alternative is for me to copy the layout code for determining punctuation and space from ClusterIterator. I was hoping to use existing code, but if that's the only path forward, I'll do it.

Flags: needinfo?(aethanyc)

Based on comment 2, I think it's going to be best for a11y to maintain its own code for this. We only really need to know about punctuation and space anyway; the other WordBreakClass values aren't relevant to us.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
Attachment #9241261 - Attachment is obsolete: true

Thanks for the explanation in comment 3. If a11y only needs to know punctuation and space for a given character, and the text cache lives in parent process, then it is reasonable to me to design a11y's own logic. I'm glad this bug is not a road blocker for you.

Will this still be usable from C++? I assume so, since layout needs it.

Yes, the new Rust word breaker will have FFI for C++ consumers.

Flags: needinfo?(aethanyc)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: