Open Bug 1125644 Opened 10 years ago Updated 2 years ago

Word Joiner (Unicode U+2060) doesn't inhibit line breaks at some characters like U+2009 (Thin Space,  )

Categories

(Core :: Layout: Text and Fonts, defect)

x86_64
Linux
defect

Tracking

()

People

(Reporter: cujyaz, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(1 file)

Attached file testcase (deleted) —
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:34.0) Gecko/20100101 Firefox/34.0 SeaMonkey/2.31 Build ID: 20141206170726 Steps to reproduce: Load attached testcase. Actual results: Word Joiner (Unicode U+2060) doesn't inhibit line breaks, particularly after spacing characters U+2000 .. U+200A, where it is useful to turn them into non-breaking spaces. A few other cases worked after bug 911849 was fixed, but don't work either in today's nightly. Expected results: Word Joiner (Unicode U+2060) should inhibit a line break after every character except U+200B (Zero Width Space). http://unicode.org/reports/tr14/
Argh, used the wrong build to test. Actually, there is no recent regression, but Word Joiner still doesn't work after some (but not all) characters where it should. Summary adjusted.
Summary: Word Joiner (Unicode U+2060) doesn't inhibit line breaks → Word Joiner (Unicode U+2060) doesn't inhibit line breaks at some characters like U+2009 (Thin Space,  )

I know this is an old bug, but this is still a problem in the latest Firefox Developer Edition [66.0b5 (64-bit) on macOS at least].

Safari and Chrome both exhibit expected behaviour on macOS.

Hello!

What kind of "confirmation" is necessary to confirm the issue and (eventually, I hope) get it done? I can personally confirm that the problem exists and is actual. :)

The Unicode standard also "can confirm", because it absolutely requires U+2060 WORD JOINER to retain its break-blocking property.

I made a very simple demo, it works as expected (no line breaks) in Safari and Chrome, but not in Firefox.

<html lang="en">
	<head>
		<title>Demo for #1125644</title>
		<meta charset="utf-8"/>
		<style>
			div {
				outline: 1px solid red;
				margin: 10px;
				width: 32px;
				font-size: 16px;
			}
		</style>
	</head>
	<body>
		<!-- U+200A HAIR SPACE + U+2060 WORD JOINER -->
		<div>Hello,&#x200A;&#x2060;World</div>

		<!-- U+2009 THIN SPACE + U+2060 WORD JOINER -->
		<div>Hello,&#x2009;&#x2060;World</div>
	</body>
</html>

You can also copy strings from my demo and paste them into this Unicode Utility (in "Line" test mode) to make sure that there should not be a break opportunity when using U+2060 WJ.

Presumably the new unified segmenter will fix this (bug 1684927). Ting-Yu, do you know if there's a way to test it in Gecko yet?

Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(aethanyc)

Yes, this is our old line breaker's bug per Unicode Line Breaking Algorithm - LB11 https://www.unicode.org/reports/tr14/#Algorithm

LB11: Do not break before or after Word joiner and related characters.

After we integrate ICU4X's line segmenter into gecko, this bug should be fixed. Currently we don't have a way to test it in Gecko yet, but we are targeting 2022Q2 to have the integration.

Depends on: segmenter
Flags: needinfo?(aethanyc)

Wow, I will look forward to it, thank you!

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: