Closed
Bug 1290022
Opened 8 years ago
Closed 4 years ago
Investigating we can use Rust based xi-unicode as Linebreaker
Categories
(Core :: Internationalization, defect, P3)
Core
Internationalization
Tracking
()
RESOLVED
WONTFIX
Tracking | Status | |
---|---|---|
firefox50 | --- | affected |
People
(Reporter: m_kato, Assigned: m_kato)
References
Details
Attachments
(2 files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review |
Actually, our legacy line breaker is too old. Although I am considering that it replace with ICU (bug 820261), it causes that binary size (including data size) is 2-3+MB.
Now, servo uses xi-unicode, so we will be able to use it for replacement.
Assignee | ||
Comment 1•8 years ago
|
||
xi-unicode supports Unicode 8.0. I send a PR to update to 9.0 (rev 37). Also, our line breaker rule isn't same as UAX#14 and pretty omplex...
Comment 2•8 years ago
|
||
What would the codesize of xi-unicode?
I wonder whether we can replace our bidi engine with unicode-bidi crate as well.
Comment 3•8 years ago
|
||
Simon, thoughts?
+1 on sharing unicode handling with servo.
Assignee | ||
Comment 4•8 years ago
|
||
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #2)
> What would the codesize of xi-unicode?
>
> I wonder whether we can replace our bidi engine with unicode-bidi crate as
> well.
This data is on Linux 64 opt. about 30KB.
Before: libxul.so ... 93709728 bytes
After: libxul.so ... 93738960 bytes
This data is just UAX#14 rule only. We need additional rules for historical fix.
Assignee | ||
Comment 5•8 years ago
|
||
(WIP doesn't remove old table to 8bit character. So If removing old table, it reduces increase of size.)
Assignee | ||
Comment 6•8 years ago
|
||
The following test isn't same rule against UAX#14.
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/between-whitespaces.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/chemical-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/datetime-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/emoji-2.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/hyphens-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/hyphens-2.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/leaders-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/markup-src-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/parentheses-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/quotationmarks-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/smileys-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/smileys-2.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/url-1.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/url-2.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/url-3.html
REFTEST TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/layout/reftests/line-breaking/winpath-1.html
We need add more rules for old compatibility...
Comment 7•8 years ago
|
||
> I wonder whether we can replace our bidi engine with unicode-bidi crate as well.
I would not recommend this right now. unicode-bidi is not quite spec compliant yet, and it also is missing some significant optimizations compared to Gecko. (I consulted the Gecko code while writing the unicode-bidi crate, and there are issues filed for the optimization work.)
Assignee | ||
Comment 8•8 years ago
|
||
Assignee | ||
Comment 9•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Priority: -- → P3
Comment 10•8 years ago
|
||
Comment on attachment 8793196 [details] [diff] [review]
WIP: Part 1. Import xi-unicode and capi to our tree
Review of attachment 8793196 [details] [diff] [review]:
-----------------------------------------------------------------
Do we need to vendor xi-unicode inside intl/?
In case you wish to edit xi-unicode, this may make sense (but then you'd need to come up with a syncing mechanism).
If you don't (I personally prefer this -- if you do need to make a change to xi-unicode best do it upstream), using the solution for rust-url-capi and mp4parse-capi would be better -- use the regular Rust build system cargo file.
We did this with rust-url-capi (https://hg.mozilla.org/mozilla-central/rev/54a28d6fbed4, parent commit contains the vendoring and is huge). In this case we had more transitive dependencies.
To vendor, add xi-unicode-capi as a path dependency to toolkit/library/rust, add xi-unicode as a regular version dependency to xi-unicode-capi; cd to the rust folder and run cargo update; and then run `cargo vendor ../../../third_party/rust`. May need to do something similar for gtest.
Comment 11•8 years ago
|
||
(In reply to Manish Goregaokar [:manishearth] from comment #3)
> Simon, thoughts?
>
> +1 on sharing unicode handling with servo.
More Rust code in Gecko and more code shared with Servo sounds good in general, but no opinion beyond that. I’m not familiar with xi-unicode (other than it exists and gives better results than what Servo had before) or Gecko’s line breaking code.
Assignee | ||
Comment 12•8 years ago
|
||
Like Manish's comment (comment #10), I already create capi crate as https://github.com/makotokato/xi-unicode_capi for WIP. Big problem is that line breaker rule into Gecko isn't same as UAX#14. Due to a lot of histrionically reason and bugs (ex. adding URL breaking rule), it is too complex.
But we may be able to use this when implementing line-break property in Gecko. So I need discuss Ishii-san about strict line breaker rule.
Assignee | ||
Comment 13•4 years ago
|
||
I decide that I can rewrite new line breaker next year. closed as WONTFIX.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•