Closed Bug 724531 Opened 13 years ago Closed 11 years ago

import ICU library into the mozilla tree

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla22

People

(Reporter: jfkthame, Assigned: mozillabugs)

References

(Blocks 3 open bugs)

Details

Attachments

(3 files, 4 obsolete files)

Import the ICU4C source (see http://site.icu-project.org/download).

Current release is 4.8.1.1, though we should be able to drop in new versions as they're released with minimal disruption.

Note that ICU intends to change the version numbering scheme, so that the next release after the 4.8 series will be 49, then 50, etc.
If this is to be used by the JS engine and also by other parts of Gecko, where in the tree do we want the code to live?
Depends on: 724533
Blocks: 724533
No longer depends on: 724533
Are we sure we want to do this?  ICU is *huge*.  The copy that chrome ships with is 9.5 MB, which is about 62% of the size of libxul.
Blocks: 724534
Blocks: 724535
Blocks: 724538
Blocks: 724540
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #2)
> Are we sure we want to do this?  ICU is *huge*.  The copy that chrome ships
> with is 9.5 MB, which is about 62% of the size of libxul.

Yes, ICU is huge.... most of that is data, however. That data would either
(a) replace data we're currently shipping (e.g. unicode character properties, charset mappings, locale-specific formatting); or
(b) support new features that we want to add (e.g. JS i18n APIs), and can't unless we ship the data in some form; or
(c) not be required for any feature we offer, in which case we can customize the ICU build to exclude it.

This was initially motivated by the upcoming ECMAscript globalization APIs that we want to be able to implement, but if we go this way we'll want to replace a bunch of existing code and data with ICU-based implementations, which should do a lot to offset the size of ICU.

It is of course possible we'll decide this isn't the best way forward, but it seems worth investigating, at least.
Assignee: nobody → mozillabugs
Regarding the legal bits, Gervase Markham quoth "Yes, the ICU licence is fine."
I'm really glad to see this development in Mozilla. I fully agree with what Jonathan wrote and I'm more than willing to help with ICU customization (as I and others at Google have done for Chrome and Android).
Blocks: 820260
Blocks: 820261
This script is based on the one to import Breakpad (toolkit/crashreporter/update-breakpad.sh). It omits major chunks of ICU that are not needed to implement the ECMAScript Internationalization API (bug 769872), but may be needed later to implement other features in Mozilla. When run against the ICU 50.1.1 release sources, the script imports ~3250 files out of ~5350.

Two questions:
- Is a top-level directory "icu" the right place to put ICU, or is there another directory where it fits in?
- I put the ICU sources into a subdirectory icu/icu-src to keep ICU-maintained files separate from Mozilla-maintained ones (and to allow for possibly necessary configure.in and Makefile.in), but it makes for an awkward hierarchy icu/icu-src/source. Should icu and icu-src be merged?
Attachment #697616 - Flags: feedback?(ted)
Comment on attachment 697616 [details] [diff] [review]
Script to import ICU library into the Mozilla tree (WIP)

Review of attachment 697616 [details] [diff] [review]:
-----------------------------------------------------------------

Looks reasonable.
Attachment #697616 - Flags: feedback?(ted) → feedback+
(In reply to Norbert Lindenberg from comment #6)
> - Is a top-level directory "icu" the right place to put ICU, or is there
> another directory where it fits in?

brendan has to sign off on new top-level directories, but that seems fine to me. Alternately you could put it in intl/icu.

> - I put the ICU sources into a subdirectory icu/icu-src to keep
> ICU-maintained files separate from Mozilla-maintained ones (and to allow for
> possibly necessary configure.in and Makefile.in), but it makes for an
> awkward hierarchy icu/icu-src/source. Should icu and icu-src be merged?

That does sound pretty awkward. For Breakpad we just stick our Makefile.ins alongside the Breakpad code and don't worry about it too much. As long as your update script does things correctly it shouldn't matter.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #8)
> brendan has to sign off on new top-level directories, but that seems fine to
> me. Alternately you could put it in intl/icu.

I discussed this with Brendan; he prefers intl/icu.
Blocks: es-intl
Moved ICU sources to intl/icu and update script into intl/.
Attachment #697616 - Attachment is obsolete: true
Attachment #710032 - Flags: review?(ted)
Comment on attachment 710032 [details] [diff] [review]
Script to import ICU library into the Mozilla tree

Review of attachment 710032 [details] [diff] [review]:
-----------------------------------------------------------------

::: intl/update-icu.sh
@@ +1,1 @@
> +# This Source Code Form is subject to the terms of the Mozilla Public

needs a #!/bin/sh
Attachment #710032 - Flags: review?(ted) → review+
Updated with #!/bin/bash. Carrying r+ted.
Attachment #710032 - Attachment is obsolete: true
Attachment #715028 - Flags: review+
Comment on attachment 715028 [details] [diff] [review]
Script to import ICU library into the Mozilla tree

Review of attachment 715028 [details] [diff] [review]:
-----------------------------------------------------------------

::: intl/update-icu.sh
@@ +1,1 @@
> +#!/bin/bash

/bin/sh
(In reply to Mike Hommey [:glandium] from comment #13)

> > +#!/bin/bash
> 
> /bin/sh

I used /bin/bash because I don't have a non-bash sh to test with, and update-breakpad.sh, after which this script is modeled, is using the same. Why should the script reference a tool that we don't actually use?
AFAICT your script doesn't have any bashism, and it should work in any posix shell, thus /bin/sh.
Updated per comment 13. Carrying r+ted.
Attachment #715028 - Attachment is obsolete: true
Attachment #715882 - Flags: review+
Attached patch Update license.html (obsolete) (deleted) — Splinter Review
Updated toolkit/content/license.html to contain the relevant portions of
http://source.icu-project.org/repos/icu/icu/trunk/license.html

1) Added intl/icu to the existing section with the ICU license and updated the year.

2) Added a section with the Unicode license. Included a reference to js/src/vm, which also uses Unicode data.

3) Did not add a section for cjdict.txt because we're not currently importing that file.

4) Did not add a section for the Time Zone Database because there's no requirement to do so.
Attachment #718147 - Flags: review?(gerv)
Attached patch Imported ICU source files (deleted) — Splinter Review
This attachment is an abbreviated form of the full patch importing the ICU source files; it includes the SVN info with the tag used and the names of all 3239 new files. The full patch is available at
http://lindenbergsoftware.com/mozilla/icu-src.patch.gz
Even in gzip form it's over the 4MB limit for Bugzilla attachments.

The patch was generated by running
$ cd intl
$ ./update-icu.sh http://source.icu-project.org/repos/icu/icu/tags/release-50-1-2

r? requests permission to land the imported ICU sources, not actual review of ICU.
Attachment #718157 - Flags: review?(dmandelin)
Comment on attachment 718157 [details] [diff] [review]
Imported ICU source files

Review of attachment 718157 [details] [diff] [review]:
-----------------------------------------------------------------

Just to avoid possible confusion: 

- Let's not land this until the licensing file changes are reviewed by Gerv and landed, so we know that the licensing issues have been squared away.
- This r+ covers landing the sources to the tree. Including them in any build is a separate step that we're working through with responsible people for each product.
Attachment #718157 - Flags: review?(dmandelin) → review+
Comment on attachment 718147 [details] [diff] [review]
Update license.html

Re: The Unicode License, you only need the parts between "Copyright © 1991-2012 Unicode" and "copyright holder", inclusive. The rest is not necessary. r=gerv with that change.

Gerv
Attachment #718147 - Flags: review?(gerv) → review+
Attached patch Update license.html (deleted) — Splinter Review
Updated per comment 20. Carrying r+gerv.
Attachment #718147 - Attachment is obsolete: true
Attachment #718512 - Flags: review+
Attachment #718512 - Flags: checkin?(jwalden+bmo)
Attachment #715882 - Flags: checkin?(jwalden+bmo)
Comment on attachment 718157 [details] [diff] [review]
Imported ICU source files

The updated full patch carrying r+dmandelin is at
http://lindenbergsoftware.com/mozilla/icu-src.patch.gz
Attachment #718157 - Flags: checkin?(jwalden+bmo)
Is this off by default? The last time this was proposed in the newsgroups, I (and several others) raised significant objections about what this would cost in terms of footprint: the last action on that thread was "I'm working on a document discussing the issues and possible solutions, including some that haven't come up in the discussion yet. This will hopefully provide a basis for further discussion and decision."
bsmedberg: comment 19 might shed some light.

Gerv
Comment on attachment 715882 [details] [diff] [review]
Script to import ICU library into the Mozilla tree

Yes, I'm told it's off for now.  The last I was told was that Asa was coming around on the matter wrt Firefox, and a link to a comment to that effect was provided (I don't remember it offhand); I don't know about anything else.
Attachment #715882 - Flags: checkin?(jwalden+bmo)
Attachment #718157 - Flags: checkin?(jwalden+bmo)
Attachment #718512 - Flags: checkin?(jwalden+bmo)
Benjamin: Yes, this is off. Build integration is under construction in bug 724533, and all the other code for the ECMAScript Internationalization API that we're currently landing is still disabled. Dave, Asa, and others are continuing the discussion about where and how to turn things on.

Jonathan and other internationalizers: I should make clear that I've so far only imported the parts of ICU that the ECMAScript Internationalization API needs. Replacing existing code in other parts of Firefox may require importing more code and data. To do this, you'd modify the update-icu.sh script to delete fewer of the files it gets from the ICU repository. If and when the file cjdict.txt is imported, the license file also needs to be updated with the additional bits for cjdict.
Assuming someone implements --with-system-icu=, as seems likely, they might need to deal with allocator mismatching for NumberingSystem, in the "part 4" patch in bug 837957, if I understand our allocator-hooking correctly:

<Waldo> does our malloc hooking in Gecko also properly hook new/delete?  does it hook either in system libraries, should they be used?
<Waldo> in particular, do things work right if there's a |delete foo| in Gecko, but the corresponding |new Foo| was in a system library?

Noting this here on the entirely off chance that whoever does that will see it, or the review will ensure this issue is addressed, for lack of a better place to note this...  :-\
Blocks: 853301
Blocks: 864843
Blocks: 866301
Blocks: 866305
ICU 52.1 is expected to release tomorrow. We'd like it to get the updated UNICODE bi-di algorithm. Are you able to update the tree with that version? http://site.icu-project.org/
Flags: needinfo?(mozillabugs)
I'm point for ICU right now.  And yes, that should be feasible -- with bug 853301 landing, the next necessary task before shipping with Intl on is to update all the imported libraries, data lists, etc. that haven't been updated since April (because with it not being shipped, why bother yet?).  It should be a matter of running through the updating information on <https://wiki.mozilla.org/User:Waldo/Internationalization_API> (will be moved to a better location), and if a new ICU's going to be released really soon, of course we should pick up that version.  (This assumes things Just Work with the updated version -- quite possible, but not guaranteed.)
Flags: needinfo?(mozillabugs)
(In reply to Jet Villegas (:jet) from comment #30)
> ICU 52.1 is expected to release tomorrow. We'd like it to get the updated
> UNICODE bi-di algorithm. Are you able to update the tree with that version?
> http://site.icu-project.org/

I don't believe updating ICU will in itself give us the updated bidi algorithm, as we don't currently rely purely on ICU to implement that; we have our own code (see layout/base/nsBidi.{h,cpp}) that will need to be updated.

See also bug 922963.
(In reply to comment #32)
> (In reply to Jet Villegas (:jet) from comment #30)
> > ICU 52.1 is expected to release tomorrow. We'd like it to get the updated
> > UNICODE bi-di algorithm. Are you able to update the tree with that version?
> > http://site.icu-project.org/
> 
> I don't believe updating ICU will in itself give us the updated bidi algorithm,
> as we don't currently rely purely on ICU to implement that; we have our own
> code (see layout/base/nsBidi.{h,cpp}) that will need to be updated.

Yeah, we don't use ICU's bidi algorithm at all, so if that is the only reason to update ICU, it can wait I guess.
No, we do use ICU's implementation of the bidi algorithm: our code in nsBidi.h/cpp is all originally ICU code, but modified at the time that it was imported to use our calling conventions and an XPCOM wrapper, and with the parts we don't use #ifdef'd out.

As I said in bug 922963, there are two possible ways forward: either update nsBidi from the new version of ICU, or start using ICU instead of nsBidi, in which case updating ICU will give us the new version of the algorithm more or less for free. I think I prefer the second option, and I've just opened bug 924851 for the first part of that.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: