Closed Bug 157993 Opened 22 years ago Closed 22 years ago

consolidate uconv libraries

Categories

(Core :: Internationalization, defect, P2)

x86
Windows 2000
defect

Tracking

()

RESOLVED FIXED
mozilla1.2alpha

People

(Reporter: alecf, Assigned: alecf)

References

Details

(Keywords: intl)

Attachments

(1 file, 6 obsolete files)

right now we have a bunch of charset converter libraries. I don't know why we need all these dlls, when we could combine them into one or two DLLs. Here's what I'm thinking: Combine into ucvbase.dll: ucvcn.dll ucvibm.dll ucvlatin.dll ucvmath.dll (only 19k, we might as well just make it part of the default) Combine into ucvasia.dll: ucvja.dll ucvko.dll ucvtw.dll ucvtw2.dll This should make the total size smaller: right now anywhere from 10-50% of EACH DLL is a copy and pasted nsUCv*Support.cpp - all these files appear to be identical. This "Support" file contains a bunch of base classes that all the decoders/encoders derive from. we can share these classes in a static library, and then when we combine the above dlls, we'll see a savings because the library will be shared within the dll. After this work is completed, I have a further optimization which gets rid of all the leaf classes for these converters and instead allows each one to be created off a specific base class. This should reduce the overhead of each converter down to a simple static function that makes a call to a shared static constructor. But I'll file that in another bug. CC'ing some i18n people so they know this is going on. The one question I have is how we should combine the libraries: does it make sense to combine them all into one library? If we do two libraries, does the above division look correct?
as a preliminary test, I combined the ucvja and ucvtw2 DLLs (the two largest) into one dll...now, this is a debug build, but here are the results: Seperate: 237 ucvja/ucvja.dll 233 ucvtw2/ucvtw2.dll 470 total Combined: 429 ucvasia/ucvasia.dll So this is a savings of 41k, or about 9%, just from combining these two libraries. I imagine we can save an additional 30-50k per library that we add to this combination. Note that in release builds, the converter libraries add up to 761k. Based on this first test, I'm going to estimate that we can knock this down to about 600k, and perhaps even smaller!
Status: NEW → ASSIGNED
Priority: -- → P2
Summary: consolidate uconv libraries → consolidate uconv libraries
Target Milestone: --- → mozilla1.2alpha
Blocks: 158003
Dear alecf, this work will increase run time footprint of asian users. For example, if I am a Japanese users, most of my time will only view document in Japanese and Wester encoding so Gecko will only load ucvja and uconv into the memory, ucvko,ucvlatin, ucvcn,ucvibm,ucvtw and ucvtw2 won't get load into the memory. if you merge them, then we will increase the in-memory footprint by 500K for most Japanese users. The same thing happen to Simplified Chinese user, Korean users, and Traditional Chinese users. Remember, Japanese users don't read Korean just like you and me. If we don't want to let English users take those footprint hit from Korean converters, there are no reason to ask Japanese users to take the footprint hit from Korean converters neither. ucvcn is for simplified chinese users in China ucvtw and ucvtw2 are for traditional chineusers in Taiwan I don't mind you combine ucvmath or ucvibm into ucvlatin but usually no body use them anyway.
code issue, QA to yokoyama@netscape.com for now.
Keywords: intl
QA Contact: ruixu → yokoyama
ftang: ultimately, I think I can consolidate these libraries to well under 500k. Also, DLLs are paged into memory on a page-by-page basis, usually with a granularity of 4k - this will not increase the heap or code footprint of the product, because only those pages which are loaded (i.e. via nsIModule.getFactory) will be paged in.
ok, now I've combined ucvja, ucvtw, and ucvtw2 into one library: 165 ucvtw/ucvtw.dll 233 ucvtw2/ucvtw2.dll 237 ucvja/ucvja.dll 635 total and combined: 553 ucvasia.dll so now we've saved 82k, or about 12%.. its getting better!
and finally, having combined all 4 asian libraries into the same library: 165 ucvtw/ucvtw.dll 233 ucvtw2/ucvtw2.dll 237 ucvja/ucvja.dll 157 ucvko/ucvko.dll 792 total and combined: 669 ucvasia.dll a savings of 123k or about 16%! based on these results, I think we can knock down the release version of asia libraries from 552k to 466k, a savings of 86k. now to combine ucvcn/ucvibm/ucvlatin/ucvmath (currently 424k in in debug builds, or 209k in release builds..)
I suggest not merging ucvmath so that I don't run into lock-in when trying to effect changes there.
what's a lock-in?
lock-in means I may have to wait forever to effect changes if people perceive that they may have wider implications. Often, MathML-related changes are tangential to other developers, and they don't give them due diligence.
an update on ucvwestern: after combining ucvcn, ucvlatin, and ucvibm, I ended up with these results...(on a debug build) before: 57 ucvibm/ucvibm.dll 133 ucvcn/ucvcn.dll 181 ucvlatin/ucvlatin.dll 371 total after: 285 ucvwest.dll a savings of 86k, or 23%. Based on this estimation, I think that this will save another 44k on release builds, bringing our total up to 130k... and that's before fixing bug 158003! As for ucvmath, I don't see why we can't allow you to keep working on ucvmath stuff, with the understanding that the code is very seperate even though technically they're in the same DLL. I'll certainly make sure it's all #ifdefed appropriately.
Keeping it separate will avoid undue hassle (I speak from experience) -- it is only 19K and doesn't cause harm to anyone.
Attached file consolidate everything but ucvmath (obsolete) (deleted) —
ok fine. for now I'll skip ucvmath .. but I may be back! :) Anyway, this gzipped patch is for Unix and Win32.. I'm going to apply it to my mac build and attempt to create 3 new projects.. (ucvutil, ucvwestern, and ucvasia) wish me luck :) I don't know what mimetype to use for gzip.. the un-gzipped patch is 551k, most of which is very mechanical removal of GetMaxLength from MANY classes, and removal of all the nsUC*Support.* files, which are now located in intl/uconv/util as a static library.
Attachment #92455 - Attachment mime type: application/octet-stream → application/x-gzip
Attached patch consolidate everything but ucvmath v1.1 (obsolete) (deleted) — Splinter Review
A minor update, and without the -N, so the patch is smaller and doesn't need to be gzipped - for some reason I couldn't read ungzipped patches on my mac.
Attachment #92455 - Attachment is obsolete: true
Comment on attachment 92746 [details] [diff] [review] consolidate everything but ucvmath v1.1 ok, I've tested this patch to the best of my ability on windows, mac, and Linux. when the tree opens today, I'll check in the updated mac project files (not yet part of the build) Next I'll do a test build on each platform (or maybe get someone to help me) and get it to IQA. Who should I ask in IQA to test this?
Attached patch update mac build scripts (obsolete) (deleted) — Splinter Review
this additional patch for mac will update the build scripts to only build the two new libraries - it also removes the use of the MANIFEST files, where were completely unnecessary - the manifest files only exported the CID files, which are only used by the module code. oh, I'll have yet another patch for packaging updates.
Attached patch updating packaging files (obsolete) (deleted) — Splinter Review
and this updates the packaging files, and removes the old library from an existing install. I'll bet I have to do this on the commercial side too... *sigh*
Ahhh, i should have paid more attention to your consolidation........... ucvcn doesn't belong to ucvwest.dll. It's for Simplified Chinese users. I'd really appreciate if you can remove the ucvcn from ucvwest.dll and add to ucvasia.dll Sorry, alec ;( >Next I'll do a test build on each platform (or maybe get someone to help me) >and get it to IQA. Who should I ask in IQA to test this? I can help you on Win if you'd like. I believe ylong may be the right iQA to test some sites. andreas: how is ylong's workload? alec's patch will impact all CJK and western pages. (ie. all the pages except MathML)
oh! absolutely no problem :) I didn't understand all these codepage things anyway! I'll do that and update the patch here.
Attached patch consolidate everything v1.2 (obsolete) (deleted) — Splinter Review
ok, this patch is one big patch which covers 1) moving ucvcn from ucvwestern to ucvasia during which I had to fix the g_AsciiMapping table that conflicted between a few modules - there is now a g_ucvko_AsciiMapping so that there is no more confusion. 2) mac build scripts 3) packaging hopefully this is the final patch, before reviews anyway. You'll also notice that during this process, I removed a bunch of implementations of GetMaxLength, and implemented support for a "MaxLengthFactor" which is the factor by which you multiply the srclen to get the destlen - this way, 95% of the classes can use the base class and not implement their own GetMaxLength, just to multiply by some constant number... this saves us a vtable and code footprint for every converter.
Attachment #92746 - Attachment is obsolete: true
Attachment #93163 - Attachment is obsolete: true
Attachment #93167 - Attachment is obsolete: true
roy - thanks for the offer to help - do you need a build, or would you like to just apply the patch to your own build? one thing I noticed after I attached the patch was that in TestUConv.cpp you have to put an #ifdef NS_DEBUG around: + if (dec) { + nsCOMPtr<nsIBasicDecoder> isBasic = do_QueryInterface(dec); + if (isBasic) { + basicDecCount++; + printf("b"); + } + else printf(" "); + } + else printf(" "); and around the equivalent code for nsIBasicEncoder - it is only a test of course I'm working on getting a linux build now.. I could really use some help on the mac though, my little iMac is going to take forever to build a fresh release build...
>roy - thanks for the offer to help - do you need a build, or would you like to >just apply the patch to your own build? I just thought you may need my Windows build help; but I guess you have no problem on Windows platform. >I could really use some help on the mac though naoki? can you help alec on Mac?
ok, one more nit on that last patch.. I just applied this to a totally fresh tree - you need to switch the order of "src" and "util" in intl/uconv/Makefile.in - the static library in "util" is required to build the dynamic library in "src" I'll make a new patch once I'm certain that this builds once and for all :)
Attached patch consolidate everything v1.21 (obsolete) (deleted) — Splinter Review
ok, last patch, I hope :) This just cleans up a few minor things: 1) g_ucvko_AsciiMapping stuff - causing missing symbols during link 2) NS_DEBUG stuff in TestUConv.cpp 3) build ordering in intl/uconv/Makefile.in
Attachment #93181 - Attachment is obsolete: true
ok, that last patch does in fact work. So who do we have lined up to test? Windows: Roy or ylong? Mac: Naoki? Linux: ylong? Thanks a LOT for your help guys. What platforms do we need actual builds on, and what platforms can we just apply the above patch? The patch is obviously easier because distributing binaries around can be pretty difficult :) Another option is that I could also branch the intl/uconv directory and someone could update that directory to the branch before building.
>ftang: ultimately, I think I can consolidate these libraries to well under 500k. >Also, DLLs are paged into memory on a page-by-page basis, usually with a >granularity of 4k - this will not increase the heap or code footprint of the >product, because only those pages which are loaded (i.e. via >nsIModule.getFactory) will be paged in. If that is the case, why don't we combin them together into one uconv.dll and let English user see the same impact ? Please either (1) leave the dll structure as is, or (2) combine them together with uconv, ucvlatin and every converter into one dll. I don't want to see in one hand we want to keep minimum download for US/English uers and in the other hand force Asian embedding build increase download size. IF embedding care about dll download size for English users. I care that for Asian embedding custmer who wil lonly include japanese dll but not chinese dll too.
What asian embedding customers are demanding (for instance) a japanese-only dll? I'd like to know before we hypothesize about what markets are demanding what configuration. I understand that Japanese/Chinese/etc are independent users but I am trying to weigh the overall cost of having all these seperate DLLs - to generate the most benefit for the largest markets. Right now we're hurting all markets by having bloated DLLs. The way I look at it is this: After these changes, here are the net benefits for each market: Include all libraries | only include market-specific Non-Asian -130k | -86k Asian -130k | +382k (where Asian means a specific asian market like Japanese - the increase is the cost of including the extra non-Japanese converters that you get with the combined DLL, minus the total savings of combining the DLLs) If I were to combine them all into one dll, the results would look something like: Include all libraries | Don't include them Non-Asian -130k | +382k Asian -130k | +382k Basically I'm optimizing for 3 markets right now. The most important markets to me are both of the "Include all libraries" markets - the ones who want full browser support. The next most important are the non-asian who don't want the libraries, because they are currently the next most demanding embedding customer. All this being said, all of these patches do not prevent us later adding some tweaks to the Makefiles which will allow individual DLLs to be created - for instance, I could forsee a new flag like MOZ_SEPERATE_CONVERTERS=ja which would cause ucvja to be generated as a DLL in addition to a static library, thus satisfying the japanese-only market.
naoki/ylong/roy, I need some help getting this tested. Can I get some help? It would work best if I could get you to apply the patch and do a build, but if I need to deliver binaries to someone, I will!
Alec, please talk to Frank and get the patch reviewed first. This is not a small patch, I prefer if you could provide a binray to IQA (assuming the change is already in your local build).
>What asian embedding customers are demanding (for instance) a japanese-only dll? let me ask a different question: What asian embedding customers are demanding (for instance) a english-only dll? that is what we have in the embedding project right now, right ? for the very same reason, (if there are a need for english only embedding), you will have a need for japaense only. alecf: why don't we sit down in a meeting together first.
ok, talked with frank over e-mail - we decided that one DLL really is the way to go. I have a working patch for linux/windows and I just need to get the mac project files lined up before I attach another patch.
alright, I have a windows release build with everything combined into one dll. The results: Before consolidation: 48 ./src/uconv.dll 84 ./ucvcn/ucvcn.dll 36 ./ucvibm/ucvibm.dll 184 ./ucvja/ucvja.dll 120 ./ucvko/ucvko.dll 100 ./ucvlatin/ucvlatin.dll 28 ./ucvmath/ucvmath.dll 128 ./ucvtw/ucvtw.dll 184 ./ucvtw2/ucvtw2.dll 912 total after consolidation: 728 src/uconv.dll that's a savings of 186k! with the 2-dll solution, it would have been 768k, so the final consolidation saved us an additional 40k.
Attached patch consolidate everything v1.3 (deleted) — Splinter Review
ok, I have consolidated all 8 dlls into a single dll. The attached patch is everything except the differences to uconv.xml on mac - this patch is 380k, and the patch to uconv.xml alone is 345k, due to some strange changes codewarrior made when I added some files. Anyway, ftang, can you review this? 90% of the changes are changes to classes such that GetMaxLength() is calcluated by the base class as a factor of the srclen - for those classes which had a custom GetMaxLength() - for instance UTF8 conversion, and so forth, I just left GetMaxLength alone.
Attachment #93342 - Attachment is obsolete: true
+// ucvlatin +#include "nsUCvLatinCID.h" +#include "nsUCvLatinDll.h" [...etc...] what about having a #define list that is hosted in each ucvXXX. This way, you could merge ucvmath as well, knowing that any changes over there won't have to require to change the driver files explicitly because the setup is distributed.
rbs: true enough.. I'm going to make another pass through for bug 158003 so how about I do that there?
OK then.
Comment on attachment 94056 [details] [diff] [review] consolidate everything v1.3 r=ftang except the part of ucvasia.dll,so ucvwest.dll,so and mozucth.xpt in xpinstall/packager which I mention to alecf in private email. I think we should land this asap so IQA can test it and we have enough time to react befor m1.2 alpha if there are any mistake
Attachment #94056 - Flags: review+
I think now is the perfect timing to land this kind of mass changes. please get sr=
I agree - sr requests have been made, I'm just waiting for them to come in...
Comment on attachment 94056 [details] [diff] [review] consolidate everything v1.3 sr=blizzard (Really with a patch this large and this repetitive it's a rubber stamp, but I did look through the whole patch and didn't see anything bad.)
Attachment #94056 - Flags: superreview+
ok, this has landed - still lots of time to keep testing this for mozilla1.2
FYI this patch appears to have cost us roughly 1% at startup (see comet).
FYI, static builds broke with this checkin. For some reason, ucvmath & the merged uconv dlls are redefining symbols in some Decoder/Encoder support classes: nsBasicDecoderSupport nsBufferDecoderSupport nsTableDecoderSupport nsMultiTableEncoderSupport etc
darin: yeah, I noticed that :( I'm trying to see what I can do.. I have something that will further reduce this dll by another 30k or so (See bug 158003) seawood: yes, before each library had their own unique set of shared classes - ucvja had "nsUCvJABasicDecoderSupport", ucvko had "nsUCvKOBasicDecoderSupport" and so forth - now they all share the same set of classes, as defined in the library intl/uconv/util - ucvmath shares these as well now.. I don't know exactly how the static build works, but I'm guessing we need to make sure that library doesn't get added twice.
ok, marking this fixed for now... still working on a way to recover the startup cost...
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
I was looking at the static build problem -- it seems that ucvmath still has its own copy of what's in uconv/util. To fix the static build, it seems like you need to make ucvmath use the util library, and then list the util library in LIBS instead of SHARED_LIBRARY_LIBS for one or both of src and ucvmath.
The static build bustage is now covered by bug 163217.
Blocks: 163737
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: