Closed Bug 11219 Opened 25 years ago Closed 23 years ago

Dependencies not strong enough for parallel builds

Categories

(SeaMonkey :: Build Config, defect, P3)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.0

People

(Reporter: tor, Assigned: cls)

Details

Attachments

(6 files)

Doing a parallel build (gmake MAKE="gmake -j12") reveals some weak dependencies in the build system. In the following directories, an attempt is make to link the library together before all the object files are ready. xpcom/build netwerk/build dom/src/build layout/build rdf/build extensions/wallet/build
I cleaned up several of these makefiles at just about the time you submitted this bug. Have you noticed any improvements since then? I think this is going to be an on-going problem until AOL deigns to actually put some money into my group's hands so we can buy the *Unix* machines we need to test stuff like this. I'd really like to set up a multi-processor SPARC server as a Tinderbox machine using gmake -jN.
Status: NEW → ASSIGNED
Here's a list of directories with dependecy problems in an 8/15 cvs pull: xpcom/build netwerk/build dom/src/build widget/src/gtk layout/build rdf/chrome/build extensions/wallet/build mailnews/db/mork/build mailnews/base/build mailnews/addrbook/build
I talked to tor about this one on #mozilla a while back. It seems the problem occurs when we extract the objects from the "static" libs to build the huge shared lib. We tried a few fixes but they only seemed to make the problem intermittent. Why do we bother making those "static" libs anyways? Wouldn't it be much easier (and faster) to just leave the objs in the subdirs and keep track of the objs from the build dir?
I argued for that when kipp first introduced the "composite" shared libs made up of static libs. I claimed it would save time and disk space if we just built .o's and linked them into whatever .so needed them. But it was decided that that would add too much complexity to the makefiles and too much local knowledge of "unrelated" modules. We _still_ haven't achieved real modularity, but apparently using static "sub-libs" allows people to convince themselves that mozilla is a truly modular product.
brian: if you make these changes for mailnews, I'll approve them
Adding kipp & shaver to the cc:. We are _never_ going to be completely modular if people don't care who they link against. In all of the cases I've checked so far ** (extensions/wallet/build, db/build, netwerk/build, rdf/build), the other "modules" that the shared lib dependents upon are really submodules of the shared lib's module or in some case part of the shared lib's module. Sure, now we can say the msgmork library is "independent" of the mork library by building a static lib. But we should not make mork dissect msgmork just for the sake of "independence." Especially when it appears to be the cause of some build breakage. NSPR does something similar to this but handles it sanely. Take a look at pr/src/md/unix/objs.mk & pr/src/Makefile.in on the AUTOCONF_NSPR_WIN32_XCOMPILE_19990621_BRANCH . ** I just ran into the exceptions: xpcom/appshell/eventloop/photon/Makefile.in which uses xp...but shaver informed me that xp was being removed from the build. widget/src/$TOOLKIT which *each* dissect widget/src/xpwidgets/libraptorbasewidget_s.a and the primary toollkit is in-turn dissected in widget/src/build along with libraptorbasewidget_s.a again. mailnews/addrbook/libaddrbook.so which includes rdf/util/src/librdfutil_s.a mailnews/base/build/libmsgbase.so which includes rdf/util/src/librdfutil_s.a mailnews/local/build/libmsglocal.so which includes rdf/util/src/librdfutil_s.a So 7.5 exceptions to the 28 cases where this would be better handled by an objs.mk like NSPR uses rather than the dissection rule. All of which should be thrown out as they are causing symbols to appear in multiple libraries.
mass reassigning briano's open bugs to me while he's on sabbatical.
accept bug.
mass move to M14.
Target Milestone: M14 → M18
Ok, I made what I think is some progress on this. I managed to remove about half of the SHARED_LIBRARY_LIBS usage from my tree. Basically, for each lib*_s.a, I create a objs.mk in the srcdir that creates the library. Both the Makefile.in that creates the lib*_S.a and the Makefile.in that links in that lib*_s.a include the objs.mk file. This gives us better dependency support for the lib*_s.a's source files and we don't have to dissect the lib*_s.a. There is a caveat though. :( Because I'm including every object file individually on the link line with full relative paths from DEPTH, the links lines can become fairly huge. When I converted xpcom over, the link line for libxpcom.so was over 3k! I'm worried that we may hit some shell or process argument size limit on some of our non-tier1 unix boxes. I fully expect the link line for layout to be twice the length of xpcom's. Maybe I'll look into incremental linking some more.
Attached patch Changes to xpcom & libreg (deleted) — Splinter Review
Attached file tgz of all new objs.mk files (deleted) —
Also, because of the way we can potentially have additional CFLAGS and/ DEFINES in each makefile, we cannot build the object files we depend upon from the current directory as suggested in 'Recursive Make Considered Harmful'. Instead, we need to fork a make in the directory where the dependent objects need to be built.
Great job, Chris. Looks like you're all over this. Do you want to reassign this bug to yourself and just put me on the Cc? As for the 3k command, I had thought 2K was the typical line length limit for sh and csh which is why we need xargs, but I must be mistaken since the 3K command worked. Even if it breaks some of the older non-tier 1 systems, it's a step in the right direction.
reassigning to cls per our conversation.
Assignee: granrose → cls
Status: ASSIGNED → NEW
Ok, so I underestimated just a tad. Using --enable-mathml, the length of the link line for libraptorhtml.so came to just under 13k. I think the posix standard length is 4k so this isn't going to work. Do we know of any linkers that don't do partial (or incremental) linking?
Status: NEW → ASSIGNED
mass re-assign of all bugs where i was listed as the qa contact
QA Contact: cyeh → chofmann
I applied the changes to the m16 tree so I would have a stable tree to work from. The link line for libraptorhtml.so is up to 15k now. After a brief conversation with Brad on irc, I'm not as concerned about the command line length. I realized that for non-gcc builds, some of our compile commands are over 5k. If a platform has a small shell line limit, chances are that they cannot build mozilla anyways. And if we can split up libraptorhtml.so (bug #43142), then all of this should be a moot point.
Also, I forgot to mention that these changes signifcantly reduce the amount of space needed to mozilla since they remove the unneeded lib*_s.a files. On linux, I see a savings of about 350M and on solaris, I see a savings of 600M. Both sets of builds were configured with: --enable-nspr-autoconf --enable-mathml --enable-svg --with-extensions
Attached file tgz updated with openbsd changes. (deleted) —
I have been informed by Colin that the proposed changes will not work on OpenVMS as it has a 4k cmd line limit. (Previous comment about 5k lines rescinded ...copy/paste error) He suggested using a linker script which appears to supported by GNU ld but not Sun ld. To make things more interesting, on a number of platforms, we call $(CXX) or $(CC) to link, not $(LD). Passing the linker script options to the linker via the compiler flag -Wl does not work.
To help us focus on the Sun/Solaris specific bugs when we do a bug query, I'm moving this one to Platform/OS category of PC/Linux which is the tier-1 supported Unix platform. There needs to be an AllUnix Bugzilla platform category.
OS: Solaris → Linux
Hardware: Sun → PC
adding myself to this one...
On the long drive home this weekend, I had an ephiphany. It was so simple. Why don't we just use symlinks? As in, symlink the dependent obj from two directories away into the current directory and actually link against the local symlink. In rules.mk, add: LDEP_OBJS = $(notdir $(DEP_OBJS)) and for each target that uses DEP_OBJS, add: @echo $(LDEP_OBJS) | xargs rm -f @$(foreach f, $(DEP_OBJS), ln -s $f $(notdir $f);) ... rm -f $(LDEP_OBJS) Using the local symlinks causes the link lines to shrink by about 50%. Unfortunately, due to the number of files in layout this is still too large (where's Jenny Craig when you need it?). floating:obj> wc foo2 1 362 6593 foo2 The other question is how do OpenVMS & OS/2 handle symlinks? Will they be able to take advantage of such a change or do I need to head back to the drawing board?
symlinks are already used throughout the build, so using symlinks here shouldn't be a problem. Now if you could just name those local libraries L1, L2, L3... we might be able to get the size of the command line down to something reasonable (unfortunately at the cost of readability).
Target Milestone: M18 → mozilla1.0
The original focus of this bug has been fixed as we've had -j4 tinderboxes & nightly builds for a long time now. We no longer allow the building of static & non-static libs in the same tree & the "static" build uses a completely different process so the bug that triggered this problem shouldn't occur again. I still want to get rid of those intermediate libs but that's for some indeterminate future date. Marking fixed.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: