Open Bug 988976 Opened 11 years ago Updated 2 years ago

BEWARE: ccache does not work efficiently with -DGRE_BUILDID=YYYYMMDDhhmmss compiler option

Categories

(Firefox Build System :: General, defect)

x86_64
Linux
defect

Tracking

(Not tracked)

People

(Reporter: ishikawa, Unassigned)

References

Details

(This applies to OSX also, and Windows, too, if ccache becomes available there.) BEWARE: ccache does not work efficiently with -DGRE_BUILDID=YYYYMMDDhhmmss (This was originally posted Bug 905646 - Enable -gsplit-dwarf by default on local builds. by mistake as https://bugzilla.mozilla.org/show_bug.cgi?id=905646#c22 Robert Kaiser pointed out that this should be put in a DIFFERENT bugzilla entry, and so here it is after a minor edit.) On a PC (Debian GNU/Linux 32-bit), I noticed ccache did not use cache very well. Where I thought hits would have been produced, there were cache misses. After investigation, I realize the problem. Depending on the setup in mozconfig file, I think, [and maybe the use of |mach| command ?] a following option in the C compiler command line is produced: e.g., -DGRE_BUILDID=20140327174446 It looks like a timestamp of a sort. Since ccache uses the command line arguments as well as the preprocessed C source file to create the hash key for cached object lookup, if the value of GRE_BUILDID is different at the each invocation of compiler even if the preprocessed source file is the same, so is the hash and there is no cache hit. [I am not sure when the GRE_BUILDID becomes different. It may be that it is recreated at the time of reconfigure. I have only seen this behavior yesterday and do not have much to say yet.] Also, ccache has a feature called "direct hit" check which seems to use only the timestamps of involved files (basically the list of files produced by -MD to, say, gcc), and the command line arguments. If the source file and involved files have not changed, and the compiler command arguments are the same, then we don't even need to create a preprocessed output, and can pick up the cached object immediately. This obviously won't work if there is a changing GRE_BUILDID define on the command line.) So we have to be careful in setting up the mozconfig for local compilation to make use of ccache effectively. I am not sure which mozconfig setting controls the appearance of -DGRE_BUILDID=YYYYMMHHmmss on the command line. (Or can it be an environment variable setting? Does the use of |mach| affect this?) At least, this -DGRE_BUILDID is not seen on another PC, so I think it has something to do with different mozconfig setup or something. Anyway, this is something one wants to be aware of when one wants to use ccache for local development. (Such an argument to C compiler won't make ccache useful on shared compilation server, too.) Someone in the know can fill in when this -DGRE_BUILDID is placed on the command line, and when the value changes. TIA cf. I checked if there is any related entry that handles ccache efficiency head on. Here is a summary. The short answer is not quite. The long answer is as follows. I checked if there is relevant entries for ccache. After searching for "ccache" without the quotes, and brief inspection to filter unrelated entries, I ended up with the following possibly relevant entries. Bug 927499 Core Build Config NEW --- bootstrap.py should install ccache 2013-10-16 Bug 911640 Core mach NEW --- mach should warn if using ccache but the cache size is small 2013-09-03 Bug 795183 Core Build Config NEW --- mach misses build warnings that are printed using relative paths (due to use of "--with-ccache" and CCACHE_BASEDIR) 2012-09-30 Bug 789317 Core Build Config NEW --- Disable ccache 2012-09-10 Bug 778711 Thunderbird Build Config NEW --- Remove --with-ccache from Thunderbird mozconfigs 2012-07-30 275790 Firefox Build Config REOP --- more ccache friendly NSPR_CFLAGS 2010-03-03 Note that no entries seem to be related to efficiency, i.e., of cache hits directly. (Bug 911640 is somewhat related. It asks for the warning from |mach| when cache size is too small (and so there can not be many cache hits. I use 3GB for cache directory.) Bug 275790 (filed in 2004) discusses the issue of superficially different pathnames that actually point to the files of exactly the same content in the source tree. Such source files ought to produce the same object and thus can produce cache hit, but due to the different paths used on the command line which was used as part of the input to hash function, it did not produce the same hash value and thus there was no hit. (But later version of ccache seemed to handle this different path issue cleverly according to a comment filed in 2010. https://bugzilla.mozilla.org/show_bug.cgi?id=275790#c8 I have not verified the behavior of ccache yet.) Bug 789317 and Bug 778711 seem rather disturbing because it asks for disabling the use of ccache in the standard config (!?). Bug 789317 - Disable ccache I read this and it reported a rather dismal performance of ccache WHEN THERE ARE VERY FEW HITS. This is to be expected. It all depends on the compilation scenario. As is noted in https://bugzilla.mozilla.org/show_bug.cgi?id=789317#c6 I think it is worthwhile on the try server when someone pushes the build a few times, etc. or the configuration is changed, but the source files remain the same (as in clobber), etc. Bug 778711 has a more ominous title. But as I read it, I realize it has something to do with the use (or non-use) of ccache when gcc/clang transition took place(?) on OS X. Eventually, then in "Bug 775305 - reenable ccache on os x", ccache was shown to be effective on try server with some data. ( https://bugzilla.mozilla.org/show_bug.cgi?id=775305#c9 and the followups.) So was installed back again on OSX. So the use of ccache was resurrected (Bug 703232 - Clean up mozconfigs a bit), I think. I now realize Bug 778711 "Remove --with-ccache from Thunderbird" was never acted upon although the bugzilla entry was entered(?). So using ccache with thunderbird should be OK for your local config. TIA
Hi, This is the difference of the content of mozconfig for different PCs. /tmp/t.tmp.1 is the sorted non-comment lines from the MOZCONFIG where -DGRE_BUILDID is NOT produced. /tmp/t.tmp.2 is the same data from a PC where -DGRE_BUILDID was produced on the command line. --- /tmp/t.tmp.1 2014-03-28 14:08:12.000000000 +0900 +++ /tmp/t.tmp.2 2014-03-28 14:08:12.000000000 +0900 @@ -1,27 +1,17 @@ + ac_add_options --disable-gstreamer ac_add_options --disable-jemalloc ac_add_options --disable-libjpeg-turbo -ac_add_options --disable-necko-wifi -ac_add_options --disable-necko-wifi ac_add_options --disable-unified-compilation ac_add_options --disable-webrtc ac_add_options --enable-application=mail ac_add_options --enable-crashreporter ac_add_options --enable-debug ac_add_options --enable-debug-symbols=-gsplit-dwarf -ac_add_options --enable-default-toolkit=cairo-gtk2 -ac_add_options --enable-gc-zeal -ac_add_options --enable-official-branding ac_add_options --enable-optimize="-g -O2 -freorder-blocks" -ac_add_options --enable-shared ac_add_options --enable-tests ac_add_options --enable-valgrind ac_add_options --with-ccache=/usr/bin/ccache -ac_add_options --with-system-ply -mk_add_options BUILD_MODULES=all -mk_add_options BUILD_OFFICIAL=1 -mk_add_options MOZILLA_OFFICIAL=1 -mk_add_options MOZ_CO_PROJECT=mail -mk_add_options MOZ_MAKE_FLAGS="-j4" -mk_add_options MOZ_MAKE_FLAGS="-s" -mk_add_options MOZ_OBJDIR=/REF-OBJ-DIR/objdir-tb3 +mk_add_options MOZ_MAKE_FLAGS="-j2" +mk_add_options MOZ_MAKE_FLAGS="-s" +mk_add_options MOZ_OBJDIR=/extra/ishikawa/download/TB-3HG/objdir-tb3 I think the following settings, which I didn't pay attention to much when I created mozconfig for my local use, may have bearings on the appearance of -DGRE_BUILDID. I don't have the following settings on a PC (Debian GNU/Linux 32-bit) where -DGRE_BUILDID was produced on the command line, whereas I do have the following on a PC (Debian GNU/Linux 64-bit) where I don't see -DGRE_BUILDID on the command line. -ac_add_options --enable-official-branding -ac_add_options --enable-shared -ac_add_options --with-system-ply -mk_add_options BUILD_MODULES=all -mk_add_options BUILD_OFFICIAL=1 -mk_add_options MOZILLA_OFFICIAL=1 -mk_add_options MOZ_CO_PROJECT=mail I have no idea which one may be responsible. (Also, maybe I should check the environment variable setting?) Or the script to invoke configure. I searched for GRE_BUILDID in mxr.mozilla.org, but could not figure out exactly where things are done this way. On a production server that creates a daily release, having a timestamp makes sense. But on a tryserver (or local development machine), this timestamp may not have that much value (unless one is doing a private release to third parties, etc.). So a warning about this would be appreciated in MDN document SOMEWHERE. (I could not find it using google after a few tries. It means it is hidden very well if it is written at all. :-( ) TIA PS: All I want is a "rock-solid mail client (tm)" that can be used in a cross-platform manner. Well, it takes a long time to reach "rock-solid" part. It is as if I am going through an adventure game :-)
OK, this problem may happen only for the top-level (in view of application structure) nsAppRunner.o, etc. If this were the case, the problem may not have such a big negative impact. I am checking it in the background. Stay tuned. TIA
OK, I tested a few compilation scenarios. The situation is not as bad as it seemed at first sight. 1 - There *ARE* files that are compiled with -DGRE_BUILDID with different time stamps for each compilation after hg qpop -a; and hg qpush -a. (The particular set of qpop/qpush on my PC did not cause configure to run.) 2 - But the number of such files are not that large (only a couple in my couple of dozen hg patch queues). They are under mozilla/xre/ subdirectory of c-c. 3 - Also, although "direct hit" does NOT occur for these files, it turned out that cache hit after the preprocessed matching DID produce cache hit. (I am not sure why I didn't see it before. It could be that the cache was purged without my knowing it: unlikely, but possible. After the incident, I cleared the cache used by ccache for local testing.) I am not sure why, in my particular observed compilation, the configure took place (after qpop/qpush, etc.), and excessive number of recompilation took place. But given my observation above, if someone in the know can explain how/where -DGRE_BUILDID is generated on the fly to compile certain files, I think my evaluation of the issue to be of grave importance can be degraded to something that can be ignored safely (especially considering that the large change to the files often results in configure, or even clobber.) On an individual development machine, that is. Yes, though, on a shared machine, if many users submit jobs as on tryserver, the appearance of -DGRE_BUILDID=... with varying time will result in certainly a loss of direct hit, and unnecessary CPP invocation that could have been saved. Looking at the discussion of build infrastructure machine usage and maintenance, anything that can lessen the workload should be tried. This may be one such target. So if someone can explain why/how -DGRE_BUILDID is produced on the fly, I will be very delighted. TIA
> 3 - Also, although "direct hit" does NOT occur for these files, it turned > out that cache hit after > the preprocessed matching DID produce cache hit. I would think the preprocessed file would only match for cases where we pass in -DGRE_BUILDID, but the file doesn't actually use the macro. For example, in toolkit/xre, only nsAppRunner.cpp uses GRE_BUILDID, but the define is passed into all the .cpp files in that directory. So I'd expect you'd get no direct hits, then get preprocessor hits for *.cpp minus nsAppRunner.cpp, then a cache miss for nsAppRunner.cpp > I am not sure why, in my particular observed compilation, the configure took > place (after qpop/qpush, etc.), and excessive number of recompilation took > place. But given my observation above, > if someone in the know can explain how/where -DGRE_BUILDID is generated on > the fly to compile > certain files, I think my evaluation of the issue to be of grave importance > can be > degraded to something that can be ignored safely (especially considering > that the > large change to the files often results in configure, or even clobber.) > On an individual development machine, that is. The buildid is generated from config/Makefile.in: # Generate a new buildid every time we "export" in config... that's only # supposed to be once per-build! export:: ifdef MOZ_BUILD_DATE printf '%s' $(MOZ_BUILD_DATE) > buildid else $(PYTHON) $(topsrcdir)/toolkit/xre/make-platformini.py --print-buildid > buildid endif When you do a top-level build, you'll see a line like this in the output: /home/mshal/mozilla-central-git/obj-x86_64-unknown-linux-gnu/_virtualenv/bin/python /home/mshal/mozilla-central-git/toolkit/xre/make-platformini.py --print-buildid > buildid That generates the new buildid file, which is used by some Makefiles to generate the '-DGRE_BUILDID=XYZ' command-line argument. Although I agree the problem is not as big as you initially suspected (-DGRE_BUILDID is confined to ~7 Makefiles, gets passed in to ~23 .cpp files, and is only actually used in ~3 .cpp files), we could still make some improvements here: 1) Only pass in -DGRE_BUILDID to files that actually use it (this would help with the direct hit problem) 2) Disable caching on files that do use GRE_BUILDID, since they'll pretty much never get a cache hit anyway 3) Fix other the buildid dependency issue (bug 729488)
(In reply to Michael Shal [:mshal] from comment #4) > > 3 - Also, although "direct hit" does NOT occur for these files, it turned > > out that cache hit after > > the preprocessed matching DID produce cache hit. > > I would think the preprocessed file would only match for cases where we pass > in -DGRE_BUILDID, but the file doesn't actually use the macro. For example, > in toolkit/xre, only nsAppRunner.cpp uses GRE_BUILDID, but the define is > passed into all the .cpp files in that directory. So I'd expect you'd get no > direct hits, then get preprocessor hits for *.cpp minus nsAppRunner.cpp, > then a cache miss for nsAppRunner.cpp > > > I am not sure why, in my particular observed compilation, the configure took > > place (after qpop/qpush, etc.), and excessive number of recompilation took > > place. But given my observation above, > > if someone in the know can explain how/where -DGRE_BUILDID is generated on > > the fly to compile > > certain files, I think my evaluation of the issue to be of grave importance > > can be > > degraded to something that can be ignored safely (especially considering > > that the > > large change to the files often results in configure, or even clobber.) > > On an individual development machine, that is. > > The buildid is generated from config/Makefile.in: > > # Generate a new buildid every time we "export" in config... that's only > # supposed to be once per-build! > export:: > ifdef MOZ_BUILD_DATE > printf '%s' $(MOZ_BUILD_DATE) > buildid > else > $(PYTHON) $(topsrcdir)/toolkit/xre/make-platformini.py > --print-buildid > buildid > endif > > When you do a top-level build, you'll see a line like this in the output: > > /home/mshal/mozilla-central-git/obj-x86_64-unknown-linux-gnu/_virtualenv/bin/ > python /home/mshal/mozilla-central-git/toolkit/xre/make-platformini.py > --print-buildid > buildid > > That generates the new buildid file, which is used by some Makefiles to > generate the '-DGRE_BUILDID=XYZ' command-line argument. > I see, thank you for the explanation. > Although I agree the problem is not as big as you initially suspected > (-DGRE_BUILDID is confined to ~7 Makefiles, gets passed in to ~23 .cpp > files, and is only actually used in ~3 .cpp files), we could still make some > improvements here: Yes, it is true. If this happens on a shared compilation farm or something, we would like to use up CPU to the last remaining cycle and the following optimization would be desirable. > 1) Only pass in -DGRE_BUILDID to files that actually use it (this would help > with the direct hit problem) > 2) Disable caching on files that do use GRE_BUILDID, since they'll pretty > much never get a cache hit anyway > 3) Fix other the buildid dependency issue (bug 729488) TIA
Product: Core → Firefox Build System
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.