Closed Bug 391958 Opened 17 years ago Closed 17 years ago

Improve partial patch generation performance

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: mtschrep, Assigned: coop)

References

(
URL
)

Details

Attachments

(4 files, 3 obsolete files)

Patch to patcher2.pl to allow it to optionally call make_incremental_updates.py 17 years ago Mike Schroepfer (deleted), patch	preed : review+	Details \| Diff \| Splinter Review
New incremental update implementation [checked in] 17 years ago Mike Schroepfer (deleted), text/plain	benjamin : review+	Details
patcher2.pl patch to use make_incremental_updates.py, my take 17 years ago J. Paul Reed [:preed] (deleted), patch		Details \| Diff \| Splinter Review
patcher2.pl patch to use make_incremental_updates.py, my take II [checked in] 17 years ago J. Paul Reed [:preed] (deleted), patch	coop : review+	Details \| Diff \| Splinter Review
wrap RunShellCommand for better logging/error handling 17 years ago Robert Helmer [:rhelmer] (deleted), patch	coop : review+	Details \| Diff \| Splinter Review
wrap RunShellCommand for better logging/error handling [checked in] [as landed] 17 years ago Robert Helmer [:rhelmer] (deleted), patch		Details \| Diff \| Splinter Review
[checked in] set the ENV to include mozilla/dist/host/bin in fast mode 17 years ago Robert Helmer [:rhelmer] (deleted), patch	coop : review+	Details \| Diff \| Splinter Review

Mike Schroepfer

Reporter

Description

•

17 years ago

Attached patch Patch to patcher2.pl to allow it to optionally call make_incremental_updates.py (obsolete) (deleted) — Details — Splinter Review

When I saw the computer time required to complete an entire build process I got a little upset so I decided to look into it a bit. The patch generation time looked to me to be the largest single entry so I focused there. Running patcher2.pl on my local system it became obvious that partial patch generation was the single largest operation. Our current process looks roughly like this (leaving out lots of edge cases): 1) for each pair of builds (e.g. 2004-mac-af, 2005-mac-af) 2) unpack from full mar (2004-mac-af.mar) into a temp dir mar -x 2004-mac-af.mar bunzip each entry in tmp from dir 3) unpack to mar (2005-mac-af.mar) into a temp dir mar -x 2005-mac-af.mar bunzip each entry in tmp to dir 4) Compare the from/to dirs: For files in both: Calculate patch file using mbsdiff bzip patch file bzip original file if patch file is smaller add patch file to manifest else add full file to manifest For files only in to: bzip file and add to manfiest For files only in from: add removed instruction to manifest Look for removed-files file in to dir if exists add each entry to manifest 5) bzip update.manifest 6) build mar of all files from step 4 + update.manifest the process with make_incremental_updates.py looks like this: 1) for each pair of builds (e.g. 2004-mac-af, 2005-mac-af) 2) unpack from full mar (2004-mac-af.mar) into a temp dir mar -x 2004-mac-af.mar 3) unpack to mar (2005-mac-af.mar) into a temp dir mar -x 2005-mac-af.mar 4) Compare the from/to dirs: For files in both: compute tuple (from.digest,to.digest) composed of sha digest of from/to file (in bzip form) if I've seen this tuple before use previously generated patch else bunzip from/to files Calculate patch file using mbsdiff bzip patch file bzip original file if patch file is smaller add patch file to manifest else add full file to manifest add results to cache with key (from.digest,to.digest) For files only in to: add file to manifest For files only in from: add removed instruction to manifest Look for removed-files file in to dir if exists add each entry to manifest 5) bzip update.manifest 6) build mar of all files from step 4 + update.manifest The two primary optimizations are: a) We never run mbsdiff on the same pair of files twice. Since the large binary files on the same platform (e.g. 2004-mac-firefox-bin, 2005-mac-firefox-bin) are the same for 40+ locales this is a *huge* savings b) We compare files using a sha.digest of the bziped version of them. This means skip bunziping files that are the same in from/to On my MBP it takes 2h 40Minutes to generate all 130 partial patches for 2004->2005 using the existing system. The new way takes 8minutes. There is a third optimization which is possible by implementing a mar file reader in python. This lets me calculate the sha digests in-place without extracting the mar into a tmp dir. This takes the runtime down from 8mins to about 6mins - but I'm not super-thrilled about having yet another mar impl (even though it is a very simple format) and a 20x speedup is not too shabby to start. Building patches on each mac/win/linux builder is likely a better optimization. Threading it doesn't seem to be much help (running 3 instances goes about 5% faster) at this point. Other fun stuff I discovered along the way: 1) make_incremental_update.sh relies on sort to compare files in from/to. The sort order in our production patches made no sense at all to me (and was different than on my local system). Turns out: http://www.gnu.org/software/fileutils/doc/faq/core-utils-faq.html#Sort%20does%20not%20sort%20in%20normal%20order! looks to be true on the production system used to build patches. It results in a manifest file in a nonsense order. 2) unwrap_full_update.pl doesn't seem to work on MacOS without some hacks around case-sensitivity 3) Removed the .5s delay in the inner loop of CreatePartialPatches select(undef, undef, undef, 0.5); 4) Not sure why we are using /dev/shm/tmp as our TMPDIR, changed to /tmp in my local config 5) There are two separate places in make_incremental_update.sh where we exclude processing of certain files. In one place a filename (e.g. "update.manifest") will be excluded if it is anywhere in the dir hierarchy. In the other it will be excluded only at the top dir level. I tried to rationalize this in the new impl 6) make_incremental_updates.sh does not properly strip spaces in files list in removed-files 7) There are a surprising number of special cases (searchplugins/extensions/etc) in incremental update generation. So this patch includes: a) make_incremental_updates.py as a replacement for: make_incremental_update.sh unwrap_full_update.pl parts of patcher2.pl b) patches to patch2.pl to optionally use make_incremental_updates.py c) A test directory with a set of specially constructed mar files to attempt to execute every code path and special case in mar construction. Also includes some hacky utils to compare and build mars. I've rebuild the 2004-2005 patches on my system using both the old and new, and run through the unit tests. make_incremental_update.py produces the same results as existing tools in both cases with the following exceptions: a) manifest file is slightly different order b) make_incremental_updates.py properly strips spaces from removed files My python mojo is about a 2/10 (perl=1/10) so be nice since I'm sure I've made some humorous errors here :-).

Mike Schroepfer

Reporter

Comment 1

•

17 years ago

Attached file New incremental update implementation [checked in] (deleted) — Details

Mike Schroepfer

Reporter

Comment 2

•

17 years ago

Added .zip of test dir at the URL (too big to attach). A few final notes: a) I re-wrote make_incremental_update.sh because in order to implement the first optimization I needed to carry state from one patch generation to the next. The easiest way to do this was to have a single program process all partial patches. b) This is designed to drop into the existing systems. Configuration and processing is still the same in patcher so none of the 'which patches do I generate' logic has to be re-tested. c) The only aspect I haven't tested/implemented is forced updates. It is trivial to implement in miu.py - but i didn't have a sample patcher.cfg that had them to test it end-to-end.

URL: http://people.mozilla.com/~schrep/tes...

Reed Loden [:reed]

Updated

•

17 years ago

Assignee: build → mtschrep

J. Paul Reed [:preed]

Updated

•

17 years ago

Attachment #276398 - Attachment mime type: text/x-python → text/plain

Mike Schroepfer

Reporter

Comment 3

•

17 years ago

Comment on attachment 276398 [details] New incremental update implementation [checked in] Since Darin wrote make_incremental_updates.sh not sure who should review this...

Attachment #276398 - Flags: review?(benjamin)

Mike Schroepfer

Reporter

Comment 4

•

17 years ago

Comment on attachment 276397 [details] [diff] [review] Patch to patcher2.pl to allow it to optionally call make_incremental_updates.py integration with patcher a quick hack - so I'm sure there is a better way to do this...

Attachment #276397 - Flags: review?(preed)

J. Paul Reed [:preed]

Comment 6

•

17 years ago

(In reply to comment #4) > (From update of attachment 276397 [details] [diff] [review]) > integration with patcher a quick hack - so I'm sure there is a better way to do > this... Couple quick things: I want to set this up on the machines we do the patch generation on for releases, so we can make sure it'll work (I don't have any reason to believe it won't just want to test :-) Also, I think we may be able to integrate this a bit cleaner by tweaking MozAUSLib::CreatePartialMar() so it just Does The Right Thing. I'll poke around how difficult that would be when I do the first part. This is cool!

Benjamin Smedberg

Comment 7

•

17 years ago

Comment on attachment 276398 [details] New incremental update implementation [checked in] > def __init__(self, work_dir, file_exclusion_list, path_exclusion_list): > self.work_dir=work_dir > self.archive_files=[] > self.manifest=[] > self.file_exclusion_list=file_exclusion_list > self.path_exclusion_list=path_exclusion_list Nit: most places you use spaces around =, which IMO is more readable. Can you do that here as well? > if filename.startswith("extensions/"): > testdir = "extensions/"+filename.split("/")[1] # Dir immediately following extensions is used for the test > self.manifest.append('add-if "'+testdir+'" "'+filename+'"') > else: > self.manifest.append('add "'+filename+'"') I think that here and throughout the script this would be more readable using the string-formatting operator: self.manifest.append('add "%s"' % filename) >def exec_shell_cmd(cmd): > """Execs shell cmd and raises an exception if the cmd fails""" > if (os.system(cmd)): > raise Exception, "cmd failed "+cmd Do we need to support python 2.3 with this script? If not, please use subprocess and check_call. See http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/tools/cvs2hg-import.py&rev=1.9&mark=15-26 for something that works in python 2.4+, and you can just "from subprocess import check_call" if you only need to support python 2.5+ >def bzip_file(filename): > """ Bzip's the file in place. The original file is replaced with a bzip'd version of itself > assumes the path is absolute""" > exec_shell_cmd('bzip2 -z9 "' + filename+'"') > os.rename(filename+".bz2",filename) Do you want to do this instead of using the python builtin bz2 module? I see you using bz2 elsewhere in this script. >def extract_mar(filename, work_dir): > """ Extracts the marfile intot he work_dir > assumes work_dir already exists otherwise will throw osError""" > print "Extracting "+filename+" to "+work_dir > saved_path = os.getcwd() > try: > os.chdir(work_dir) > exec_shell_cmd("mar -x "+filename) > finally: > os.chdir(saved_path) I'd like the "mar" command to be configurable, so that we don't have to put mar in the PATH: we could pick up MAR from the environment: mar = os.environ.get('MAR', 'mar'); or from the commandline (e.g. --mar=/path/to/mar) r=me with nits fixed... would be good to have a second-review from pickone('luser', 'preed', 'rhelmer')

Attachment #276398 - Flags: review?(benjamin) → review+

J. Paul Reed [:preed]

Comment 8

•

17 years ago

Attached patch patcher2.pl patch to use make_incremental_updates.py, my take (obsolete) (deleted) — Details — Splinter Review

J. Paul Reed [:preed]

Comment 9

•

17 years ago

Comment on attachment 276397 [details] [diff] [review] Patch to patcher2.pl to allow it to optionally call make_incremental_updates.py After playing around with this a bit more, I don't think it's worth it to move this code around... patcher2 is due for a rework soon anyway. I attached a take on this; I ran patcher2.pl with the attached patch here, and then ran patcher2 with the patch I attached, and diffed the result; they're the same. The only changes I made relate to some variable renaming, so it was clearer when the fast-patcher option was being used, and using some perl-idioms, to match the style of the rest of the patcher2 code.

Attachment #276397 - Flags: review?(preed) → review+

J. Paul Reed [:preed]

Comment 10

•

17 years ago

Attached patch patcher2.pl patch to use make_incremental_updates.py, my take II [checked in] (deleted) — Details — Splinter Review

Ugh... last version of this patch had a lot of garbage in it that wasn't relevant to the patch. This version is just the relevant stuff. diff had -w on it, too, to ignore the indentation changes from Schrep's original patch.

Attachment #278639 - Attachment is obsolete: true

J. Paul Reed [:preed]

Comment 11

•

17 years ago

Reassigning to build, per a request from joduinn and an ok from schrep.

Assignee: mtschrep → build

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Updated

•

17 years ago

Priority: -- → P3

Reed Loden [:reed]

Updated

•

17 years ago

Assignee: build → nobody

QA Contact: mozpreed → build

Chris Cooper [:coop] (he/him)

Assignee

Updated

•

17 years ago

Assignee: nobody → ccooper

Priority: P3 → P2

Chris Cooper [:coop] (he/him)

Assignee

Updated

•

17 years ago

Status: NEW → ASSIGNED

Chris Cooper [:coop] (he/him)

Assignee

Comment 12

•

17 years ago

Comment on attachment 278645 [details] [diff] [review] patcher2.pl patch to use make_incremental_updates.py, my take II [checked in] Some minor comment typos, and I would consider including Data::Dumper at the outset (and possibly using it elsewhere to log data structures), but otherwise looks good.

Attachment #278645 - Flags: review+

Chris Cooper [:coop] (he/him)

Assignee

Comment 13

•

17 years ago

Comment on attachment 276398 [details] New incremental update implementation [checked in] Checked in on trunk.

Attachment #276398 - Attachment description: New incremental update implementation → New incremental update implementation [checked in]

Chris Cooper [:coop] (he/him)

Assignee

Updated

•

17 years ago

Attachment #276397 - Attachment is obsolete: true

Chris Cooper [:coop] (he/him)

Assignee

Comment 14

•

17 years ago

Comment on attachment 278645 [details] [diff] [review] patcher2.pl patch to use make_incremental_updates.py, my take II [checked in] Checked in on trunk.

Attachment #278645 - Attachment description: patcher2.pl patch to use make_incremental_updates.py, my take II → patcher2.pl patch to use make_incremental_updates.py, my take II [checked in]

Chris Cooper [:coop] (he/him)

Assignee

Comment 15

•

17 years ago

The patcher changes are causing a build process failure during the 2.0.0.11 release. I've reverted the change and am trying to diagnose.

Chris Cooper [:coop] (he/him)

Assignee

Comment 16

•

17 years ago

Planning to land this again on Friday when rhelmer is back around to help out.

Chris Cooper [:coop] (he/him)

Assignee

Comment 17

•

17 years ago

The patch has been re-landed along with a fix to make_incremental_updates.py to fix the atchlist_file / patchlist_filename discrepancy. Getting ready to test it out in the automation staging env.

Mike Schroepfer

Reporter

Comment 18

•

17 years ago

Just a note that the URL field as a link to a directory used for unit testing. It was too big to attach directly.