Closed
Bug 699579
Opened 13 years ago
Closed 9 years ago
Only virus scan release files we haven't already scanned
Categories
(Release Engineering :: Release Automation: Other, defect, P3)
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 1210538
People
(Reporter: nthomas, Unassigned)
References
Details
(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1916] [release][automation])
When virus scanning releases, hash files after extracting them from dmg/mar/etc, and don't rescan if filename and hash match a previously scanned file. This will make the scanning much quicker and not hammer stage so hard.
Reporter | ||
Updated•13 years ago
|
Whiteboard: [release][automation]
Comment 1•13 years ago
|
||
Just to point out a tradeoff being made here: if updated anti-virus protection data becomes available that can detect a previously undetected virus, we won't catch that virus if we don't rescan the file.
eg.
- upload fileA which contains virusXYZ
- scan fileA and don't detect virus since the scanner doesn't yet detect virusXYZ
- update anti-virus definitions so we now detect virusXYZ
- skip scan of fileA, hence don't catch instance of virusXYZ
One workaround could be to only cache the scan result for a certain amount of time.
Reporter | ||
Comment 2•13 years ago
|
||
It's a fair point. To clarify, I'm only talking about within a scanning run, which takes about 3 hours at the moment but would be much shortened by this change. It wouldn't persist over separate jobs.
Comment 3•13 years ago
|
||
Or save the hash of the virus definitions along with the file hash ? :)
Updated•13 years ago
|
Blocks: hg-automation
Comment 4•13 years ago
|
||
I'm taking a stab at this here: https://github.com/bhearsum/tools/compare/master...caching-scan
My initial tests with only one installer and one mar (of the same local) show that it's slower with caching, I guess because of all the hashing. I'm going to try a larger scale test tomorrow locally, but I'd like to do a full test on surf after things calm down. Maybe over the weekend.
Assignee: nobody → bhearsum
Comment 5•13 years ago
|
||
todo: walk directories in a predictable order, probably alphabetical
Reporter | ||
Comment 6•13 years ago
|
||
I'd love to do win32/ and update/win32/ first instead of last.
Comment 7•13 years ago
|
||
(In reply to Nick Thomas [:nthomas] from comment #6)
> I'd love to do win32/ and update/win32/ first instead of last.
Hmm, I think we could just adjustment the automation piece to do this. Rather than giving it "/pub/mozilla.org/firefox/10.0.1-candidates/build1" to scan, we could give it the individual platform + update dirs, in whatever order we think best. I'd rather that than hardcode something in extract_and_run_command.py, or try to implement some generic sorting arguments. Like this: https://github.com/bhearsum/tools/compare/master...caching-scan#L0L15
Comment 8•13 years ago
|
||
Well, I ran my tests over the weekend and I'm not encouraged by the results of the current patch. Here they are:
| CACHED | NOT CACHED
j2 | 1h39min | 1h47min
j4 | 54min | 56min
j6 | 55min | 54min
j8 | 56min | 54min
The tests were run on an ix machine, which has 4 logical CPU cores. It looks to me like once you start maxing out the CPUs at j4, that caching with md5 makes no difference.
I'm going to try swapping in a faster hashing algorithm from http://pypi.python.org/pypi/pyhash/0.4.1, probably SuperFastHash.
Comment 9•13 years ago
|
||
....and, I can't even get this library to install on the build machine, it requires some very specific Boost library to compile, or something.
Comment 10•13 years ago
|
||
I ended up looking at http://mxr.mozilla.org/mozilla-central/source/tools/update-packaging/make_incremental_updates.py to get some hints on speeding this up and found that it was using sha1. Despite the fact that sha1 is supposed to be slower than md5, it seems to perform faster for this use case. In my tests, it ran at 46min in -j6 and -j8, and 48min in -j4 - which is 8min faster than no caching in both cases. I don't really understand how this can be the case, so if someone else understands, please fill me in!
I also found that make_incremental_updates.py has a cache of the hashes of the files themselves...which I bet is pretty hot considering how many guaranteed dupes you have per platform when generating partials. I bet this is a big part of where it gets its speed from. I'm not sure if we can do the same thing in extract_and_run_command.py without making it really specific for the task...but we might be able to throw the existing logic into a class and make it subclassable. If we do that, we could do something similar to what make_incremental_updates.py does, and cache the sha1 sums by file path + size + locale (we need locale, because helper.exe and probably some non-binary files could easily collide due to the only differences being strings...).
I'm not going to spend anymore time on this at the moment though, I want to see how things are with the new candidates/ directory before I spend more time here.
Comment 11•13 years ago
|
||
We're living with the current state of things, and I don't think I'll be touching this anytime soon.
Assignee: bhearsum → nobody
Component: Release Engineering → Release Engineering: Automation
QA Contact: release → catlee
Comment 12•13 years ago
|
||
Bulk move of bugs to Release Automation component.
Component: Release Engineering: Automation (General) → Release Engineering: Automation (Release Automation)
Comment 13•13 years ago
|
||
Mass move of bugs to Release Automation component.
Updated•13 years ago
|
No longer blocks: hg-automation
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•10 years ago
|
Whiteboard: [release][automation] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1916] [release][automation]
Comment 14•9 years ago
|
||
Possibly something to tackle as part of release promotion/beetmover?
Blocks: release-promotion
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•