Closed Bug 1771620 Opened 2 years ago Closed 2 years ago

Further optimize output.sh phase by using parallel's `--pipe` and/or `--pipepart` mechanisms to provide output-file with filenames via stdin

Categories

(Webtools :: Searchfox, enhancement)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: asuth, Assigned: asuth)

References

Details

Attachments

(1 file)

The instrumentation added in bug 1567724 tells us that each output-file.rs invocation's bootstrapping for mozilla-central takes about 21.5 seconds to happen. GNU parallel's joblog indicates we currently trigger 341 of these invocations. Our m5d.2xlarge instances have 8 vCPUs each (which includes hyper-threading already), so (341 - 8) of these loads are redundant. Multiplying this out and then dividing by the 8 vCPUs, we find that each vCPU spends about 14.9 minutes doing redundant initialization that wouldn't be necessary if we passed the list of files to output over stdin to output-file. Since currently the output-file phase takes about 23 minutes, cutting out 15 minutes of waste would be a quite helpful improvement!

This does assume that output-file does not experience unbounded memory growth. In https://bugzilla.mozilla.org/show_bug.cgi?id=1567724#c19 I had noticed that the TreeDiffCache did grow our memory usage, but with the removal of blame-skipping logic, we no longer have a TreeDiffCache.

When I do this I'm also going to do something so the metrics scripts can tell how long build-codesearch.py takes.

I'm going to trigger the config2 indexer job now to see how that benefits, and in particular, I want to see if we have any problems with uneven distribution of work among the output-file invocations that might necessitate any quick additional changes like randomly shuffling all-files.

config2 indexing job is now 2h49m down from 3h55m (Thursday) down from 7h52m (before Thursday).

indexer-logs-analyze.sh output key excerpt with additional new logging and analyze logic for mozilla-beta which now gets to build-codesearch.py at 28:50 down from 44:30 down from 1:44:07. But note that we do now capture through check-index.sh which is a better gauge as it also includes livegrep/codesearch index building and the time required to compress all the HTML into gzip, and for mozilla-beta that timestamp is 34:58.

├── mozilla-beta                                                                               
│   └──                                                                
│         script                time since start   apparent duration  
│        ──────────────────────────────────────────────────────────── 
│         find-repo-files.py    0:00:00            0:01:23            
│         build.sh              0:01:23            0:07:07            
│         js-analyze.sh         0:08:30            0:03:12            
│         idl-analyze.sh        0:11:42            0:00:12                                     
│         ipdl-analyze.sh       0:11:54            0:00:00            
│         crossref.sh           0:11:54            0:08:42            
│         output.sh             0:20:36            0:08:14            
│         build-codesearch.py   0:28:50            0:03:19            
│         compress-outputs.sh   0:32:09            0:02:49            
│         check-index.sh        0:34:58                                
│                                                                      
├── mozilla-release                                                                            
│   └──                                                                
│         script                time since start   apparent duration  
│        ──────────────────────────────────────────────────────────── 
│         find-repo-files.py    0:00:00            0:01:25            
│         build.sh              0:01:25            0:07:06                                     
│         js-analyze.sh         0:08:31            0:03:12            
│         idl-analyze.sh        0:11:43            0:00:12            
│         ipdl-analyze.sh       0:11:55            0:00:00            
│         crossref.sh           0:11:55            0:08:40            
│         output.sh             0:20:35            0:08:12            
│         build-codesearch.py   0:28:47            0:03:21            
│         compress-outputs.sh   0:32:08            0:02:49            
│         check-index.sh        0:34:57                                
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Blocks: 1771633
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: