Closed Bug 958286 Opened 11 years ago Closed 10 years ago

Determine if new remotefilelog extension will work in build farm.

Categories

(Developer Services :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: hwine, Unassigned)

References

Details

The new remotefilelog extension developed by face book (https://bitbucket.org/facebook/remotefilelog) looks very promising for use cases commonly encountered in the buildfarm. In that environment, the current limitations of the extension are achievable:
 - only usable on linux (client & server)
 - requires a high bandwidth connection client to server
 - requires ssh connections.

The benefits of remotefilelog are based on not needing to transfer full repository history, which translates to:
 - faster clones, especially for near-tip builds
 - reduced bandwidth usage (important for AWS)
 - reduced disk space usage on build boxes

First step is to determine what if the extension would work in our hg environment. Facebook fronts its server with memcache which offloads the hg server. We don't have this at present, and it may present too much of a load to the main ssh server.

If the extension should work, then setting up a dev/staging server would be needed to explore how this extensions:
 - impacts current client caching used in the build farm
 - whether the benefits are realized for try builds
 - plan a rollout/conversion strategy
I don't believe Mozilla has reached the scale for this extension to do much good. From my experience, incremental pulls don't take too long. And the overhead of storing the full history is negligible. And full clones should be rare (if we ever get our act together - see bug 851270).

(Keep in mind Facebook's repo is larger than ours and is growing at a faster rate.)

Furthermore, performance of hg.mozilla.org is not very good. We're serving files over NFS (slow and buggy) and we could probably throw more CPU at the servers. I set up a mirror of mozilla-central on EC2 and am able to clone 30% faster than hg.mozilla.org!

This extension is likely a lot of effort to get deployed. I think we should focus on 1) making the hg.mozilla.org and its mirroring infrastructure better and faster 2) Fixing our slaves to not clobber existing clones so much.

I concede this extension makes sense given the setup we have. But our current setup is severely flawed and installing this extension simply wallpapers over that suckitude. I think we should prioritize fixing what we have before introducing more complexity and new pieces to the solution. If we still have a perf problem once we're doing things optimally, then I think we should move forward.

Just my $0.02.
Fixing what we have is already prioritized and on track. It should be available for testing soon. See bug 937732 for the latest update on that.

If you're connected to a VPN you're welcome to try cloning from hgweb1.dmz.scl3.mozilla.com/mozilla-central and comparing the speed to hg.mozilla.org.
Thanks for the update, bkero!

Local disk only addresses concern #1 from above. Automation still has the problem that it purges source checkouts (bug 851270).

Also, distributing repositories via bundles should be faster than `hg clone`. It also reduces load on the Mercurial server, since clones require a non-significant amount of CPU and memory. Downloading static files, however...
(In reply to Gregory Szorc [:gps] from comment #3)
> Also, distributing repositories via bundles should be faster than `hg
> clone`. It also reduces load on the Mercurial server, since clones require a
> non-significant amount of CPU and memory. Downloading static files,
> however...

We already unbundle instead of cloning, and I've measured that a long time ago, and it was much slower than cloning. For good measure, I also (re)measured yesterday, and that still holds true. 7 minutes for a clone, vs 12 minutes for unbundle.
`hg bundle` by default uses bzip2 compression. revlogs use zlib compression. I'm pretty sure the wire protocol uses zlib compression as well.

As is common knowledge, bzip2 is slower than zlib. But, just to add some data, I created 3 flavors of bundles for 1f835fe670d7 of m-c:

compression     size    wall time
uncompressed   2591 MB     3:17
gzip            783 MB     3:59
bzip2           628 MB     6:13

If we care about wall time, we should not be using bzip2 for bundles. (We still likely want to offer bzip2 bundles for external users not having the bandwidth to clone.) For reference, hg bundle's -t takes {none,gzip,bzip2}.

Another idea is to use uncompressed clone (hg clone --uncompressed).

Enough talk about clone speed. Can we just minimize the number of clones by fixing bug 851270?
(In reply to Gregory Szorc [:gps] from comment #5)
> `hg bundle` by default uses bzip2 compression. revlogs use zlib compression.
> I'm pretty sure the wire protocol uses zlib compression as well.
> 
> As is common knowledge, bzip2 is slower than zlib. But, just to add some
> data, I created 3 flavors of bundles for 1f835fe670d7 of m-c:
> 
> compression     size    wall time
> uncompressed   2591 MB     3:17
> gzip            783 MB     3:59
> bzip2           628 MB     6:13

This is what i get on a AWS instance with a ssd (which is not what we are currently using):
  unbundle gz    5:07
  unbundle bz2  11:27
  clone -U       5:01

(unbundles were done with a bundle on local disk, clone over http on a locally hg served repository)

which makes more sense, because afaik, clones are actually just getting a bundle under the hood (so essentially, the server does bundle and the client unbundle, with some negotiation beforehand).

> Enough talk about clone speed. Can we just minimize the number of clones by
> fixing bug 851270?

Clone speed is actually important, as we now use AWS spot instance, this makes their first builds much quicker to start. Alternatively, we could make slaves not available to take build jobs until they have a local clone of anything.
Based on our research this does not create any noticable speedup for us. If anybody is interested in proving this wrong, please respond and we can discuss it then.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Component: WebOps: Source Control → General
Product: Infrastructure & Operations → Developer Services
You need to log in before you can comment on or make changes to this bug.