Closed Bug 827123 Opened 12 years ago Closed 10 years ago

Replicate sqlite pushlog files to all mercurial hosts so we can eliminate NFS

Categories

(Developer Services :: General, task)

All
Other
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: fox2mike, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 1 obsolete file)

Tracker/Discussion bug (spawned from bug 826099)
Assignee: server-ops → server-ops-devservices
Component: Server Operations → Server Operations: Developer Services
I've recently been considering the idea of switching pushlog over to a real database (with a table per repo). This would make life much easier for the hgweb heads, since that will remove the requirement of rsyncing (or recalculating) a new pushlog file with every commit.

Callek said he has no problems with this, although needs to consult with the rest of releng before making this decision.
This could help with bug 498641.
(In reply to Ben Kero [:bkero] from comment #1)
> I've recently been considering the idea of switching pushlog over to a real
> database (with a table per repo). This would make life much easier for the
> hgweb heads, since that will remove the requirement of rsyncing (or
> recalculating) a new pushlog file with every commit.

I'm fine with this. The schema could probably stand a once-over from a DBA while we're at it.
Ted,

Can we get the current schema attached here please? or copy pasted if it's small enough.

Ben,

Should probably have a call with Sheeri to discuss this. Might need it's own MySQL cluster in scl3.
(In reply to Shyam Mani [:fox2mike] from comment #4)
> Ted,
> 
> Can we get the current schema attached here please? or copy pasted if it's
> small enough.

I'm not ted, but:

(from http://hg.mozilla.org/hgcustom/hghooks/file/3b0c66182bb0/mozhghooks/pushlog.py#l40 )

CREATE TABLE IF NOT EXISTS changesets (pushid INTEGER, rev INTEGER, node text);
CREATE TABLE IF NOT EXISTS pushlog (id INTEGER PRIMARY KEY AUTOINCREMENT, user TEXT, date INTEGER);
CREATE UNIQUE INDEX IF NOT EXISTS changeset_node ON changesets (node);
CREATE UNIQUE INDEX IF NOT EXISTS changeset_rev ON changesets (rev);
CREATE INDEX IF NOT EXISTS changeset_pushid ON changesets (pushid);
CREATE INDEX IF NOT EXISTS pushlog_date ON pushlog (date);
CREATE INDEX IF NOT EXISTS pushlog_user ON pushlog (user);

with the knowledge that we can happily adjust schema as we migrate off sqlite, as well as the fact that we do need to occassionally "reset" the pushlog (data) entirely (e.g. due to twig resets, or a try reset; as well as new repos)

----

My [not-a-dba] recommendation if we do this, as far as new schema; is to have a separate trees table, which records which tree a push belongs to, store all pushes in a single (partitioned) table, such that we keep the last say month worth of pushes in a fast-access mode, and further back can be slower.

store which tree a push belongs to in a new column, paired with new indexes

when resetting a tree, assign a new tree index (set tree name in trees table to "" or NULL), and purge old pushes of said tree after-x-time of tree being defunct. (when we know we don't need them anymore).
How many pushes are there per week/month/time period? 

What gets stored in the "node" field, that has the TEXT type?

What gets stored in the "user" field, that has the TEXT type?

When do pushes get defunct? Is is mostly a time-based thing or does it depend on versions?

Moving it to a real db is possible. There is a generic in scl3 (right now tbpl is on generic in phx1 so it's not a stretch that it would be on a shared server), and scl3 generic doesn't have a ton on it, so I'm willing to try it there first.
There are dozens of pushes per day, two INSERTs for each commit. Here's an example of the data.

sqlite> .schema changesets
CREATE TABLE changesets (pushid INTEGER, rev INTEGER, node text);
CREATE UNIQUE INDEX changeset_node ON changesets (node);
CREATE INDEX changeset_pushid ON changesets (pushid);
CREATE UNIQUE INDEX changeset_rev ON changesets (rev);

sqlite> .schema pushlog
CREATE TABLE pushlog (id INTEGER PRIMARY KEY AUTOINCREMENT, user TEXT, date INTEGER);
CREATE INDEX pushlog_date ON pushlog (date);
CREATE INDEX pushlog_user ON pushlog (user);

sqlite> select * from changesets limit 10;
1|13382|61007906a1f8ad5c303b0815ac4e9821168d3937
1|0|8ba995b74e18334ab3707f27e9eb8f4e37ba3d29
1|1|9b2a99adc05e53cd4010de512f50118594756650
1|2|10cab3c34c28b0746436f8bc7ffc8c47f421ee23
1|3|a00ac31e8ae4fe6cdf5f40a007c1ab36ae01ffae
1|4|e943454a2e49b2860353c9449359c1822cb14827
1|5|331cb67f2a3cb141465e0da88f8cd1ef36e85ffc
1|6|dad02d3ebc7d9e5fdfed17234d31d10e3b1b55ec
1|7|cd100ce4677919334ec2e3ffb57b444aabf81141
1|8|33654b51bca91fab0faed723e281c76bd65896c1

sqlite> select * from pushlog limit 10;
1|bsmedberg@mozilla.com|1206031764
2|jorendorff@mozilla.com|1206553146
3|bsmedberg@mozilla.com|1206641628
4|jorendorff@mozilla.com|1206727429
5|jorendorff@mozilla.com|1207177491
6|jorendorff@mozilla.com|1207607933
7|bsmedberg@mozilla.com|1207691072
8|bsmedberg@mozilla.com|1207714702
9|bsmedberg@mozilla.com|1207846866
10|bsmedberg@mozilla.com|1207850431
The TEXT data type in MySQL will be vast overkill for those fields, then. It's not a matter of space on disk, but when MySQL is processing those in memory, it becomes trickier. Check out http://www.pythian.com/blog/text-vs-varchar/

Is the changeset node always 40 characters? If so, I'd recommend using CHAR(40). If it's variable, VARCHAR(x) should be fine, like VARCHAR(50) or VARCHAR(100).

Also, the e-mail address doesn't need to be TEXT. VARCHAR(50) should probably be enough, unless you have different guidelines for e-mail addresses (the ones above are about 20 characters long).

This data is pretty small, I think we can handle the purging. I'm still waiting for the answer to:

When do pushes get defunct? Is is mostly a time-based thing or does it depend on versions?
It's a hash, it's always going to be 40 characters.  I don't know if we even have a policy about what to accept as storage formats for email addresses (I'm guessing some have UTF-8 encoding).

AFAIK the pushes never become defunct. Unless something goes really wrong (like sensitive data is committed) and I have to go tear it out manually.

Sometimes branches of bigger repositories like mozilla-central are cloned onto 'twigs'. These twigs can be 'reset' to be a fresh clone of mozilla-central again, although start out with blank pushlogs.

One of our oldest (and probably biggest) belongs to mozilla-central:
-rw-rw-r--   1 hg                  scm_level_3  19M Feb 11 11:12 pushlog2.db
Hrm, maybe :Callek can address what he meant by pushes going defunct in comment 5?
(In reply to Ben Kero [:bkero] from comment #9)
> Sometimes branches of bigger repositories like mozilla-central are cloned
> onto 'twigs'. These twigs can be 'reset' to be a fresh clone of
> mozilla-central again, although start out with blank pushlogs.

This! We must have a very safe way of being able to wipe a pushlog for a branch if we are combining all the branches into one database.
I don't think it's important to be able to save the old data for a branch if we reset it, we don't currently do that. We can simply delete all the pushlog records for that branch and start clean.
Also note that the column types are what they are just because it's sqlite, not for any important reasons.
Summary: Figure out a caching mechanism for pushlog → Move pushlog to MySQL backend
(In reply to Sheeri Cabral [:sheeri] from comment #10)
> Hrm, maybe :Callek can address what he meant by pushes going defunct in
> comment 5?

Yea, I merely meant defunct as in, "we reset a tree" (try, project-branch, etc.) currently that purging is done by deleting the .sqlite file. If we use a real shared DB it might make sense to do this slightly different, but certainly just deleting the rows will be fine.

The ability to easily reset a repo to a clean/purged state is needed. but beyond that the pushlog data doesn't expire.
(In reply to Sheeri Cabral [:sheeri] from comment #6)

> Moving it to a real db is possible. There is a generic in scl3 (right now
> tbpl is on generic in phx1 so it's not a stretch that it would be on a
> shared server), and scl3 generic doesn't have a ton on it, so I'm willing to
> try it there first.

This needs to have uber uptime. Any issues with this will essentially kill hg operations since pushlog is enabled globally. I'm a little paranoid about sharing hardware for this for the above reason, but will go with what the DBAs recommend. More than happy to order hardware to run this too, if needed.
Callek - thanx for clarifying, I was trying to assess if we'd need some kind of partitioning or defragmentation if we are constantly deleting stuff.

--------------------

Shyam - Right now only pootle and graphite_mozilla_org is on generic in scl3, but I'm totally OK with having separate hardware too. This is important, like putting bouncer on its own cluster.

Can we reuse some of the old addons in scl3 hardware for this? or is that already claimed? (we had one master and several slaves, like 3-6 slaves)

-------------------------

Callek, bkero, whoever:

Is there a sense of the "working set" of data? e.g. what "most" queries will be? Usually in a system like this it's something like "the most recent pushes". I'm just trying to get a sense of how large a buffer pool size we'd need.
We should get some access logs off the hg.mo webheads and see what queries get hit most often. I suspect the answer there is "whatever queries TBPL uses", and everything else is noise.
Ben, can you help with what ted needs here? 

Ted/Sheeri : I'll keep an eye, but I'd like Ben to drive the project.

Sheeri : we will probably order hardware for this when the time is right (aka before we go to production).
Assignee: server-ops-devservices → bkero
Whiteboard: [2013Q2]
Blocks: 498641
Note, one pushlog offender is the pushlog scraper on the l10n dashboard, that's currently pounding the /json-pushes/ api on some 800 repos. Queries looks like json-pushes?startID=2134&endID=2234, i.e., we're getting new pushes, but at max some 200 at a time.

We'd love to move that to a single db query, though, at which point that'd collapse to one frequent ping to the central database, if we get access to the db or an API on top of that db.
Can we accellerate this?

It seems like the cleanest approach to make sure we're not running into issues like bugs 842536, 847864, where repos don't record new pushes due to file permissions.

The l10n repos in particular for new locales are not as reliable as we'd like them to be, and that makes it hard for us to hit our goals this quarter.
I filed what I think are the prerequisites to getting the pushlog code to the point where we could make this switch.
Whiteboard: [2013Q2]
Attached patch Use revlogs for storing pushlog data (obsolete) (deleted) — Splinter Review
Assignee: bkero → gps
Comment on attachment 771888 [details] [diff] [review]
Use revlogs for storing pushlog data

Gah, forgot to pass -e to bzexport.

Anyway, I was a bit hungover today and decided to try something funky: storing pushlog data in Mercurial revlogs (the append-only data structure Mercurial uses for practically everything).

I believe that any flavor of SQL is excessive for the goals of pushlog recording. My understanding is SQLite was initially used because it was easy. And, we now are pushing for MySQL mainly because it has "SQL" in the name. In addition, there is a nebulous requirement that it might be nice to have a unified API for querying pushlog data across repositories. While such an API could be useful, we have to consider the operational implications - notably the introduction of a new service that Mercurial will now depend on and the fact that a single repository's data will no longer be isolated in a single directory (there will be info in a MySQL database somewhere).

I like keeping data stored inside the repository because it doesn't increase the surface area for failure and doesn't require us to reconfigure our repository maintenance scripts/tools. And, because there is an Atom and JSON API for pushlog data, nobody is inhibited from spidering all the repositories and creating a unified pushlog database (if that really is a legit goal).

Anyway, the patch is only partially complete. Notably missing is work into the actual hook bits. I'm pretty sure this will not behave properly if you push something to the repo. My main goal so far was to experiment with revlog storage and prove out the concept. I believe I have achieved that goal. I have code that imports data from the old pushlog SQLite database into a revlog and I have updated the querying code to consult the revlog instead of a database. Existing tests in runtests.py all pass!

I am somewhat concerned about performance of this code. By ditching SQL, I've ditched a rich querying language and replaced it with loading all the data into memory than manipulating it. On the face of it, this won't scale. However, revlogs are surprisingly fast. And, we're only talking about tens of thousands of entries (as opposed to say millions). I'm optimistic the code will scale for mozilla-central and mozilla-inbound for the foreseeable future. If someone gives me a copy of our largest pushlog databases, I can confirm this. If things are too slow, I can introduce caching/indices for slow/popular queries. There are also things I can be much more intelligent about.

Balancing out the loss of SQL are a) No more SQLite and the craptitude associated with it (excessive fsync()'s, very flaky NFS performance, random I/O) and b) a "more Mercurial" storage backend (this includes better filesystem and in-memory caching of data since revlogs are friendly to page caching and Mercurial can cache them in memory).

If there is interest in trying out this solution, I'll finish up the patch and we can look into deploying it. We a few modifications, we should be able to deploy it side-by-side with the existing SQLite pushlog and we can evaluate its effectiveness. If we still want to go with MySQL, sure. But I think this patch provides a compelling alternative.
Attachment #771888 - Flags: feedback?(ted)
Oh, one advantage of revlogs I forgot to mention is that everything is covered by Mercurial's transaction guarantees. When Mercurial commits a set of changesets, it obtains a global write lock on the repository and performs everything within a transaction. If the transaction is aborted (e.g. if a hook says something is bad), the entire transaction is rolled back. The revlog updating in this patch is performed within the context of this global lock and transaction. Contrast this with SQLite or MySQL which have their own independent sets of locks and transactions. We currently have issues with held locks on the pushlog SQLite databases, so removing them should hopefully mean higher reliability of the Mercurial service. I believe we'll get stronger guarantees with a revlog approach than with MySQL.
I personally don't see the beauty of mysql in sql, but in a single database to query.

Performance would be rather crucial for systems that poll for pushes. My poller takes 9-10 minutes to get through all the repos already, for queries of the form "/l10n/ab/json-pushes?startID=1000&endID=1201", common case being that that's an empty result, i.e., startID is the latest push.

PS: IMHO, we've set up the hooks wrong in terms of what happens on errors, IIRC, it had something to do with the ordering of the hooks, and how some hooks need to fail before others. Could've been tree closure vs pushlog or something.
I think saving pushlog data into a revlog is basically a sane idea; IMO the Mercurial-level transactions are nice, and I agree with Gregory that keeping everything inside a repository is better than distributing stuff elsewhere.

Some things to think about:

- The pushlog data probably don't create great deltas, so you lose some of the storage benefit that revlogs were built for. Still, you basically have similar problems with the changelog revlog already, and it's not much of an issue (particularly since each revision should be relatively small).

- Performance might be an issue, this is definitely something to test for. I would want to get an access log to determine the more popular queries and test based on that.

- Crazy idea: I wonder if it's worth it to somehow bloat the pushes revlog with empty revisions for each non-push-head, such that actual revs line up with their corresponding changelog revision. This would blow up the index quite a bit, but wouldn't affect the data that much, and it may make querying quite a bit faster.

As for the code, changing simplejson to stdlib json is probably a bad idea; IIRC simplejson is still quite a bit faster (especially for 2.x stdlib json!).
I don't have a preference here; if mercurial will work for you, and the extra load won't break mercurial, then go for it. Let me know what you need from the db side; we take daily exports and keep them for 4 days, so we can send you an export any time.
Summary: Move pushlog to MySQL backend → Move pushlog to different backend
(In reply to Axel Hecht [:Pike] from comment #25)
> I personally don't see the beauty of mysql in sql, but in a single database
> to query.

That's a nice benefit!

> Performance would be rather crucial for systems that poll for pushes. My
> poller takes 9-10 minutes to get through all the repos already, for queries
> of the form "/l10n/ab/json-pushes?startID=1000&endID=1201", common case
> being that that's an empty result, i.e., startID is the latest push.

I think this has more to do with the crappy performance of SQLite + NFS than anything else. SQLite can incur a lot of I/O when opening a database and on first read. If my understanding of the existing code is correct, we open a new SQLite connection on every HTTP request and repository push operation. This is horribly inefficient and likely introduces a lot of latency. It will be interesting to see if the naive revlog approach taken thus far is better.



(In reply to Dirkjan Ochtman (:djc) from comment #26)
> - The pushlog data probably don't create great deltas, so you lose some of
> the storage benefit that revlogs were built for. Still, you basically have
> similar problems with the changelog revlog already, and it's not much of an
> issue (particularly since each revision should be relatively small).

Indeed. I originally thought of writing a new-line delimited or similar file. But, I figured revlogs were close enough. It should be trivial to swap out the storage backend, however.

> - Crazy idea: I wonder if it's worth it to somehow bloat the pushes revlog
> with empty revisions for each non-push-head, such that actual revs line up
> with their corresponding changelog revision. This would blow up the index
> quite a bit, but wouldn't affect the data that much, and it may make
> querying quite a bit faster.

My initial version actually introduced two pushlogs: 1 with revisions for each push and 1 with revisions corresponding to each changeset. The latter (since removed) redundantly defined the push id, user, and time. This facilitated rapid lookup from rev/changeset to push ID. It can be reintroduced if performance concerns warrant it. I can also introduce empty revs in the push revlog so rev == push ID. If all the existing SQLite databases have push IDs [1..N], I suppose I could just assume push ID == rev + 1 or I could just add an empty rev 0. Or, if we don't care about consistent numbering across conversion (I assumed we did), we could just reset the push IDs.

> As for the code, changing simplejson to stdlib json is probably a bad idea;
> IIRC simplejson is still quite a bit faster (especially for 2.x stdlib
> json!).

Boo. I wonder if it's even measurable...
bkero provided me with the sqlite db's for central, inbound, and try along with the HTTP access logs for a webhead on a representative weekday. I will load the SQLite DB's into a revlog and will analyze perf characteristics.

Also, when we eventually move off NFS, we'll need a way to replicate the pushlog to the read-only slaves. I'm thinking we could write an extension for the slaves that causes them to pull the pushlog revlog when pulling remote changesets. But, this could be a follow-up feature/bug. Still something to think about...
(In reply to Gregory Szorc [:gps] from comment #29)

> Also, when we eventually move off NFS, we'll need a way to replicate the
> pushlog to the read-only slaves. I'm thinking we could write an extension
> for the slaves that causes them to pull the pushlog revlog when pulling
> remote changesets. But, this could be a follow-up feature/bug. Still
> something to think about...

This needs to happen at the same time, because a lot of things depend on "reading" pushlog too :)
QA Contact: shyam → nmaul
I don't believe the move off NFS needs to happen at the same time as switching pushlog to a different backend. As long as all consumers of the SQLite database are updated at the same time, the pushlog change can be independent of the NFS move. Now, there are pushlog considerations for the move off NFS, but they can be handled later.

That being said, if we are comfortable making two major changes during one maintenance, we can do that :)
I'm upgrading the severity, because at this point try is pretty much unusable and has been for a while.  I've lost about a day of work in the last week to this bug, as has Joe, and I doubt we're the only ones...
Severity: minor → major
(In reply to Gregory Szorc [:gps] from comment #31)
> I don't believe the move off NFS needs to happen at the same time as
> switching pushlog to a different backend. As long as all consumers of the
> SQLite database are updated at the same time, the pushlog change can be
> independent of the NFS move. Now, there are pushlog considerations for the
> move off NFS, but they can be handled later.
> 
> That being said, if we are comfortable making two major changes during one
> maintenance, we can do that :)

Lets do these in two separate downtimes. Given the scope of these changes, and the lack of load-testable staging environment, I recommend we do the database switchover first, watch for a few days/week to see if there's any need to rollback. Once that's proven to hold up under load, without any surprises, we can then proceed to switch-from-NFS, and again, watch for a few days/week to see if there's a need to rollback.
Blocks: 770811
(In reply to Boris Zbarsky (:bz) from comment #32)
> I'm upgrading the severity, because at this point try is pretty much
> unusable and has been for a while.  I've lost about a day of work in the
> last week to this bug, as has Joe, and I doubt we're the only ones...

Filed bug 894429 for a try reset to get it working again in the meantime.
Shyam - if this is happening now, we should order database hardware ASAP. Unless we want to migrate first to generic in scl3. I'm not sure how fast this all should happen.
The current plan is *not* to use a database apart from the hg revlog format.
Ah! Thanx.
From offline email with bz;

1) upgrading the sql db is a pre-requisite to moving hg off NFS (big hg-wide perf gain). While we believe that will improve things, I'm not aware of anything that would cause that sql db to *degrade* recently like you are seeing. 

2) Some of these perf issues were supposed to be fixed when we upgraded to newer hg version on servers late may2013. Details in https://bugzilla.mozilla.org/show_bug.cgi?id=781012. Do you know if you saw the problems start soon after that server upgrade?

3) bug#691459 is tracking getting IT nagios monitoring setup to avoid this exact situation impacting developers. I'd much prefer to have automated warning when we get close to limit, instead of reacting like this when a human has to resort to filing a bug!

4) We last reset the try repo 2013-03-21 in bug#853697. Its not certain that resetting try repo again today will improve things, (mixed experiences in the past), but we'll try it right now before PDT wakes up and see. Expect brief try-closure soon. We're filed bug#894429 and tracking down all the RelEng+IT+Sheriffs needed as I type.
No longer blocks: 770811
Blocks: 770811
Comment on attachment 771888 [details] [diff] [review]
Use revlogs for storing pushlog data

Review of attachment 771888 [details] [diff] [review]:
-----------------------------------------------------------------

ted asked me to review this, so some comments follow.

Also, we definitely want the real-sized db performance tests before we move on this.

Do we have tests for stripping?

::: hgwebjson.py
@@ -3,5 @@
>  from mercurial.hgweb import webcommands
>  from mercurial import templatefilters
>  
>  demandimport.disable()
> -import simplejson

IMO, this should be a separate changeset, probably with some separate benchmarks. I did some nanobenchmarks and it looks like json is faster there:

djc@enrai ~ $ python -m timeit "import simplejson; s = '{\"key\": 0}'" "simplejson.loads(s)"
10000 loops, best of 3: 26.6 usec per loop
djc@enrai ~ $ python -m timeit "import json; s = '{\"key\": 0}'" "json.loads(s)"
10000 loops, best of 3: 25.5 usec per loop
djc@enrai ~ $ python -m timeit "import simplejson" "simplejson.dumps({'key': 0})"
10000 loops, best of 3: 49.8 usec per loop
djc@enrai ~ $ python -m timeit "import json" "json.dumps({'key': 0})"
10000 loops, best of 3: 25.2 usec per loop

::: pushlog-feed.py
@@ +66,5 @@
> +
> +        pushes = {}
> +        for node in self.repo.pushlogpushes:
> +            push_id, time, user, changesets = self.repo.pushlogpushes.read_node(node)
> +            pushes[push_id] = (time, user, changesets)

Nit: unneeded parentheses are sadface.

::: pushlog.py
@@ +23,5 @@
> +        if push_id is None:
> +            push_id = self.read(tip)[0] + 1
> +
> +        text = '\1\n%d\n%d\n%s\n%s' % (push_id, date, user,
> +            '\n'.join(changesets))

Not sure about this encoding, should we worry about user containing \n? Looks like hg forbids that on the command-line, so we should be okay:

djc@enrai tmp $ hg init test
djc@enrai tmp $ cd test
djc@enrai test $ ls -l
total 0
djc@enrai test $ echo a > a
djc@enrai test $ hg ci -Ama -u "Floozy
> crap"
adding a
transaction abort!
rollback completed
abort: username 'Floozy\ncrap' contains a newline!

Is it worth thinking about \0 instead?

@@ +60,5 @@
> +        push_id, rev, node = row
> +        changesets[rev] = push_id
> +        pushes[push_id][2].append(node)
> +
> +    repo.pushlogpushes.strip(0, txn)

Why do we need this? Maybe add a comment?
Comment on attachment 771888 [details] [diff] [review]
Use revlogs for storing pushlog data

Review of attachment 771888 [details] [diff] [review]:
-----------------------------------------------------------------

I don't feel confident about reviewing most of the pushlog extension, since I know exactly nothing about hg revlogs. I do have a few random comments here and there.

::: pushlog-feed.py
@@ +66,5 @@
> +
> +        pushes = {}
> +        for node in self.repo.pushlogpushes:
> +            push_id, time, user, changesets = self.repo.pushlogpushes.read_node(node)
> +            pushes[push_id] = (time, user, changesets)

So this loads all push data into memory and then filters from there? I would be interested to know how large our current m-c/m-i pushlogs are in revlog form.

@@ +83,5 @@
> +            offset = (self.page - 1) * self.querystart_value
> +            added = 0
> +            for i, push_id in enumerate(sorted(pushes, reverse=True)):
> +                if offset > i:
> +                    continue

Seems a little wasteful given that these are sequential integers.

::: pushlog.py
@@ +4,5 @@
> +# Push data is stored in revlogs, just like other pieces of Mercurial data.
> +# To facilitate efficient access over multiple query patterns, push data is
> +# stored in multiple revlogs. There is some redundancy, but this is the price
> +# you may for rapid retrieval. The amount of data stored is small, so this
> +# shouldn't be a major concern.

I was going to complain about the lack of a license header, but apparently we don't have them for anything in this repo. Oops.

@@ +40,5 @@
> +        return int(push_id), int(date), user, lines[3:]
> +
> +def convert_sqlite_pushlog(ui, repo, path, txn):
> +    """Converts SQLite pushlog to revlog format."""
> +    import sqlite3

I feel like this is tricky enough that we probably shouldn't try to do it on the fly. I think taking a bit of explicit downtime on the hg servers, running a conversion script, then updating to the new pushlog version would be a better plan.

@@ +75,5 @@
> +def push_hook(ui, repo, node=None, source=None, url=None, **kwargs):
> +    try:
> +        ui.warn('push hook!\n')
> +        repo.maybe_convert_pushlog()
> +        tip = self['tip']

I'm confused about where "self" is coming from here.

@@ +84,5 @@
> +        if source == 'strip':
> +            # Find affected pushlogs and strip. Or, consider not supporting
> +            # this since rewriting pushlog history is kinda silly.
> +            txn.close()
> +            return

If we still have issues with the number of heads on try in the future, it would be nice to support this. I had a script that would strip off old heads, but the current pushlog hook barfs on that.

::: runtests.py
@@ +41,2 @@
>  style=gitweb_mozilla
> +""".format(mydir=mydir, templates=os.environ['HG_TEMPLATES']))

Need to be careful here, I don't know what version of Python we're running on the hg servers.

@@ +81,5 @@
>          # subclasses should override this to do real work
>          self.setUpRepo()
>          write_hgrc(self.repodir)
>          self.repo = hg.repository(self.ui, self.repodir)
> +        self.repo.maybe_convert_pushlog()

You should probably just put new test repositories into the repo. You can use an old one if you want to explicitly test conversion, but it seems silly to do that for every single test.
Attachment #771888 - Flags: feedback?(ted)
There are many concerns over this bug. Let me try to sort through them.

Concern: We need to work on this bug ASAP.

While I would love to see this rolled out and I would love to see Mercurial not hosted on NFS, it is my understanding the locking issue with Try is solved by periodic Try resets. Hacky, yes. But, there is a known workaround. When instituted properly, pushlog being on SQLite is a moderate annoyance, not a fire drill requiring me (or possibly others) to drop other in-progress tasks.

Concern: Rolling this patch out to production will be risky.

I agree. We're talking about a major change with wide-ranging impact if we do it wrong.

I think rolling out a pushlog replacement to production should be done deliberately, incrementally, and with many tests (both automated and manual).

We should consider having the new revlog pushlog exist side-by-side with the existing SQLite pushlog. On push, we write to both. We have the HTTP API serve from revlogs. If revlogs don't work out, we revert the HTTP API to serve from SQLite (like today). We then continue to iterate on improvements until revlogs (or something else) replaces SQLite. If revlogs work great, we turn off writing to SQLite. We should be able to control all of this via settings in the repository's hgrc file.

I would also like us to incrementally roll out the revlog pushlog to repositories. e.g. let's start with Try and/or some lesser used project branches. If things work, great - we roll out to everywhere. If they don't work, we revert to SQLite.

Can someone confirm that each repository has a separate hgrc file and that we're able to easily tweak the extensions, etc on each repository? Out of curiosity, where do these configs live in version control?

Concern: Existing patch is lacking tooling.

I agree! The patch should be supplemented with Mercurial commands that can interact with the pushlog revlogs so IT can manually correct any issues that arise. We arguably didn't need these before because you could just issue SQL to the database. Modifying revlogs will be much harder.

Concern: Existing patch is lacking testing.

I agree! I don't land code without tests (usually). I certainly don't land code as important as this bug without tests. If I'm finishing this patch, I will not request review until sufficient test coverage is in place.
(In reply to Gregory Szorc [:gps] from comment #41)
> I think rolling out a pushlog replacement to production should be done
> deliberately, incrementally, and with many tests (both automated and manual).

Agreed. I think we should start by rolling this out to Try - given that if there are any problems, we can just reset it & won't mind data loss from the pushlog (as opposed to the same happening on mozilla-central/inbound).
(In reply to Ed Morley [:edmorley UTC+1] from comment #42)
> (In reply to Gregory Szorc [:gps] from comment #41)
> > I think rolling out a pushlog replacement to production should be done
> > deliberately, incrementally, and with many tests (both automated and manual).
> 
> Agreed. I think we should start by rolling this out to Try - given that if
> there are any problems, we can just reset it & won't mind data loss from the
> pushlog (as opposed to the same happening on mozilla-central/inbound).

I had a long discussion and knowledge transfer with rhelmer yesterday. We definitely need to iron out the deployment and rollback strategy before we can get serious about finalizing the code.

Here's my proposal.

New pushlog hook is implemented which records data to revlogs. It works side-by-side with the existing SQLite pushlog hook. So any pushes with both configured write to both SQLite and revlogs.

The HTTP server code in the pushlog repository is updated to pull data from revlogs.

The new pushlog extension will provide an hg command to migrate data from the SQLite database to revlogs.

For rollout, we select a small subset of repositories to activate the new pushlog code on. The new revlog pushlog extension is activated on those repos. Data is migrated from SQLite to pushlog. Writes go to both SQLite and revlogs but reads from the HTTP interface come from revlogs. If there is a problem, we deactivate the revlog pushlog extension and roll back the HTTP pushlog code for that repo. Since writes have been going to SQLite the whole time, no data loss occurs. If revlogs prove themselves, we deactivate the old SQLite pushlog hook.

One issue I see here is that repositories may all share the same checkout of the pushlog repository today. We'll need separate checkouts - one that supports reading from SQLite and one from revlogs. If we can't operate from multiple checkouts, that will introduce a bit more coding work, since we'll need to support both paths in the same code base. Doable, sure. But, a bit more complicated and prone to failure. Multiple checkouts is highly preferred.

Does IT have any more requirements around pushlog APIs? i.e. do you need a command to inject or modify data in the pushlogs? I'm not sure what was done via SQL before. You won't have a tool to modify revlogs, so we need to bake an API and/or additional Mercurial commands into the extension.
Flags: needinfo?(bkero)
A requirement from me, that I don't see talked about or tested based on convo's here is *data* on behavior for what happens when:

Local user without pushlog extension [alias: lnp]
Remote repo with new pushlog hook [alias rpr]
Remote repo with old pushlog hook [alias rpo]


-lnp pull from rpr, then pushes into rpo, rpo is later turned on for new hook and conversion script attempted.

-lnp pulls from rpr, and pushes to a *different* rpr (e.g. inbound to try).

-lnp pulls from rpr and pushes to rpo, THEN a different lnp pulls from rpo and pushes to a rpr.

-IT clones a rpo or rpr, from an existing rpr (e.g. twig resets)

-Similar scenarios

===

My concern stems around, "does the revlog case mean that when we merge m-i to m-c will m-c now have *seperate* push entries for all pushes between last merge and now" if so thats a complete breakage of the assumptions we must (currently) maintain.

The same reasons apply for all of the above cases (e.g. pushing any branch to try, uplifts, etc.)

And MOST importantly that if we roll this out to a single (or few) trees, we don't shoot ourselves in the foot by making a problem on local users side, that we then have to permanently have magic to work around in future work.
(In reply to Justin Wood (:Callek) from comment #44)
> My concern stems around, "does the revlog case mean that when we merge m-i
> to m-c will m-c now have *seperate* push entries for all pushes between last
> merge and now" if so thats a complete breakage of the assumptions we must
> (currently) maintain.

I don't grok this. Any time changesets are added/pushed to a repository, a new pushlog ID is recorded. Since m-i and m-c are separate repositories, the set of pushes are completely separate. e.g. inbound will have 12 pushes between changesets A..B. But, when we merge m-i into m-c, those 12 pushes and changesets A..B are performed in a single push, so only 1 new push will be recorded in m-c.

That's how things work with SQLite today. That's how they'll work with revlogs tomorrow. Also, these revlogs are server-side only: they won't be synced to clients. Although, we'll eventually need the servers to replicate them, so we'll extend the extension to support that at a future date (to unblock the migration off NFS). I believe it's out of scope for the initial landing.
(In reply to Gregory Szorc [:gps] from comment #45)
> (In reply to Justin Wood (:Callek) from comment #44)
> > My concern stems around, "does the revlog case mean that when we merge m-i
> > to m-c will m-c now have *seperate* push entries for all pushes between last
> > merge and now" if so thats a complete breakage of the assumptions we must
> > (currently) maintain.
> 
> I don't grok this. Any time changesets are added/pushed to a repository, a
> new pushlog ID is recorded. Since m-i and m-c are separate repositories, the
> set of pushes are completely separate. e.g. inbound will have 12 pushes
> between changesets A..B. But, when we merge m-i into m-c, those 12 pushes
> and changesets A..B are performed in a single push, so only 1 new push will
> be recorded in m-c.
> 
> That's how things work with SQLite today. That's how they'll work with
> revlogs tomorrow. Also, these revlogs are server-side only: they won't be
> synced to clients.

That was the crux of my question, if the pushlog info gets pushed/pulled from a copy of the repo with the new pushlog info, when the repo is pushed/pulled at all.

I was basically asking if the new revlog stuff is even "noticed" and "replicated" in any way by mercurial even if it doesn't know to access or read the pushlog data. It sounds like you are asserting that is the case, I am requesting a data-driven validation of the assertion on same.
I told rhelmer earlier this week that I wanted to sand off some of the
rough edges in this patch before "throwing it over the fence." I believe
I have done that.

Changes since last patch:

* We now have 2 revlogs containing pushlog data. 1 contains per-push
  info. The other mirrors the changelog and allows rapid lookup of
  pushlog info given a specific changeset.

* There is a |hg pushlog-import-sqlite| command for importing data from
  SQLite. You run it and the revlogs are truncated and repopulated with
  with data from SQLite. It even has progress bars if you have the
  progress extension enabled \o/.

* We no longer load the entire pushlog data into memory when processing
  HTTP requests for pushlog data. The new implementation is better,
  but not perfect. There is tons of room for improvement (e.g. using
  algorithms more intelligent than linear traversal). However, unless
  someone can demonstrate a performance problem, I'm going to assume the
  implementation is good enough.

* The push hook is fixed and there are minimal tests for it.

* Storage format inside revlogs updated.

* General code cleanup.

The tests all pass with Mercurial 2.5.4.

Know limitations:

* stripping isn't handled. If someone strips changesets from the
  repository (using e.g. |hg strip|), this extension isn't going to
  happy. It will likely complain about a revision mismatch on the next
  push. I'd love to fix this, however I'm not sure how to intercept
  strip events in Mercurial. Looking through mercurial.repair.strip(),
  I don't see any obvious hook points. We /could/ override
  localrepository.changelog to intercept strip(), but that feels hacky
  and I'm not sure there is precedent for that. Ditto for
  localrepository.destroying() and localrepository.destroyed(). I guess
  another option is implementing a custom basestore.basestore? I may
  have to reach out to the Mercurial folks for a recommended solution
  here. If we never run |hg strip|, then this shouldn't be an issue for
  us. Still, it would be nice to handle gracefully.

* No revlog cloning. We'll need to do this eventually. But, it's not
  required for the initial release (only for moving off NFS). Worst case
  we need to perform a one-time migration later. But, we'll presumably
  be in downtime for the move off NFS, so it shouldn't be too bad.

* New repositories will have their pushlog ID start at 0, not 1. This is
  because SQLite primary keys start at 1 and revlogs start at 0. As I
  was writing this comment, I saw some entries in the HTTP logs where
  pushID == 0. Since we use > and not >= for query filtering, this means
  those clients will not pick up pushes for pushID==0. This needs to be
  fixed (likely by inserting an empty pushlog entry at revision 0).

* There are no commands for querying or modifying pushlog data. If
  things get messed up, IT will be powerless. We should consider what
  functionality IT will need (if any) and provide that. I say "if any"
  because if this extension is implemented properly, we should never run
  into an issue. Since the pushlog is updated as part of adding new
  changesets to the repository and is part of the same repository
  transaction, if pushlog fails, the push should fail and everything
  should be rolled back. The only obvious issue I see is if the pushlog
  extension doesn't get enabled and pushes come into the repository.
  Then, this extension will likely complain on the next push. This
  could be remedied by providing a command to insert empty revisions
  into the revlogs when pushlog info is missing.

If IT says we never run |hg strip|, and this implementation passes load
tests, the definite blocker to initial deployment is the pushID==0
backwards incompatibility. There is a soft blocker on the missing tools
to query and mutate pushlog data. Other than those, I think we're good
to go.

At this point, I think I'm done with my initial responsibility on this
bug and will let rhelmer take over for the 2nd 95%.
Attachment #771888 - Attachment is obsolete: true
(In reply to Gregory Szorc [:gps] from comment #47)
> Know limitations:
> 
> * stripping isn't handled. If someone strips changesets from the

So, I don't consider supporting stripping a blocker (from my seat) however it is indeed a "we want to have at least _some_ solution for it, even if a manual seperate tool)

The only cases I have ever seen us strip for:
* Unintended private data got checked into a repo, and it's severe enough to not leave in repo.
* Repo (in my memory only try) got corrupted in a very weird way with the sqlite pushlog not taking new entries, so we stripped to last existing-in-pushlog entry, and re-pushed to verify things were good.

Both of which assumes a Human is already on the server for some related critical issue, so a solution that is not directly built into mercurial is "ok" imo (even if not ideal).

> * No revlog cloning. We'll need to do this eventually. But, it's not

This is a good thing in the default case, and certainly a thing to think about in this implementation and to add the ability for going forward!

> * There are no commands for querying or modifying pushlog data. If
>   things get messed up, IT will be powerless. We should consider what
>   functionality IT will need (if any) and provide that. I say "if any"
>   because if this extension is implemented properly, we should never run
>   into an issue. Since the pushlog is updated as part of adding new
>   changesets to the repository and is part of the same repository
>   transaction, if pushlog fails, the push should fail and everything
>   should be rolled back. The only obvious issue I see is if the pushlog
>   extension doesn't get enabled and pushes come into the repository.
>   Then, this extension will likely complain on the next push. This
>   could be remedied by providing a command to insert empty revisions
>   into the revlogs when pushlog info is missing.

Until we have proven stability in the hook I personally would love a way to 'repair missing pushlog entries' at least, which will notice unmatched csets and insert a new 'fake' push per head of unmatched pushes... such that we have a way to repair if we learn that ^C or some other silly thing corrupts this state. [I should note that as I understand the implementation and mercurial this should _never_ happen, but we're still on NFS so I don't trust the "aiui" stuff]

Options for "modifying" already existing pushlog data or querying pushlog data manually is imo not a blocker, especially since if we have THAT method described we can run it for any repo that has a corrupted/confused pushlog, in a new raw-clone and have a recoverable sanity even if lossy on the 'who pushed'.

> At this point, I think I'm done with my initial responsibility on this
> bug and will let rhelmer take over for the 2nd 95%.
What's the proper way for us to license Mercurial extensions that import core Mercurial Python modules?

Mercurial is licensed as follows:

# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.

I know MPL 2.0 isn't compatible with GPL. But I'm not sure if Python code counts. GPL's language around linking and non-compiled languages has always thrown me off...

Gerv?
Flags: needinfo?(gerv)
I'm pretty sure the SFC (which handles legal counsel for the Mercurial project) considers Mercurial extensions to require GPL-compatible licensing, because extensions basically have unfettered access to Mercurial internals.
(In reply to Gregory Szorc [:gps] from comment #47)

> If IT says we never run |hg strip|, and this implementation passes load
> tests, the definite blocker to initial deployment is the pushID==0
> backwards incompatibility. There is a soft blocker on the missing tools
> to query and mutate pushlog data. Other than those, I think we're good
> to go.

We don't want to ever run strip, but there are cases every now and then that we are asked to run hg strip. So that's always a possibility.
The Mercurial project's opinion is that extensions should be GPL2+:
http://mercurial.selenic.com/wiki/License

Therefore, we should license extensions we write that way. If the plan is to include code under a GPLv2+-incompatible licence, then let me know and we'll work out a way forward.

MPL 2 is compatible with GPL 2, in the sense that you can certainly incorporate MPL 2-ed code into a larger work which is GPL 2-ed (which is what we are doing here). (This is only not true if the MPL 2-ed work has the Incompatible Software clause added to its licensing, which is not the default.)

Gerv
Flags: needinfo?(gerv)
There was concern that not using a centralized database for aggregating pushlog data would make it harder to... aggregate pushlog data.

Since the pushlog data is exposed via HTTP+JSON, it was actually quite trivial to pull data to a central database.

http://gregoryszorc.com/blog/2013/07/25/track-pushes-and-train-riding-with-mercurial/

Someone could easily replicate this using MySQL and repository polling. Or, if you needed lower latency, you could hook into the AMQP system broadcasting pushes or write a hook to notify the aggregation system.
Note that we already license our existing Hg hooks as GPL:
http://hg.mozilla.org/hgcustom/hghooks/file/2ce2c5286ed6/COPYING
rhelmer agreed to pick up review and landing of this patch - discussed last week.
Please bear in mind that the primary reason IT is interested in the pushlog extension is that so we can migrate off of NFS. A solution that doesn't enable that doesn't unblock us, and therefore doesn't let us move forward with what we believe to be the biggest performance improvement we can possibly deliver to Mercurial. That's why the original proposal was to replace SQLite with MySQL or PostgreSQL with minimal other changes.

If there are other good things we want to accomplish at the same time, that's okay too, but I don't want us to lose sight of the primary objective. I don't know what those other goals (if any) would be, so I can't rank them objectively against this one (getting off of NFS).


My feeling here is that we're maybe putting a lot of effort into moving from SQLite to built-in revlogs... but that this is essentially just moving from one storage backend to another, without accomplishing the goal.

Maybe I'm missing something, so let me ask some questions:

1) Is it easier to switch from sqlite to revlogs to cloning revlogs, or easier to switch from sqlite to mysql/postgres?

2) Are cloned revlogs easier to maintain long-term than a central database would be?

3) Are we accomplishing extra stuff that I just don't know about?


Thank you for entertaining my intrusion into this issue.
We could accomplish much of the same with pushlogs and MySQL. However, I think pushlogs are better from an operational perspective because you don't have a dependency on an external service and the data for a repository will continue to be isolated to a repository's location on disk. Furthermore, Mercurial has a mechanism for transferring revlog data, so replication of pushlogs will be relatively straightforward. This opens the door to clients consuming pushlog data directly from Mercurial as opposed to going over the HTTP API.
s/pushlog/revlog/ for most of my last comment for it to make sense.
(In reply to Jake Maul [:jakem] from comment #56)
> Please bear in mind that the primary reason IT is interested in the pushlog
> extension is that so we can migrate off of NFS. A solution that doesn't
> enable that doesn't unblock us, and therefore doesn't let us move forward
> with what we believe to be the biggest performance improvement we can
> possibly deliver to Mercurial. That's why the original proposal was to
> replace SQLite with MySQL or PostgreSQL with minimal other changes.
> 
> If there are other good things we want to accomplish at the same time,
> that's okay too, but I don't want us to lose sight of the primary objective.
> I don't know what those other goals (if any) would be, so I can't rank them
> objectively against this one (getting off of NFS).
> 
> 
> My feeling here is that we're maybe putting a lot of effort into moving from
> SQLite to built-in revlogs... but that this is essentially just moving from
> one storage backend to another, without accomplishing the goal.

I just chatted with gps about this - here's the current situation as I understand it (please correct if this is wrong):

1) pushlog currently uses sqlite as a backend
2) backend is populated by an extension that records extra information from hg push
3) backend is stored on NFS mount, which the webheads mount r/o and serve

Right now I am working on load-testing and reviewing the patch in this bug, which will change #1 only (which is indeed just moving from one storage backend to another). We should be able to replace #3 by using "hg pull" from the webheads, but we need to figure out how to transfer custom revlogs, and how exactly this sync should be scheduled.

So this is a step on the way, but just deploying the patch in this bug will not enable us to turn off NFS.

Are there other reasons that it's urgent to get off of NFS (and SQLite) or is all this purely about making hg faster? Has anyone looked into syncing the SQLite DBs instead of sharing over NFS, as an interim step?

The (implicit) plan here has been to change the backend, ship that, then change the distribution mechanism as outlined above - I think it complicates things slightly but might make sense to do this the other way around (change distribution mechanism, then change backend) if we're really dying for a perfomance win here (I don't know the backstory sorry). I'd really like to see some data backing up assertions about what's causing performance problems though before we try to do something strategic like this though.

Do we have a staging/testing environment right now, that uses NFS in the same way production does?
Flags: needinfo?(nmaul)
I'd add

4) pushlog is stored per repository

This is one of my major painpoints. I need to query hundreds of web apis to get a few data items per hour.
I believe aggregating pushlog data should be out of scope as far as repository hosting is concerned.

Aggregating pushlog data isn't that difficult and can easily be pushed onto clients. Here are some ideas:

1) Use the code in my Mercurial extension at https://hg.mozilla.org/users/gszorc_mozilla.com/hgext-gecko-dev. It creates a SQLite database containing all the pushlog data for most of the release trees. Hook it up to a CRON and write your own custom SQL queries against that DB.
2) Ask A*Team to provide a unified pushlog API as part of treeherder.
3) Set up a Stackato service for exposing aggregated pushlog data (https://mana.mozilla.org/wiki/display/websites/paas.allizom.org)

Someone could combine #1 and #3 in a few hours of work.
It's not hard, it's just wrong. I have my own service up and running, and it takes 10 minutes to cycle around all repos that I'm interested.

The right solution is to push and not to poll in those cases.
(In reply to Axel Hecht [:Pike] from comment #62)
> It's not hard, it's just wrong. I have my own service up and running, and it
> takes 10 minutes to cycle around all repos that I'm interested.
> 
> The right solution is to push and not to poll in those cases.

How many repos are you polling? What takes 10 minutes? Is the pushlog HTTP API too slow? Are you performing queries for the full pushlog data (slow) or only the data since the last poll (fast)?
840 repos, more next week, thanks to gaia branching. This is l10n, any branch I look into comes with 80-100 repos. I'm querying json-pushes?startID=my_latest_known&endID=that+200, i.e., I usually get empty responses, and if I get responses, they have little data. I trigger a pull and get the details locally afterwards.
Assignee: gps → server-ops-devservices
No need to page oncall :)
Severity: major → normal
(In reply to Robert Helmer [:rhelmer] from comment #59)
> I just chatted with gps about this - here's the current situation as I
> understand it (please correct if this is wrong):
> 
> 1) pushlog currently uses sqlite as a backend
> 2) backend is populated by an extension that records extra information from
> hg push
> 3) backend is stored on NFS mount, which the webheads mount r/o and serve
> 
> Right now I am working on load-testing and reviewing the patch in this bug,
> which will change #1 only (which is indeed just moving from one storage
> backend to another). We should be able to replace #3 by using "hg pull" from
> the webheads, but we need to figure out how to transfer custom revlogs, and
> how exactly this sync should be scheduled.
> 
> So this is a step on the way, but just deploying the patch in this bug will
> not enable us to turn off NFS.
> 
> Are there other reasons that it's urgent to get off of NFS (and SQLite) or
> is all this purely about making hg faster?

It's more than just hg... it's anything that uses NFS in SCL3. Hg has a tendency to hog all the IOPS, and even then it's still starving. :)

We also can't get much support from upstream with our current setup. They've told us in no uncertain terms that the way we're running Mercurial is terrible, and they can't really help us until we get away from NFS-backed servers.


> Has anyone looked into syncing
> the SQLite DBs instead of sharing over NFS, as an interim step?

I believe :bkero has started looking into how we can synchronize the SQLite files. This is non-trivial for a couple reasons, but might be workable.

This project has gone nowhere for long enough that we're starting to dig into this sort of sysadmin-side solution, because *we* (IT) need to get this unblocked... and if that means we have to hack some shoddy rsync scripts together rather than have a real solution, then so be it.

I'm not happy about this (I'm sure there will be consequences, like replication lag), but I'd rather do that and get this behind us than keep waiting while we debate the merits of various implementation details which are, AFAICT, really not that significant to the overall functionality of the system. I'm unhappy that we're 10 months and 66 comments into this bug and we're still talking about how to do this.

The perfect is the enemy of the good.


> The (implicit) plan here has been to change the backend, ship that, then
> change the distribution mechanism as outlined above - I think it complicates
> things slightly but might make sense to do this the other way around (change
> distribution mechanism, then change backend) if we're really dying for a
> perfomance win here (I don't know the backstory sorry). I'd really like to
> see some data backing up assertions about what's causing performance
> problems though before we try to do something strategic like this though.

:bkero might be able to help here. I'm not sure what we have actually. Part of the problem (IMO) is that the Linux NFS client isn't particularly conducive to getting great performance stats out of.

Generally speaking though, writing to a shared volume is inherently difficult... and that's what pushlog does in spades. On top of that, SQLite is not really designed to work with multiple servers accessing the same database file simultaneously... which pushlog also does. Honestly, I think we'll get a sizable performance boost by moving pushlog to MySQL or some other backend all by itself.

A couple choice quotes (among many good ones) from http://www.sqlite.org/whentouse.html :

"A good rule of thumb is that you should avoid using SQLite in situations where the same database will be accessed simultaneously from many computers over a network filesystem."

"... [if] you are thinking of splitting the database component off onto a separate machine, then you should definitely consider using an enterprise-class client/server database engine instead of SQLite."

The TL;DR I get out of this is: our usage of SQLite here is killing kittens. We should bite the bullet and *at least* rewrite this for MySQL.


> Do we have a staging/testing environment right now, that uses NFS in the
> same way production does?

There is a dev/staging system, but I don't know how comparable it is in this sense. It was put together to test the last Mercurial upgrades. The design is likely somewhat different in ways that didn't matter for that use case, but might for this one. This is another question for :bkero.
Flags: needinfo?(nmaul)
Another good reason to do this: if we get off of NFS, we have a *much* better chance of multi-homing this between SCL3 and PHX1. Right now that's basically a non-starter.
We have every intention to move off NFS. Operational risk mitigation is why we don't plan to go directly there. See previous comments.
I was replying to comment 59, which was asking for clarification on the specific reasons.
> One issue I see here is that repositories may all share the same checkout of
> the pushlog repository today. We'll need separate checkouts - one that
> supports reading from SQLite and one from revlogs. If we can't operate from
> multiple checkouts, that will introduce a bit more coding work, since we'll
> need to support both paths in the same code base. Doable, sure. But, a bit
> more complicated and prone to failure. Multiple checkouts is highly
> preferred.

I'm not sure what you mean by many repositories 'sharing the same checkout'. Are you referring to every repository using a global hook defined in /etc/mercurial/hgrc, or are you referring to pushlog being cloned when mozilla-central is lifted into mozilla-beta, etc?

> Does IT have any more requirements around pushlog APIs? i.e. do you need a
> command to inject or modify data in the pushlogs? I'm not sure what was done
> via SQL before. You won't have a tool to modify revlogs, so we need to bake
> an API and/or additional Mercurial commands into the extension.

IT sometimes needs to modify pushlog entries when pushes get CTRL+c'd during some stages of uploading. Additionally there are some other reasons for manual editing, such as when a developer accidentally pushes sensitive information to the repositories.
Flags: needinfo?(bkero)
I supposed I should update this with the latest information:

Bug 937732 tracks migrating hg to local disks. To this end I've created a system  to replicate pushlog files to each webhead on push. This allows us to have autonomous webhead machines, and should get us most of the way towards a performant and multi-homed HG. Feel free to peruse that bug if you're interested in the latest upon this.

It certainly can be done with AMQP or MySQL, and perhaps should be. I don't have the time to code this up, since I have other projects to be working on and would have to learn SQLAlchemy to do this.

Is storing pushlog data in revlogs still a feasible action, or will we need to consider moving this to another SQL solution?
Flags: needinfo?(bkero)
Storing in revlogs should still be a feasible action. But we don't necessarily need to store in revlogs.

Assuming the hg master -> slave replication works via `hg pull` or `hg push`, all we need is a Mercurial extension installed on both master and slave that knows how to exchange pushlog data. You could still have SQL powering storage on either end.

In the time since I initially worked on this bug, I've actually implemented some Mercurial extensions that extend the Mercurial server protocol. So, I now have the knowledge to implement pushlog sync via Mercurial. I could code something myself, or I could have a knowledge exchange with rhelmer.

bkero: we should also chat about Mercurial hosting techniques. I've been experimenting with ZFS (specifically transparent deduplication) on Mercurial servers and have seen promising results. I'd love to see our new server architecture support some "crazy" ideas I think Mozilla will want to support down the road.
CC'ing our czar of developer productivity so he knows about the bug preventing our Mercurial servers from running faster and with fewer bugs.
The future architecture simply uses SSH identities to tell the hgweb hosts to 'pull', so the revlog method should work with that.

I've actually packaged our hgweb heads to a point where they're easily deployable on EC2 or with Docker using the puppet module. You're welcome to access it here and set it up in a VM (or in Docker). http://github.com/bkero/puppet-module-hg/

Have you done any search as to how much deduplication actually helps with mercurial's storage formats? I've recently gone through our repositories and ensured they're "bare" checkouts (ie have a 'null working copy). As for ZFS, it sounds interesting, although my regard for ZFS on Linux are not that high (FUSE module is slow, license incompatibility with the Linux kernel, memory corruption problems for the new ZFS module).

I would be interested to see what the performance is, especially with compression and deduplication on real hardware though.
(In reply to Ben Kero [:bkero] from comment #74)
> The future architecture simply uses SSH identities to tell the hgweb hosts
> to 'pull', so the revlog method should work with that.

This shouldn't be too hard to implement. We could even hack something together that still uses sqlite on both ends if that's what we wanted!

> Have you done any search as to how much deduplication actually helps with
> mercurial's storage formats? I've recently gone through our repositories and
> ensured they're "bare" checkouts (ie have a 'null working copy). As for ZFS,
> it sounds interesting, although my regard for ZFS on Linux are not that high
> (FUSE module is slow, license incompatibility with the Linux kernel, memory
> corruption problems for the new ZFS module).
> 
> I would be interested to see what the performance is, especially with
> compression and deduplication on real hardware though.

I have an EC2 instance running http://zfsonlinux.org/ (native kernel module) and am seeing promising results without any of the memory corruption you speak of (so far at least). I have the major Firefox repos all cloned and syncing and am seeing 2.05x dedupe ratio. But that's with an automatically chosen record size of 128k. I'm pretty sure the dedupe ratio would be better with a smaller record size.

There are some "hidden" Mercurial modes to change the internal storage format to facilitate better dedupe ratios. I've also considered the idea of writing a Mercurial extension to change Mercurial's storage format to block align so a block-level deduping filesystem (like ZFS) would effectively only store each distinct changeset once on disk. But I need to run these ideas by Mercurial core devs first.

IMO the big win for ZFS on the server would be practically instantaneous and free server-side clones of existing repos. Want a personal clone of mozilla-central? Just take a zfs snapshot of the mozilla-central filesystem and run `zfs clone`. The operation should take a few milliseconds and a few dozen kb on disk. Contrast with multi-minute wait times today. This kind of flexibility opens up all kinds of possible developer workflows, including throwaway repositories. Just imagine if pushing to try involved zfs cloning mozilla-central on the fly. Then, you could have your own personal repo/sandbox to play around in with almost zero effort or overhead.

If you're concerned about ZFS on Linux (I think that's a rational concern), we could always serve Mercurial from a BSD or even an Illumos distro. Both have rock solid ZFS implementations. And Mercurial is merely Python + HTTP server + SSH, so it can really be hosted on pretty much any OS with relative ease.
I'm all for a better Mercurial system, but let's not lump that in here. This is a specific piece of the puzzle. Larger redesigns are out of scope for this change.

We need to stop talking about various ways to do this and actually *do* it.

Ben, you and I have already talked about your work to synchronize the sqlite files. Let's proceed with that. It's not perfect, but it unblocks the more important project (NFS), and does so in such a way that we don't need any special programming ability or outside help... something that cannot be said about any of the other proposed solutions.


All discussions about moving to revlogs, MySQL, or ZFS should be shunted to other (new, and probably separate) bugs as potential improvements we could make to the architecture, *after* this one. I'm happy to entertain them, but not in here and not right now.

To that end, I'm retitling this bug to be more specific.
Summary: Move pushlog to different backend → Replicate sqlite pushlog files to all mercurial hosts so we can eliminate NFS
This patch implements a Mercurial extension that transfers pushlog data
over the Mercurial wire protocol. It does it seemlessly when running `hg
pull` from a client. It satisfies the requirements of this bug and keeps
pushlog stored in sqlite (for better or worse).

More info is documented in the extension. Activate the extension and run
`hg help -e pushlogsync` to see what it can do.

I haven't written any tests. But I did try this out locally with the
sqlite db copies from m-c that were provided to me a few months ago.
Assignee: server-ops-devservices → gps
Status: NEW → ASSIGNED
Oh, bzexport and its greedy bug grabbing.
Assignee: gps → server-ops-devservices
(In reply to Gregory Szorc [:gps] (in southeast Asia until Dec 14) from comment #78)
> Oh, bzexport and its greedy bug grabbing.

I believe --no_take_bug should work :-)

https://hg.mozilla.org/users/tmielczarek_mozilla.com/bzexport/file/c604ded99c59/__init__.py#l1014
Latest information on deploying the sqlite replication:

Currently weighing options for how to execute the hook as the committing user with very permissive key permissions vs running it as the 'hg' user with a very-locked-down sudo command.

Past that, it's simply a matter of adding a (repo-specific for testing, then) global hook of:

$ /usr/local/bin/repo-push.py $REPO

This is an example of the time and output.

$ time /usr/local/bin/repo-push.py integration/mozilla-inbound
integration/mozilla-inbound already exists, pulling
pulling from ssh://hg.mozilla.org/integration/mozilla-inbound
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 6 changes to 6 files
(run 'hg update' to get a working copy)

real    0m4.031s
user    0m0.008s
sys     0m0.002s

Using this system we are able to add webhead mirrors anywhere that has SSH access from the host. Right now this only consists of one machine.

$ cat /etc/mercurial/mirrors
hgweb1.dmz.scl3.mozilla.com
bkero: where is repo-push.py?

I'm looking at https://github.com/bkero/puppet-module-hg/blob/master/templates/mirror-pull.erb and have a few comments.

1) It isn't necessary to create a lock when doing hg pull. Mercurial itself will obtain a write lock on the local repo when doing any kind of write operation. If we actually need this lock, my money is on extensions we have deployed not obtaining locks properly. This reminds me - the patch I posted last night doesn't obtain this lock, so it is prone to race conditions.

2) When doing hg pull on the mirror, I /think/ we should specify a changeset to pull. e.g. `hg pull -r aae7add536d9`. In case there are weird file atomicity issues on the server, this will prevent clients from pulling down partial writes on the server. This should /never/ happen. Alas it happens for us now because we are using NFS. Another benefit of this approach is it prevents clients from getting farther ahead when we want them to. The downside is pushes in rapid order may take slightly longer to sync. But I doubt we high enough volume on many repos for this to be a practical concern. All that being said, I can go both ways on this issue. Once NFS is out of the way, the corruption on pull issue *should* go away, so `hg pull` should be completely safe.
gps:

repo-push.py is still in the old (misnamed) hg-new module, which is not published. Since the software stack on the commit servers is more difficult to replace, I've held that off until the new webhead format is done.

Will the mercurial lock block until free, or will it just fail? We're looking for the blocking behavior here. If that's available with mercurial I can remove the lockfile workings. That would simplify things and make me happy inside.

I can certainly add a flag for passing a changeset to pull. Will this behave as expected with repositories with multiple heads?

The pace of changes coming in is not that great. For example, here is try/, our busiest repository for yesterday, with email addresses x'd out to protect the innocent.

[root@hgssh1.dmz.scl3 log]# grep "^Fri Dec 13" hg.log|grep try$
Fri Dec 13 00:30:05 PST 2013 - x - try
Fri Dec 13 00:39:53 PST 2013 - x - try
Fri Dec 13 00:58:53 PST 2013 - x - try
Fri Dec 13 01:22:09 PST 2013 - x - try
Fri Dec 13 02:24:46 PST 2013 - x - try
Fri Dec 13 02:35:33 PST 2013 - x - try
Fri Dec 13 02:36:34 PST 2013 - x - try
Fri Dec 13 02:52:39 PST 2013 - x - try
Fri Dec 13 02:55:40 PST 2013 - x - try
Fri Dec 13 03:19:38 PST 2013 - x - try
Fri Dec 13 03:28:00 PST 2013 - x - try
Fri Dec 13 03:39:47 PST 2013 - x - try
Fri Dec 13 03:44:35 PST 2013 - x - try
Fri Dec 13 04:13:35 PST 2013 - x - try
Fri Dec 13 04:39:36 PST 2013 - x - try
Fri Dec 13 04:57:24 PST 2013 - x - try
Fri Dec 13 05:10:56 PST 2013 - x - try
Fri Dec 13 05:23:43 PST 2013 - x - try
Fri Dec 13 05:28:52 PST 2013 - x - try
Fri Dec 13 05:37:01 PST 2013 - x - try
Fri Dec 13 05:48:09 PST 2013 - x - try
Fri Dec 13 05:57:40 PST 2013 - x - try
Fri Dec 13 05:58:55 PST 2013 - x - try
Fri Dec 13 06:00:01 PST 2013 - x - try
Fri Dec 13 06:04:26 PST 2013 - x - try
Fri Dec 13 06:05:48 PST 2013 - x - try
Fri Dec 13 06:06:13 PST 2013 - x - try
Fri Dec 13 06:06:18 PST 2013 - x - try
Fri Dec 13 06:06:44 PST 2013 - x - try
Fri Dec 13 06:14:52 PST 2013 - x - try
Fri Dec 13 06:22:56 PST 2013 - x - try
Fri Dec 13 06:29:57 PST 2013 - x - try
-r was taken, so I used '-c' for a flag (changeset) in the repo-push script. This argument is optional.

Diff is available in the github commit log. https://github.com/bkero/puppet-module-hg/commits/master
(In reply to Ben Kero [:bkero] from comment #82)
> gps:
> 
> repo-push.py is still in the old (misnamed) hg-new module, which is not
> published. Since the software stack on the commit servers is more difficult
> to replace, I've held that off until the new webhead format is done.

Boo. We should really get some more eyeballs on it.

> Will the mercurial lock block until free, or will it just fail? We're
> looking for the blocking behavior here. If that's available with mercurial I
> can remove the lockfile workings. That would simplify things and make me
> happy inside.

For most repo operations that acquire a lock, Mercurial will wait up to ui.timeout (default 600) seconds before giving up and aborting.

> I can certainly add a flag for passing a changeset to pull. Will this behave
> as expected with repositories with multiple heads?

If you say `hg pull -r X`, only X and all of its ancestors will be pulled. For multi-headed repos, you'll need to be sure to pull in every head.

Thinking about it more, it should be safe to pull down every changeset - assuming the master isn't hosted on NFS. The problem we have today is that some changesets are being exposed to clients because NFS is reordering when writes are exposed. If a client pulls at just the right time (typically during or immediately after a large push), the client repo can get corrupted. Moving off NFS should eliminate the reordering, make this a non-issue, and make `hg pull` safe for replication.
The work for this: Mirror sqlite pushlog files out to hgweb hosts has been done. If you're interested in commenting on the actual implementations of the mirroring or pushlog scripts, please file a new bug or ping me on IRC
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Component: Server Operations: Developer Services → General
Product: mozilla.org → Developer Services
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: