Closed Bug 1104374 Opened 10 years ago Closed 8 years ago

Query pushlog from last push ID, not changeset

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gps, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Automation scheduling currently performs an incremental pushlog fetch by querying for "changes since changeset X was pushed:"

https://hg.mozilla.org/build/buildbotcustom/file/151dc655a2cc/changes/hgpoller.py#l231
https://hg.mozilla.org/build/buildbotcustom/file/151dc655a2cc/bin/hgpoller.py#l72
https://hg.mozilla.org/build/mozharness/file/3eb428902957/scripts/b2g_bumper.py#l339

This works and is technically correct. However, our new Try server architecture won't work this way. In the new Try world, a single changeset can be pushed multiple times to the repository and it will be stored in a separate bundle for each push it was part of. Therefore, querying for "changes since changeset X" is ambiguous since X can occur multiple times.

We need automation to change its json-pushes polling to be based on "since push ID Y" instead.

An alternative is "pushes since date Z." However, there are no essentially no locks in the new Try server and multiple pushes can occur with the same timestamp. The numeric push IDs are a stronger identifier for synchronization purposes and should be used.

This change blocks the deployment of the new Try server.

Tell me who to tag for review and I'll write the patches for this bug.
I don't think the existing queries will, in practice, request for "since changeset X" where X is not a previous push head.
I wonder if this impacts Treeherder; I believe Treeherder queries from changeset X, rather than push ID Y atm.  Flagging for confirmation.
Flags: needinfo?(emorley)
I was just looking at the treeherder code trying to make sense of it...
(In reply to Mike Hommey [:glandium] from comment #1)
> I don't think the existing queries will, in practice, request for "since
> changeset X" where X is not a previous push head.

Correct.

I could make this work without changing the pollers. However, that requires reimplementing or refactoring the json-pushes hgweb endpoint to be push-head aware instead of naive. I would strongly prefer to avoid hacking up that code. Although, maybe it ends up being easier. The pushlog code on the server is a mess :/
(In reply to Jonathan Griffin (:jgriffin) from comment #2)
> I wonder if this impacts Treeherder; I believe Treeherder queries from
> changeset X, rather than push ID Y atm.  Flagging for confirmation.

(In reply to Gregory Szorc [:gps] from comment #3)
> I was just looking at the treeherder code trying to make sense of it...

Treeherder used to use &startID=N rather than &fromchange=SHA, but was switched to the latter (IMO unnecessarily) as part of trying to fix the pushlog ingestion races/failed inserts. Using &fromchange=SHA also requires an additional join (aiui), so I'd happily switch back to improve perf (and presumably reduce the number of timeouts we hit). Bug 1076826 is filed for this.

Note that switching back to &startID=N means we'd hit bug 1065771 again - if we could get that fixed first (or come up with another solution for handling reset repos), that would be awesome :-)
Depends on: 1076826, 1065771
Flags: needinfo?(emorley)
I can review buildbot hg poller patches.
Depends on: 1114843
Fixed by bug 1289514 I think.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.