Closed Bug 725362 Opened 13 years ago Closed 11 years ago

Configure hg.mozilla.org/try and try-comm-central repositories as non-publishing (Mercurial 2.1 "phases")

Categories

(Developer Services :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mbrubeck, Assigned: bkero)

References

Details

(Whiteboard: [hg 2.1])

Attachments

(1 obsolete file)

Mercurial 2.1 added a new "phase" feature to prevent accidental rewriting of published commits.  However, this breaks the normal way of pushing in-progress patches to the Try repository.  For details, see:
https://wiki.mozilla.org/ReleaseEngineering/TryServer#hg_phases

This can be fixed by adding the following to the hgrc file for the Try repo on hg.mozilla.org:

[phases]
publish = False

This requires the server to be running hg 2.1.  If an upgrade is not already planned, then a separate bug should be filed and added as a blocker to this bug.

For more documentation, see:
http://selenic.com/hg/file/default/mercurial/help/phases.txt
http://mercurial.selenic.com/wiki/Phases
Assignee: nobody → server-ops
Component: Hg: Customizations → Server Operations
QA Contact: hg.customizations → cshields
No plans for hg 2.1 right now (will probably do that in the SCL3 migration), will make this hgrc change when we do.
Assignee: server-ops → cshields
When is the SCL3 migration slated for?

As developers update their hg clients, more and more of us are going to run into this problem.  By far the simplest fix would be a server-side change.
An upgrade is nowhere on my radar right now and fitting one in is a huge inconvenience to other urgent work we have going on right now.

Can you describe "this breaks the normal way of pushing in-progress patches to the Try repository." in more detail please?
(In reply to Corey Shields [:cshields] from comment #3)
> Can you describe "this breaks the normal way of pushing in-progress patches
> to the Try repository." in more detail please?

After pushing any local patches to Try, the developer will get an error message the next time they try to change their patch.  There's a workaround for these errors, though not a very obvious one.  Justin documented the workaround here:

https://wiki.mozilla.org/ReleaseEngineering/TryServer#hg_phases

Until the server-side config change is in place, pretty much everyone who pushes to Try with hg 2.1 will run into this error.  Using the workaround is not much trouble once you know about, but there'll be a lot of confusion from developers trying to figure out what is happening.  We could mitigate this by posting notices the newsgroups and Planet Mozilla.
Component: Server Operations → Hg: Customizations
Component: Hg: Customizations → Server Operations
Yesterday I've (guilelessly) updated via MacPorts to Mercurial 2.1. Since then I'm not able anymore to update comm-beta via client.py. Is this also because of this new phase feature? And yes, how can I fix it?


$ python client.py checkout --mozilla-repo=http://hg.mozilla.org/releases/mozilla-beta/ --skip-chatzilla --skip-venkma
Executing command: ['hg', 'pull', '-R', './.']
Rufe von http://hg.mozilla.org/releases/comm-beta/ ab
Suche nach Änderungen
Keine Änderungen gefunden
The exception was:
subprocess.CalledProcessError: Command '['hg', 'pull', '-R', './.']' returned non-zero exit status 1

Retrying previous command: 1 of 1 time(s)
Executing command: ['hg', 'pull', '-R', './.']
Rufe von http://hg.mozilla.org/releases/comm-beta/ ab
Suche nach Änderungen
Keine Änderungen gefunden
The exception was:
subprocess.CalledProcessError: Command '['hg', 'pull', '-R', './.']' returned non-zero exit status 1

Traceback (most recent call last):
  File "client.py", line 609, in <module>
    do_hg_pull('.', options.comm_repo, options.hg, options.comm_rev)
  File "client.py", line 330, in do_hg_pull
    check_call_noisy(cmd, retryMax=options.retries)
  File "client.py", line 158, in check_call_noisy
    raise Exception("Command '%s' failed %d time(s). Giving up." % (cmd, retryMax + 1))
Exception: Command '['hg', 'pull', '-R', './.']' failed 2 time(s). Giving up.
Did you read the Mozilla wiki page in comment 0 and try the solution it suggested?
Yes, I've read that. I've added this "Disable hg phases with a post-push hook" line to my .hgrc, but it doesn't help.
Then yours is likely a separate bug, possibly also caused by hg 2.1.
(In reply to Nomis101 from comment #7)
> Yes, I've read that. I've added this "Disable hg phases with a post-push
> hook" line to my .hgrc, but it doesn't help.

Adding that line to your .hgrc will prevent the problem from recurring, but it won't correct the state of already-pushed changesets. You need to run the command (|hg phase --force --draft "mq()"|) manually for that.
When this is done, please do the same for try-comm-central as well.
Summary: Configure hg.mozilla.org/try repository as non-publishing (Mercurial 2.1 "phases") → Configure hg.mozilla.org/try and try-comm-central repositories as non-publishing (Mercurial 2.1 "phases")
Assignee: cshields → server-ops-devservices
Component: Server Operations → Server Operations: Developer Services
QA Contact: cshields → shyam
Whiteboard: [hg 2.1]
Assignee: server-ops-devservices → bkero
bc suggested that another cause of the high try server load might be:

bc	edmorley: how many of the runs on try are due to empty submissions related to that hg phase/publish foo?
bc	if you have hg 2.1 or later, you get into trouble and can sometimes submit a try run with just the try patch instead of your others.
...
bc	that phase/publish thing marks your mq patches as public or something. if you submit a job and then cancel it and try to submit a new one without fixing your existing queue new try submissions will just have the 'try patch'

I can't investigate what proportion of runs is hitting the above at the moment, due to hg.m.o/try/foo timing out (bug 770811).

If it is a sizable factor, I guess we could add a hook to check for it, to cover us until this bug is resolved?
Blocks: 772458
Although might have been a false alarm:

bc	 : looking at tbpl try i see a number of commits with only the try message. anyway to put a hook in to prevent empty changesets?
edmorley : tbpl only shows changesets it hasn't seen before in the display (pending bug 771651), but still actually runs what it should
edmorley : some people also put their patches in the try: cset
edmorley : are there ones you've seen that aren't either of those?
bc       : not sure. i can't load csets from tpbl. it just times out and resets.
bc       : maybe what i was seeing during my recent try was just the non display of previous csets.
No longer blocks: 772458
Depends on: 741353
Attached file extension to solve issue server side (obsolete) (deleted) —
This issues got reported to me once again. I took 20 minutes forging a basic extensions that solve the issue *server* side. So contributor will not need to jungles with hook anymore.

I still recommend upgrade to newer version.
We are testing out a 2.5.4 update which should be rolling out later this week.
Can the .hgrc change be made during the downtime for the hg 2.5.4 upgrade? It's one of the main reasons for the upgrade IMO.
Flags: needinfo?(bkero)
Flags: needinfo?(bkero)
This is now actionable, and would be a good developer win made possibly by our mercurial server upgrade, ben when do we suspect we can get to it?

Hal do you think this is a change that needs CAB [I'm inclined to think not, but worth asking]?
Flags: needinfo?(hwine)
Flags: needinfo?(bkero)
(In reply to Justin Wood (:Callek) from comment #18)
> Hal do you think this is a change that needs CAB [I'm inclined to think not,
> but worth asking]?

Regardless of going through cab, we need to have a plan to a) make the change, b) test the change, c) roll back if any issues.

Also, this will change behavior for users (or we wouldn't do it). We'll need to at least announce before or after that the change is happening, so folks don't get surprised.

Based on how big these updates are will determine if it actually needs to go to CAB.
Flags: needinfo?(hwine)
I have configured the hgssh.stage.dmz.scl3 server's mozilla-central to be non-publishing. Testing can be done on there.

From my reading it seems the Mercurial developers made this to be backwards compatible and unobtrusive to users who did not wish to use phases.

Please correct me if I'm wrong, but enabling and using non-publishing phases will not affect the workflow of anybody who does not specifically want to use it.

This is easy to enable on a global basis (in /etc/mercurial/hgrc), or on a per-repository level. Since the change is unobtrusive I don't see a problem enabling it globally.

Questions? Concerns?
Flags: needinfo?(bkero)
(In reply to Ben Kero [:bkero] from comment #20)
> I have configured the hgssh.stage.dmz.scl3 server's mozilla-central to be
> non-publishing. Testing can be done on there.
> 
> From my reading it seems the Mercurial developers made this to be backwards
> compatible and unobtrusive to users who did not wish to use phases.
> 
> Please correct me if I'm wrong, but enabling and using non-publishing phases
> will not affect the workflow of anybody who does not specifically want to
> use it.
> 
> This is easy to enable on a global basis (in /etc/mercurial/hgrc), or on a
> per-repository level. Since the change is unobtrusive I don't see a problem
> enabling it globally.

phases were indeed meant to be unobtrusively a new feature, however its also a correctness/safety net feature.

Which basically by default provides for "if you push it to a public repo, we make it really hard for you to alter the cset locally" -- which is a state that makes sense for m-c, the pref flip this bug advocates changes the default for clients that know about phases only, to allow try to be "non-publishing" basically tells the client that anything pushed to try/ is not necessarily used anywhere. Which allows people to easily rebase, modify, etc their csets.

I would only advocate enabling it on try and try-comm-central
That sort of defeats the purpose of phases altogether, though. After all, pushing to try isn't really "publishing" a changeset, but pushing to other repositories really is.
> Please correct me if I'm wrong, but enabling and using non-publishing phases will not affect the 
> workflow of anybody who does not specifically want to use it.

> That sort of defeats the purpose of phases altogether, though. After all, pushing to try isn't 
> really "publishing" a changeset, but pushing to other repositories really is.

Right.  We should configure only temporary repos to be non-publishing; the others should be left as-is.
I can and am certainly willing to push this change to /try and /try-comm-central whenever it is deemed appropriate.

I believe :hwine wanted to do testing on this, and on hgssh.stage I have provided him a means to do so.
I'm personally less concerned with "in advance" testing on this one, but queue :hwine for further consideration/testing.
Flags: needinfo?(hwine)
(disclaimer: I'm the main developer of phases and changesets evolution in Mercurial)

The comment #21 from Justin Wood (:Callek) got it all right.

TLDR; repo people pull from should be publishing. Repo where changesets are only pushed can be non-publishing.

All repos used to share changesets (used for both push//pull) should be left as is with the default (publishing=True). This will prevent rewriting of changesets already shared with other people and associated issues.

The try and try-comm-central are push only repositories, used only for testing purposes. People can safely rewrite changesets pushed there (its even the purpose of the try server). Try servers should be set to publishing=False.

There should not be performance impact for using a try server, let use know if anything suspicious is detected.
(In reply to Justin Wood (:Callek) from comment #25)
> I'm personally less concerned with "in advance" testing on this one, but
> queue :hwine for further consideration/testing.

I'm more concerned that we have a plan to test the effect on try than when/where it happens. And that plan isn't completely "let the users find the problems". (When/where should be based on risk.)

Based on comment 21 and comment 26, there appears to be little risk to the hg repository integrity itself. However, no one has yet mentioned interaction with pushlog and tbpl. 

The "rewrites" only exist in the dev's local repository. When pushed to the server, they appear as multiple heads just as they always have. (I've demonstrated that locally.)

Since the server contents don't change, everything _should_ work okay. I recommend that post deployment testing include pushing a mutated cset. Both the original revision and the mutated revision should be accessible via tbpl.
Flags: needinfo?(hwine)
From the server's perspective, a 'mutated' changeset has no relation to the original. The only thing phases does (AIUI) is mark changesets you pushed to a (publishing, which is the default) server as "published", so that mq will refuse to change or delete them afterwards. If you mutate a changeset then you've created an entirely new changeset with the default unpublished phase.

Also, setting this should functionally just make the server act like an older version of Mercurial (or like it does with older client versions).
What's the status of this bug?
I enabled this feature on the hg staging server for testing use long ago. I'm waiting for the OK from hwine or releng to deploy this.
Could I suggest, in the absence of a formal test plan (comment 27), we simply roll this out to try and rely on users to find and report issues?

It seems to me that the absolute worst thing that could happen here is that we have to reset try.
Given the low impact of this change I would be fine with that. I need the go-ahead from releng though. Marking infoneeded.
Flags: needinfo?(hwine)
Sheriffs are more than happy to just push to prod and see - this is only affecting Try & we already periodically reset it anyway, so impact would be minimal if it didn't work out.
(In reply to Ben Kero [:bkero] from comment #32)
> Given the low impact of this change I would be fine with that. I need the
> go-ahead from releng though. Marking infoneeded.

Ben, lets make this change. And see if anyone complains, I don't expect a single issue though.

Remember *only* on try, and try-comm-central.
Flags: needinfo?(hwine)
To elaborate on c#34, joduinn wants to clear this round of releases and the weekend b2g needs.

Lets schedule this for any day next week. Per IRC convo, both me and jlebar will jointly be on the hook if anything goes wrong, though neither of us expect a thing we do accept it is a non-zero risk level.
Before this goes live in production:

1) has anyone answered hwine's questions in comment#27?
2) what testing has been done? And ™what is the rollback plan if anything goes wrong?
3) please confirm this *only* impacts try repo and try-comm-central. Explicitly *no* other repos will be changed.

In irc by jlebar and callek that this is "low risk, but non-zero". However, given the current ongoing FF24.0b1 fun and the upcoming B2G handoff this weekend, we agree to *not* do this now and will do this next week at a low-disruption time.
1) This only affects try/try-comm-central.
2) An hg core developer says this change will have minimal impact (comment 26).
3) Worst case scenario we have to reset try, something that happens at short notice & not infrequently anyway.
4) Resetting try does not require closure of other trees.

It's just pretty frustrating that 18 months in, it seems like we're still dragging our feet here, for something that does not affect the main development trees & has an easy rollback plan :-(
(In reply to Hal Wine [:hwine] from comment #27)
> (In reply to Justin Wood (:Callek) from comment #25)
...
> Based on comment 21 and comment 26, there appears to be little risk to the
> hg repository integrity itself. However, no one has yet mentioned
> interaction with pushlog and tbpl. 
...

Who can answer this question? Do we know that pushlog and tbpl will continue to work?
(In reply to John O'Duinn [:joduinn] from comment #38)
> Who can answer this question? Do we know that pushlog and tbpl will continue
> to work?

Yup
> Who can answer this question? Do we know that pushlog and tbpl will continue to work?

Everyone in this bug outside of releng believes that they will continue to work fine, including the developer of the feature.

Without running a test on the actual hardware with the actual production configuration, we don't have the ability to be 100% certain of this.  If your requirement is that we be 100% certain, then the release engineering org will have to test this themselves; nobody else is able to.
(In reply to Hal Wine [:hwine] from comment #27)
> (In reply to Justin Wood (:Callek) from comment #25)
> > I'm personally less concerned with "in advance" testing on this one, but
> > queue :hwine for further consideration/testing.
> 
> I'm more concerned that we have a plan to test the effect on try than
> when/where it happens. And that plan isn't completely "let the users find
> the problems". (When/where should be based on risk.)

Both myself and a few others have tested this with local clone mini examples. Other orgs have also tested the feature live. No-one (to my knowledge) has tested this explicitly with hg.m.o. However local users have used extensions and hgrc configs to force the same affect as this change would do for years since we deployed hg 2.1+ for our server.

So I don't feel any additional testing is required [wouldn't hurt, but I don't think we should block on it]

> Based on comment 21 and comment 26, there appears to be little risk to the
> hg repository integrity itself. However, no one has yet mentioned
> interaction with pushlog and tbpl. 

This would not affect pushlog in any way, the data store is completely seperate from what this code touches. TBPL is entirely fed off pushlog.

> Since the server contents don't change, everything _should_ work okay. I
> recommend that post deployment testing include pushing a mutated cset. Both
> the original revision and the mutated revision should be accessible via tbpl.

I agree, but given the above I don't feel we necessarily need to block anything on said post-deployment testing happening, since if this mutated thing fails we're as we are today, if it works that is the exact reason of this bug! That said, I wonder if anyone here can agree to do such?

(In reply to John O'Duinn [:joduinn] from comment #36)
> Before this goes live in production:
> 
> 1) has anyone answered hwine's questions in comment#27?

I copied his comment in and answered above. (I think some points were sporradically answered already, but I'm being explicit)

> 2) what testing has been done? And ™what is the rollback plan if anything
> goes wrong?

for what testing has been done, see above.
For rollback plan - One of two solutions, depending on nature of the problem. We can either just drop the config setting itself, which will be a minutes change, or we can reset try which would be an hours thing.

> 3) please confirm this *only* impacts try repo and try-comm-central.
> Explicitly *no* other repos will be changed.

The feature is configurable either globally or per-repo, I'm explicitly asking IT to set it per-repo for only try and try-comm-central, setting it anywhere else is *completely* out of scope, and undesired.
(In reply to Ed Morley [:edmorley UTC+1] from comment #39)
> (In reply to John O'Duinn [:joduinn] from comment #38)
> > Who can answer this question? Do we know that pushlog and tbpl will continue
> > to work?
> 
> Yup

Great, thanks Ed.
(In reply to Ed Morley [:edmorley UTC+1] from comment #37)
> 1) This only affects try/try-comm-central.
yep.

> 2) An hg core developer says this change will have minimal impact (comment
> 26).
yep.

> 3) Worst case scenario we have to reset try, something that happens at short
> notice & not infrequently anyway.
As already discussed above, and in irc, with everyone around, its a few hour try closure to turn the config off, reset the try repo and see new builds/tests go green on the new repo. From chat w/callek, the OMG rollback plan will be: 
** close try branch
** revert config setting
** reset try repo
** trigger clean builds/tests
** reopen try branch
Lets coordinate having all the right people present when we do this, and pick a day/time next week that does not impact any work for releases. 


> 4) Resetting try does not require closure of other trees.
Correct, but it *does* block developers who are landing patches to *-inbound and then onwards to m-c/b2g18 (for b2g) and developers who are landing fennec respin patches (for FF24.0b1). Hence the defer.


> It's just pretty frustrating that 18 months in, it seems like we're still
> dragging our feet here, for something that does not affect the main
> development trees & has an easy rollback plan :-(
Yep, agreed this took a long time. To be fair, the hg server+client upgrades from 2.0.2 --> 2.5.4 were "non-trivial" pre-requisites that took a ton of work. Turning this config on is now possible because of all that upgrade work completed in Q2.
(In reply to Ben Kero [:bkero] from comment #32)
> Given the low impact of this change I would be fine with that. I need the
> go-ahead from releng though. Marking infoneeded.

Sorry I missed this question. As has been documented above, the folks who will bear the burden have agreed to go live and we have a documented rollback plan.

One last "nice to have" would be instructions to folks who have already implemented the work around described at https://wiki.mozilla.org/Build:TryServer#hg_phases (and that page will need updating).

I wasn't clear above -- I'm hoping someone who has been actively working with phases already has the answers and can provide that information to the rest of the dev community. I do not have these answers.
> One last "nice to have" would be instructions to folks who have already implemented the work around 
> described at https://wiki.mozilla.org/Build:TryServer#hg_phases (and that page will need updating).

No action by people who have implemented the workaround is needed.  The workaround forces hg to apply the 'draft' phase to your patch queue.  With this bug fixed, the workaround will be redundant but harmless.

I'll be happy to rework the wiki page wrt the work-around once we fix this bug.
(In reply to comment #45)
> > One last "nice to have" would be instructions to folks who have already implemented the work around 
> > described at https://wiki.mozilla.org/Build:TryServer#hg_phases (and that page will need updating).
> 
> No action by people who have implemented the workaround is needed.  The
> workaround forces hg to apply the 'draft' phase to your patch queue.  With this
> bug fixed, the workaround will be redundant but harmless.
> 
> I'll be happy to rework the wiki page wrt the work-around once we fix this bug.

I can also post to dev.platform about it.
Assuming all reasonable concerns have been satisfied, when should we schedule to have this deployed?
Flags: needinfo?(hwine)
Hey Ben,

So I had a todo to address the scheduling here today, but was holding out for concrete resolutions on Beta L10n releng issues we had hit.  We're still working on those, but I expect us to be done "soon", so I will press forward with a caveat that there may be a slight delay if the bubble gum and chicken wire we used to hold together the l10n process breaks free while we do beta2.

Per chat with :joduinn we're comfortable with this happening outside of the normal CAB/downtime schedule, and would prefer it to be a weekday (so we're all around if anything goes wrong) and early ET (so we can do so at relatively low traffic).

I propose Thursday, Aug 15, at 6am PT (9am ET), does that work for you?

I also ask that we plan for 1 hour of "must be here, able to dedicate to this task" at that time, but actual try closure will likely be smaller than that.

Someone from IT and Releng ["me"] will need to remain around during the day incase anything goes wrong ready to do either a simple pref flip or all-out Try Reset [as our worst-case]

Is this time something you (or someone else who understands the hg.m.o setup, and can troubleshoot if anything seems wonky) can allocate yourself to?

Sheriffs (edmorley/RyanVM) is there anything in the above you would disagree with enough to postpone or cancel the plan here?
Flags: needinfo?(ryanvm)
Flags: needinfo?(hwine)
Flags: needinfo?(emorley)
Flags: needinfo?(bkero)
Sounds great to me.
Flags: needinfo?(ryanvm)
Flags: needinfo?(emorley)
Aug 15, 6am PDT/9am EDT works. I'll handle the hgrc changes and be around all morning.
Flags: needinfo?(bkero)
sgtm :-)
Working fine :-)

{
$ hg qnew trychooser-foo.diff -m "testing tryserver phases update; try: -b o -p win32 -u none -t none on a CLOSED TREE"

$ hg push ssh://hg.mozilla.org/try -f
pushing to ssh://hg.mozilla.org/try
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 15 changesets with 11 changes to 48 files (+1 heads)
remote: Tree try is CLOSED! (https://treestatus.mozilla.org/try?format=json) - Short closure for Bug 725362
remote: But you included the magic words.  Hope you had permission!
remote: Looks like you used try syntax, going ahead with the push.
remote: If you don't get what you expected, check http://trychooser.pub.build.mozilla.org/
 for help with building your trychooser request.
remote: Thanks for helping save resources, you're the best!
remote: Trying to insert into pushlog.
remote: Please do not interrupt...
remote: Inserted into the pushlog db successfully.
remote: You can view the progress of your build at the following URL:
remote:   https://tbpl.mozilla.org/?tree=Try&rev=fade20500af2

$ hg qpop -a
popping trychooser-foo.diff
patch queue now empty
}

(ie previously wouldn't have been able to pop the mq, since I'd have to change the phase to draft first)
Attachment #701377 - Attachment is obsolete: true
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
I posted a PSA to dev.platform as promised.
Now when I try to get back to a previouse mq patch, it seems to also apply every other patch I pushed to try before?
(In reply to Martijn Wargers [:mwargers] (QA) from comment #54)
> Now when I try to get back to a previouse mq patch, it seems to also apply
> every other patch I pushed to try before?

How are you pushing to try? Did you clone the try repo? IIUC, you can push to try from another repos to avoid this: https://wiki.mozilla.org/ReleaseEngineering/TryServer#Pushing_to_try
Depends on: 963074
Component: Server Operations: Developer Services → General
Product: mozilla.org → Developer Services
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: