Closed Bug 858953 Opened 12 years ago Closed 11 years ago

Updates can get messed up for *en-US* and/or locales if the nightly builds happen on older changesets than the latest nightly

Categories

(Release Engineering :: General, defect)

defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whimboo, Unassigned)

References

Details

(Whiteboard: [qa-automation-blocked])

Our Mozmill tests have been shown that we do not offer updates to todays localized builds on the Nightly and Aurora channel. On both an empty AUS snippet is getting delivered. Only for en-US updates are working. Aurora test results: http://mozmill-ci.blargon7.com/#/update/report/36bf148e4d23c6904e5ade75a42cff00 Nightly: http://mozmill-ci.blargon7.com/#/update/report/36bf148e4d23c6904e5ade75a43fb0a2
This is still happening with yesterdays builds. Aus snippets are still empty.
As I write (when nightlies for 2013-04-08 are running so this will soon be out of date) there should mostly be snippets pointing the 2013-04-07 nightly (only mostly because some locales failed the win32 07 nightly: ml, es-AR, mk, kn, ku). But in aus3-staging:/opt/aus2/incoming/2/Firefox/mozilla-central/WINNT_x86-msvc/20130406030922 there are a range of things: * pointing at 2013-04-05: 32 locales * pointing at 2013-04-06: 54 locales * pointing at 2013-04-07: 20 locales (bg, de, en-US, es-ES, fy-NL ...) The first two are unexpected. Taking da as an example of the second case, the partial points at url=http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013/04/2013-04-06-03-09-22-mozilla-central-l10n/firefox-23.0a1.da.win32.partial.20130406030922-20130406030922.mar NB: identical buildIDs, and it's a tiny mar file (18KB). This strongly implies that downloading the previous .exe, .zip and complete.mar is going wrong in the nightly repack jobs. Possibly a slave change, possibly ftp.m.o not returning the freshest data.
Nick, something related to bug 703559?
I just checked Firefox Nightly 23.0a1 2013-04-07 de and it received partial updates to 2013-04-08.
The new builds seem to work now. At least for de, fr, and it. Those are the locales we currently test.
Sounds like this is working now, so closing. If I've missed something that still needs to be done, please reopen with details.
Status: NEW → RESOLVED
Closed: 12 years ago
Component: Release Engineering → Release Engineering: Automation (General)
QA Contact: catlee
Resolution: --- → FIXED
Not completely fixed. AUS snippets are still empty at least for updates on Windows and for Nighlty builds. Aurora seems to be fine. https://aus3.mozilla.org/update/3/Firefox/23.0a1/20130405103453/WINNT_x86-msvc/fr/nightly/Windows_NT%206.2/default/default/update.xml?force=1
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This is a chronic issue; I'm morphing the bug to talk about that rather than having this bug flop back and forth as we hit specific instances of it. The reason this came up most recently on nightly is because Nightly was respun before all the repacks from the regularly scheduled one had completed. It's a known issue that the current AUS server can't cope with some locales being missing for some nightlies. This happens because the server looks at the second most recent buildid for a branch, and if there's no snippet there for the locale, it assume there's no update (https://mxr.mozilla.org/mozilla/source/webtools/aus/xml/inc/patch.class.php#704). When we have two or more nightlies triggered close together, l10n repacks end up stomping on each other in some way (I'm having trouble remember precisely how, but I believe it has something to do with the earlier repacks downloading the newer build, and then uploading to the wrong place on AUS). There's no real fix for this with the current update server AFAIK. Here are some ideas on how to avoid the situation that may or may not be possible, and may or may not be a net-win: * Don't trigger l10n for manually triggered nightlies. ** Means we wouldn't get l10n at all in cases where the nightly burned and had to be respun. ** Might break updates for locales (or maybe just partial updates for locales) in the subsequent nightly. * Coalesce together repack jobs where the locale matches ** Dunno if this is possible right now ** Definitely not possible when we move to chunked l10n builds ** Might break partials for l10n whenever coalescing happens * Make repacks download en-US from dated dirs, to make sure we never download the wrong one ** Dunno if the repack jobs have enough information to do this ** Might break partials for l10n whenever we have overlap The longer term fix for this is the new update server. We're actively working on getting it to the point where we can move Nightly users over to it, that work is tracked in https://bugzilla.mozilla.org/show_bug.cgi?id=832454. There's no definitive timeline on that work yet.
Summary: Empty AUS snippets offered for localized Aurora and Nightly builds → nightly-style builds stop serving updates for some locales when new nightlies are triggered before the previous one's repacks finish
(In reply to Ben Hearsum [:bhearsum] from comment #8) > This is a chronic issue; I'm morphing the bug to talk about that rather than > having this bug flop back and forth as we hit specific instances of it. > > The reason this came up most recently on nightly is because Nightly was > respun before all the repacks from the regularly scheduled one had completed. Oh, so it looks like I filed a dupe then. See bug 848406. > * Don't trigger l10n for manually triggered nightlies. > ** Means we wouldn't get l10n at all in cases where the nightly burned and > had to be respun. > ** Might break updates for locales (or maybe just partial updates for > locales) in the subsequent nightly. Our testing is based on Pulse. If the status of the build is failing we do not test those builds. But our update testing will be affected, correct. Couldn't we simply set the previous_builid of those builds to None? With that we can make sure not to execute the update tests. That might be the easiest solution for now without a lot of work. > * Make repacks download en-US from dated dirs, to make sure we never > download the wrong one > ** Dunno if the repack jobs have enough information to do this Have you ever looked into mozdownload? It can download a build given by the branch and build id or date: http://pypi.python.org/pypi/mozdownload. But not sure if you want to add a dependency and there might cases we still fail. > working on getting it to the point where we can move Nightly users over to > it, that work is tracked in > https://bugzilla.mozilla.org/show_bug.cgi?id=832454 Thanks.
(In reply to Henrik Skupin (:whimboo) from comment #9) > (In reply to Ben Hearsum [:bhearsum] from comment #8) > > This is a chronic issue; I'm morphing the bug to talk about that rather than > > having this bug flop back and forth as we hit specific instances of it. > > > > The reason this came up most recently on nightly is because Nightly was > > respun before all the repacks from the regularly scheduled one had completed. > > Oh, so it looks like I filed a dupe then. See bug 848406. Nick and I were talking about this yesterday and I'm not sure my analysis in my previous comment is correct. He pointed out that bug 517947 has been fixed for years, which means we shouldn't be losing update entirely when nightlies are respun before repacks complete. However, we did notice some other weirdness that we couldn't get to the bottom of. I suspect that this and bug 848406 are the same, regardless of the above.
Thanks. Feel free to close appropriately when it's clear.
This is getting more and more annoying in the last couple of days. We are getting spammed with broken update test results because of the underlying problem. If it doesn't stop or can't be fixed in a short term, we might have to disable update tests for localized nightly builds.
Whiteboard: [qa-automation-blocked]
Blocks: 808550
Axel, how do localizers feel on missing updates on aurora? Have they noticed those? I think I will disable our Mozmill update checks for localized builds by Monday given that we nearly get only failures in the last couple of days. This is not a satisfying situation. Sorry.
Flags: needinfo?(l10n)
I'm not responsible for localized builds, John's team is, thus forwarding that needinfo to him.
Flags: needinfo?(l10n) → needinfo?(joduinn)
Bhearsum, what should we do in this case? It seems like our only option is to turn off the tests until y'all either finish the new update server or fix this issue. However, if we turn these tests off then we won't know how frequently we're hitting this issue. We're between a rock and a hard place. :/
I don't have any good suggestions...this bug simply isn't high priority enough to get real attention.
That's fine, we all have priorities. Then I suppose we either make the tests cover up when they fail by re-trying a whole bunch and then if they still fail they mark as "don't care" in the tracker, or we turn them off altogether.
I don't think that we should retry to run those update tests again and again until they will pass. That will most likely the next day and then the buildids have been changed and our tests will fail anyway. So it looks like that we will go the hard way and stop testing updates for localized builds for Nightly, Aurora, and ESR17. I will try to get this changed in the next couple of days.
I have disabled update tests for localized builds in mozmill-ci for now: https://github.com/mozilla/mozmill-ci/pull/238
Status: REOPENED → NEW
This is strange. Today we got the same or a similar failure for en-US builds. The update snippet for a build from May 20th is empty: https://aus3.mozilla.org/update/3/Firefox/23.0a2/20130520004018/Linux_x86-gcc3/en-US/aurora/Linux%203.5.0-27-generic%20%28GTK%202.24.13%29/default/default/update.xml?force=1 What's wrong here? Does this bug not only affect localized builds?
Product: mozilla.org → Release Engineering
We had a similar situation again for Aurora builds from Aug 10th to 12th, which didn't update: http://mozmill-daily.blargon7.com/#/update/report/b7ef1fb3d9703aeaf2c46e07d272051c http://mozmill-daily.blargon7.com/#/update/report/b7ef1fb3d9703aeaf2c46e07d26a72a9 As of now it looks like that new update snippets are available to todays builds.
Looks like the issue with Aurora is not fixed yet and we still don't get updates. I will file a new bug for this particular issue.
Blocks: 905623
Blocks: 920453
NOTE: This happens when a more recent nightly build is triggered on an older changeset. At that point things get mesded up. This happens for *en-US* as well. This analysis should be useful for this bug 20453: Sequence of events - 63a505ec015c checked in on Saturday - ffxbld – Sat Sep 21 03:23:54 2013 PDT - Nightly build starts on Sunday Sep 22 2013 00:40:30 -- for push 63a505ec015c (correct changeset) - No nightly build on Monday since no code was checked in from the last nightly - 07a29b018edc checked in on Monday - bugzilla@standard8.plus.com – Mon Sep 23 04:04:02 2013 PDT - Many more check-ins on Monday - b34384409be6 checked in on Monday - ffxbld – Mon Sep 23 16:30:43 2013 PDT - Nightly build starts on Tuesday - Sep 24 2013 00:51:18 - 03:48 -- for push b34384409be6 (correct changeset) -- it finished before the next nightly was to be triggered - Nightly build starts on Tuesday - Sep 24 2013 05:56:06 -- for push 63a505ec015c (__older__ changeset!!) The last nightly should have not happened IIUC. All 3 nightly builds say this: The Nightly scheduler named 'mozilla-aurora nightly' triggered this build http://buildbot-master61.srv.releng.use1.mozilla.com:8001/builders/Linux%20x86-64%20mozilla-aurora%20nightly/builds/4 http://buildbot-master58.srv.releng.usw2.mozilla.com:8001/builders/Linux%20x86-64%20mozilla-aurora%20nightly/builds/31 http://buildbot-master63.srv.releng.use1.mozilla.com:8001/builders/Linux%20x86-64%20mozilla-aurora%20nightly/builds/19 [1] https://tbpl.mozilla.org/php/getParsedLog.php?id=28205526&tree=Mozilla-Aurora&full=1 buildid: 20130922004001 [2] https://tbpl.mozilla.org/php/getParsedLog.php?id=28278063&tree=Mozilla-Aurora&full=1 buildid: 20130924004001 [3] https://tbpl.mozilla.org/php/getParsedLog.php?id=28287311&tree=Mozilla-Aurora&full=1 buildid: 20130923004006 -- buildid of the *23rd*? The upload dates show this as well. Here are the last 4 nightly partial mar files in *descendent* order: https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013/09/?C=M;O=D https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013/09/2013-09-25-00-40-04-mozilla-aurora/firefox-26.0a2.en-US.linux-x86_64.partial.20130923004006-20130925004004.mar 25-Sep-2013 10:34 23th -> 25th (2 days jump!) https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013/09/2013-09-23-00-40-06-mozilla-aurora/firefox-26.0a2.en-US.linux-x86_64.partial.20130924004001-20130923004006.mar 24-Sep-2013 15:16 24th -> 23rd (backwards!) https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013/09/2013-09-24-00-40-01-mozilla-aurora/firefox-26.0a2.en-US.linux-x86_64.partial.20130922004001-20130924004001.mar 24-Sep-2013 10:45 22nd -> 24th (2 days jump!) https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013/09/2013-09-22-00-40-01-mozilla-aurora/firefox-26.0a2.en-US.linux-x86_64.partial.20130921004001-20130922004001.mar 22-Sep-2013 09:55 21st -> 22nd (normal) I can verify that I can update from the 22nd (complete update to the 23rd???) but I can't update from the build of neither the 23rd nor the 24th.
Summary: nightly-style builds stop serving updates for some locales when new nightlies are triggered before the previous one's repacks finish → Updates can get messed up for *en-US* and/or locales if the nightly builds happen on older changesets than the latest nightly
Found in triage: 1) From comment#25, it sounds like this happens when a nightly build is somehow-declared-not-good, and a 2nd nightly build is triggered on a previous/older changeset. 2) Balrog is currently serving updates for mozilla-central nightly builds, and soon for aurora builds also! bug#933161 is fixing this in the new update server (Balrog). 3) Given (1) and (2) reduce likihood of this happening, and given how complex this would be to fix in old AUS update server, I'd vote to close this bug as GOOD_ENOUGH_UNTIL_BALROG_RULES_THEM_ALL and focus efforts on bug#933161. bhearsum, whimboo, what say you?
Flags: needinfo?(joduinn)
Flags: needinfo?(hskupin)
Flags: needinfo?(bhearsum)
Balrog has been serving nightly updates on Aurora since at least last week. This is fixed everywhere except esr17/24. esr17 dies soon, and we'll probably backport balrog nightly updates to esr24 at some point. Even if we don't, we won't be doing anything to fix this in AUS3.
Status: NEW → RESOLVED
Closed: 12 years ago11 years ago
Flags: needinfo?(hskupin)
Flags: needinfo?(bhearsum)
Resolution: --- → FIXED
That's good to hear. Thanks all!
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.