nightlies based on same revision end up adding two entry points for the same file in Balrog blobs
Categories
(Release Engineering Graveyard :: Applications: Balrog (backend), defect, P1)
Tracking
(Not tracked)
People
(Reporter: mtabara, Unassigned)
References
Details
(Whiteboard: [releaseduty])
Somehow related to bug 1501167, we've hit something interesting earlier today in central.
Mozilla-central was closed on Wednesday, 20 March for most of the day so the 10:00am and 10:00pm UTC nightlies were based on the same revision. Since they were based on the same revision, they shared the same decision task, hence the same buildid which comes from parameters.yml's moz_build_date
.
This ended up in balrog with having two urls for the same file (size and hash different):
"completes": [
{
"fileUrl": "https://archive.mozilla.org/pub/firefox/nightly/2019/03/2019-03-20-11-29-39-mozilla-central-l10n/firefox-68.0a1.zh-TW.mac.complete.mar",
"filesize": 58334587,
"from": "*",
"hashValue": "7c46b0cce414b9ca0e757a71fc3a8d9e80ba1cdb33be6caa7fe321995fc2daac875bb8a4317c41915cdd5b19bd3dff74b945f35a6aa8b110856bbcf9414c2c90"
},
{
"fileUrl": "https://archive.mozilla.org/pub/firefox/nightly/2019/03/2019-03-20-11-29-39-mozilla-central-l10n/firefox-68.0a1.zh-TW.mac.complete.mar",
"filesize": 58328179,
"from": "*",
"hashValue": "de545dc946ea19c84fedbbfec6534f01312ed28cf7559637938440438987bb7ad778a72283359ef0db85377eb7c2da9306948308b921e2b12c62f3773dc5d0cb"
}
I suppose this confused Balrog and we've seen nightlies fail to update with:
AUS:SVC Downloader:_selectPatch - found existing patch with state: null
AUS:SVC Downloader:downloadUpdate - url: https://archive.mozilla.org/pub/firefox/nightly/2019/03/2019-03-20-11-29-39-mozilla-central-l10n/firefox-68.0a1.zh-TW.mac.complete.mar, path: /Users/shawn/Library/Caches/Mozilla/updates/Applications/Firefox Nightly/updates/0/update.mar, interval: 0
AUS:SVC Downloader:onStartRequest - original URI spec: https://archive.mozilla.org/pub/firefox/nightly/2019/03/2019-03-20-11-29-39-mozilla-central-l10n/firefox-68.0a1.zh-TW.mac.complete.mar, final URI spec: https://archive.mozilla.org/pub/firefox/nightly/2019/03/2019-03-20-11-29-39-mozilla-central-l10n/firefox-68.0a1.zh-TW.mac.complete.mar
AUS:SVC Downloader:onProgress - progress: 30864/58328179
AUS:SVC Downloader:onProgress - maxProgress: 58328179 is not equal to expected patch size: 58334587
AUS:SVC Downloader: cancel
AUS:SVC Downloader:onProgress - progress: 0/58328179
AUS:SVC Downloader:onProgress - maxProgress: 58328179 is not equal to expected patch size: 58334587
AUS:SVC Downloader: cancel
AUS:SVC Downloader:onStopRequest - original URI spec: https://archive.mozilla.org/pub/firefox/nightly/2019/03/2019-03-20-11-29-39-mozilla-central-l10n/firefox-68.0a1.zh-TW.mac.complete.mar, final URI spec: https://archive.mozilla.org/pub/firefox/nightly/2019/03/2019-03-20-11-29-39-mozilla-central-l10n/firefox-68.0a1.zh-TW.mac.complete.mar, status: 2147549183
AUS:SVC Downloader:onStopRequest - status: 2147549183, current fail: 0, max fail: 10, retryTimeout: 2000
AUS:SVC Downloader:onStopRequest - non-verification failure
AUS:SVC getStatusTextFromCode - transfer error: 失敗 (不明原因), default code: 2152398849
AUS:SVC Downloader:onStopRequest - setting state to: download-failed
AUS:SVC Downloader:onStopRequest - notifying observers of error. topic: update-error, status: download-attempts-exceeded, downloadAttempts: 23 maxAttempts: 2
AUS:SVC UpdateManager:_writeUpdatesToXMLFile - no updates to write. removing file: /Users/shawn/Library/Caches/Mozilla/updates/Applications/Firefox Nightly/active-update.xml
UTM:SVC TimerManager:registerTimer - id: telemetry_modules_ping
Reporter | ||
Comment 1•6 years ago
|
||
Temp solution to unblock this: we've frozen nightlies to previsouly known good buildid - 20190319215514
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Comment 2•6 years ago
|
||
:RyanVM raised this during channel meeting today and had indicated we haven't had this problem in the BB days. We should amend our logic to prevent scheduling the same nightly graph if an already existing one is there based on that particular revision.
Comment 3•6 years ago
|
||
(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #2)
:RyanVM raised this during channel meeting today and had indicated we haven't had this problem in the BB days. We should amend our logic to prevent scheduling the same nightly graph if an already existing one is there based on that particular revision.
That might be tricky while nightlies are a hook creating a new graph, but might be doable.
I think it'll become easier when we're triggering promotion on shippable builds.
Comment 4•6 years ago
|
||
callek, should we block this on Nightly Promotion project? We plan to do this in Q2 after shippable is done. Is there a placeholder bug for that yet?
Comment 5•6 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #3)
(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #2)
:RyanVM raised this during channel meeting today and had indicated we haven't had this problem in the BB days. We should amend our logic to prevent scheduling the same nightly graph if an already existing one is there based on that particular revision.
That might be tricky while nightlies are a hook creating a new graph, but might be doable.
I think it'll become easier when we're triggering promotion on shippable builds.
Could we have the hook logic check something in the index to determine if we've already done nightlies for a given revision?
Comment 6•6 years ago
|
||
This was discussed briefly at channel meeting - We can have something in index check if nightlies were already run, but it would be cludgy and harder to validate (e.g. do we want to prevent triggering windows again on a given rev if all linux finished?)
That said, there will be work in Q2 that should solve that aspect as well, which is Nightly Promotion. I say we wait for that unless this becomes more of a prevalent problem.
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 8•5 years ago
|
||
This happened again.
Reporter | ||
Comment 9•5 years ago
|
||
(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #8)
This happened again.
Turns out we had two nightlies triggered last night, based on the same revision.
It's https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&searchStr=nightly&revision=b283a7ef186c216d765631f6cb1260a3fa2ee42c
At 2019-08-15T19:45:20.912Z, the first decision task was triggered - https://tools.taskcluster.net/groups/c9bVskeaT5Os0Uy7IJ7ZxA/tasks/c9bVskeaT5Os0Uy7IJ7ZxA/details
At 2019-08-15T22:00:26.923Z, the second decision task was triggered - https://tools.taskcluster.net/groups/dlzpxOqnTxGRm5l0aBLkxw/tasks/dlzpxOqnTxGRm5l0aBLkxw/details
Since the revision 283a7ef186cCNMerge was a "inbound to mozilla-central. a=merge", I'm tempted to believe that someone manually triggered the nightlies. Then, 2h15min later, the cron job automatically triggered and since there was no other revision pushed in between, it triggered it again, based on the same revision.
Comment 10•5 years ago
|
||
Your theory is correct. Sheriffs had to trigger Nightlies earlier (to ship the backout).
Reporter | ||
Comment 11•5 years ago
|
||
Duplicating this against bug 1579415 where we'll track the first solution to prevent triggering it if another graph already exists.
Updated•5 years ago
|
Description
•