Rerunning a balrog submit results in duplicate partials information
Categories
(Release Engineering Graveyard :: Applications: Balrog (backend), enhancement)
Tracking
(Not tracked)
People
(Reporter: nthomas, Unassigned)
References
Details
Attachments
(1 file)
(deleted),
text/x-github-pull-request
|
Details |
Comment 1•6 years ago
|
||
It feels like maybe the right thing to do here is to return a 409 Conflict whenever we receive an update that would add multiple partials with the same "from" entry.
This would cause automation to fail the first time, and require someone to remove the old entry before re-running. Would this be a good solution and decrease confusion?
Reporter | ||
Comment 2•6 years ago
|
||
From a practical point of view, if all the other balrog submission is complete then it's relatively straightforward to manually download the json, remove the old partial info, and resubmit it (manual work, we could make it a little safer with scripting). If other platforms were still running it would be necessary to wait until they were done, to avoid reverting all the automation submissions between the manual download and resubmit.
Given we submit all the partials plus the complete in a single job, what are the downsides to replacing a locale entry instead ?
Comment 3•6 years ago
|
||
(In reply to Nick Thomas [:nthomas] (UTC+13) from comment #2)
From a practical point of view, if all the other balrog submission is
complete then it's relatively straightforward to manually download the json,
remove the old partial info, and resubmit it (manual work, we could make it
a little safer with scripting). If other platforms were still running it
would be necessary to wait until they were done, to avoid reverting all the
automation submissions between the manual download and resubmit.Given we submit all the partials plus the complete in a single job, what are
the downsides to replacing a locale entry instead ?
I think I'm a bit confused about the exact circumstances that trigger this bug. It sounds like we have instances where we have multiple different submissions for the same partial/complete, but with different data? And we hit this when they run at the same time and run the update merge code? And all of this only happens when we're retriggering in circumstances like bug 1501113?
In any case, it's possible to handle this in merge_lists, although it would be a hack (I think we'd have to make some guesses on whether or not we're in a list of partials/completes based on the shape of the data). Depending on exactly how this is happening, we might be able to find a better fix.
Reporter | ||
Comment 4•6 years ago
|
||
(In reply to bhearsum@mozilla.com (:bhearsum) from comment #3)
I think I'm a bit confused about the exact circumstances that trigger this bug. It sounds like we have instances where we have multiple different submissions for the same partial/complete, but with different data? And we hit this when they run at the same time and run the update merge code? And all of this only happens when we're retriggering in circumstances like bug 1501113?
Comment #0 was a problem generating partials where we reran the generation and downstreams, meaning we had different mar files and wanted to replace data already in Balrog. Pretty uncommon thing to happen. Having looked through today's code I'm not sure how this happened, as addLocaleToRelease() appears to replace, and we really need that for Firefox-mozilla-central-nightly-latest.
Comment 5•6 years ago
|
||
(In reply to Nick Thomas [:nthomas] (UTC+13) from comment #4)
(In reply to bhearsum@mozilla.com (:bhearsum) from comment #3)
I think I'm a bit confused about the exact circumstances that trigger this bug. It sounds like we have instances where we have multiple different submissions for the same partial/complete, but with different data? And we hit this when they run at the same time and run the update merge code? And all of this only happens when we're retriggering in circumstances like bug 1501113?
Comment #0 was a problem generating partials where we reran the generation
and downstreams, meaning we had different mar files and wanted to replace
data already in Balrog. Pretty uncommon thing to happen. Having looked
through today's code I'm not sure how this happened, as
[addLocaleToRelease()](https://github.com/mozilla/balrog/blob/
695f19fa6b7167b2b27beb65d3bfae5a21ff32dc/auslib/db.py#L2083) appears to
replace, and we really need that for Firefox-mozilla-central-nightly-latest.
Hm, maybe we should add some logging code to make it possible to figure out what happened when it happens again. The scenario I described in comment #3 was the only one I could imagine hitting this, and it sounds like those weren't our circumstances when this was filed.
Comment 6•6 years ago
|
||
Comment 7•6 years ago
|
||
We've now got extra debugging statements for this in Balrog prod. The next time we hit it again I hope we'll be able to get enough information to make sense of it.
Comment 8•5 years ago
|
||
(In reply to bhearsum@mozilla.com (:bhearsum) from comment #7)
We've now got extra debugging statements for this in Balrog prod. The next
time we hit it again I hope we'll be able to get enough information to make
sense of it.
Looks like we hit this again with the 2019060222 nightlies, for at least some locales (eg: "tl"): https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=6d71d3ca012438d7eac6e8f9471e198a10eabc70
Comment 9•5 years ago
|
||
We hit this again last night in bug 1574404.
Comment 10•5 years ago
|
||
(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #9)
We hit this again last night in bug 1574404.
Well, not exactly rerunning but more like bug 1537710
Comment 11•5 years ago
|
||
Taking a step back at what else could cause this, in bug 1537710.
Updated•5 years ago
|
Comment 13•5 years ago
|
||
We're going to burn down Balrog submissions soon, see https://github.com/mozilla-releng/balrog/issues/1049. We'll lok into this more/ensure it's not an issue as part of that.
Updated•5 years ago
|
Description
•