Closed Bug 741478 Opened 13 years ago Closed 13 years ago

Move updateSUT.py invocation from clientproxy.cp to verify.py

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Callek, Assigned: Callek)

References

Details

Attachments

(3 files, 5 obsolete files)

Moving updateSUT out of clientproxy and buildbot has great benefits, mostly consolidating the verification steps into one place. The other benefit is not trying to update SUT until we verify that we can both connect and ping the tegra, and being sure a failure to update SUT doesn't |sys.exit()| out of clientproxy.
Depends on: 690311
Attached patch [tools] v0.5 (obsolete) (deleted) — Splinter Review
This should do it (for tools part) but is untested so far, so could have obvious mistakes. I'll be testing over the next few days and weekend on foopy5.
Assignee: nobody → bugspam.Callek
Status: NEW → ASSIGNED
Attachment #612346 - Flags: feedback?(armenzg)
Comment on attachment 612346 [details] [diff] [review] [tools] v0.5 Review of attachment 612346 [details] [diff] [review]: ----------------------------------------------------------------- Something around these lines is good. I feel that we should deprecate updateSUT.py In fact, we should use verify.py on the slaves. For now, you could even get rid of updateSUT.py without affecting the jobs since we grab a specific revision of it: http://hg.mozilla.org/build/tools/raw-file/eba35f2aeabd/sut_tools/updateSUT.py Thanks! ::: sut_tools/verify.py @@ +65,4 @@ > sleepDuration = 0 > while curRetry < MAX_RETRIES: > try: > + dm = updateSUT.connect(tegra, sleepDuration) I am not exactly liking that we are using "connect" from updateSUT.py. It feels that such function should live in sut_lib.py and both updateSUT.py and verify.py should grab it from there. what do you think? @@ +87,4 @@ > if not dmAlive(dm): > return False > > + ver = updateSUT.version(dm) The same with the function version. It feels odd. Perhaps a lot of the functions of updateVerify.py should live in sut_lib.py or in verify.py. I can change buildbotcustom to deprecate the usage of updateSUT.py and instead call verify.py. What do you think? @@ +87,5 @@ > if not dmAlive(dm): > return False > > + ver = updateSUT.version(dm) > + if not updateSUT.isVersionCorrect(ver=ver): Again, we call updateSUT.py. It feels that it should live within verify.py or sut_lib.py.
Attachment #612346 - Flags: feedback?(armenzg) → feedback+
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #2) > Comment on attachment 612346 [details] [diff] [review] > [tools] v0.5 > > Review of attachment 612346 [details] [diff] [review]: > ----------------------------------------------------------------- > > Something around these lines is good. > I feel that we should deprecate updateSUT.py > > In fact, we should use verify.py on the slaves. > > For now, you could even get rid of updateSUT.py without affecting the jobs > since we grab a specific revision of it: > http://hg.mozilla.org/build/tools/raw-file/eba35f2aeabd/sut_tools/updateSUT. > py > > Thanks! Overall I agree with you, my primary goal here is/was to stop doing updateSUT from cp itself (which causes us to exit(1) on failure) and instead do it in verify.py (so that we can properly test pingable/etc first). And rip it out of buildbot. A further goal I was hoping to defer to other followup bugs [if you don't object] is to "run verify.py in a sorta noop mode in buildbot" basically allowing the job to hit RETRY if we can't verify instead of causing us to try doing these correcting-steps in buildbot, when letting another "already ok" tegra take the job is better. Also future, is to reorganize how we have verify.py/updateSUT.py/cleanup.py/etc. organized in sut_tools so that we actually do stick the util functions like |connect| etc in sut_lib rather than as a part of updateSUT.py etc. What do you think, or do we want to extend our scope here to include that? [NOTE: this (run updateSUT as part of cp) can (and has already at least a few times) cause cp itself to exit when it really does not need to -- now that we do reboot from outside of buildbot in the verify command]
Fair point. Let's focus on fixing that scenario for CP. We can defer all that work for later on. Send the review my way once you are ready. Thanks Justin!
Attached patch [tools] v1 (obsolete) (deleted) — Splinter Review
So This should do the tools part. Modulo those cleanups that are far easier once this is done... That said, I the buildbot side I don't think wins us much atm, though I can certainly do it, it does have a downside until we get the "stop buildbot at the end of every run, and then do verify again" accomplished... We would then require a manual foopy stop_cp/start_cp combo when we need to update SUTAgent because we would not be checking/doing it on every talos run like currently. All in all I actually think its a good tradeoff short-term, but given that its not a patch for largest benefit, (even though its an easy patch to create) I'll await your assurance to do [or not do] right now.
Attachment #612346 - Attachment is obsolete: true
Attachment #613874 - Flags: review?(armenzg)
Comment on attachment 613874 [details] [diff] [review] [tools] v1 Review of attachment 613874 [details] [diff] [review]: ----------------------------------------------------------------- If you could move those 2 functions to sut_lib that would be great. If this would really really slow you down too much I could compromise and say to do it in a follow up bug. Sorry for taking this long. ::: sut_tools/verify.py @@ +65,4 @@ > sleepDuration = 0 > while curRetry < MAX_RETRIES: > try: > + dm = updateSUT.connect(tegra, sleepDuration) I can't recall what we were heading towards but if possible I would like the connect() function to be moved from updateSUT.py to sut_lib.py. I know that is a pain but I don't want to carry on doing it this way. Can you please do this for me? :) @@ +87,5 @@ > if not dmAlive(dm): > return False > > + ver = updateSUT.version(dm) > + if not updateSUT.isVersionCorrect(ver=ver): Can this be "updateSUT.isVersionCorrect(dm=dm):" and get rid of the previous line?
Attachment #613874 - Flags: review?(armenzg) → review-
Attached patch [tools] v2 (obsolete) (deleted) — Splinter Review
Armen, as discussed on IRC this does not do all the refactoring you suggested -- in anticipation of doing it in another patch. Bear, r? you since you were asking over this patch in IRC today. In addition to suggested changes from v1, this fixes some except blocks that were broken and causing unhelpful log messages for bear.
Attachment #613874 - Attachment is obsolete: true
Attachment #614907 - Flags: review?(bear)
Attachment #614907 - Flags: review?(armenzg)
Comment on attachment 614907 [details] [diff] [review] [tools] v2 some nits: - for the connect() parameter list can we use sleep=False for the default since it's a boolean by behaviour
Attachment #614907 - Flags: review?(bear) → review+
Attached patch [tools] v2.1 (deleted) — Splinter Review
changed sleep default to False as suggested. Going to consider this bug done after this lands, since we'll rip it out of buildbot once we start doing some parts of verify in buildbot rather than now. Since at the moment we also do not dive back into verify on every job start.
Attachment #614907 - Attachment is obsolete: true
Attachment #614907 - Flags: review?(armenzg)
Attachment #614915 - Flags: review+
Summary: Move updateSUT.py invocation from both clientproxy.cp and buildbot to verify.py → Move updateSUT.py invocation from clientproxy.cp to verify.py
http://hg.mozilla.org/build/tools/rev/d417a465f9f9 (accidentally left my @gmail as author instead of my @m.c which breaks my self-imposed work-freetime seperation for commits)
As discussed on IRC we were failing the download of the apk due to path issues, we actually have cwd of / here, so we need to full-path the file. All current callers of this have os.environ SUT_NAME set, and an exception at this level *is* good if someone violates that assumption anyway.
Attachment #618015 - Flags: review?(bear)
Attachment #618015 - Flags: review?(bear) → review+
Attached patch [tools] Fix part 3 (obsolete) (deleted) — Splinter Review
Sadly the apkFilename is used directly as the "to-file" on the tegra, and I misremembered how that worked (forgot we pass `data` not a local filename to that). This should fix this up
Attachment #618067 - Flags: review?(bear)
Comment on attachment 618067 [details] [diff] [review] [tools] Fix part 3 are you testing this in staging now? and can you test by forcing the sutagent to be wrong, old, missing?
Attachment #618067 - Flags: review?(bear) → review-
Attached patch [tools] Fix part 3 (obsolete) (deleted) — Splinter Review
This is tested. And fixes some slight other issues. There is a slight outstanding issue with this code that I should call out, even though it existed before... the |while tries ..| we have for the *re*connect to the tegra after we install a newer SUTAgent doesn't [usually] retry there, we return a valid deviceManager (just a non-connected-one) in the following circumstances: * tegra is not up * tegra is up but not accepting socket connections devicemanager itself retries to connect 5 times though, and would throw if it got an #AGENT-ERROR# but I think since this part existed before now its easier not to worry about it, and since this all gets called before buildbot is run, it won't kill off jobs.
Attachment #618067 - Attachment is obsolete: true
Attachment #618465 - Flags: review?(bear)
Comment on attachment 618465 [details] [diff] [review] [tools] Fix part 3 Review of attachment 618465 [details] [diff] [review]: ----------------------------------------------------------------- This is not the most up to date version of the patch (I just found out from armen, so thanks armen!) I'll attach the correct one early tomorrow.
Attachment #618465 - Flags: review?(bear) → review-
Attachment #614915 - Flags: checked-in+
Attachment #618015 - Flags: checked-in+
Attached patch [tools] Fix part 3 (v2) (deleted) — Splinter Review
This is part 3 only with an added fix that I have had locally for a while
Attachment #618465 - Attachment is obsolete: true
Attachment #620386 - Flags: review?(bear)
Attachment #620386 - Flags: review?(bear) → review+
This last patch deployed Wednesday.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: