Closed Bug 805016 Opened 12 years ago Closed 12 years ago

integrating Buildbot/Foopies with BMM/MozPool/Lifeguard for Android

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Unassigned)

Details

From an email sent a few days ago. This is where we're farthest behind in terms of panda-based Android and B2G support, so we should figure this out ASAP and get moving on the necessary coding and configuration. I'm not sure who in releng is the point person for this. Whoever is, please take the bug? ---- We're still working on getting production hardware in place for the pandas, but this is a good time to work on connecting the systems together. As I understand it, we'll need to put Android into prod immediately when the hardware is available; B2G is not far behind, and foopyless configurations are important to consider but out of scope for implementation at the moment. Let me know if that prioritization is incorrect. BMM/MozPool/Lifeguard are still sorting out what does what, but essentially we'll have an HTTP endpoint to request that a board be power-cycled, and a similar endpoint to request re-imaging, where the request specifies the desired image. For Android, AIUI power-cycling is be part of the production process, clearing the device between runs, while re-imaging is used for automatic failure remediation. For B2G, reimaging would occur on just about every boot. There are provisions in place to pass a JSON "config blob" with the reimage request, which would indicate precisely which B2G image should be downloaded and installed. We don't yet have the live-image scripts required to install B2G. As Mark and I work out how BMM, MozPool, and Lifeguard work together, it'd be helpful to have releng's integration vision. So, Callek, Aki, and/or Kim (or who?), in broad strokes, how do you see this process working? Random points to jog your thoughts: * We could add BMM servers for tegras, too, to make a single reboot API that would work for both (automatically reimaging tegras is not currently possible) * Will foopies "check out" a particular board? What happens if that board is not functional? When does the board get rebooted? In a B2G context, when does it get reimaged? What happens if the reboot or reimage fails? * BMM servers are co-located with the hardware they manage; there's no central server. If you hit one BMM server with a request for a board it doesn't manage, it will redirect (302) to the correct BMM server. BMM servers for each board are also listed in inventory. So, if foopies hit BMM servers directly, it will need to be a bit more complex than a single curl or python-requests call, but not too bad. So, let me know what you're thinking and planning, and we'll bring this together. I don't think we face any particularly challenging coding issues, once we agree on a design. And we have a surfeit of coders, so we should be in good shape.
(the initial email was only to a few people, and probably I guessed the wrong people - don't worry if you're missing it, as it's copied in full above)
Summary: integrating Buildbot/Foopies with BMM/MozPool/Lifeguard → integrating Buildbot/Foopies with BMM/MozPool/Lifeguard for Android
I think that after running verify.py by clientproxy.py we can determine if a board needs to be re-imaged and talk with bmm. > * We could add BMM servers for tegras, too, to make a single reboot API that would > work for both (automatically reimaging tegras is not currently possible) > I would suggest not to get entangled with tegras until we iron everything out with pandas Android/b2g but it seems like a great idea to keep the pool at maximum. > * Will foopies "check out" a particular board? What happens if that board is not > functional? When does the board get rebooted? In a B2G context, when does it get > reimaged? What happens if the reboot or reimage fails? > clientproxy.py does not currently checkout boards. Can we assume a board not functional after re-imaging once and then not being able to pass verify.py? I will defer the other questions to kmoir and Callek. > * BMM servers are co-located with the hardware they manage; there's no central > server. If you hit one BMM server with a request for a board it doesn't manage, > it will redirect (302) to the correct BMM server. BMM servers for each board are > also listed in inventory. So, if foopies hit BMM servers directly, it will need > to be a bit more complex than a single curl or python-requests call, but not too > bad. > This sounds great! I don't see a question on this last point but more of a FYI.
We may have a fairly large disconnect on these projects, based on prior discussions with folks not on the bug. I'm going to take that discussion to email to sort out, and will post the result here. fwiw, the basic understanding from the releng point of view is: - pandas-for-android need none of this. ateam may use some of the back end to implement bug 797868 (s/a bug 797868) - pandas-for-b2g will need buildbot control of reimaging (bmm?). The API releng would use has yet to be designed - discussions are happening in email with ateam about that API. ATM, releng isn't aware of any direct interaction between our systems and either the lifeguard or mozpool efforts. If there's a disconnect, we'll find it in the email, and that will be a good thing!
Please be sure to include dustin and dividehex on this mail thread.
https://etherpad.mozilla.org/panda-b2g-imaging From Hal's action items at the bottom: * dustin to coordinate with kmoir as existing panda chassis connected to BMM (primarily for dcops usage atm). * relops/ateam to focus on android issues first (as we're expecting some android image changes over the next bit of time) * b2g images coming w/in a week (missed date from Clint)
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.