Closed Bug 1361058 Opened 8 years ago Closed 7 years ago

Generate Windows Error Reporting Metadata for all official builds

Categories

(Release Engineering :: Release Automation: Other, enhancement, P1)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: tjr, Assigned: tjr)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

While we catch most crashes using breakpad, some crashes still happen that bypass it. These _may_ be reported to Microsoft using the built-in Windows Error Reporting tool. We can access these crashes, but to do so we have to process each relevant file (e.g. xul.dll) and upload the metadata to Microsoft. If we don't upload the metadata to MSFT, we won't be ab;e to access crashes for that software that are submitted to Microsoft. This bug tracks generating the metadata automatically for all official builds (Nightly, Beta, Release x86 and x64). ---- To do this, we have to install Microsoft's "Product Mapping Tool" (available behind the login at the SysDev portal. This tool is old. It provides two components: a GUI application and a PowerShell library. To install that software, we need a Windows Live Essentials installation that provides the sign-in library needed by the primary application. This is at https://www.microsoft.com/en-us/download/details.aspx?id=26686 This is out of support. Preliminary testing indicates it will not install on Windows 10. Once the Product Mapping Tool is installed, we can run a powershell script that will scan the generated files and produce a metadata file. That metadata file will be uploaded to Microsoft. The uploading process of the metadata file is outside the scope of this particular bug.
I spoke with Amy Rich. She indicated we have a few options: 1) The ops team can install the tools on the buildbot and taskcluster build machines 2) I can check the tools into tree The installers are (probably) GUI-only, which means that #1 will require baking into the base AMI which is really unpleasant (but not impossible.) While the powershell script is appropriate for in-tree, the software is not. Right now we really only have builds and tests. Another option is to add a TaskCluster step that occurs after builds and is separate from the build. This seems preferable: we don't have to muck with the security-sensitive build machines for one. But it's not clear if final signing occurs in TaskCluster... this needs to be explored at length. Another concern is repacks and locales and making sure we would get crashes for them. Finally, we want to be sure we only generate the metdata for official builds (not try). It seems possible we perform _some_ signing on try builds, but this should not be done with a publicly trusted key so that should give us something to cue off of...
Ok, myself and tjr discussed this at length in IRC. A few points we discussed: * Buildbot builds output signed binaries * Taskcluster builds do not (signing happens as seperate tasks) * Its unclear if the WER tool even works on unsigned binaries. * Its unclear if the WER tool would be invalidated if run on a binary with a different signature that it was first run on. -- For better example, PDB files stay valid before/after signing, its unclear if the validity of these files is keyed off their SHA or some other metric that can change during signing. * We don't currently guarantee that internal binaries stay the same between en-US signed and {locale} signed. * This work blocks "Control Flow Guard" (CFG) Security checks from riding the trains, as it means that crashes happening as part of that are not visible to our teams. * The work involved to enable this will (likely) be different between buildbot and taskcluster. * Taskcluster L10n currently takes en-US unsigned output as input to its repacks. * A small gap in coverage of this feature [WER reporting of crashes] (as in, during buildbot->taskcluster flag day+a week or two) may be acceptable, but is unclear that it would be. Plan of record for taskcluster (as of this writing): * Ready for windows users to use taskcluster builds late Q3/early Q4 * Taskcluster flow looks like: -> en-US build, produces unsigned binaries (.zip/etc) -> signingworker (special access to signing servers) takes the unsigned en-US build, and signs all the internals (firefox.exe, updater.exe, xul.dll etc) -> 'repackage' task (takes all the signed artifacts from last step and produces a new .zip/installer.exe/complete_update.mar) using the signed inner binaries -> signingworker takes those binaries and signs them as they are (so the installer itself is signed, and the update is allowed to apply to existing users). Plan now that we've taken into account this bug and requirements: * Taskcluster L10n will use en-US's signed output if possible (at least for the signed innards) [leaves little extra work to do on new SHAs for all innards] * This bug will get implemented for Buildbot's output ~now, unrelated to the Scheduler for Windows Taskcluster. * We will plan to have the WER reporting support in Taskcluster by the time we switch users to Taskcluster Produced Builds [If this becomes a schedule issue, we can revisit] * tjr to figure out (with :catlee or others) who will do code and automation work to support these needs on Buildbot.
Small update: The Windows Live Essentials installer is a GUI installer, but has no options. It automatically installs when you run it (including when invoked by the command line). The install completed in 27 seconds on the slave loaner I have. Then it hangs waiting for you to press a 'Close' button, but that can be automated with taskkill probably. The Product Mapping tool can be installed from the command line using a command like: > msiexec.exe /package "C:\Users\cltbld\Downloads\MetadataExchange.msi" /QN /L*V "C:\Users\cltbld\AppData\Local\Temp\msilog.log"
Priority: -- → P1
Blocks: 1362494
Alright, I'm ready to request review of the design of how I'm integrating this into buildbot. I'm not done, but ifI need to refactor my approach I'd prefer to do it now than after I do a bunch more work. I'll replace test-create-metadata with create-metadata once I finish the script. It'll have to identify when we have official releases (of nightly/aurora/beta/release) vs normal try runs, which is TBD (any pointers helpful!) But what I have now is a new tooltool artifact consisting of the powershell libraries, a new build step that copies those artifacts, then calls out to the powershell script. Try Run : https://treeherder.mozilla.org/#/jobs?repo=try&revision=58e4dc283dcca09779d0cc9a1ec3a53b0be7b5a9 You can look in the raw log for 'generate_metadata' and in https://tools.taskcluster.net/task-inspector/#BBMxBa-7QaumO1JiUiSRbw/0 -> Artifacts to see that the metadata.emx file gets uploaded at the end.
Flags: needinfo?(catlee)
Comment on attachment 8870125 [details] Bug 1361058 Prepare a metadata.emx file for uploading to Microsft for Windows Error Reporting as part of the Windows build process https://reviewboard.mozilla.org/r/141586/#review146446 Do the .emx files end up getting uploaded only to Taskcluster, or also to archive.m.o in 'make upload'? Do we need them on archive.m.o? ::: testing/mozharness/mozharness/mozilla/building/buildbase.py:1446 (Diff revision 1) > '.rpm', > '.tar.bz2', > '.tar.gz', > '.zip', > '.json', > + '.emx', I don't think we need this here. This is used to help identify which artifacts are the firefox package vs. other files. I think the .emx files will still be uploaded due to your other changes below. ::: testing/mozharness/mozharness/mozilla/building/buildbase.py:1949 (Diff revision 1) > + def generate_metadata(self): > + dirs = self.query_abs_dirs() > + # The below must match the name given to the file that is pulled from tooltool > + powershell_directory = "wer-powershell" > + powershell_directory = os.path.join(dirs['abs_src_dir'], powershell_directory) > + script_direcotry = os.path.join(dirs['abs_src_dir'], "build", "win32", "wer-powershell") small typo here ::: testing/mozharness/mozharness/mozilla/building/buildbase.py:1968 (Diff revision 1) > + cmds = [] > + cmds.append("C:\\Windows\\SysWOW64\\WindowsPowerShell\\v1.0\\powershell.exe") > + cmds.append("Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope CurrentUser -Force;") > + cmds.append(". \"" + powershell_filename + "\";") > + cmds.append("&createMetadata '" + os.path.join(dirs['abs_src_dir'], powershell_directory) + "' '" + dirs['base_work_dir'] + "'") > + is this one big command, or a series of commands?
Attachment #8870125 - Flags: review-
Comment on attachment 8870125 [details] Bug 1361058 Prepare a metadata.emx file for uploading to Microsft for Windows Error Reporting as part of the Windows build process https://reviewboard.mozilla.org/r/141586/#review146454
Comment on attachment 8870125 [details] Bug 1361058 Prepare a metadata.emx file for uploading to Microsft for Windows Error Reporting as part of the Windows build process https://reviewboard.mozilla.org/r/141586/#review146446 There is (probably) no need for them to be uploaded to archive.m.o. Before this is completed I'm going to need to devise a method for me to be alerted to their generation and get a copy of it. > is this one big command, or a series of commands? Kinda both. It calls powershell with a three-component command. The first relaxes the script loading requirements for the machine. This could be broken out into a seperate call I suppose. The second loads a powershell script and the third calls a function in that script - those two have to be combined so the function can be found in the interpreter.
This patch will cause this metadata to be generated for all builds done, is that your intent? Or does this only need to happen for nightly / release builds?
Flags: needinfo?(catlee)
(In reply to Chris AtLee [:catlee] from comment #9) > This patch will cause this metadata to be generated for all builds done, is > that your intent? No > Or does this only need to happen for nightly / release > builds? Once I figure out how to detect the official nightly/aurora/beta/release builds I will make it apply only to those.
I spoke with rail a bunch today about how to move this forward. He noted that we build (and sign) many more 'releases' than we actually ship to users. (But that we don't sign anything with a production signing key unless it comes from the corresponding repository such as -beta or soforth.) Because of the (currently) manual upload process and the antiquity of the upload tool, I am nervous about trying to submit metadata for versions we build, sign, but don't actually ship to users. The most complete way to solve this problem would be to replicate the work done in Bug 1342974 or postrelease_mark_as_shipped.py and run it as a pure post-release job. In the interim though, it should be sufficient to generate the metadata as in the current patch (adding a generate_metadata step to buildbase.py) and use a pulse event to select the specific version to submit to Microsoft. This makes it far easier for me to develop, and also allows us to run and test the script continually and not just on a postrelease basis. I'm going to use a similar method to https://dxr.mozilla.org/mozilla-central/source/browser/confvars.sh#16 to detect what builds I want to generate the metadata for.
Blocks: winqual
tjr, so to clarify [for my taskcluster information needs]: * WER needs to be generated on Signed things * WER is only needed for the internals of the package (the XUL.dll and such) not the installer (firefox-setup.exe whatever) * We only want to generate WER for: * Nightlies * Beta * Release... * Basically anything we reasonably expect to be in our users hands... * You'll eventually be happy if this is run in releases as a "post-release" step and not before (as in, we don't want to run it for jobs that are merely potentially going out to users, but not yet confirmed to ship) Can you indicate yes/no on those. I'll likely want some help in getting this going for Taskcluster nightlies, since we'd need seperate code (that operates *outside* of the main build) but I'm not there yet.
(In reply to Justin Wood (:Callek) from comment #16) > tjr, so to clarify [for my taskcluster information needs]: > > * WER needs to be generated on Signed things Yes > * WER is only needed for the internals of the package (the XUL.dll and such) > not the installer (firefox-setup.exe whatever) Yes. I suppose in theory if the installer crashed having WER reports of it would be useful, but this is not in scope right now. > * We only want to generate WER for: > * Nightlies > * Beta > * Release... > * Basically anything we reasonably expect to be in our users hands... Yes > * You'll eventually be happy if this is run in releases as a "post-release" > step and not before (as in, we don't want to run it for jobs that are merely > potentially going out to users, but not yet confirmed to ship) Yes. The thing I am building is going to have sort of semi-manual or manual process to separate the maybe's from the definitely's. Moving it to post-release is the better architecture.
At this point I'm ready to request review of the patch. There are a couple hang-ups before it can be merged, but I don't want to wait endlessly until requesting feedback. The Microsoft library is failing to run on 64 bit hosts. I have emailed them and hope this can be fixed. Because of that, I cannot get it to produce a metadata.emx file to completion to test the upload. Finally, the script only generates the metadata file on Authenticode-signed binaries signed with a trusted cert. This should only happen on official channel builds, and thus that part hasn't been tested yet. So any significant changes will get a re-review, but I'd like to get some eyes on it now as well.
An update on this. We can't run the powershell metadata tool on 64 bit hosts, so we can't do it as part of our build system. I've emailed Microsoft about this and they said, summarized: That tool is really old and it relies on components that are EOL. Starting in Fall 2017 the reliability reporting functionality will move to DevCenter. DevCenter will use a different approach to claiming ownership of files and more information on that will be coming soon. I will not recommend pursuing Sysdev partner portal path because Microsoft CSS doesn’t support this. My recommendation is for you to wait for the new DevCenter which is under active development and is fully supported by CSS. So... I guess we wait.
Closing this out. The new Windows Dashboard we have access to populates data automatically for anything signed with our Authenticode signing key. So we don't need to do anything to generate metadata. Talk to sledru for access to the dashboard!
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(sledru)
Resolution: --- → FIXED
Huge thanks for all this work!
Flags: needinfo?(sledru)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: