remove broken tooltool precache from windows worker types
Categories
(Infrastructure & Operations :: RelOps: OpenCloudConfig, task)
Tracking
(Not tracked)
People
(Reporter: grenade, Assigned: grenade)
References
Details
whilst testing some changes for bug 1524592 in taskcluster windows build worker types, i accidentally discovered that the sequence of paths within the system PATH
environment variable, may cause the firefox build to break. whilst this is an understandable consequence of system config, the specific ordering required for a successful build, is not likely to be the default configuration of a normal windows system and as such the build breaks in a way that is not easy to debug or diagnose.
specifically, in order for a build to succeed, the paths to components included in mozilla-build must precede paths already included in a default windows install.
if the system PATH
environment variable looks like this, the build will succeed (succeeding build):
C:\Program Files\Mercurial;C:\mozilla-build\bin;C:\mozilla-build\kdiff3;C:\mozilla-build\moztools-x64\bin;C:\mozilla-build\mozmake;C:\mozilla-build\msys\bin;C:\mozilla-build\msys\local\bin;C:\mozilla-build\nsis-3.01;C:\mozilla-build\python;C:\mozilla-build\python\Scripts;C:\mozilla-build\python3;
C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;
C:\Program Files\Amazon\cfn-bootstrap\;C:\Program Files (x86)\GNU\GnuPG\pub;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\;
c:\mozilla-build\python\lib\site-packages\pywin32_system32
if the system PATH
environment variable looks like this, the build will fail (failing build):
C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;
C:\Program Files\Amazon\cfn-bootstrap\;C:\Program Files (x86)\GNU\GnuPG\pub;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\;
C:\Program Files\Mercurial;C:\mozilla-build\bin;C:\mozilla-build\kdiff3;C:\mozilla-build\moztools-x64\bin;C:\mozilla-build\mozmake;C:\mozilla-build\msys\bin;C:\mozilla-build\msys\local\bin;C:\mozilla-build\nsis-3.01;C:\mozilla-build\python;C:\mozilla-build\python\Scripts;C:\mozilla-build\python3;
c:\mozilla-build\python\lib\site-packages\pywin32_system32
the linked builds differ only in that they have a different sequence in their respective system PATH
environment variables. the tests were separated by a deployment to infra that changed only the PATH
environment variable sequence. note that i have split the examples above into four lines for ease of spotting the difference in sequencing (the actual PATH
env var doesn't contain line breaks).
- the line beginning with
C:\Windows\system32
is part of the default Windows configuration - the line beginning with
C:\Program Files\Amazon\cfn-bootstrap\
is added by software installations that occurred after Windows setup - the line containing only
c:\mozilla-build\python\lib\site-packages\pywin32_system32
is added by the build system - the line beginning with
C:\Program Files\Mercurial
is added by OCC after the installation of mozilla-build (and hg). This is the line that must precede the rest, in order for builds to be successful.
i haven't yet determined what specifically breaks the build. there must be an executable or assembly file in mozilla-build as well as on the system that is referenced by the build system without it's full path. or perhaps it's more obscure and there is a dll that something referenced by the build system depends on. in any case i'd like to find whatever it is and either fix the referencing (in the build system) so that it is explicitly referenced by its specific path, or at the very least document here which specific path from the mozilla-build install contains whatever is referenced by the build system in such a way that it must precede the system components of PATH
.
Comment 1•6 years ago
|
||
It looks like the build itself it successful, it's some of the Python tests that run after the build that are failing. Specifically those logs show:
13:51:14 WARNING - ..\python\mozbuild\mozbuild\test\backend\test_build.py::TestBuild::test_faster_make TEST-UNEXPECTED-FAIL
13:51:14 WARNING - ..\python\mozbuild\mozbuild\test\backend\test_build.py::TestBuild::test_faster_recursive_make TEST-UNEXPECTED-FAIL
13:51:14 WARNING - ..\python\mozbuild\mozbuild\test\backend\test_build.py::TestBuild::test_recursive_make TEST-UNEXPECTED-FAIL
Interestingly, in the pytest output I noticed:
13:51:14 INFO - args = ['c:\Windows\system32\mozmake.EXE', '-C', 'z:/build/build/src/tmptsabls/faster', '-j16', 'BUILD_VERBOSE_LOG=1', '-w', ...]
Is there normally a mozmake.exe in system32?
Assignee | ||
Comment 2•6 years ago
|
||
(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #1)
Interestingly, in the pytest output I noticed:
13:51:14 INFO - args = ['c:\Windows\system32\mozmake.EXE', '-C', 'z:/build/build/src/tmptsabls/faster', '-j16', 'BUILD_VERBOSE_LOG=1', '-w', ...]
Is there normally a mozmake.exe in system32?
thanks Ted! nice catch. i think i know the culprit now and it's not what i thought and nothing to do with the path sequencing.
i did notice errors in some worker deployment logs regarding a failure to extract mozmake from it's tooltool artifact. i think the problem is caused by an unset TOOLTOOL_HOME variable during the worker tooltool precache (we've been meaning to get rid of the tooltool precache for a while, looks like a good time to do it). i think the precache routine is dumping mozmake in system32 because of the unset var, then the test is later finding it there.
i am grateful for your assistance.
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 3•6 years ago
|
||
after removing precaching from the occ manifests and going back to appending (rather than prepending) the mozilla-build paths, the builds broke again.
since removing the precaching we don't see the system32 mozmake any more, but there is another TEST-UNEXPECTED-FAIL that i don't really understand.
Comment 4•6 years ago
|
||
That is not a very useful diff output, no:
12:37:24 INFO - F--- z:/build/build/src/config/tests/src-simple/../test.manifest.jar 2019-02-15 11:48:57 +0000
12:37:24 INFO - +++ - 2019-02-15 12:37:24 +0000
12:37:24 INFO - @@ -1,4 +1,4 @@
12:37:24 INFO - -content test jar:test.jar!/one
12:37:24 INFO - -locale ab-X-stuff jar:test.jar!/three
12:37:24 INFO - -overlay chrome://one/file.xml chrome://two/otherfile.xml
12:37:24 INFO - -skin test classic jar:test.jar!/one
12:37:24 INFO - +content test jar:test.jar!/one
12:37:24 INFO - +locale ab-X-stuff jar:test.jar!/three
12:37:24 INFO - +overlay chrome://one/file.xml chrome://two/otherfile.xml
12:37:24 INFO - +skin test classic jar:test.jar!/one
my best guess would be line endings?
Assignee | ||
Updated•6 years ago
|
Description
•