Closed Bug 1527970 Opened 6 years ago Closed 6 years ago

remove broken tooltool precache from windows worker types

Categories

(Infrastructure & Operations :: RelOps: OpenCloudConfig, task)

Desktop
Windows
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: grenade, Assigned: grenade)

References

Details

whilst testing some changes for bug 1524592 in taskcluster windows build worker types, i accidentally discovered that the sequence of paths within the system PATH environment variable, may cause the firefox build to break. whilst this is an understandable consequence of system config, the specific ordering required for a successful build, is not likely to be the default configuration of a normal windows system and as such the build breaks in a way that is not easy to debug or diagnose.

specifically, in order for a build to succeed, the paths to components included in mozilla-build must precede paths already included in a default windows install.

if the system PATH environment variable looks like this, the build will succeed (succeeding build):

C:\Program Files\Mercurial;C:\mozilla-build\bin;C:\mozilla-build\kdiff3;C:\mozilla-build\moztools-x64\bin;C:\mozilla-build\mozmake;C:\mozilla-build\msys\bin;C:\mozilla-build\msys\local\bin;C:\mozilla-build\nsis-3.01;C:\mozilla-build\python;C:\mozilla-build\python\Scripts;C:\mozilla-build\python3;
C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;
C:\Program Files\Amazon\cfn-bootstrap\;C:\Program Files (x86)\GNU\GnuPG\pub;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\;
c:\mozilla-build\python\lib\site-packages\pywin32_system32

if the system PATH environment variable looks like this, the build will fail (failing build):

C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;
C:\Program Files\Amazon\cfn-bootstrap\;C:\Program Files (x86)\GNU\GnuPG\pub;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\;
C:\Program Files\Mercurial;C:\mozilla-build\bin;C:\mozilla-build\kdiff3;C:\mozilla-build\moztools-x64\bin;C:\mozilla-build\mozmake;C:\mozilla-build\msys\bin;C:\mozilla-build\msys\local\bin;C:\mozilla-build\nsis-3.01;C:\mozilla-build\python;C:\mozilla-build\python\Scripts;C:\mozilla-build\python3;
c:\mozilla-build\python\lib\site-packages\pywin32_system32

the linked builds differ only in that they have a different sequence in their respective system PATH environment variables. the tests were separated by a deployment to infra that changed only the PATH environment variable sequence. note that i have split the examples above into four lines for ease of spotting the difference in sequencing (the actual PATH env var doesn't contain line breaks).

  • the line beginning with C:\Windows\system32 is part of the default Windows configuration
  • the line beginning with C:\Program Files\Amazon\cfn-bootstrap\ is added by software installations that occurred after Windows setup
  • the line containing only c:\mozilla-build\python\lib\site-packages\pywin32_system32 is added by the build system
  • the line beginning with C:\Program Files\Mercurial is added by OCC after the installation of mozilla-build (and hg). This is the line that must precede the rest, in order for builds to be successful.

i haven't yet determined what specifically breaks the build. there must be an executable or assembly file in mozilla-build as well as on the system that is referenced by the build system without it's full path. or perhaps it's more obscure and there is a dll that something referenced by the build system depends on. in any case i'd like to find whatever it is and either fix the referencing (in the build system) so that it is explicitly referenced by its specific path, or at the very least document here which specific path from the mozilla-build install contains whatever is referenced by the build system in such a way that it must precede the system components of PATH.

It looks like the build itself it successful, it's some of the Python tests that run after the build that are failing. Specifically those logs show:

13:51:14  WARNING - ..\python\mozbuild\mozbuild\test\backend\test_build.py::TestBuild::test_faster_make TEST-UNEXPECTED-FAIL
13:51:14  WARNING - ..\python\mozbuild\mozbuild\test\backend\test_build.py::TestBuild::test_faster_recursive_make TEST-UNEXPECTED-FAIL
13:51:14  WARNING - ..\python\mozbuild\mozbuild\test\backend\test_build.py::TestBuild::test_recursive_make TEST-UNEXPECTED-FAIL

Interestingly, in the pytest output I noticed:

13:51:14     INFO - args = ['c:\Windows\system32\mozmake.EXE', '-C', 'z:/build/build/src/tmptsabls/faster', '-j16', 'BUILD_VERBOSE_LOG=1', '-w', ...]

Is there normally a mozmake.exe in system32?

(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #1)


Interestingly, in the pytest output I noticed:

13:51:14 INFO - args = ['c:\Windows\system32\mozmake.EXE', '-C', 'z:/build/build/src/tmptsabls/faster', '-j16', 'BUILD_VERBOSE_LOG=1', '-w', ...]


Is there normally a mozmake.exe in system32?

thanks Ted! nice catch. i think i know the culprit now and it's not what i thought and nothing to do with the path sequencing.

i did notice errors in some worker deployment logs regarding a failure to extract mozmake from it's tooltool artifact. i think the problem is caused by an unset TOOLTOOL_HOME variable during the worker tooltool precache (we've been meaning to get rid of the tooltool precache for a while, looks like a good time to do it). i think the precache routine is dumping mozmake in system32 because of the unset var, then the test is later finding it there.

i am grateful for your assistance.

Assignee: nobody → rthijssen
Status: NEW → ASSIGNED
Component: General → RelOps: OpenCloudConfig
Product: Firefox Build System → Infrastructure & Operations
QA Contact: rthijssen
Summary: Firefox build depends on mozilla-build paths preceding system paths in system PATH env var → remove broken tooltool precache from windows worke types
Summary: remove broken tooltool precache from windows worke types → remove broken tooltool precache from windows worker types

after removing precaching from the occ manifests and going back to appending (rather than prepending) the mozilla-build paths, the builds broke again.

since removing the precaching we don't see the system32 mozmake any more, but there is another TEST-UNEXPECTED-FAIL that i don't really understand.

That is not a very useful diff output, no:

12:37:24     INFO - F--- z:/build/build/src/config/tests/src-simple/../test.manifest.jar	2019-02-15 11:48:57 +0000
12:37:24     INFO - +++ -	2019-02-15 12:37:24 +0000
12:37:24     INFO - @@ -1,4 +1,4 @@
12:37:24     INFO - -content test jar:test.jar!/one
12:37:24     INFO - -locale ab-X-stuff jar:test.jar!/three
12:37:24     INFO - -overlay chrome://one/file.xml chrome://two/otherfile.xml
12:37:24     INFO - -skin test classic jar:test.jar!/one
12:37:24     INFO - +content test jar:test.jar!/one
12:37:24     INFO - +locale ab-X-stuff jar:test.jar!/three
12:37:24     INFO - +overlay chrome://one/file.xml chrome://two/otherfile.xml
12:37:24     INFO - +skin test classic jar:test.jar!/one

my best guess would be line endings?

Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.