Closed Bug 1396845 Opened 7 years ago Closed 6 years ago

Intermittent sccache: encountered fatal error

Categories

(Firefox Build System :: General, defect, P1)

defect

Tracking

(firefox64 fixed)

RESOLVED FIXED
mozilla64
Tracking Status
firefox64 --- fixed

People

(Reporter: aryx, Assigned: ted)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell infra][stockwell fixed:product])

Attachments

(1 file)

https://treeherder.mozilla.org/logviewer.html#?job_id=128529143&repo=autoland [task 2017-09-05T14:06:13.005910Z] 14:06:13 INFO - gmake[5]: Entering directory '/builds/worker/workspace/build/src/obj-firefox/dom/workers' [task 2017-09-05T14:06:13.009152Z] 14:06:13 INFO - /builds/worker/workspace/build/src/sccache2/sccache /builds/worker/workspace/build/src/clang/bin/clang++ -B /builds/worker/workspace/build/src/gcc/bin -std=gnu++11 -o Unified_cpp_dom_workers2.o -c -I/builds/worker/workspace/build/src/obj-firefox/dist/stl_wrappers -I/builds/worker/workspace/build/src/obj-firefox/dist/system_wrappers -include /builds/worker/workspace/build/src/config/gcc_hidden.h -DDEBUG=1 -DOS_POSIX=1 -DOS_LINUX=1 -DSTATIC_EXPORTABLE_JS_API -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -I/builds/worker/workspace/build/src/dom/workers -I/builds/worker/workspace/build/src/obj-firefox/dom/workers -I/builds/worker/workspace/build/src/dom/base -I/builds/worker/workspace/build/src/dom/system -I/builds/worker/workspace/build/src/dom/base -I/builds/worker/workspace/build/src/dom/bindings -I/builds/worker/workspace/build/src/xpcom/build -I/builds/worker/workspace/build/src/xpcom/threads -I/builds/worker/workspace/build/src/obj-firefox/ipc/ipdl/_ipdlheaders -I/builds/worker/workspace/build/src/ipc/chromium/src -I/builds/worker/workspace/build/src/ipc/glue -I/builds/worker/workspace/build/src/obj-firefox/dist/include -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nspr -I/builds/worker/workspace/build/src/obj-firefox/dist/include/nss -fPIC -DMOZILLA_CLIENT -include /builds/worker/workspace/build/src/obj-firefox/mozilla-config.h -MD -MP -MF .deps/Unified_cpp_dom_workers2.o.pp -Qunused-arguments -Qunused-arguments -Wall -Wc++11-compat -Wempty-body -Wignored-qualifiers -Woverloaded-virtual -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wunreachable-code-return -Wwrite-strings -Wno-invalid-offsetof -Wclass-varargs -Wloop-analysis -Wc++11-compat-pedantic -Wc++14-compat -Wc++14-compat-pedantic -Wc++1z-compat -Wcomma -Wimplicit-fallthrough -Werror=non-literal-null-conversion -Wstring-conversion -Wno-inline-new-delete -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wformat -Wno-gnu-zero-variadic-macro-arguments -Wformat-security -Wno-unknown-warning-option -Wno-return-type-c-linkage -fsanitize=address -fno-exceptions -fno-strict-aliasing -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -D_GLIBCXX_USE_CXX11_ABI=0 -pipe -g -O1 -fno-omit-frame-pointer -Werror -Wno-error=shadow /builds/worker/workspace/build/src/obj-firefox/dom/workers/Unified_cpp_dom_workers2.cpp [task 2017-09-05T14:06:13.009239Z] 14:06:13 INFO - sccache: encountered fatal error [task 2017-09-05T14:06:13.009274Z] 14:06:13 INFO - sccache: error : Invalid checksum [task 2017-09-05T14:06:13.009303Z] 14:06:13 INFO - sccache: cause: Invalid checksum [task 2017-09-05T14:06:13.009339Z] 14:06:13 INFO - /builds/worker/workspace/build/src/config/rules.mk:1064: recipe for target 'Unified_cpp_dom_workers2.o' failed [task 2017-09-05T14:06:13.009365Z] 14:06:13 INFO - gmake[5]: *** [Unified_cpp_dom_workers2.o] Error 254 [task 2017-09-05T14:06:13.009394Z] 14:06:13 INFO - gmake[5]: Leaving directory '/builds/worker/workspace/build/src/obj-firefox/dom/workers' [task 2017-09-05T14:06:13.009425Z] 14:06:13 INFO - /builds/worker/workspace/build/src/config/recurse.mk:73: recipe for target 'dom/workers/target' failed [task 2017-09-05T14:06:13.009449Z] 14:06:13 INFO - gmake[4]: *** [dom/workers/target] Error 2 [task 2017-09-05T14:06:13.009473Z] 14:06:13 INFO - gmake[4]: *** Waiting for unfinished jobs....
Severity: normal → blocker
Component: Build Config → Buildduty
Flags: needinfo?(ted)
Product: Core → Release Engineering
QA Contact: catlee
There must be a bad entry in the cache that only these builds are hitting. sccache does have an env var that can be used to skip cache lookups, `SCCACHE_RECACHE=1`, but we don't have a good way to set that for a single build type. I guess the closest thing we could do would be to stick `mk_add_options export SCCACHE_RECACHE=1` in the mozconfig for this specific build type, then back it out once you get a green build?
Flags: needinfo?(ted)
And now, Linux32 opt.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #3) > There must be a bad entry in the cache that only these builds are hitting. > sccache does have an env var that can be used to skip cache lookups, > `SCCACHE_RECACHE=1`, but we don't have a good way to set that for a single > build type. I guess the closest thing we could do would be to stick > `mk_add_options export SCCACHE_RECACHE=1` in the mozconfig for this specific > build type, then back it out once you get a green build? I wonder if cache wouldn't invalidate eventually. Worth a try with :ted's solution too.
Priority: -- → P1
Given Buildduty manifesto[1], this seems like an error that doesn't fall on Buildduty's plate. Moving this over to General Automation for now, until we find a more suited owner. Please move back if I'm wrong. [1]: https://wiki.mozilla.org/ReleaseEngineering/Buildduty_manifesto#Things_Buildduty_isn.E2.80.99t_responsible_for
Component: Buildduty → General Automation
For future reference, here is something that I just did, and should work for future occurrences of the problem: - Select the busted job on treeherder - Click "...", then "Edit and Retrigger" - In the create task textbox, under env, add "SCCACHE_RECACHE: '1'" - Create the task. It will show up in treeherder. - Wait for it to finish. - Once it's green, other failing jobs of the same type can then be retriggered (so if a linux32-opt busted and you triggered a task with SCCACHE_RECACHE for linux32-opt, you can then retrigger the other busted linux32-opts, assuming they were getting the same broken object).
We have 47 failures on this bug in the last 7 days. They occur on Windows2012 x64, 32 and the affected build type is opt. Recent log: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=155713818&lineNumber=38760 and a relevant part of it: 20:48:29 INFO - Running `z:/build/build/src/sccache2/sccache.exe 'z:/build/build/src/rustc/bin/rustc.exe' --crate-name regex_syntax 'Z:\build\build\src\third_party\rust\regex-syntax\src\lib.rs' --crate-type lib --emit=dep-info,link -C opt-level=2 -C metadata=56cac34a1113c987 -C extra-filename=-56cac34a1113c987 --out-dir 'z:/build/build/src/obj-firefox/toolkit/library\i686-pc-windows-msvc\release\deps' --target i686-pc-windows-msvc -L 'dependency=z:/build/build/src/obj-firefox/toolkit/library\i686-pc-windows-msvc\release\deps' -L 'dependency=z:/build/build/src/obj-firefox/toolkit/library\release\deps' --cap-lints allow -C opt-level=2 -C debuginfo=2` 38759 20:48:29 INFO - sccache: encountered fatal error 38760 20:48:29 INFO - sccache: error : Failed to create temp dir 38761 20:48:29 INFO - sccache: cause: Failed to create temp dir 38762 20:48:29 INFO - error: Could not compile `stable_deref_trait`. 38763 20:48:29 INFO - Caused by: 38764 20:48:29 INFO - process didn't exit successfully: `z:/build/build/src/sccache2/sccache.exe z:/build/build/src/rustc/bin/rustc.exe --crate-name stable_deref_trait Z:\build\build\src\third_party\rust\stable_deref_trait\src\lib.rs --crate-type lib --emit=dep-info,link -C opt-level=2 --cfg feature="default" --cfg feature="std" -C metadata=de8ffe798b49d872 -C extra-filename=-de8ffe798b49d872 --out-dir z:/build/build/src/obj-firefox/toolkit/library\i686-pc-windows-msvc\release\deps --target i686-pc-windows-msvc -L dependency=z:/build/build/src/obj-firefox/toolkit/library\i686-pc-windows-msvc\release\deps -L dependency=z:/build/build/src/obj-firefox/toolkit/library\release\deps --cap-lints allow -C opt-level=2 -C debuginfo=2` (exit code: 4294967294) 38765 20:48:29 INFO - warning: build failed, waiting for other jobs to finish... 38766 20:48:29 INFO - sccache: encountered fatal error 38767 20:48:29 INFO - sccache: error : Failed to create temp dir 38768 20:48:29 INFO - sccache: cause: Failed to create temp dir 38769 20:48:29 INFO - sccache: encountered fatal error :catlee, can you please take a look at this?
Flags: needinfo?(catlee)
Whiteboard: [stockwell needswork]
Flags: needinfo?(catlee) → needinfo?(ted)
Whiteboard: [stockwell needswork] → [stockwell infra]
This is different from the error message in comment 0. This seems to be failing here: https://github.com/mozilla/sccache/blob/39504428e28dc9f53ae7abdef77f8676d1c0e525/src/compiler/rust.rs#L131 `TempDir::new` isn't doing anything very fancy, the guts of it look like: https://github.com/rust-lang-nursery/tempdir/blob/41377a0fb637de79a2e14426ee489f218ea02c19/src/lib.rs#L190 If sccache is failing to create a temp directory there's probably something else bad happening that's making that happen.
Flags: needinfo?(ted)
I changed the error handling for that case so that it should print the underlying error when this fails: https://github.com/mozilla/sccache/commit/4fe33a6da14b14adc33db539b0cc963e1fa6f8d8 I'll get a patch up to update the version of sccache we use in CI and we can at least hopefully get better diagnostics.
Apparently I never got around to updating sccache to pick up that fix until last week (bug 1445218). Looking at the most recent logs from orangefactor shows that the errors all look like: 01:40:09 INFO - sccache: encountered fatal error 01:40:09 INFO - sccache: error : Failed to create temp dir 01:40:09 INFO - sccache: cause: Failed to create temp dir 01:40:09 INFO - sccache: cause: The system cannot find the path specified. (os error 3) The Rust standard library winds up calling `CreateDirectoryW`: https://msdn.microsoft.com/en-us/library/windows/desktop/aa363855(v=vs.85).aspx So this is that function returning with an error, and the error being `ERROR_PATH_NOT_FOUND`. Given that, this means that the temp directory itself does not exist, so this must be a system configuration issue somehow.
Component: General Automation → Worker
Product: Release Engineering → Taskcluster
QA Contact: catlee → pmoore
There are 33 occurrences in the last 7 days, on windows2012-32 opt and windows2012-64 opt.
Whiteboard: [stockwell infra] → [stockwell infra][stockwell needswork]
Update: There were 33 failures in the last 7 days on windows2012-64 & windows2012-32. Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=189683020&repo=autoland&lineNumber=67666 02:05:41 INFO - Comparing manifests... 02:05:41 INFO - Comparing packages... 02:05:41 INFO - mozmake.EXE[3]: Leaving directory 'z:/build/build/src/obj-firefox/config/tests/src-simple' 02:05:41 INFO - mozmake.EXE[2]: Leaving directory 'z:/build/build/src/obj-firefox/config' 02:05:41 INFO - Compiling itoa v0.4.1 02:05:41 INFO - Running `z:/build/build/src/sccache2/sccache.exe 'z:/build/build/src/rustc/bin/rustc.exe' --crate-name itoa 'third_party\rust\itoa\src\lib.rs' --crate-type lib --emit=dep-info,link -C opt-level=2 -C codegen-units=1 --cfg 'feature="default"' --cfg 'feature="std"' -C metadata=91a63898de3b9caa -C extra-filename=-91a63898de3b9caa --out-dir 'z:/build/build/src/obj-firefox/toolkit/library\x86_64-pc-windows-msvc\release\deps' --target x86_64-pc-windows-msvc -L 'dependency=z:/build/build/src/obj-firefox/toolkit/library\x86_64-pc-windows-msvc\release\deps' -L 'dependency=z:/build/build/src/obj-firefox/toolkit/library\release\deps' --cap-lints warn -C opt-level=2 -C debuginfo=2` 02:05:42 INFO - ......sccache: encountered fatal error 02:05:42 INFO - sccache: error : Failed to create temp dir 02:05:42 INFO - sccache: cause: Failed to create temp dir 02:05:42 INFO - sccache: cause: The system cannot find the path specified. (os error 3) 02:05:42 INFO - error: Could not compile `siphasher`. 02:05:42 INFO - Caused by: 02:05:42 INFO - process didn't exit successfully: `z:/build/build/src/sccache2/sccache.exe z:/build/build/src/rustc/bin/rustc.exe --crate-name siphasher third_party\rust\siphasher\src\lib.rs --crate-type lib --emit=dep-info,link -C opt-level=2 -C codegen-units=1 -C metadata=757e07074f2cab3b -C extra-filename=-757e07074f2cab3b --out-dir z:/build/build/src/obj-firefox/toolkit/library\release\deps -L dependency=z:/build/build/src/obj-firefox/toolkit/library\release\deps --cap-lints warn` (exit code: 4294967294) 02:05:42 INFO - warning: build failed, waiting for other jobs to finish... 02:05:42 INFO - .TEST-PASS | autobinscope.py | ../../dist/bin/xul.dll succeeded 02:05:42 INFO - z:/build/build/src/obj-firefox/_virtualenvs/init/Scripts/python.exe z:/build/build/src/build/win32/autobinscope.py ../../dist/bin/nss3.dll ../../dist/crashreporter-symbols/ 02:05:42 INFO - sccache: encountered fatal error 02:05:42 INFO - sccache: error : Failed to create temp dir 02:05:42 INFO - sccache: cause: Failed to create temp dir 02:05:42 INFO - sccache: cause: The system cannot find the path specified. (os error 3) 02:05:43 INFO - error: Could not compile `unicode-xid`. 02:05:43 INFO - Caused by: 02:05:43 INFO - process didn't exit successfully: `z:/build/build/src/sccache2/sccache.exe z:/build/build/src/rustc/bin/rustc.exe --crate-name unicode_xid third_party\rust\unicode-xid\src\lib.rs --crate-type lib --emit=dep-info,link -C opt-level=2 -C codegen-units=1 --cfg feature="default" -C metadata=ec858df19fcf253b -C extra-filename=-ec858df19fcf253b --out-dir z:/build/build/src/obj-firefox/toolkit/library\release\deps -L dependency=z:/build/build/src/obj-firefox/toolkit/library\release\deps --cap-lints warn` (exit code: 4294967294) 02:05:43 INFO - warning: build failed, waiting for other jobs to finish...
Update: There have been 41 failures in the last 7 days, all on opt build type (with one exception) Failures on platforms: windows2012-64: 23 windows2012-32: 17 osx-10-10 / debug: 1 Recent log file with the failure: https://treeherder.mozilla.org/logviewer.html#?job_id=191394972&repo=mozilla-inbound&lineNumber=64638
Flags: needinfo?(pmoore)
(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #32) > Apparently I never got around to updating sccache to pick up that fix until > last week (bug 1445218). Looking at the most recent logs from orangefactor > shows that the errors all look like: > > 01:40:09 INFO - sccache: encountered fatal error > 01:40:09 INFO - sccache: error : Failed to create temp dir > 01:40:09 INFO - sccache: cause: Failed to create temp dir > 01:40:09 INFO - sccache: cause: The system cannot find the path > specified. (os error 3) > > The Rust standard library winds up calling `CreateDirectoryW`: > https://msdn.microsoft.com/en-us/library/windows/desktop/aa363855(v=vs.85). > aspx > > So this is that function returning with an error, and the error being > `ERROR_PATH_NOT_FOUND`. Given that, this means that the temp directory > itself does not exist, so this must be a system configuration issue somehow. Hi Ted, Do you know what the path is of the directory it is trying to create?
Flags: needinfo?(pmoore) → needinfo?(ted)
There are 27 total failures in the last 7 days on windows2012. Pete/Ted is there any update here?
(In reply to Pete Moore [:pmoore][:pete] from comment #56) > Hi Ted, > > Do you know what the path is of the directory it is trying to create? I don't know exactly because the path doesn't get propagated up with the error unfortunately. The `tempdir` crate calls `env::temp_dir()` to get the temp directory in which to create the new directory, and the Rust docs say this: https://doc.rust-lang.org/std/env/fn.temp_dir.html#windows "Returns the value of, in order, the TMP, TEMP, USERPROFILE environment variable if any are set and not the empty string. Otherwise, temp_dir returns the path of the Windows directory. This behavior is identical to that of GetTempPath, which this function uses internally." The Windows implementation of that function literally just calls `GetTempPathW`: https://github.com/rust-lang/rust/blob/e847f30f571e617990f7193665ab10127100e9df/src/libstd/sys/windows/os.rs#L291
Flags: needinfo?(ted)
(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #61) > (In reply to Pete Moore [:pmoore][:pete] from comment #56) > > Hi Ted, > > > > Do you know what the path is of the directory it is trying to create? > > I don't know exactly because the path doesn't get propagated up with the > error unfortunately. The `tempdir` crate calls `env::temp_dir()` to get the > temp directory in which to create the new directory, and the Rust docs say > this: > https://doc.rust-lang.org/std/env/fn.temp_dir.html#windows > > "Returns the value of, in order, the TMP, TEMP, USERPROFILE environment > variable if any are set and not the empty string. Otherwise, temp_dir > returns the path of the Windows directory. This behavior is identical to > that of GetTempPath, which this function uses internally." > > The Windows implementation of that function literally just calls > `GetTempPathW`: > https://github.com/rust-lang/rust/blob/ > e847f30f571e617990f7193665ab10127100e9df/src/libstd/sys/windows/os.rs#L291 The rust docs look consistent with https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-gettemppathw#remarks - so this probably is accurate. I'm curious whether those env vars intermittently have invalid values or are not set etc, or whether the temp directory genuinely goes missing. If there is any way to get some debug in there to show what directory it is trying to delete, and to perhaps to dump the contents of the parent directory, and show DACLs associated to the directory if it exists, or its closest parent that does exist (using e.g. icacls command) that could possibly shed some light. If the temp directory is getting deleted, I suspect this may be triggered by some process/thread of the task that is executing, rather than by the worker - but anything is possible...
This bug failed 36 in the last 7 days. Occurs on windows2012 platforms on opt build type. Recent log: https://treeherder.mozilla.org/logviewer.html#?job_id=200252674&repo=mozilla-inbound INFO - ......sccache: encountered fatal error 20:38:09 INFO - sccache: error : Failed to create temp dir 20:38:09 INFO - sccache: cause: Failed to create temp dir 20:38:09 INFO - sccache: cause: The system cannot find the path specified. (os error 3) 20:38:09 INFO - error: Could not compile `siphasher`. 20:38:09 INFO - Caused by: 20:38:09 INFO - process didn't exit successfully: `z:/build/build/src/sccache2/sccache.exe 'z:/build/build/src/rustc/bin/rustc.exe' --crate-name siphasher 'third_party\rust\siphasher\src\lib.rs' --crate-type lib --emit=dep-info,link -C opt-level=2 -C codegen-units=1 -C metadata=2f1deef41a0ba09e -C extra-filename=-2f1deef41a0ba09e --out-dir 'z:/build/build/src/obj-firefox/toolkit/library\release\deps' -L 'dependency=z:/build/build/src/obj-firefox/toolkit/library\release\deps' --cap-lints warn` (exit code: 4294967294) 20:38:09 INFO - warning: build failed, waiting for other jobs to finish... 20:38:09 INFO - .sccache: encountered fatal error 20:38:09 INFO - sccache: error : Failed to create temp dir 20:38:09 INFO - sccache: cause: Failed to create temp dir 20:38:09 INFO - sccache: cause: The system cannot find the path specified. (os error 3) 20:38:09 INFO - error: Could not compile `libc`. 20:38:09 INFO - Caused by: 20:38:09 INFO - process didn't exit successfully: `z:/build/build/src/sccache2/sccache.exe 'z:/build/build/src/rustc/bin/rustc.exe' --crate-name libc 'third_party\rust\libc\src\lib.rs' --crate-type lib --emit=dep-info,link -C opt-level=2 -C codegen-units=1 --cfg 'feature="default"' --cfg 'feature="use_std"' -C metadata=b63034c45cb453f7 -C extra-filename=-b63034c45cb453f7 --out-dir 'z:/build/build/src/obj-firefox/toolkit/library\release\deps' -L 'dependency=z:/build/build/src/obj-firefox/toolkit/library\release\deps' --cap-lints warn` (exit code: 4294967294) 20:38:09 INFO - warning: build failed, waiting for other jobs to finish... 20:38:09 INFO - TEST-PASS | autobinscope.py | ../../dist/bin/xul.dll succeeded 20:38:09 INFO - z:/build/build/src/obj-firefox/_virtualenvs/init/Scripts/python.exe z:/build/build/src/build/win32/autobinscope.py ../../dist/bin/nss3.dll ../../dist/crashreporter-symbols/ 20:38:10 INFO - sccache: encountered fatal error 20:38:10 INFO - sccache: error : Failed to create temp dir 20:38:10 INFO - sccache: cause: Failed to create temp dir 20:38:10 INFO - sccache: cause: The system cannot find the path specified. (os error 3) 20:38:10 INFO - error: Could not compile `unicode-xid`. 20:38:10 INFO - Caused by: 20:38:10 INFO - process didn't exit successfully: `z:/build/build/src/sccache2/sccache.exe 'z:/build/build/src/rustc/bin/rustc.exe' --crate-name unicode_xid 'third_party\rust\unicode-xid\src\lib.rs' --crate-type lib --emit=dep-info,link -C opt-level=2 -C codegen-units=1 --cfg 'feature="default"' -C metadata=5bd8ab608523f65a -C extra-filename=-5bd8ab608523f65a --out-dir 'z:/build/build/src/obj-firefox/toolkit/library\release\deps' -L 'dependency=z:/build/build/src/obj-firefox/toolkit/library\release\deps' --cap-lints warn` (exit code: 4294967294) 20:38:10 INFO - warning: build failed, waiting for other jobs to finish... 20:38:10 INFO - sccache: encountered fatal error 20:38:10 INFO - sccache: error : Failed to create temp dir 20:38:10 INFO - sccache: cause: Failed to create temp dir 20:38:10 INFO - sccache: cause: The system cannot find the path specified. (os error 3) 20:38:10 INFO - sccache: encountered fatal error 20:38:10 INFO - sccache: error : Failed to create temp dir 20:38:10 INFO - sccache: cause: Failed to create temp dir 20:38:10 INFO - sccache: cause: The system cannot find the path specified. (os error 3) 20:38:10 INFO - sccache: encountered fatal error 20:38:10 INFO - sccache: error : Failed to create temp dir 20:38:10 INFO - sccache: cause: Failed to create temp dir 20:38:10 INFO - sccache: cause: The system cannot find the path specified. (os error 3) 20:38:10 INFO - sccache: encountered fatal error 20:38:10 INFO - sccache: error : Failed to create temp dir 20:38:10 INFO - sccache: cause: Failed to create temp dir 20:38:10 INFO - sccache: cause: The system cannot find the path specified. (os error 3) 20:38:10 INFO - error: Could not compile `dtoa`. 20:38:10 INFO - Caused by: 20:38:10 INFO - process didn't exit successfully: `z:/build/build/src/sccache2/sccache.exe 'z:/build/build/src/rustc/bin/rustc.exe' --crate-name dtoa 'third_party\rust\dtoa\src\lib.rs' --crate-type lib --emit=dep-info,link -C opt-level=2 -C codegen-units=1 -C metadata=4012af297245995f -C extra-filename=-4012af297245995f --out-dir 'z:/build/build/src/obj-firefox/toolkit/library\x86_64-pc-windows-msvc\release\deps' --target x86_64-pc-windows-msvc -L 'dependency=z:/build/build/src/obj-firefox/toolkit/library\x86_64-pc-windows-msvc\release\deps' -L 'dependency=z:/build/build/src/obj-firefox/toolkit/library\release\deps' --cap-lints warn -C opt-level=2 -C debuginfo=2` (exit code: 4294967294) 20:38:10 INFO - warning: build failed, waiting for other jobs to finish... 20:38:10 INFO - error: Could not compile `siphasher`. pmoore: Can you please take a look at this bug?
Flags: needinfo?(pmoore)
Hey ted, is there any way we can inject some code into the failing process that could dump the info described in comment 62? Thanks!
Flags: needinfo?(pmoore) → needinfo?(ted)
Hm, yeah, I think we could fork the tempdir crate and put in extra logging for testing purposes.
I forked that crate and wrote a simple patch to put the path in the error message: https://github.com/luser/tempdir/commit/bc045d5e96c85074da83efdf7ca512a2cfd5826b We'll just need to change the sccache toolchain task to build sccache using this version of the crate.
I looked through some of the builds from comment 66 and I noticed that basically all of those examples are rusttest builds, which is interesting.
Flags: needinfo?(ted)
I managed to reproduce the failure on that try push, but my override of the tempdir crate didn't work properly. :-/
That worked, but I can't say I like the answer it's giving me: https://treeherder.mozilla.org/logviewer.html#?job_id=201758699&repo=try&lineNumber=45655 19:58:44 INFO - sccache: cause: Failed to create temp dir 19:58:44 INFO - sccache: cause: Error creating temp directory "z:\\build\\build\\src\\c\\Users\\task_1537988825\\AppData\\Local\\Temp\\sccache.HQRR1j9faz6I": The system cannot find the path specified. (os error 3) That's a pretty messed up path! I assume something has screwed up the TEMPDIR path by eating the colon or something like that? If it was set to `c/Users/.../Temp` then I could see it producing this path.
(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #77) > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=f5a422b240cf8c5b9db8a359a8273d5dbe0f255e I added a dump of the full environment to the error message in this push. Hopefully we can catch the failure and get some more info.
Took quite a few retries this time but I did finally get a failure: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f5a422b240cf8c5b9db8a359a8273d5dbe0f255e&selectedJob=202697224 The log shows that both TMP and TEMP are set incorrectly: 20:35:00 INFO - TEMP=c/Users/task_1538422421/AppData/Local/Temp 20:35:00 INFO - TERM=cygwin 20:35:00 INFO - TINDERBOX_OUTPUT=1 20:35:00 INFO - TMP=c/Users/task_1538422421/AppData/Local/Temp Looking further up in the logs to where mozharness is dumping the environment, I see: 19:48:50 INFO - 'TEMP': 'C:\\Users\\task_1538422421\\AppData\\Local\\Temp', 19:48:50 INFO - 'TINDERBOX_OUTPUT': '1', 19:48:50 INFO - 'TMP': 'C:\\Users\\task_1538422421\\AppData\\Local\\Temp', Those are consistent with the values from successful builds, so I haven't found anything definitive yet.
I caught a failure with that last try push: https://treeherder.mozilla.org/logviewer.html#?job_id=202917973&repo=try&lineNumber=46042 I added extra logging to dump the environment using `env` from make before launching sccache, and from sccache itself when starting. Those both produce sensible results. From env: 17:19:27 INFO - TEMP=/tmp 17:19:27 INFO - TERM=cygwin 17:19:27 INFO - TINDERBOX_OUTPUT=1 17:19:27 INFO - TMP=/tmp From sccache: 17:19:27 INFO - TEMP=C:/Users/task_1538499626/AppData/Local/Temp 17:19:27 INFO - TERM=cygwin 17:19:27 INFO - TINDERBOX_OUTPUT=1 17:19:27 INFO - TMP=C:/Users/task_1538499626/AppData/Local/Temp However, when it failed it produced the same busted output as previously: 18:00:16 INFO - TEMP=c/Users/task_1538499626/AppData/Local/Temp 18:00:16 INFO - TERM=cygwin 18:00:16 INFO - TINDERBOX_OUTPUT=1 18:00:16 INFO - TMP=c/Users/task_1538499626/AppData/Local/Temp
I realized while looking into that log that these builds also exhibit bug 1495798, which I just filed today. It seems pretty plausible that fixing that bug would fix this issue, since it looks like sccache is getting launched with correct environment variables the first time. If we weren't shutting it down and then re-starting it in an ad-hoc manner during `make check` I don't think this would happen.
Component: Worker → General
Depends on: 1495798
Product: Taskcluster → Firefox Build System
QA Contact: pmoore
Blocks: 1470384
I retriggered a bunch of win64-rusttest builds on my try push in bug 1495798 and I hit something that looked like this error: https://treeherder.mozilla.org/logviewer.html#?job_id=203191252&repo=try&lineNumber=48397 21:17:36 INFO - ..error: couldn't create a temp dir: The system cannot find the path specified. (os error 3) 21:17:36 INFO - error: aborting due to previous error 21:17:36 INFO - error: Could not compile `stylo_tests`. ...but when I looked closer I realized the error message was different. That error comes from rustc itself: https://github.com/rust-lang/rust/blob/8a0e5cbf124cea5254bb09111cae6af1b86b21e4/src/librustc_codegen_llvm/back/link.rs#L299 Interestingly, that code is trying to create a temp dir inside the output directory, so if that fails things are definitely broken in a strange way!
No longer blocks: 1470384
Although, the fact that it's breaking in that way is a useful data point. My patch in bug 1495798 ensures that we keep the sccache server started during the build alive for the `make check` step, so most Rust compiles in that step should be handled properly through the existing sccache server that has a working environment. Looking at sccache.log verifies that the server did get the request for the compile that failed: DEBUG:sccache::server: parse_arguments: CannotCache(crate-type): ["--crate-name", "build_script_build", "servo\\ports\\geckolib\\tests\\build.rs", "--crate-type", "bin", "--emit=dep-info,link", "-C", "opt-level=2", "-C", "codegen-units=1", "-C", "metadata=43baea8111549fcb", "-C", "extra-filename=-43baea8111549fcb", "--out-dir", "z:/build/build/src/obj-firefox/toolkit/library\\release\\build\\stylo_tests-43baea8111549fcb", "-L", "dependency=z:/build/build/src/obj-firefox/toolkit/library\\release\\deps", "--extern", "regex=z:/build/build/src/obj-firefox/toolkit/library\\release\\deps\\libregex-9bf31ce226af7ac4.rlib"] ...but it refused to cache it because it's compiling a build script binary, which means rustc was then invoked from the sccache *client* process, where it failed!
I mentioned this on IRC, but looking at the log in comment 83 again I noticed that while not all env vars are messed up, some of them definitely are, and PATH is one of those: 18:00:16 INFO - PATH=z:/build/build/src/obj-firefox/toolkit/library\release\deps;Z:\task_1538499626\build\src\rustc\bin;z/build/build/src/clang/bin:z/build/build/src/clang/bin:z/build/build/src/vs2017_15.8.4/VC/bin/Hostx64/x64:z/build/build/src/vs2017_15.8.4/SDK/bin/10.0.17134.0/x64:z/build/build/src/vs2017_15.8.4/VC/redist/x64/Microsoft.VC141.CRT:z/build/build/src/vs2017_15.8.4/SDK/Redist/ucrt/DLLs/x64:z/build/build/src/vs2017_15.8.4/DIA SDK/bin/amd64:c/Program Files/Mercurial:c/mozilla-build/bin:c/mozilla-build/kdiff3:c/mozilla-build/moztools-x64/bin:c/mozilla-build/mozmake:/bin:/local/bin:c/mozilla-build/nsis-3.01:c/mozilla-build/python:c/mozilla-build/python/Scripts:c/Windows/system32:c/Windows:c/Windows/System32/Wbem:c/Windows/System32/WindowsPowerShell/v1.0/:c/Program Files/Amazon/cfn-bootstrap/:c/Program Files (x86)/GNU/GnuPG/pub:c/Program Files (x86)/Windows Kits/10/Windows Performance Toolkit/:c/mozilla-build/python/lib/site-packages/pywin32_system32:c/mozilla-build/python/lib/site-packages/pywin32_system32 That could definitely explain the symptoms in bug 1470384!
Given the way that cargo prepares the PATH env var for rustc subprocesses it seems like the environment is getting screwed up between make and cargo: https://github.com/rust-lang/cargo/blob/b1e1d388ea09ecc27515345b180dd02263b087df/src/cargo/core/compiler/compilation.rs#L202-L205 From comment 87 you can see that there are two paths at the start of PATH that look correct, and those are what cargo adds. The rest are mangled, and presumably came from cargo's environment.
I ran 101 build-win64-rusttests/opt tasks on that last try push and of those 96 passed, 4 failed in known xpcshell selftest intermittents, and 1 failed due to an infra error downloading a toolchain. I didn't see a single instance of this or bug 1470384, which seems pretty good!
Assignee: nobody → ted
Blocks: 1470384
No longer depends on: 1495798
Historically when invoking mach from mozharness on windows we have started it by way of bash. However, we haven't been doing that for `mach build check`. This hadn't caused noticeable issues in the past, but now that we are running rusttest builds, which wind up compiling extra Rust code in that step, intermittent failures are showing up because of this. The root cause is likely a bug in msys' translation of environment variables, but msys is unsupported and we're already using this workaround for the rest of the build.
Blocks: 1463695
Pushed by tmielczarek@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/af465aa90d73 invoke mach via bash on windows when running `mach build check`. r=froydnj
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
The graph from the previous comment shows this failure disappearing as of 2018-10-06, so it looks like this patch did fix the issue!
Whiteboard: [stockwell infra][stockwell needswork] → [stockwell infra][stockwell fixed:product]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: