84.0b1 codeql-cpp jobs broken during configure
Categories
(Firefox Build System :: General, defect)
Tracking
(firefox-esr78 unaffected, firefox83 unaffected, firefox84- fixed, firefox85 fixed)
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox83 | --- | unaffected |
firefox84 | - | fixed |
firefox85 | --- | fixed |
People
(Reporter: RyanVM, Assigned: tjr)
References
(Regression)
Details
(Keywords: regression)
Attachments
(1 file)
(deleted),
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-release+
|
Details |
We shipped 84.0b1 today and the codeql-cpp hit fatal errors while running configure:
https://firefoxci.taskcluster-artifacts.net/XATXkY3lQeymtCnN21v8JA/1/public/logs/live_backing.log
AFAIK, this isn't a release-blocking issue but I assume we want to get this fixed before 84 goes to release in a month.
Assignee | ||
Comment 1•4 years ago
|
||
I've traced this to Bug 1671424 / https://hg.mozilla.org/mozilla-central/rev/8af2e5440a95
Updated•4 years ago
|
Assignee | ||
Comment 2•4 years ago
|
||
Thsi issue is not related to the fact that we're now calling _run_command_in_objdir; inasmuch as silencing both stdout and stderr in the our process runner does not solve the problem.
Comment 3•4 years ago
|
||
There's this at the end of old-configure:
[T 23:00:10 987] Recognising file, first four bytes: { 7f, 45, 4c, 46 }
[T 23:00:10 987] ELF 64-bit x86-64.
[T 23:00:10 987] PH#1: PT_DYNAMIC or PT_INTERP - so dynamic.
Something is making autoconf output extra garbage.
Assignee | ||
Comment 4•4 years ago
|
||
Oh; and to describe the actual problem, the following output is being prepended to old-configure. The output comes from the codeql tool that wraps the build job (and configure step.) I've been debugging with rm -rf old-configure obj-x86_64-pc-linux-gnu js/src/old-configure codeql-db && ~/packages/codeql/codeql database create --language=cpp --command="./mach configure" codeql-db
[T 17:00:06 22974] Recognising file, first four bytes: { 7f, 45, 4c, 46 }
[T 17:00:06 22974] ELF 64-bit x86-64.
[T 17:00:06 22974] PH#1: PT_DYNAMIC or PT_INTERP - so dynamic.
[T 17:00:06 22975] Recognising file, first four bytes: { 7f, 45, 4c, 46 }
[T 17:00:06 22975] ELF 64-bit x86-64.
[T 17:00:06 22975] PH#1: PT_DYNAMIC or PT_INTERP - so dynamic.
Comment 5•4 years ago
|
||
The important part is that it's at the end. The part at the beginning is mostly ignored, but the last line being a failure makes the entire script fail. (the last line of a script without set -e gets its exit code propagated up)
Comment 6•4 years ago
|
||
Set release status flags based on info from the regressing bug 1671424
Comment hidden (Intermittent Failures Robot) |
Updated•4 years ago
|
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 9•4 years ago
|
||
Any updates on this bug? We're a week away from 84 going to RC.
Updated•4 years ago
|
Assignee | ||
Comment 10•4 years ago
|
||
For some reason I thoughts I commented on this.... The CodeQL engineer gave me a guess:
My SWAG is that there's an interaction between autoconf's hard-wired choice of file-descriptor IDs in the configure scripts and the fact that codeql has a file open to do its internal logging to. (But that's without proof).
I haven't tried to work off this to try and fix it (yet.)
Assignee | ||
Comment 11•4 years ago
|
||
That does seem to be the problem. Changing this to 5 fixes old-configure, but there's still errors later on. I'm trying to figure out what an appropriate mechanism would be, it doesn't seem like opening a new file descriptor (e.g. 400) works or using a variable as described here doesn't seem to work...
Comment 12•4 years ago
|
||
exec 4>&1
is the same as dup2(1, 4)
. I don't see a reason for an existing file descriptor to cause problems with that.
Assignee | ||
Comment 13•4 years ago
|
||
In Bug 1671424 we changed the way we execute autoconf and it broke the codeql database
generation job. This job wraps the entire build process in a way similar to
codeql --command="./mach build"
codeql actually clone()'s the subprocesses that execute, meaning that they share file
descriptors with the original codeql process. The original process happens to use
file descriptor 4 as an internal logging destination.
autoconf grabs file descriptor 4 and uses it a temporary redirection point either
to a file or stdout. When it does so, it closes the original file descriptor 4 and
points it at the new location, which also affects the codeql process, resulting in
undesired logging output going into the configure script.
Because this file descriptor trick is only used to avoid duplicating a few lines of
code, I removed the trick and duplicated the code.
Updated•4 years ago
|
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 15•4 years ago
|
||
Comment on attachment 9191094 [details]
Bug 1677797 - Change autoconf to avoid using a hard-coded file descriptor r?glandium
Beta/Release Uplift Approval Request
- User impact if declined: The codeql jobs will fail
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Affects the configure script. If the browser stops building, this would be why, but otherwise if it builds it hasn't broken anything.
- String changes made/needed:
Assignee | ||
Comment 16•4 years ago
|
||
[Tracking Requested - why for this release]: Would like to uplift this so the codeql jobs will work when we release 84
Comment 17•4 years ago
|
||
Reporter | ||
Updated•4 years ago
|
Comment 18•4 years ago
|
||
bugherder |
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 19•4 years ago
|
||
Comment on attachment 9191094 [details]
Bug 1677797 - Change autoconf to avoid using a hard-coded file descriptor r?glandium
Approved for 84.0rc2.
Reporter | ||
Comment 20•4 years ago
|
||
uplift |
Description
•