Closed Bug 1677797 Opened 4 years ago Closed 4 years ago

84.0b1 codeql-cpp jobs broken during configure

Categories

(Firefox Build System :: General, defect)

defect

Tracking

(firefox-esr78 unaffected, firefox83 unaffected, firefox84- fixed, firefox85 fixed)

RESOLVED FIXED
85 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox83 --- unaffected
firefox84 - fixed
firefox85 --- fixed

People

(Reporter: RyanVM, Assigned: tjr)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file)

We shipped 84.0b1 today and the codeql-cpp hit fatal errors while running configure:
https://firefoxci.taskcluster-artifacts.net/XATXkY3lQeymtCnN21v8JA/1/public/logs/live_backing.log

AFAIK, this isn't a release-blocking issue but I assume we want to get this fixed before 84 goes to release in a month.

Flags: needinfo?(tom)
Has Regression Range: --- → yes

Thsi issue is not related to the fact that we're now calling _run_command_in_objdir; inasmuch as silencing both stdout and stderr in the our process runner does not solve the problem.

Flags: needinfo?(tom)

There's this at the end of old-configure:

[T 23:00:10 987] Recognising file, first four bytes: { 7f, 45, 4c, 46 }
[T 23:00:10 987]   ELF 64-bit x86-64.
[T 23:00:10 987]   PH#1: PT_DYNAMIC or PT_INTERP - so dynamic.

Something is making autoconf output extra garbage.

Oh; and to describe the actual problem, the following output is being prepended to old-configure. The output comes from the codeql tool that wraps the build job (and configure step.) I've been debugging with rm -rf old-configure obj-x86_64-pc-linux-gnu js/src/old-configure codeql-db && ~/packages/codeql/codeql database create --language=cpp --command="./mach configure" codeql-db

[T 17:00:06 22974] Recognising file, first four bytes: { 7f, 45, 4c, 46 }
[T 17:00:06 22974]   ELF 64-bit x86-64.
[T 17:00:06 22974]   PH#1: PT_DYNAMIC or PT_INTERP - so dynamic.
[T 17:00:06 22975] Recognising file, first four bytes: { 7f, 45, 4c, 46 }
[T 17:00:06 22975]   ELF 64-bit x86-64.
[T 17:00:06 22975]   PH#1: PT_DYNAMIC or PT_INTERP - so dynamic.

The important part is that it's at the end. The part at the beginning is mostly ignored, but the last line being a failure makes the entire script fail. (the last line of a script without set -e gets its exit code propagated up)

Set release status flags based on info from the regressing bug 1671424

Any updates on this bug? We're a week away from 84 going to RC.

Flags: needinfo?(tom)

For some reason I thoughts I commented on this.... The CodeQL engineer gave me a guess:

My SWAG is that there's an interaction between autoconf's hard-wired choice of file-descriptor IDs in the configure scripts and the fact that codeql has a file open to do its internal logging to. (But that's without proof).

I haven't tried to work off this to try and fix it (yet.)

That does seem to be the problem. Changing this to 5 fixes old-configure, but there's still errors later on. I'm trying to figure out what an appropriate mechanism would be, it doesn't seem like opening a new file descriptor (e.g. 400) works or using a variable as described here doesn't seem to work...

exec 4>&1 is the same as dup2(1, 4). I don't see a reason for an existing file descriptor to cause problems with that.

In Bug 1671424 we changed the way we execute autoconf and it broke the codeql database
generation job. This job wraps the entire build process in a way similar to
codeql --command="./mach build"

codeql actually clone()'s the subprocesses that execute, meaning that they share file
descriptors with the original codeql process. The original process happens to use
file descriptor 4 as an internal logging destination.

autoconf grabs file descriptor 4 and uses it a temporary redirection point either
to a file or stdout. When it does so, it closes the original file descriptor 4 and
points it at the new location, which also affects the codeql process, resulting in
undesired logging output going into the configure script.

Because this file descriptor trick is only used to avoid duplicating a few lines of
code, I removed the trick and duplicated the code.

Assignee: nobody → tom
Status: NEW → ASSIGNED

Comment on attachment 9191094 [details]
Bug 1677797 - Change autoconf to avoid using a hard-coded file descriptor r?glandium

Beta/Release Uplift Approval Request

  • User impact if declined: The codeql jobs will fail
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Affects the configure script. If the browser stops building, this would be why, but otherwise if it builds it hasn't broken anything.
  • String changes made/needed:
Flags: needinfo?(tom)
Attachment #9191094 - Flags: approval-mozilla-beta?

[Tracking Requested - why for this release]: Would like to uplift this so the codeql jobs will work when we release 84

Pushed by tritter@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7c8dd800fa54 Change autoconf to avoid using a hard-coded file descriptor r=glandium
Attachment #9191094 - Flags: approval-mozilla-beta? → approval-mozilla-release?
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 85 Branch

Comment on attachment 9191094 [details]
Bug 1677797 - Change autoconf to avoid using a hard-coded file descriptor r?glandium

Approved for 84.0rc2.

Attachment #9191094 - Flags: approval-mozilla-release? → approval-mozilla-release+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: