Closed Bug 1730222 Opened 3 years ago Closed 3 years ago

Perma toolkit/crashreporter/test/unit/test_crash_phc.js | run_test - [run_test : 2] "undefined" == "FreedPage" | after xpcshell return code: 0

Categories

(Core :: Memory Allocator, defect, P5)

defect

Tracking

()

RESOLVED FIXED
94 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox-esr91 --- unaffected
firefox92 --- unaffected
firefox93 --- unaffected
firefox94 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: gsvelto)

References

(Regression)

Details

(Keywords: intermittent-failure, regression, Whiteboard: [retriggered])

Attachments

(1 file)

Filed by: imoraru [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=351174153&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Fl6KUnJCQreDkiQuPUCDAg/runs/0/artifacts/public/logs/live_backing.log


[task 2021-09-10T17:49:47.028Z] 17:49:47     INFO -  TEST-START | toolkit/crashreporter/test/unit/test_crash_phc.js
[task 2021-09-10T17:49:47.247Z] 17:49:47  WARNING -  TEST-UNEXPECTED-FAIL | toolkit/crashreporter/test/unit/test_crash_phc.js | xpcshell return code: 0
[task 2021-09-10T17:49:47.248Z] 17:49:47     INFO -  TEST-INFO took 220ms
[task 2021-09-10T17:49:47.248Z] 17:49:47     INFO -  >>>>>>>
[task 2021-09-10T17:49:47.248Z] 17:49:47     INFO -  (xpcshell/head.js) | test MAIN run_test pending (1)
[task 2021-09-10T17:49:47.248Z] 17:49:47     INFO -  (xpcshell/head.js) | test run_next_test 0 pending (2)
[task 2021-09-10T17:49:47.248Z] 17:49:47     INFO -  (xpcshell/head.js) | test MAIN run_test finished (2)
[task 2021-09-10T17:49:47.248Z] 17:49:47     INFO -  running event loop
[task 2021-09-10T17:49:47.248Z] 17:49:47     INFO -  toolkit/crashreporter/test/unit/test_crash_phc.js | Starting run_test
[task 2021-09-10T17:49:47.248Z] 17:49:47     INFO -  (xpcshell/head.js) | test run_test pending (2)
[task 2021-09-10T17:49:47.248Z] 17:49:47     INFO -  TEST-PASS | toolkit/crashreporter/test/unit/test_crash_phc.js | run_test - [run_test : 27] true == true
[task 2021-09-10T17:49:47.248Z] 17:49:47     INFO -  (xpcshell/head.js) | test run_next_test 0 finished (2)
[task 2021-09-10T17:49:47.248Z] 17:49:47     INFO -  "CONSOLE_MESSAGE: (info) No chrome package registered for chrome://branding/locale/brand.properties"
[task 2021-09-10T17:49:47.248Z] 17:49:47  WARNING -  TEST-UNEXPECTED-FAIL | toolkit/crashreporter/test/unit/test_crash_phc.js | run_test - [run_test : 2] "undefined" == "FreedPage"
[task 2021-09-10T17:49:47.248Z] 17:49:47     INFO -  /opt/worker/tasks/task_1631295470/build/tests/xpcshell/tests/toolkit/crashreporter/test/unit/test_crash_phc.js:check:2
[task 2021-09-10T17:49:47.249Z] 17:49:47     INFO -  /opt/worker/tasks/task_1631295470/build/tests/xpcshell/tests/toolkit/crashreporter/test/unit/test_crash_phc.js:run_test/<:33
[task 2021-09-10T17:49:47.249Z] 17:49:47     INFO -  /opt/worker/tasks/task_1631295470/build/tests/xpcshell/tests/toolkit/crashreporter/test/unit/head_crashreporter.js:handleMinidump:165
[task 2021-09-10T17:49:47.249Z] 17:49:47     INFO -  /opt/worker/tasks/task_1631295470/build/tests/xpcshell/head.js:_do_main:240
[task 2021-09-10T17:49:47.249Z] 17:49:47     INFO -  /opt/worker/tasks/task_1631295470/build/tests/xpcshell/head.js:_execute_test:597
[task 2021-09-10T17:49:47.249Z] 17:49:47     INFO -  -e:null:1
[task 2021-09-10T17:49:47.249Z] 17:49:47     INFO -  exiting test
[task 2021-09-10T17:49:47.249Z] 17:49:47     INFO -  Unexpected exception NS_ERROR_ABORT:
[task 2021-09-10T17:49:47.249Z] 17:49:47     INFO -  _abort_failed_test@/opt/worker/tasks/task_1631295470/build/tests/xpcshell/head.js:860:20
[task 2021-09-10T17:49:47.249Z] 17:49:47     INFO -  do_report_result@/opt/worker/tasks/task_1631295470/build/tests/xpcshell/head.js:961:5
[task 2021-09-10T17:49:47.249Z] 17:49:47     INFO -  Assert<@/opt/worker/tasks/task_1631295470/build/tests/xpcshell/head.js:75:21
[task 2021-09-10T17:49:47.249Z] 17:49:47     INFO -  proto.report@resource://testing-common/Assert.jsm:228:10
[task 2021-09-10T17:49:47.249Z] 17:49:47     INFO -  equal@resource://testing-common/Assert.jsm:270:8
[task 2021-09-10T17:49:47.249Z] 17:49:47     INFO -  check@/opt/worker/tasks/task_1631295470/build/tests/xpcshell/tests/toolkit/crashreporter/test/unit/test_crash_phc.js:2:10
[task 2021-09-10T17:49:47.249Z] 17:49:47     INFO -  run_test/<@/opt/worker/tasks/task_1631295470/build/tests/xpcshell/tests/toolkit/crashreporter/test/unit/test_crash_phc.js:33:12
[task 2021-09-10T17:49:47.250Z] 17:49:47     INFO -  handleMinidump@/opt/worker/tasks/task_1631295470/build/tests/xpcshell/tests/toolkit/crashreporter/test/unit/head_crashreporter.js:165:11
[task 2021-09-10T17:49:47.250Z] 17:49:47     INFO -  _do_main@/opt/worker/tasks/task_1631295470/build/tests/xpcshell/head.js:240:6
[task 2021-09-10T17:49:47.250Z] 17:49:47     INFO -  _execute_test@/opt/worker/tasks/task_1631295470/build/tests/xpcshell/head.js:597:5
[task 2021-09-10T17:49:47.250Z] 17:49:47     INFO -  @-e:1:1
[task 2021-09-10T17:49:47.250Z] 17:49:47     INFO -  exiting test
[task 2021-09-10T17:49:47.250Z] 17:49:47     INFO -  <<<<<<<
[task 2021-09-10T17:49:47.251Z] 17:49:47     INFO -  TEST-START | toolkit/crashreporter/test/unit_ipc/test_content_phc3.js

Hi Mike! Can you please take a look at this failure?

Flags: needinfo?(mh+mozilla)
Regressed by: 1576515
Has Regression Range: --- → yes
Whiteboard: [retriggered]

Set release status flags based on info from the regressing bug 1576515

Flags: needinfo?(mh+mozilla) → needinfo?(kwright)
Component: Crash Reporting → Memory Allocator
Product: Toolkit → Core

It looks like the extra file being generated is missing elements. I've seen issues in the past on M1 with generating crash data but I'm pretty sure it has since been fixed so it probably isn't related.

Also to note, it looks like bug 1536217 disabled this test for ARM64 windows for frequent failures, though it doesn't elaborate as to the cause of the failure so I'm curious if it's the same issue. I may need to disable the test on ARM64 macs as well.

Gabriele, do you know what might be happening here?

Flags: needinfo?(kwright) → needinfo?(gsvelto)

The PHC annotations are missing, not having an ARM-based mac to test on I can only speculate why that is:

  • The crash might not be recognized as a PHC crash, you should check what happens in the exception handler and specifically here. One thing worth noting is that with my recent additions to the exception handler that check must be changed because exception_subcode can be non-zero for other exceptions too. We should use something like: if ((exception_code == EXC_BAD_ACCESS) && exception_subcode) instead
  • Alternatively you should check if we're indeed producing a PHC allocation. We should crash if we can't and that doesn't seem to be happening here.
  • Last but not least, if both the above are verified and we're indeed detecting the crash as a PHC one we might not be writing out the annotation properly. I find that unlikely given that code is platform independent, but you never know.
Flags: needinfo?(gsvelto)

Mihai, this failure is on an M1 Mac. Are we running the tests in ARM mode, or are we still having issues with x86 emulation kicking in?

Flags: needinfo?(mtabara)

302 to Ben for this

Flags: needinfo?(mtabara) → needinfo?(bhearsum)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #7)

Mihai, this failure is on an M1 Mac. Are we running the tests in ARM mode, or are we still having issues with x86 emulation kicking in?

As far as I know, everything is running in ARM mode now. dhouse - is there any extra confirmation we can do to verify that?

I also wonder if granting a loaner would be useful in this circumstance.

Flags: needinfo?(bhearsum) → needinfo?(dhouse)

On this one macmini-m1-30 worker Sept 10, the only process i caught[1] as executing as x86 was node(js).
I'll check the system logs to see if rosetta reports executing anything else under x86_64 emulation.

[1] my m1 architecture execution monitoring is still only checking processes once each minute as I didn't change it or find an alternative.

Summary: Perma [tier 2] toolkit/crashreporter/test/unit/test_crash_phc.js | run_test - [run_test : 2] "undefined" == "FreedPage" | after xpcshell return code: 0 → Perma toolkit/crashreporter/test/unit/test_crash_phc.js | run_test - [run_test : 2] "undefined" == "FreedPage" | after xpcshell return code: 0

I'm not certain if something else was running as x86_64 between my 1minute checks, but I know that processes for basename,bash,Firefox,generic-worker-simple,launchd,livelog,Python,ssltunnel,start-worker,xpcshell executed as arm64 (and none caught as x86_64 from 16:00 utc to 20:00 sept 10th) on the worker macmini-m1-30 for https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Fl6KUnJCQreDkiQuPUCDAg/runs/0/artifacts/public/logs/live_backing.log. I didn't find any logging of rosetta emulation.

Same for the recent failure, https://treeherder.mozilla.org/logviewer?job_id=351785984&repo=mozilla-central&lineNumber=5461, macmini-m1-40 2021-09-16T18:00Z to 2021-09-16T20:00Z, no processes found executing as x86_64 (checked at 1 minute intevals). And these processes were found executing as arm64 from 18:44 to 18:50: awk,BadCertAndPinningServer,bash,diskimages-helper,firefox,Firefox,generic-worker-simple,grep,hdiutil,http3server,launchd,livelog,node,plugin-container,Python,ssltunnel,start-worker,xpcshell

Flags: needinfo?(dhouse)

For testing, would it be helpful to have a patch that causes Firefox to crash if it is being run emulated?

(In reply to Haik Aftandilian [:haik] from comment #13)

For testing, would it be helpful to have a patch that causes Firefox to crash if it is being run emulated?

how would you make that not ship?

(In reply to bhearsum@mozilla.com (:bhearsum) from comment #9)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #7)

Mihai, this failure is on an M1 Mac. Are we running the tests in ARM mode, or are we still having issues with x86 emulation kicking in?

As far as I know, everything is running in ARM mode now. dhouse - is there any extra confirmation we can do to verify that?

It would be good for the task names to reflect that.

(In reply to Mike Hommey [:glandium] from comment #14)

(In reply to Haik Aftandilian [:haik] from comment #13)

For testing, would it be helpful to have a patch that causes Firefox to crash if it is being run emulated?

how would you make that not ship?

I Just meant a local patch to help validate tests are not unexpectedly run under emulation. i.e., call nsCocoaFeatures::ProcessIsRosettaTranslated() at startup and crash if it returns true.

And we could have a shippable test to check for this. nsIMacUtils::isTranslated already exposes whether or not we are running under Rosetta, but it needs a fix to work for arm64.

:bhearsum, what do you think? I imagine logging (into the try task log) the process architecture could be helpful too (if the task runner checks firefox/processes when executed).
And I think we have capacity to loan m1 workers (shall I loan one to :gsvelto?)

Flags: needinfo?(bhearsum)

(In reply to Dave House [:dhouse] from comment #16)

:bhearsum, what do you think? I imagine logging (into the try task log) the process architecture could be helpful too (if the task runner checks firefox/processes when executed).

Logging definitely sounds useful, but I think the most useful way to do that is within Firefox or xpcshell? (If we're just looking from test harnesses, we can't be 100% certain if it's running under Rosetta, unless I'm missing something).

And I think we have capacity to loan m1 workers (shall I loan one to :gsvelto?)

If it's useful to him, I think it's a good idea.

Flags: needinfo?(bhearsum)

:gsvelto would an m1 loaner be useful for you? I'm assuming it could be used to reproduce and prove or fix this test. If you do want one, I'll put it in a separate worker pool so that you can run try tasks through it and access it directly with ssh+vnc.

Flags: needinfo?(gsvelto)

(In reply to Dave House [:dhouse] from comment #18)

:gsvelto would an m1 loaner be useful for you? I'm assuming it could be used to reproduce and prove or fix this test. If you do want one, I'll put it in a separate worker pool so that you can run try tasks through it and access it directly with ssh+vnc.

Yes it would be useful. I'd need to set some time aside for this though as I was working on other stuff ATM.

Flags: needinfo?(gsvelto)

:bhearsum, something like https://gist.github.com/davehouse/8dfccfbd9366eeb8ee59d165818fed1c from test harness could check if a process is running through rosetta. (I don't know what is useful since I don't know the firefox/xpcshell side or the test harness)

Having fun with the loaner. We're not detecting the crashing address as a PHC one, now trying to figure out why.

Assignee: nobody → gsvelto
Status: NEW → ASSIGNED

Alright, I know what's going on. PHC was designed to work only on 4KiB pages. By default it will never create guard pages when requesting something larger than 4KiB. However Apple uses 16KiB pages on ARM processors which means PHC will never allocate a guard page even when it can. So the only way to use this is relax this particular constraint; I'll give it a spin tomorrow.

This also tightens the check for PHC-type crashes in the exception handler
so that we don't accidentally try to interpret unrelated crash types as
potential PHC ones.

This is perma failing on tier 1 jobs. I see that there is a patch submitted for review 6 days ago.
Mike can you please review the patch? Thank you!

Flags: needinfo?(mh+mozilla)
Flags: needinfo?(mh+mozilla)
Pushed by gsvelto@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d1144997cc18 Support PHC on macOS when running on ARM-based machines r=glandium
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 94 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: