All 64-bit mode crashes on OS X with reason EXC_BAD_ACCESS have crash addresses truncated to 32-bits
Categories
(Toolkit :: Crash Reporting, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox91 | --- | fixed |
People
(Reporter: smichaud, Assigned: gsvelto)
References
(Blocks 4 open bugs)
Details
Attachments
(1 file, 1 obsolete file)
(deleted),
text/x-phabricator-request
|
Details |
Reporter | ||
Updated•10 years ago
|
Reporter | ||
Comment 1•10 years ago
|
||
Updated•7 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 3•6 years ago
|
||
Assignee | ||
Comment 4•6 years ago
|
||
I've updated the patch trying to address the review comments as best as I could. I've also double-checked other implementations and run both automated and manual tests and everything seems to work correctly. Heavy emphasis on seems considering how many unknown corner-cases there might be.
Assignee | ||
Comment 5•6 years ago
|
||
There seems to be an issue with marionette tests, it didn't show up locally in debug builds only try opt ones: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fd1961ad3347270a4827ab719dff4fd303cd18cb&selectedJob=238146499
I'm testing again with a local opt build.
Assignee | ||
Comment 6•6 years ago
|
||
While I couldn't reproduce the failure locally it seems like it's a race in the minidump generation or something along the lines. In the marionette logs I can see three lock failures before the actual error:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=238146533&repo=try&lineNumber=39510-39513
Locally everything works fine, even after hundreds of runs with a heavy background load to slow down the system. This is definitely a race and possibly a narrow one. My changes here might have made it wide enough to reproduce frequently on try.
Interestingly we have a bunch of similar issues regarding marionette that occur only very rarely and I haven't been able to debug them yet (see bug 1523583). Hopefully this is something similar. There is also a known race in the crash reporting code (bug 1489536) but I don't think that's what causing the problem here.
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 7•5 years ago
|
||
I finally found some time to investigate the issue I encountered in comment 6... and it's gone. I'm running more tests to make sure it's really gone for good. It looked like a race between the crash generation and the front-end code so either the race is still there but not triggered or it was solved (most likely in the front-end code). Either way if testing comes up clean I'll land this, it's about time. This is my latest try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=900578d15112851fecfe57dd53dde0a8e04dcc2f
Comment 9•5 years ago
|
||
bugherder |
Comment 10•5 years ago
|
||
Backed out for causing bug 1574515.
Backout: https://hg.mozilla.org/mozilla-central/rev/1868b438bd8307c67a4b3f569239410b843722e0
Assignee | ||
Comment 11•5 years ago
|
||
I have just tested on my Mac... and I don't reproduce, no matter how hard I try. I'll push this to try again but I'm wondering why this doesn't reproduce locally nor it happened on try before landing. Sigh. I'll never land this fix.
Assignee | ||
Comment 13•5 years ago
|
||
I gave this another spin on try and the issue with gtests seems to have gone away... I'll try to land this again and hope that it sticks this time.
Assignee | ||
Comment 14•5 years ago
|
||
Nope, I was wrong, I was looking at the opt build but it's the debug one that was failing.
Assignee | ||
Comment 15•3 years ago
|
||
I did more investigation on this today, and spent a long time poring over Apple's kernel code as well as crashpad's implementation. Long story short, we need a lot more changes here compared to what we have in attachment 9052569 [details]. I'll work on this starting with next week.
Assignee | ||
Comment 16•3 years ago
|
||
Some results from my investigation: recent version of macOS deliver a lot more exception as signals and Chrome is mostly using these to catch crashes... maybe we should too? It's definitely easier than dealing with mach exceptions. That being said I finally figured out what's wrong with gtests: for some reason on mac we're calling XRE_mainInit()
which sets the exception handler twice, so when we disable it only one of the two exception ports we had set up get removed and the following crash is directed to the second one. How did this ever work is beyond me but somehow this patch was needed to surface the problem.
I'm now trying to figure out how this is happening given that this most definitely does not happen on Linux.
Updated•3 years ago
|
Assignee | ||
Comment 17•3 years ago
|
||
Alright, I think I finally fixed the last issue with this. The behavior I saw in comment 16 was "normal" in the sense that we were forking the main process and that's supposed to happen (I'm not sure why it doesn't on Linux but I don't care). The actual issue was that with the changes in the patch the following chain of events happened:
- The child process would crash, with crash reporting enabled (in the build) but disabled (at runtime)
- Since the child process' exception handler was disabled the exception would be delivered to the main process as its exception handler had been fork()'d into the child (note, the exception was not delivered to the copy of the main process' exception handler in the child because the test disabled all exception handling)
- The main process would realize that the exception was not its own and tried to forward it
- Previously we would tell the macOS kernel to handle the exception on its own, as it was not meant for the main process, but the forked code tried to exit the main process - for some reason this didn't work
- At this point the exception handler thread would hang waiting for a shutdown message that never came and the process would be stuck
The fix involved not trying to close the main process if we receive an exception not meant for it, but just ignore the exception and proceed as normal.
Let's see if this sticks.
Comment 18•3 years ago
|
||
Comment 19•3 years ago
|
||
bugherder |
Description
•