Closed Bug 1493342 Opened 6 years ago Closed 2 years ago

Report crashes at amd64 non-canonical addresses correctly

Categories

(Toolkit :: Crash Reporting, enhancement, P2)

x86_64
Unspecified
enhancement

Tracking

()

RESOLVED FIXED

People

(Reporter: jld, Assigned: cmartin)

References

Details

(Keywords: sec-want)

AMD64 (x86_64) has what it calls non-canonical addresses: 64-bit values that aren't a sign-extended 48-bit integer and don't correspond to anything in the page table. Accessing them fails with #GP instead of #PF and doesn't report the address to the fault handler. Linux and OS X report the address as 0; Windows reports it as -1 (0xffffffffffffffff). In particular, this is a problem for us because use-after-free crashes (where the address is a memory poison pattern) can be misinterpreted as null pointer accesses, which can contribute to security bugs taking unnecessarily long to be analyzed. However, it's usually possible to determine the fault address anyway: disassemble the faulting instruction and use the register values to compute the effective address. This can be done either when generating the crash dump or when processing it, because memory near the faulting rIP is included. (In practice, the value of the base register might be enough to identify cases like memory poison patterns, but if this is going to substitute for the crash address — and not have a constant known value like the current situation — it seems better to include the displacement/index.)
I have a Breakpad patch that does this: https://github.com/luser/breakpad/commit/c8c0bbcdba178b74db171c21d2c6f4a59efd0131 I never quite got up the energy to upstream it because it relies on pulling in a better disassembler (capstone).
It's possible we could dupe this to one of those other bugs, but this is a long-standing issue that I'd love to figure out how to fix.
It's kind of hack, but another approach I was thinking of is hand-writing enough of a disassembler to compute the EA for the most common types of instruction. It just needs to parse enough of the opcode to find the ModR/M, and it doesn't need to handle anything that isn't using 64-bit addressing or doesn't touch memory. But I don't know how that compares to importing an existing disassembler.
Keywords: sec-want

Has there been any progress on this bug? It would be very helpful to have correct data on crash-stats for these platforms.

Flags: needinfo?(gsvelto)

Sadly there hasn't. It requires some significant surgery on the Socorro side of crash analysis and none of us has had enough spare cycles to look into it.

Flags: needinfo?(gsvelto)

BTW since we're rewriting most of this stuff in Rust we could implement this either by using capstone's Rust bindings or iced-x86.

Assignee: nobody → cmartin
Severity: normal → S3
Priority: -- → P2

I'm currently merging a PR for the majority of functionality needed to implement this. The rest of the implementation afterwards should be fairly straightforward.

PR is merged \o/. Chris do you want to handle the last few steps now that you've done all the disassembler integration? IIUC the only thing left to do is use get_instruction_memory_access() in the stack-walker and handle the case where the result doesn't match the address reported in the exception context. Maybe we should give some thought to the output. For example we could replace the address in case we think the one obtained via disassembly is better, but then we should probably leave a field in the output to mark the fact that we're reporting something different than what was in the exception. Alternatively we could add a new field in the JSON output with what we think is the probable crash address when it's different than the exception one and leave the latter untouched. This would require some changes in Socorro to pick up the new field but it's not a big deal either and I could handle the Socorro changes myself.

Flags: needinfo?(cmartin)

I gave this some thought and did a few experiments. I think that the fastest way to surface this functionality is to outright replace the crash address in rust-minidump's stackwalker JSON output when we're 100% certain it's wrong. This would apply to crashes with the following conditions:

  • Processor is x86-64
  • Address of accessed memory decoded via disassembly is between 0xffff800000000000 and 0x00007fffffffffff (exclusive)
  • On Windows
    • Crash address is 0xffffffffffffffff
    • Crash reason is EXCEPTION_ACCESS_VIOLATION_READ
  • On Linux
    • Crash address is 0x0000000000000000
    • Crash reason is SIGBUS / SI_KERNEL or SIGSEGV / SI_KERNEL
  • on macOS
    • Crash address is 0x0000000000000000
    • Crash reason is EXC_BAD_ACCESS / EXC_I386_GPFLT

It would also be great to surface the data that gets pretty-printed into the Crashing instruction: and Memory accessed by instruction: fields. If we could put it somewhere in the JSON output we could surface it on Socorro which would be very useful during crash analysis.

Sounds like a good plan to me! I'll start working on it later today. Shouldn't take long :)

Flags: needinfo?(cmartin)

PR 665 should be able to close this bug

Yes, we can close it after your PR lands. I'll open a separate one in Socorro to deploy the change which might not happen immediately because AFAIK Will is still working on the schema.

And with the deploy to Socorro in bug 1801122 this bug is finally fixed! Thanks to everybody who helped implement this! Also know that Breakpad landed a similar fix to their stackwalker after we did but it's a bit clunky to use an euphemism. Let's say it involves calling spitting out the bytes to a temporary file, calling objdump on it and then parsing the text output (in C++). I'm VERY HAPPY we've moved on to Rust.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.