1493342 - Report crashes at amd64 non-canonical addresses correctly

Reporter

Description

•

6 years ago

AMD64 (x86_64) has what it calls non-canonical addresses: 64-bit values that aren't a sign-extended 48-bit integer and don't correspond to anything in the page table. Accessing them fails with #GP instead of #PF and doesn't report the address to the fault handler. Linux and OS X report the address as 0; Windows reports it as -1 (0xffffffffffffffff). In particular, this is a problem for us because use-after-free crashes (where the address is a memory poison pattern) can be misinterpreted as null pointer accesses, which can contribute to security bugs taking unnecessarily long to be analyzed. However, it's usually possible to determine the fault address anyway: disassemble the faulting instruction and use the register values to compute the effective address. This can be done either when generating the crash dump or when processing it, because memory near the faulting rIP is included. (In practice, the value of the base register might be enough to identify cases like memory poison patterns, but if this is going to substitute for the crash address — and not have a constant known value like the current situation — it seems better to include the displacement/index.)

(not currently active) Ted Mielczarek

Comment 1

•

6 years ago

I have a Breakpad patch that does this: https://github.com/luser/breakpad/commit/c8c0bbcdba178b74db171c21d2c6f4a59efd0131 I never quite got up the energy to upstream it because it relies on pulling in a better disassembler (capstone).

(not currently active) Ted Mielczarek

Updated

•

6 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=974420

(not currently active) Ted Mielczarek

Updated

•

6 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1002564

(not currently active) Ted Mielczarek

Updated

•

6 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1378836

(not currently active) Ted Mielczarek

Comment 2

•

6 years ago

It's possible we could dupe this to one of those other bugs, but this is a long-standing issue that I'd love to figure out how to fix.

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Reporter

Comment 3

•

6 years ago

It's kind of hack, but another approach I was thinking of is hand-writing enough of a disassembler to compute the EA for the most common types of instruction. It just needs to parse enough of the opcode to find the ModR/M, and it doesn't need to handle anything that isn't using 64-bit addressing or doesn't touch memory. But I don't know how that compares to importing an existing disassembler.

Daniel Veditz [:dveditz]

Updated

•

5 years ago

Keywords: sec-want

Jon Coppeard (:jonco) (PTO until 14th September)

Comment 5

•

4 years ago

Has there been any progress on this bug? It would be very helpful to have correct data on crash-stats for these platforms.

Flags: needinfo?(gsvelto)

Gabriele Svelto [:gsvelto]

Comment 6

•

4 years ago

Sadly there hasn't. It requires some significant surgery on the Socorro side of crash analysis and none of us has had enough spare cycles to look into it.

Flags: needinfo?(gsvelto)

Gabriele Svelto [:gsvelto]

Comment 7

•

4 years ago

BTW since we're rewriting most of this stuff in Rust we could implement this either by using capstone's Rust bindings or iced-x86.

Gian-Carlo Pascutto [:gcp]

Updated

•

2 years ago

Assignee: nobody → cmartin

Severity: normal → S3

Priority: -- → P2

Chris Martin [:cmartin]

Assignee

Comment 8

•

2 years ago

I'm currently merging a PR for the majority of functionality needed to implement this. The rest of the implementation afterwards should be fairly straightforward.

Gabriele Svelto [:gsvelto]

Comment 9

•

2 years ago

PR is merged \o/. Chris do you want to handle the last few steps now that you've done all the disassembler integration? IIUC the only thing left to do is use get_instruction_memory_access() in the stack-walker and handle the case where the result doesn't match the address reported in the exception context. Maybe we should give some thought to the output. For example we could replace the address in case we think the one obtained via disassembly is better, but then we should probably leave a field in the output to mark the fact that we're reporting something different than what was in the exception. Alternatively we could add a new field in the JSON output with what we think is the probable crash address when it's different than the exception one and leave the latter untouched. This would require some changes in Socorro to pick up the new field but it's not a big deal either and I could handle the Socorro changes myself.

Flags: needinfo?(cmartin)

Gabriele Svelto [:gsvelto]

Comment 10

•

2 years ago

I gave this some thought and did a few experiments. I think that the fastest way to surface this functionality is to outright replace the crash address in rust-minidump's stackwalker JSON output when we're 100% certain it's wrong. This would apply to crashes with the following conditions:

Processor is x86-64
Address of accessed memory decoded via disassembly is between 0xffff800000000000 and 0x00007fffffffffff (exclusive)
On Windows
- Crash address is 0xffffffffffffffff
- Crash reason is EXCEPTION_ACCESS_VIOLATION_READ
On Linux
- Crash address is 0x0000000000000000
- Crash reason is SIGBUS / SI_KERNEL or SIGSEGV / SI_KERNEL
on macOS
- Crash address is 0x0000000000000000
- Crash reason is EXC_BAD_ACCESS / EXC_I386_GPFLT

It would also be great to surface the data that gets pretty-printed into the Crashing instruction: and Memory accessed by instruction: fields. If we could put it somewhere in the JSON output we could surface it on Socorro which would be very useful during crash analysis.

Chris Martin [:cmartin]

Assignee

Comment 11

•

2 years ago

Sounds like a good plan to me! I'll start working on it later today. Shouldn't take long :)

Flags: needinfo?(cmartin)

Chris Martin [:cmartin]

Assignee

Comment 12

•

2 years ago

PR 665 should be able to close this bug

Gabriele Svelto [:gsvelto]

Comment 13

•

2 years ago

Yes, we can close it after your PR lands. I'll open a separate one in Socorro to deploy the change which might not happen immediately because AFAIK Will is still working on the schema.

Gabriele Svelto [:gsvelto]

Comment 14

•

2 years ago

And with the deploy to Socorro in bug 1801122 this bug is finally fixed! Thanks to everybody who helped implement this! Also know that Breakpad landed a similar fix to their stackwalker after we did but it's a bit clunky to use an euphemism. Let's say it involves calling spitting out the bytes to a temporary file, calling objdump on it and then parsing the text output (in C++). I'm VERY HAPPY we've moved on to Rust.

Status: NEW → RESOLVED

Closed: 2 years ago

Resolution: --- → FIXED

Bugzilla

Report crashes at amd64 non-canonical addresses correctly

Categories

(Toolkit :: Crash Reporting, enhancement, P2)

Tracking

()

People

(Reporter: jld, Assigned: cmartin)

References

Details

(Keywords: sec-want)

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Updated

Updated

Comment 2

Comment 3

Updated

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14