Open Bug 995649 Opened 11 years ago Updated 2 years ago

mprotect the GC heap when we are not running in the JS engine

Tracking

()

Status:

NEW

Milestone:

flash10

People

(Reporter: terrence, Unassigned)

References

(Blocks 2 open bugs)

Details

Attachments

(3 files)

mprotect_the_gcheap-v0.diff 11 years ago Terrence Cole [:terrence] (deleted), patch		Details \| Diff \| Splinter Review
mprotect the gc heap (rebase on gecko 32) 10 years ago Nicolas B. Pierron [:nbp] (deleted), patch		Details \| Diff \| Splinter Review
bug995649-mprotect-gc-heap 7 years ago Jon Coppeard (:jonco) (PTO until 14th September) (deleted), patch		Details \| Diff \| Splinter Review

Terrence Cole [:terrence]

Reporter

Description

•

11 years ago

We suspect that some of our consistent low volume crashes may be coming from some buggy code elsewhere in firefox accidentally scribbling over the GC heap. We could find these quite quickly by setting the GC heap PROT_NONE when we are not in a request. This should be pretty easy to implement: loop over the chunk-set and chunk-pool and mprot the chunks in JSEnter/LeaveRequest. It would be rather slow, but should very quickly tell us if this is actually the problem. Moreover, it would point us right at the offending code.

Terrence Cole [:terrence]

Reporter

Comment 1

•

11 years ago

Attached patch mprotect_the_gcheap-v0.diff (deleted) — Details — Splinter Review

This patch works; a little too well, even. It's identified places where we read from the normal heap from off-main-thread. These are all safe accesses to the script -- it is protected from both collection and from execution by the background compilation -- or to constant data on the runtime. Still, we need to fix these before we can implement the technique here.

Assignee: nobody → terrence

Status: NEW → ASSIGNED

Terrence Cole [:terrence]

Reporter

Updated

•

11 years ago

Depends on: 998129

Nicolas B. Pierron [:nbp]

Comment 2

•

10 years ago

Attached patch mprotect the gc heap (rebase on gecko 32) (deleted) — Details — Splinter Review

Terrence Cole [:terrence]

Reporter

Comment 3

•

8 years ago

Note that I'm not actually working on this. There are way too many places where we have ridiculous special cases that allow access outside a request. I think it would be a nice boon to have this: it would probably have caught several cases where Ion's OMT compile thread was reading and writing from the main heap concurrent with GC. On the other hand, TSan also caught this case immediately and without us having to do 6 months of difficult, performance sensitive rewriting and debugging. So I guess we could do this, but it's a super, super low priority.

Status: ASSIGNED → NEW

Terrence Cole [:terrence]

Reporter

Updated

•

8 years ago

Assignee: terrence → nobody

Jon Coppeard (:jonco) (PTO until 14th September)

Updated

•

7 years ago

Summary: mprotect the GC heap when we are out of a JSRequest → mprotect the GC heap when we are not running in the JS engine

Jon Coppeard (:jonco) (PTO until 14th September)

Comment 4

•

7 years ago

This could be useful to us to track down some of our crashes. Continually protecting/unprotecting the whole heap will be slow, so we could unprotect pages on demand if we detect an access to them in the fault handler. Also, we might want to only write-protect pages. Note that this can only detect access to GC things themselves, not associated malloc memory.

Emanuel Hoogeveen [:ehoogeveen]

Comment 5

•

7 years ago

If we do this, we should register each protected region with MemoryProtectionExceptionHandler so attempts to access them are annotated in crash stats. Unprotecting on demand sounds nice, but how would the fault handler differentiate between legitimate accesses and corruption?

Lars T Hansen [:lth]

Comment 6

•

7 years ago

With shared memory there can be concurrent faults from multiple threads, so probably a special case?

Jon Coppeard (:jonco) (PTO until 14th September)

Comment 7

•

7 years ago

Attached patch bug995649-mprotect-gc-heap (deleted) — Details — Splinter Review

Initial attempt at rebasing the patch. The code to protect empty (unused) chunks is commented out. 10% of test fail.

Jon Coppeard (:jonco) (PTO until 14th September)

Comment 8

•

7 years ago

(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #5) > Unprotecting on demand sounds nice, but how would the fault handler > differentiate between legitimate accesses and corruption? We'd store a flag somewhere to say if access was allowed, i.e. we're running inside the engine.

Emanuel Hoogeveen [:ehoogeveen]

Comment 9

•

7 years ago

Hmm, I guess it is mostly outside influences we're worried about. We could store a list of pages that were unprotected while the flag was set, and reprotect them when we clear it.

Jon Coppeard (:jonco) (PTO until 14th September)

Comment 10

•

7 years ago

It seems there is a new feature for Intel CPUs called "memory protection keys" that could be very useful here: https://lwn.net/Articles/643797/ AIUI with this we wouldn't have to mprotect every time we enter or leave the JS engine, only change the permissions for the current thread.

Steve Fink [:sfink] [:s:]

Comment 11

•

7 years ago

Whoa, that's pretty cool. Seems useful for Spectre mitigations too. Looks like it's still described as only available in "future Skylake server CPUs."

Tom S [:evilpie]

Updated

•

7 years ago

Blocks: exploit-mitigation

Chris Peterson [:cpeterson]

Updated

•

6 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1514113

Paul Bone [:pbone]

Comment 12

•

6 years ago

(In reply to Jon Coppeard (:jonco) from comment #4)

This could be useful to us to track down some of our crashes.

Continually protecting/unprotecting the whole heap will be slow, so we could
unprotect pages on demand if we detect an access to them in the fault
handler. Also, we might want to only write-protect pages.

It could be done on some builds only to catch bugs, and enabled for short periods if we like. But that won't help protect against exploits.

Point is, it's useful to have this capability even if we don't use it all that much.

Priority: -- → P3

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.