864220 - IonMonkey: Implement loop interrupt checks using mprotect

Assignee

Description

•

11 years ago

asm.js implements interrupt checking by having the interrupting thread mprotect() the module's executable memory so that the executing thread segfaults, from which it can do whatever and either resume or terminate execution. Vanilla IonMonkey, in contrast, emits instructions at the loop header to explicitly check for an interrupt. Per bug 860923 comment 1 the interrupt checks can have a big effect on performance, with nearly a 10% improvement by eliminating them on fannkuch(10) using the backtracking allocator. They also make a noticeable impact on benchmarks; I seem to get SS and octane improvements of 1-2% by eliminating them.

It would be nifty if vanilla IonMonkey could use the same interrupt mechanism as asm.js. There are concerns, however:

- IonMonkey can't be interrupted at arbitrary points and forcefully unwound in the same manner as asm.js, as it may be in the middle of updating the heap and leave things in a bad state.

- IonMonkey can't be interrupted at arbitrary points and have a traceable stack for GC. asm.js doesn't store GC thing pointers on the stack anywhere so this is not a concern there.

- IonMonkey allocates code in scattered chunks from the runtime's executable code allocator, rather than as one big per-module slab. This could have performance problems if Ion code needs to be interrupted frequently. I'm hoping this is a nonissue, or that using a separate allocator for Ion code to reduce fragmentation will be enough, but it's hard to say for sure.

The first two points seem like they should be solvable using a variation on the technique. Instead of invoking the operation callback immediately after segfaulting, the signal handler could just patch loop backedges to jump to an OOL stub which invokes the operation callback. The signal handler would do this patching, unprotect the code pages, resume execution and very shortly afterwards invoke the operation callback with a known state.

Nicolas B. Pierron [:nbp]

Comment 1

•

11 years ago

(In reply to Brian Hackett (:bhackett) from comment #0)
> asm.js implements interrupt checking by having the interrupting thread
> mprotect() the module's executable memory so that the executing thread
> segfaults, from which it can do whatever and either resume or terminate
> execution.  Vanilla IonMonkey, in contrast, emits instructions at the loop
> header to explicitly check for an interrupt.  Per bug 860923 comment 1 the
> interrupt checks can have a big effect on performance, with nearly a 10%
> improvement by eliminating them on fannkuch(10) using the backtracking
> allocator.  They also make a noticeable impact on benchmarks; I seem to get
> SS and octane improvements of 1-2% by eliminating them.

Is that browser results?  Which benchmark use interrupts during its execution, Dromaeo maybe?

> It would be nifty if vanilla IonMonkey could use the same interrupt
> mechanism as asm.js.  There are concerns, however:
> 
> - IonMonkey can't be interrupted at arbitrary points and forcefully unwound
> in the same manner as asm.js, as it may be in the middle of updating the
> heap and leave things in a bad state.
> 
> - IonMonkey can't be interrupted at arbitrary points and have a traceable
> stack for GC.  asm.js doesn't store GC thing pointers on the stack anywhere
> so this is not a concern there.

For these points, we can do it in 2 phases.  In a first pass we protect every part of the code, see that we are in the middle of nowhere inside some Ion code (no need for Ion caches or VM calls as they will return to the Ion code).  In a second pass we can protect only the loopHeader and return statements which we can make safe for stack inspection.  This would be less trouble than making the engine safe in all locations.

> - IonMonkey allocates code in scattered chunks from the runtime's executable
> code allocator, rather than as one big per-module slab.  This could have
> performance problems if Ion code needs to be interrupted frequently.  I'm
> hoping this is a nonissue, or that using a separate allocator for Ion code
> to reduce fragmentation will be enough, but it's hard to say for sure.

We can be greedy if we do it in 2 passes.

shu: I don't know if this will work with PJS, so we might need to keep using the interrupt check there, as it is currently used as a synchronization point from what I understand.  I don't know if the signal handlers are well setup in other threads either.

Flags: needinfo?(shu)

Brian Hackett [Laid off!]

Assignee

Comment 2

•

11 years ago

(In reply to Nicolas B. Pierron [:nbp] from comment #1)
> (In reply to Brian Hackett (:bhackett) from comment #0)
> > Per bug 860923 comment 1 the
> > interrupt checks can have a big effect on performance, with nearly a 10%
> > improvement by eliminating them on fannkuch(10) using the backtracking
> > allocator.  They also make a noticeable impact on benchmarks; I seem to get
> > SS and octane improvements of 1-2% by eliminating them.
> 
> Is that browser results?  Which benchmark use interrupts during its
> execution, Dromaeo maybe?

The numbers I got were with shell builds.  I don't think any benchmarks actually *use* the interrupt mechanism, this improvement is just what we get from removing explicit interrupt checks.

> For these points, we can do it in 2 phases.  In a first pass we protect
> every part of the code, see that we are in the middle of nowhere inside some
> Ion code (no need for Ion caches or VM calls as they will return to the Ion
> code).  In a second pass we can protect only the loopHeader and return
> statements which we can make safe for stack inspection.  This would be less
> trouble than making the engine safe in all locations.

I don't see how this would work with mprotect(), which has a page granularity so you can't protect individual addresses.  The last para of comment 0 outlines a way that we can still call the operation callback in predictable places, even though the thread may segv at an arbitrary point in Ion code.

Nicolas B. Pierron [:nbp]

Comment 3

•

11 years ago

(In reply to Brian Hackett (:bhackett) from comment #2)
> (In reply to Nicolas B. Pierron [:nbp] from comment #1)
> > For these points, we can do it in 2 phases.  In a first pass we protect
> > every part of the code, see that we are in the middle of nowhere inside some
> > Ion code (no need for Ion caches or VM calls as they will return to the Ion
> > code).  In a second pass we can protect only the loopHeader and return
> > statements which we can make safe for stack inspection.  This would be less
> > trouble than making the engine safe in all locations.
> 
> I don't see how this would work with mprotect(), which has a page
> granularity so you can't protect individual addresses.  The last para of
> comment 0 outlines a way that we can still call the operation callback in
> predictable places, even though the thread may segv at an arbitrary point in
> Ion code.

We can add a trap like what gdb is doing for breakpoints.

Brian Hackett [Laid off!]

Assignee

Updated

•

11 years ago

Summary: IonMonkey: Implementing loop interrupt checks using mprotect → IonMonkey: Implement loop interrupt checks using mprotect

Brian Hackett [Laid off!]

Assignee

Updated

•

11 years ago

Depends on: 864957

Brian Hackett [Laid off!]

Assignee

Comment 4

•

11 years ago

Attached patch patch (ebf1b0f1920c) (obsolete) (deleted) — Details — Splinter Review

I still need to get performance numbers for this (should have some later today or tomorrow) and do more testing, but this patch seems to work and is I think pretty clean.  If we need to trigger an operation callback then all the runtime's Ion code is protected (and only the Ion code, not baseline or regexp or anything), which can then segv, unprotect the code and patch all backedges in Ion code to call the interrupt handler.  The next time one of those backedges is taken and calls the handler, the backedges are repointed to their normal target.

Attachment #742026 - Flags: review?(luke)

Attachment #742026 - Flags: review?(dvander)

Brian Hackett [Laid off!]

Assignee

Comment 5

•

11 years ago

Attached patch updated (obsolete) (deleted) — Details — Splinter Review

This fixes a bug that showed up while running experiments on the browser.  If the main thread accesses protected code memory while holding the operation callback lock then the signal handler would deadlock.

I also measured the cost of the mprotects while playing the bananabread demo.  On average, on an x64/linux build, protecting code memory on the interrupting thread takes 127us for me, and unprotecting it on the main thread takes 32us.  The latter number is really the only one relevant to perf as the main thread will hardly ever be blocked on the interrupt thread while waiting to acquire the operation callback lock.  The callback seems to be triggered once or twice a second, so this adds up to like 10ms of main thread overhead after a few minutes playing.  Doesn't seem a big concern.

I see similar numbers when having several tabs open and using a JS intensive page like maps.  The number of tabs is generally not relevant though to the overhead of the code protecting since we flush Ion code from background pages on every GC (they're not playing animations).

Attachment #742026 - Attachment is obsolete: true

Attachment #742026 - Flags: review?(luke)

Attachment #742026 - Flags: review?(dvander)

Attachment #742709 - Flags: review?(luke)

Attachment #742709 - Flags: review?(dvander)

Luke Wagner [:luke]

Comment 6

•

11 years ago

Comment on attachment 742709 [details] [diff] [review]
updated

I rebased and applied the patch the tip.  I then wrapped the call to IonRuntime::handleAccessViolation with gettimeofday to print how long it took.  Running unrealengine.com/html5 with asmjs=false shows pauses in the range 200us to 1ms.  Running the demo in two tabs shows pauses in the 1-4ms range.  I think this is too much when the total target frame time is 16ms.  Consider also that we intend to stop throwing away Ion code on GC which will further exacerbate the problem.

Also, the frequency of these pauses is way more than once a second.  This is because the watchdog timer isn't the only caller of TriggerOperationCallback, the alloc trigger and background compilation are two other frequent sources.  With asm.js, the mprotect only happens if asm.js code is actually on the stack, and so it observably happens far less frequently (after the loading bar completes).

So, overall, I'm against the patch.  (Also, I haven't seen any measured speedups to justify it.)

Attachment #742709 - Flags: review?(luke) → review-

Brian Hackett [Laid off!]

Assignee

Comment 7

•

11 years ago

Per bug 860923 comment 1 I see a 9.5% improvement on emscripten-fannkuch by removing interrupt checks. I also seem to get a 2% improvement on e.g. octane, though that is noisy.

I posted some numbers, and you posted some numbers, and they're not quite in agreement. Your numbers don't seem to be in the range of 'this is obviously a clunker that shouldn't land'.

It seems to me like we need some more data, and a simple way to get that is to get this thing in the tree so that more people can test it on more machines. It will be easy to turn off later if needed.

There are several ways the overhead could be improved. Some obvious ones:

- Background compilation triggers never need to protect memory. These triggers are for attaching new IonScripts and if code is already running in Ion then delaying that attach arbitrarily long will not affect perf.

- Watchdog and alloc triggers are also not that time sensitive. They could try a normal trigger, and only protect memory if N ms have passed with no callback invoke. With the overhead I saw on normal web workloads this doesn't seem like it would be worth doing though.

- First try protecting the stack rather than code memory. Almost all loops will have some spill code, and if not the code memory could be additionally protected after N ms after above. This is nice because it is insensitive to the amount of Ion code that exists.

- More prosaically, increase the size of the slabs of executable memory used for Ion scripts, to reduce fragmentation. If Ion scripts are very long lived, an idea I don't know if I'm really on board with to begin with, they could be relocated to use a continuous slab.

Doing any of these isn't worth doing until there is a known need for them, though.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 8

•

11 years ago

> Watchdog and alloc triggers are also not that time sensitive.

Please don't change how alloc triggers work. It's incredibly easy to OOM a b2g phone, so I don't think we should be relaxing this. Also, alloc triggers should hit very rarely. They cause a non-incremental GC, so we already go to a lot of trouble to avoid them.

Brian Hackett [Laid off!]

Assignee

Comment 9

•

11 years ago

(In reply to Bill McCloskey (:billm) from comment #8)
> > Watchdog and alloc triggers are also not that time sensitive.
> 
> Please don't change how alloc triggers work. It's incredibly easy to OOM a
> b2g phone, so I don't think we should be relaxing this. Also, alloc triggers
> should hit very rarely. They cause a non-incremental GC, so we already go to
> a lot of trouble to avoid them.

Do alloc triggers always occur on the main thread?  If so then the protection isn't necessary in the first place, as all it's doing is finding a way to stop the main thread while the loop backedges are being patched.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 10

•

11 years ago

(In reply to Brian Hackett (:bhackett) from comment #9)
> Do alloc triggers always occur on the main thread?  If so then the
> protection isn't necessary in the first place, as all it's doing is finding
> a way to stop the main thread while the loop backedges are being patched.

Currently yes. I think Nicolas may be changing this soon. If we ever reach a low-memory state (which would be detected on a separate thread), he wants to trigger a GC. However, it would be perfectly fine to do something slow in that case.

Anyway, it sounds like we have a lot of options for dealing with the alloc trigger.

Luke Wagner [:luke]

Comment 11

•

11 years ago

(In reply to Brian Hackett (:bhackett) from comment #7)
> I posted some numbers, and you posted some numbers, and they're not quite in
> agreement.

I don't know what it means for them to be "not quite in agreement".  The cost of the mechanism in this patch is clearly O(n) in the quantity of Ion code (both for mprotect calls but also for loop-backedge patching) in the entire main thread's runtime.  The unreal code clearly shows that this O(n) can grow to be non-negligible.  We can expect this to happen more over time when Ion code is not discarded on GC and as apps get bigger.

(In reply to Brian Hackett (:bhackett) from comment #7)
> Your numbers don't seem to be in the range of 'this is obviously
> a clunker that shouldn't land'.

From a games responsiveness/latency perspective, I think they do.

> It seems to me like we need some more data, and a simple way to get that is
> to get this thing in the tree so that more people can test it on more
> machines.  It will be easy to turn off later if needed.

I'm not sure what else we would learn; we have data for a real application which the web is really moving towards which shows a problem.  We already have one global source of per-frame jank, we should not add a second.

Luke Wagner [:luke]

Comment 12

•

11 years ago

Trying to make progress, I think there is a viable strategy that avoids the O(n) sporadic pauses on the main thread and even the multi-threaded access to ExecutableAlloc:
 - when background compilation completes, it's not important to halt running Ion code; clearly before we run the completed compilation we'll go through C++ and thus JS_CHECK_INTERRUPT, so all we need to do is set rt->interrupt in this case
 - for alloc triggers, we just need enough record-keeping (which we might already have, I'm not familiar with VM calls) to patch the return address of the innermost ion-to-C++ call so that we return to TriggerGC.  The only problem is that I thought I remembered that there is a faster call path that knows there can be no GC but we still want to trigger a GC.  It seems like we could work around that in various ways, though
 - for the slow script:
   1. fix the DOM's watchdog thread so that it only calls JS_TriggerOperationCallback when a *single* JS evaluation has been running for longer than X seconds (so there is no worry about janking interactive content, which should finish frames in far less time)
   2. instead of locking and mprotecting all the ExecutableAlloc code, suspend the thread (SuspendThread on windows, pthread_kill(thread, SIGSEGV) on Unices) and then do the backedge flipping as you have in this patch (for recursive cases, we'd also need to patch prologues)

Brian Hackett [Laid off!]

Assignee

Comment 13

•

11 years ago

Cool, thanks.

(In reply to Luke Wagner [:luke] from comment #12)
> Trying to make progress, I think there is a viable strategy that avoids the
>  - for alloc triggers, we just need enough record-keeping (which we might
> already have, I'm not familiar with VM calls) to patch the return address of
> the innermost ion-to-C++ call so that we return to TriggerGC.  The only
> problem is that I thought I remembered that there is a faster call path that
> knows there can be no GC but we still want to trigger a GC.  It seems like
> we could work around that in various ways, though

There isn't really a mechanism for patching return addresses in Ion, but it would be fine to just patch the backedges immediately since we're still on the main thread.  The operation callback will be invoked at the same point it currently is.

Luke Wagner [:luke]

Comment 14

•

11 years ago

(In reply to Brian Hackett (:bhackett) from comment #13)
> There isn't really a mechanism for patching return addresses in Ion, but it
> would be fine to just patch the backedges immediately since we're still on
> the main thread.

My concern there was that patching all the backedges is an O(n) operation and we do expect the alloc trigger to happen somewhat frequently (even in Emscripten games).  Talking to dvander, it sounds like we already have the ability synchronously patch the return address for Ion-to-C++ VM-calls.  I also asked and it sounds like we don't need to do the alloc trigger for non-VM-calls.

Brian Hackett [Laid off!]

Assignee

Updated

•

11 years ago

Depends on: 871079

Brian Hackett [Laid off!]

Assignee

Comment 15

•

11 years ago

Attached patch patch (4ff1e574e509) (obsolete) (deleted) — Details — Splinter Review

This is pretty similar to the last patch.  Callback triggers due to background compilation and alloc triggers have been modified to avoid needing to protect anything, as discussed here and on IRC.  Bug 871079 has a patch which avoids triggering the callback from the watchdog thread unless a single JS request has been running for more than a second, which should avoid ever needing to trigger the callback in an interactive application.

This patch still uses memory protection of the code for actually triggering the callback from off thread, however.  I am worried about patching backedges after doing a SuspendThread / pthread_kill, as the main thread could be at any instruction and there does not seem to be any way to avoid races when reading the list of backedges in the signal handler.  Carefully writing the code that manipulates the backedge list and sprinkling around enough 'volatile' might do the trick, but I think doing this without proper locking / fences is a bad idea.  Especially so when the alternative reuses the same mechanism as asm.js.

Assignee: general → bhackett1024

Attachment #742709 - Attachment is obsolete: true

Attachment #742709 - Flags: review?(dvander)

Attachment #748810 - Flags: review?(luke)

Attachment #748810 - Flags: review?(dvander)

Shu-yu Guo [:shu]

Comment 16

•

11 years ago

Clearing the needinfo; I don't think I have anything of value to add here.

Flags: needinfo?(shu)

Brian Hackett [Laid off!]

Assignee

Comment 17

•

11 years ago

Attached patch rebase (2287b2667565) (obsolete) (deleted) — Details — Splinter Review

Attachment #748810 - Attachment is obsolete: true

Attachment #748810 - Flags: review?(luke)

Attachment #748810 - Flags: review?(dvander)

Attachment #782310 - Flags: review?(luke)

Attachment #782310 - Flags: review?(jdemooij)

Brian Hackett [Laid off!]

Assignee

Comment 18

•

11 years ago

Attached patch rebase (2287b2667565) (obsolete) (deleted) — Details — Splinter Review

Oops, didn't test this enough.  A stray use of an obsolete #ifdef was keeping signal handlers from being installed.

Attachment #782310 - Attachment is obsolete: true

Attachment #782310 - Flags: review?(luke)

Attachment #782310 - Flags: review?(jdemooij)

Attachment #782313 - Flags: review?(luke)

Attachment #782313 - Flags: review?(jdemooij)

Brian Hackett [Laid off!]

Assignee

Comment 19

•

11 years ago

Attached patch rebase #3 (2287b2667565) (obsolete) (deleted) — Details — Splinter Review

And I still didn't test this enough.  while (true) {} loops work fine, but infinite loops with non-trivial tests like while (n) {} don't.  The problem is that after critical edge unsplitting the last instruction in the backedge can be something other than a goto, which wasn't possible back when I wrote this patch.

Attachment #782313 - Attachment is obsolete: true

Attachment #782313 - Flags: review?(luke)

Attachment #782313 - Flags: review?(jdemooij)

Brian Hackett [Laid off!]

Assignee

Updated

•

11 years ago

Attachment #782339 - Flags: review?(luke)

Attachment #782339 - Flags: review?(jdemooij)

Luke Wagner [:luke]

Comment 20

•

11 years ago

Using an --enable-threadsafe debug shell build, I can pretty reliably (1 out of 3 times) get an assert in jit-test/tests/ion/bug858586.js if crank up the operation callback rate ("sleepDuration = PR_TicksPerSecond() / 100000;" in WatchdogMain) and force on the operation callback (adding SetTimeoutValue(cx, 100) in SetContextOptions).  Here's the stack: http://pastebin.mozilla.org/2734339

Also, a bunch of the jit-tests in parallel/ fail with the above changes, so probably something is just missing in the parallel compilation path.  Here's the first two:
    tests/parallel/Array-mapPar-assign-to-def-prop.js
    tests/parallel/Array-mapPar-factorial.js

Brian Hackett [Laid off!]

Assignee

Comment 21

•

11 years ago

Attached patch patch (2287b2667565) (obsolete) (deleted) — Details — Splinter Review

Good catch, that assert indicates a potential deadlock.  If the watchdog thread triggers a callback by protecting the memory and the main thread then triggers a callback itself before executing anymore Ion code, it will segv while holding the operation callback lock and end up trying to reacquire it (side note, it's kind of dumb that PR locks aren't reentrant. this kind of lock causes all sorts of bugs in systems code).

I'm not sure if you meant this in your post but I get similar failures in jit-tests/tests/parallel/ with those shell/js.cpp changes applied, whether this patch is also applied or not.

Attachment #782339 - Attachment is obsolete: true

Attachment #782339 - Flags: review?(luke)

Attachment #782339 - Flags: review?(jdemooij)

Attachment #782579 - Flags: review?(luke)

Attachment #782579 - Flags: review?(jdemooij)

Luke Wagner [:luke]

Comment 22

•

11 years ago

(In reply to Brian Hackett (:bhackett) from comment #21)
> I'm not sure if you meant this in your post but I get similar failures in
> jit-tests/tests/parallel/ with those shell/js.cpp changes applied, whether
> this patch is also applied or not.

Ah hah, you're right!  I compared w/ and w/o your patch, but not w/o your patch w/ tweaks.  I'll file a bug.

Luke Wagner [:luke]

Comment 23

•

11 years ago

Actually, it's probably not even a bug; the failures aren't crashes, they are reporting "ForkJoin: mode=parallel status=warmup bailouts=1" which probably means the op callback sequentializes which the test is testing doesn't happen.

Luke Wagner [:luke]

Comment 24

•

11 years ago

Another high-level review request: could you add a new memory reporting entry for the new backedge list stored in the IonRuntime?  It seems like this could get big, esp. if we start not discarding all Ion jit code on every GC, which it sounds like we're considering doing now.

Brian Hackett [Laid off!]

Assignee

Comment 25

•

11 years ago

(In reply to Luke Wagner [:luke] from comment #24)
> Another high-level review request: could you add a new memory reporting
> entry for the new backedge list stored in the IonRuntime?  It seems like
> this could get big, esp. if we start not discarding all Ion jit code on
> every GC, which it sounds like we're considering doing now.

Well, the runtime's backedge list is threaded through the individual IonScripts, so the memory used by this list will be reported along with all the other IonScript memory.  Separating this out would add complexity and I don't really see what it would buy vs. separately reporting all the other bits and bobs that make up an IonScript --- the number of backedges in an IonScript will be <= the number of backedges in the LIR graph so won't grow out of proportion with the rest of the IonScript or its associated executable buffer.

Luke Wagner [:luke]

Comment 26

•

11 years ago

(In reply to Brian Hackett (:bhackett) from comment #25)
> Well, the runtime's backedge list is threaded through the individual
> IonScripts, so the memory used by this list will be reported along with all
> the other IonScript memory.

Ah, I hadn't read that far yet, I thought it was just a big list in the IonRuntime; that sounds fine then.

Luke Wagner [:luke]

Comment 27

•

11 years ago

Along the same lines as the above opcallback stress tests, I pushed a patch that increased the frequency of operation callbacks in the browser's watchdog thread (to 10,000) to try:

  https://tbpl.mozilla.org/?tree=Try&rev=371f8a0a5b1d

Several of the failures are expected: the parallel tests mentioned above and a few timeouts in mochitests that I'm assuming are caused by slow tests running slower b/c of all the interrupts.  (The exception mechanism on Mac is like 100x slower than on Unix/Windows, so the timeout in 'make package' is, I assume, because of that.)  However, there are some worrying crashes in the shell:
  https://tbpl.mozilla.org/php/getParsedLog.php?id=25880022&tree=Try#error0
  https://tbpl.mozilla.org/php/getParsedLog.php?id=25880356&tree=Try&full=1
and in the browser:
  https://tbpl.mozilla.org/php/getParsedLog.php?id=25880786&tree=Try&full=1#error1
  https://tbpl.mozilla.org/php/getParsedLog.php?id=25880285&tree=Try&full=1#error1
  https://tbpl.mozilla.org/php/getParsedLog.php?id=25880939&tree=Try&full=1#error1

(also some Android build bustage)

Brian Hackett [Laid off!]

Assignee

Comment 28

•

11 years ago

Attached patch patch (2287b2667565) (obsolete) (deleted) — Details — Splinter Review

I don't think the browser crashes are actually crashes: they are all preceded by a test timeout and fail with SIGABRT, so I think the log is just showing the stack at the point it decided to kill the process (usually inside an mprotect call, not surprisingly).

The shell crashes indicate a couple real bugs though, which this patch fixes.  The first is that invalidating an IonScript involves clobbering parts of its code which overlap the backedges, and patching those backedges afterwards trashes the modifications.  This is fixed by removing backedge information from invalidated ion scripts (the code won't run anymore so there's no value in patching its backedges).  The second is that the mechanism added to make backedge codegen easier in quirky loops like 'while (objectThatMightEmulateUndefined) {}' was broken if the loop test had out of line blocks referencing the created labels, since those labels were stack allocated and no longer existed, and random parts of the stack got trashed.

Attachment #782579 - Attachment is obsolete: true

Attachment #782579 - Flags: review?(luke)

Attachment #782579 - Flags: review?(jdemooij)

Attachment #784562 - Flags: review?(luke)

Attachment #784562 - Flags: review?(jdemooij)

Luke Wagner [:luke]

Comment 29

•

11 years ago

Comment on attachment 784562 [details] [diff] [review]
patch (2287b2667565)

Review of attachment 784562 [details] [diff] [review]:
-----------------------------------------------------------------

I have one remaining serious concern.  If we can address this (as suggested below or otherwise) I'll r+.

Since EnsureAsmJSSignalHandlersInstalled() is now always executed every seg-fault in the browser is going to funnel through here.  That means if we ever have a bug that leaves ExecutableAllocator in an inconsistent state, we'll crash without calling breakpad.  Technically this was true before, but the asm.js data structure is way simpler and the handlers are only installed on he first use of asm.js.

I really think we need to mitigate this problem:
 - on Mac, I think we're good; if we fault, it'll be on the handler thread and noone is listening to that, so the OS will call the process-level breakpad handler (but please check)
 - on Unix, pass SA_NODEFER to sigaction (so that we'll fault if we fault from our handler) and have a per-runtime atomic word that we set on entry to the handler and unset on exit such that if we enter a handler and it's set, we immediately chain to our next handler
 - on Windows, I don't know what happens if you fault from the vectored handler.  If we're lucky, it'll reenter the handler (as if SA_NODEFER) and we can apply the same trick.

In the end, what I think we need to verify on all tier 1 platforms is that if you build with --enable-crashreporter and stick *(int*)0 = 0; inside handleAccessViolation, we still bring up the crash reporter.

::: js/src/ion/AsmJS.h
@@ +182,5 @@
>  }
>  #endif
>  
> +bool
> +EnsureAsmJSSignalHandlersInstalled(JSRuntime *rt);

Can you move this to ion/AsmJSSignalHandlers.h and add a comment?

::: js/src/ion/Ion.cpp
@@ +190,5 @@
>  
>  IonRuntime::~IonRuntime()
>  {
>      js_delete(functionWrappers_);
> +    js_delete(ionAlloc_);

Might be nice to put a comment here to the effect "~IonRuntime is only called from ~JSRuntime so we don't have to (and cannot) AutoLockForOperationCallback.

@@ +316,5 @@
> +IonRuntime::createIonAlloc(JSContext *cx)
> +{
> +    JSC::AllocationBehavior randomize =
> +        cx->runtime()->jitHardening ? JSC::AllocationCanRandomize : JSC::AllocationDeterministic;
> +    ionAlloc_ = js_new<JSC::ExecutableAllocator>(randomize);

It is asserted by the callers, but to make it consistent with the IonRuntime:: calls below, can you assert currentThreadOwnsOperationCallbackLock()?

@@ +338,5 @@
> +}
> +
> +bool
> +IonRuntime::handleAccessViolation(JSRuntime *rt, void *faultingAddress)
> +{

Can you rt->assertValidThread() here?  (I'm really thinking that method would be better named assertOnOwnerThread...)  Also it'd be nice to hoist the JS_ASSERT(!rt->currentThreadOwnsOperationCallbackLock()) up here so it reads like a precondition not like something contingent on the following test.

@@ +360,5 @@
> +
> +void
> +IonRuntime::ensureIonCodeAccessible(JSRuntime *rt)
> +{
> +    JS_ASSERT(rt->currentThreadOwnsOperationCallbackLock());

Can you also rt->assertValidThread() here too?  It seems to hold from the callers and while it may not be technically necessary, it makes it easier to reason about the whole process as funneling in from all threads to the main thread on which all the un-protecting/patching happens.

@@ +385,5 @@
> +
> +void
> +IonRuntime::patchIonBackedges(BackedgeTarget target)
> +{
> +    // Patch all loop backedges in Ion code so that they either jump to the

Similarly, could you rt->assertValidThread() here too?

::: js/src/ion/IonCompartment.h
@@ +263,5 @@
> +    bool signalHandlersInstalled() {
> +        return signalHandlersInstalled_;
> +    }
> +    bool ionCodeProtected() {
> +        return ionCodeProtected_;

Could you also JS_ASSERT(rt->currentThreadOwnsOperationCallbackLock()); for these two and the two below as well?

::: js/src/vm/Runtime.cpp
@@ +69,5 @@
>  
> +    {
> +        AutoLockForOperationCallback lock(this);
> +        JSC::ExecutableAllocator *ionAlloc = ionRuntime() ? ionRuntime()->ionAlloc(this) : NULL;
> +        if (ionAlloc)

if (JSC::ExecutableAllocator *ionAlloc = ...)

@@ +116,3 @@
>      TriggerOperationCallbackForAsmJSCode(this);
> +
> +    if (ionRuntime()) {

For symmetry, can you hoist this block of code into a TriggerOperationCallbackForIonCode in Ion.cpp.

@@ +125,5 @@
> +            // patching the backedges, to avoid deadlocking inside the signal
> +            // handler.
> +            assertValidThread();
> +            ionRuntime()->ensureIonCodeAccessible(this);
> +            ionRuntime()->patchIonBackedges(ion::IonRuntime::BackedgeInterruptCheck);

ensureIonCodeAccessible() will call patchIonBackedges, so I think this call is superfluous.

Brian Hackett [Laid off!]

Assignee

Comment 30

•

11 years ago

Attached patch patch (f24d9d59a689) (deleted) — Details — Splinter Review

Rebase and update wrt patch comments.  I tested this (well, functionally identical changes) on OS X, Linux and Windows and was able to get the crash reporter showing on all platforms when crashing inside JSRuntime::handleAccessViolation.  Windows will deliver a signal to the handler when crashing inside that handler, as with SA_NODEFER.

Attachment #784562 - Attachment is obsolete: true

Attachment #784562 - Flags: review?(luke)

Attachment #784562 - Flags: review?(jdemooij)

Attachment #785789 - Flags: review?(luke)

Attachment #785789 - Flags: review?(jdemooij)

Luke Wagner [:luke]

Comment 31

•

11 years ago

(In reply to Brian Hackett (:bhackett) from comment #30)
> Windows will deliver a signal to the
> handler when crashing inside that handler, as with SA_NODEFER.

phew, sweet!

Luke Wagner [:luke]

Comment 32

•

11 years ago

Comment on attachment 785789 [details] [diff] [review]
patch (f24d9d59a689)

Review of attachment 785789 [details] [diff] [review]:
-----------------------------------------------------------------

Great, thanks for addressing all my requests.  My last request would have been to write a test for crash-during-handler.  Unfortunately, the current crashreporter unit tests that we'd be adding to lazily initialize breakpad hooks which clobber the JS handlers (but only in xpcshell; the browser initializes them eagerly, but there are no crashreporter mochitests).  I filed bug 842728 on this around the Odin landing (bumping again now).  When it gets fixed, I'll write some tests.

::: js/src/ion/AsmJSSignalHandlers.cpp
@@ +441,5 @@
> +    void *faultingAddress = (void*)record->ExceptionInformation[1];
> +
> +    JSRuntime *rt = RuntimeForCurrentThread();
> +    if (!rt || rt->handlingSignal)
> +        return false;

Can you write a comment in front of this block explaining he situation we're guarding against here and paste it above the other 2 copies below.

@@ +1006,5 @@
>  # else  // assume Unix
>      struct sigaction sigAction;
>      sigAction.sa_sigaction = &AsmJSFaultHandler;
>      sigemptyset(&sigAction.sa_mask);
> +    sigAction.sa_flags = SA_SIGINFO | SA_NODEFER;

Could you add
  // see handlingSignal comment above
to the end of this line?

Attachment #785789 - Flags: review?(luke) → review+

Jan de Mooij [:jandem]

Comment 33

•

11 years ago

Comment on attachment 785789 [details] [diff] [review]
patch (f24d9d59a689)

Review of attachment 785789 [details] [diff] [review]:
-----------------------------------------------------------------

Nice. It's a good win on Octane-crypto and some Kraken tests with tight loops. I will r+ when these comments are addressed; leaving r? to avoid confusion with multiple reviewers.

::: js/src/ion/CodeGenerator.cpp
@@ +5643,5 @@
>      ionScript->copyScriptEntries(graph.mir().scripts());
>      if (callTargets.length() > 0)
>          ionScript->copyCallTargetEntries(callTargets.begin());
>  
> +    for (size_t i = 0; i < patchableBackedges_.length(); i++) {

Can we add IonScript::copyPatchableBackedges for this, like the other copy* methods?

::: js/src/ion/Ion.cpp
@@ +281,5 @@
>          if (!generateVMWrapper(cx, *fun))
>              return false;
>      }
>  
> +    signalHandlersInstalled_ = EnsureAsmJSSignalHandlersInstalled(cx->runtime());

Just curious, is it possible for signalHandlersInstalled_ to ever be false on one of our tier-1 platforms?

::: js/src/ion/IonLinker.h
@@ +77,5 @@
> +        return newCode(cx, cx->compartment()->ionCompartment()->execAlloc(), kind);
> +    }
> +
> +    IonCode *newCodeForIonScript(JSContext *cx)
> +    {

Style nit: { on previous line.

::: js/src/ion/shared/CodeGenerator-shared.cpp
@@ +679,5 @@
> +typedef bool (*InterruptCheckFn)(JSContext *);
> +const VMFunction InterruptCheckInfo = FunctionInfo<InterruptCheckFn>(InterruptCheck);
> +
> +Label *
> +CodeGeneratorShared::isBackedgeWithImplicitCheck(MBasicBlock *mir)

The name of this method sounds like it returns a bool, maybe labelOfBackedgeWithImplicitCheck or something.

@@ +684,5 @@
> +{
> +    // If this is a loop backedge to a loop header with an implicit interrupt
> +    // check, use a patchable jump. Skip this search if compiling without a
> +    // script for asm.js, as there will be no interrupt check instruction.
> +    if (mir->info().script() && mir->isLoopHeader() && mir->id() <= current->mir()->id()) {

!mir->compilingAsmJS(). Also the last part could be more robust I think:

if (!mir->compilingAsmJS() && current->mir()->loopHeaderOfBackedge() == mir) {
    ...
}

@@ +717,5 @@
> +        CodeOffsetJump backedge = masm.jumpWithPatch(&rejoin);
> +        masm.bind(&rejoin);
> +
> +        if (!patchableBackedges_.append(PatchableBackedgeInfo(backedge, mir->lir()->label(), oolEntry)))
> +            MOZ_CRASH();

Why the MOZ_CRASH instead of returning false?

@@ +734,5 @@
> +        CodeOffsetJump backedge = masm.jumpWithPatch(&rejoin, cond);
> +        masm.bind(&rejoin);
> +
> +        if (!patchableBackedges_.append(PatchableBackedgeInfo(backedge, mir->lir()->label(), oolEntry)))
> +            MOZ_CRASH();

And here.

Jan de Mooij [:jandem]

Comment 34

•

11 years ago

(In reply to Jan de Mooij [:jandem] from comment #33)
> if (!mir->compilingAsmJS() && current->mir()->loopHeaderOfBackedge() == mir)

I forgot the current->mir()->isLoopBackedge() check.

Brian Hackett [Laid off!]

Assignee

Comment 35

•

11 years ago

loopHeaderOfBackedge() and related functions don't work reliably after critical edge unsplitting --- a given block can have multiple incoming or outgoing backedges without a reliable placement in the predecessors/successors list.  I could add a method on MBasicBlock based on the block ids but it wouldn't work before the graph was in rpo, so I went with the id comparison inline.  I'll add a comment for what's going on.

I added the MOZ_CRASH() calls because jumpToBlock is infallible, and making it fallible makes a bunch of other stuff fallible, and the resulting code is both uglier and more error prone.  Since many parts of the Ion compiler (unchecked new MWhatever()) as well as the entire browser outside of JS will already crash on OOM, I don't think making jumpToBlock fallible has any tangible benefit.

Since your other comments are just nits, if these are OK do you really need to see a new version of the patch?

Jan de Mooij [:jandem]

Comment 36

•

11 years ago

Comment on attachment 785789 [details] [diff] [review]
patch (f24d9d59a689)

Review of attachment 785789 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Brian Hackett (:bhackett) from comment #35)
> Since your other comments are just nits, if these are OK do you really need
> to see a new version of the patch?

With the other comments addressed this should be good to go.

Attachment #785789 - Flags: review?(jdemooij) → review+

Luke Wagner [:luke]

Updated

•

11 years ago

Blocks: 902506

Brian Hackett [Laid off!]

Assignee

Comment 37

•

11 years ago

I had to disable this for ARM as a bunch of tests were crashing or timing out and I don't have a way to debug the failures.  Hope to resolve that fairly soon.

https://hg.mozilla.org/integration/mozilla-inbound/rev/9c90bda44992

Carsten Book [:Tomcat]

Comment 38

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/9c90bda44992

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla26

Luke Wagner [:luke]

Updated

•

11 years ago

Depends on: 906171

Boris Zbarsky [:bzbarsky]

Updated

•

11 years ago

Depends on: 916753

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

11 years ago

Depends on: 947070

Douglas Crosher [:dougc]

Updated

•

11 years ago

Depends on: 944612

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

11 years ago

Depends on: 953289

Nicolas B. Pierron [:nbp]

Updated

•

11 years ago

Blocks: 964258

Jon Coppeard (:jonco) (PTO until 14th September)

Updated

•

10 years ago

Depends on: 965745

patch (ebf1b0f1920c) 11 years ago Brian Hackett [Laid off!] (deleted), patch		Details \| Diff \| Splinter Review
updated 11 years ago Brian Hackett [Laid off!] (deleted), patch	luke : review-	Details \| Diff \| Splinter Review
patch (4ff1e574e509) 11 years ago Brian Hackett [Laid off!] (deleted), patch		Details \| Diff \| Splinter Review
rebase (2287b2667565) 11 years ago Brian Hackett [Laid off!] (deleted), patch		Details \| Diff \| Splinter Review
rebase (2287b2667565) 11 years ago Brian Hackett [Laid off!] (deleted), patch		Details \| Diff \| Splinter Review
rebase #3 (2287b2667565) 11 years ago Brian Hackett [Laid off!] (deleted), patch		Details \| Diff \| Splinter Review
patch (2287b2667565) 11 years ago Brian Hackett [Laid off!] (deleted), patch		Details \| Diff \| Splinter Review
patch (2287b2667565) 11 years ago Brian Hackett [Laid off!] (deleted), patch		Details \| Diff \| Splinter Review
patch (f24d9d59a689) 11 years ago Brian Hackett [Laid off!] (deleted), patch	luke : review+ jandem : review+	Details \| Diff \| Splinter Review