Closed Bug 1563335 Opened 5 years ago Closed 5 years ago

Implement switch to execute SharedArrayBuffer threads sequentially rather than in parallel

Categories

(Core :: DOM: Content Processes, task, P2)

task

Tracking

()

RESOLVED FIXED
mozilla76
Tracking Status
firefox76 --- fixed

People

(Reporter: annevk, Assigned: bas.schouten)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

In the event that SharedArrayBuffer does turn out to be problematic we want to have a failsafe whereby the threads that can use it with postMessage() (see bug 1562663) are no longer executed in parallel.

This will be disabled by default, but we should be able to flip it in the event of an emergency.

Type: defect → task

Apparently part of the idea here that I forgot abouot is that we'd enable this on background windows to prevent those from using too many resources and keep the code tested. And ideally it'd only ever be enabled there as it'll cause jank if we had to enable it by default.

Type: task → defect

The priority flag is not set for this bug.
:jimm, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jmathies)

Luke question

Flags: needinfo?(jmathies) → needinfo?(luke)

This is P2, same as all the other "re-enable" SharedArrayBuffer bugs.

Flags: needinfo?(luke)
Priority: -- → P2
Assignee: nobody → ytausky
Type: defect → task
Assignee: ytausky → ttung

Will work on this once I provide a patch in bug 1562667.

Spoke with luke and let me put my understanding below:

The goal is when a SAB is alive and is postMessage'ed from one worker/window to another worker window, we want the threads of these globals to be executed in sequence.

  • To achieve this, we might need the WatchDog thread to coordinate when we start to execute tasks in MainThread and WorkerThreads in sequence and when we end.
  • To avoid the potential deadlock we might need to use async interrupt (and the control runnable on workers).
  • To avoid starvation on the main thread, we might need to monitor the number of pending tasks on the main thread and prevent it from exceeding a certain number.
  • We might need APIs in JS to let DOM know the number of existing SAB and maybe some callback to let DOM know the time for not doing the execution in parallel.
  • All of these are protected under a pref.

Luke, would you mind pointing out if there is something missing or maybe some items are not correct? Thanks!

Flags: needinfo?(luke)

I'm guessing this is what you mean by "avoid the potential deadlock", but just to be completely clear: since worker threads can block using Atomics.wait() there needs to be some kind of change to Atomics.wait() to make it perform (the equivalent of) a directed yield back to a scheduler of some sort so that other threads can be allowed to run and thus allow the waiting thread to be notified.

@ttung Thanks, that's a good list. A few more refinements to the bullets (pointwise):

  • It doesn't have to be the WatchDog thread, but since the WatchDog thread is already being notified when JS scripts start and stop and async-interrupting them when they take too long, this seems like a natural point to extend to avoid duplication.
  • In addition to using async interrupt to stop all long-running JS, we probably also need to integrate with nested event loops (e.g., for sync XHR and alert()), which can block an unbounded amount of time waiting for clicks or packets.
  • The "starvation on the main thread" bit I think really just means that if there is a queue of threads waiting to run JS, the main thread probably gets more-highly prioritized (e.g., it always gets to jump to be #2 in line)
  • The APIs in JS are specifically "SpiderMonkey JS APIs used by Gecko code", not functions-exposed-to-JavaScript-code
  • The pref is the "disable high-resolution time" pref and would be enabled only in a zero-day situation in release FF, but (probably) all the time in Tor.

@lth Async interrupts are already integrated with Atomics.wait(). My hope is that, since blocking for an unbounded time is already a problem on both workers (for worker.terminate()) and main thread (for "stop script"), there should be a small number of these cases that are already carefully integrated.

Flags: needinfo?(luke)

(In reply to Luke Wagner [:luke] from comment #8)

@lth Async interrupts are already integrated with Atomics.wait(). My hope is that, since blocking for an unbounded time is already a problem on both workers (for worker.terminate()) and main thread (for "stop script"), there should be a small number of these cases that are already carefully integrated.

I wonder if we're not talking about two different things. I'm thinking not about indefinite waits but about normal uses of wait/notify to manage critical sections, locks, condition variables. A normal use of wait currently entails blocking on a lock deep in the runtime. At that point, for a green thread, control has to pass to some management routine to allow something else to run.

(Now that you mention it, moving everything onto a single thread will have terrible consequences for anything using spinlocks, but I'm not sure what we can do about that.)

Ah hah, I see; great point. So technically, I think the hazard you're talking about would only significantly hurt performance, not completely deadlock (specifically: thread X holds a mutex and then gets interrupted; thread Y Atomic.wait()s on that mutex and wastes the entire scheduling quantum waiting before thread X gets scheduled again and gets to finish and release the mutex).

But of course we should avoid this perf problem. I think a proper solution be that, in the guts of Atomics.wait() (and also the guts of a nested event loop), we explicitly send the "I'm not running JS" event (the same one sent to the watchdog thread when a JS execution finishes) and then re-request to run JS after the wait is ready to resume (again, using the same request mentioned above for when we want to start executing a JS task).

So long as the interrupt properly breaks the wait and the wait is properly restarted, there should not be a deadlock hazard. Performance on a typical producer-consumer problem would be so poor that it might look like deadlock though ;-)

About correctness and deadlocks, though... One problem we're seeing in the form of intermittents in CI is that programs can be written to be brittle wrt timing, because writing them the "right" way is laborious / hard / impractical. These programs never fail in real situations, because "large enough" quanta are used for timeouts and so on. But they fail on horribly overloaded systems where waits are much longer than normal; this is not unlike what you see stepping through timing-sensitive code in a debugger. On a system that is very slow as a result of the problem described above, I would expect to see an increasing failure rate in practice, and it will appear that our implementation is buggy. (It's another argument to get wait/notify right.)

Spinlocks are related obviously; normally they're an optimization, but in a green-threads case a spinlock will almost never exit early, so anything using a spinlock to speed up a normal lock will suddenly experience a slowdown in its locking (but again, should not deadlock). Not sure how to approach that; we can perhaps recognize a spinlock in the compiler, but it'll be brittle. For a while we had a proposal for Atomics.pause() which would have provided the necessary signal to the compiler, but Filip shot it down.

(Battles I wish would have gone the other way: Atomics.isLockFree is in and Atomics.pause is out.)

Yes, spinlocks will suck for sure. (I think the abovementioned wait tweaks would make producer/consumer do "ok" though.) It seems like the ideal lock impl (for both this case but also any single-core machine) would use a single inline compareAndExchange (to grab the lock quickly in the uncontended case), and then immediately call into Atomics.wait() on failure. A smart lock impl might use navigator.hardwareConcurrency to make this decision (whether to spin a bit before waiting), which suggests that, in "oh crap" mode, we should have navigator.hardwareConcurrency return 1. This would probably be good for a number of other reasons since it reflects reality.

First of all, thanks for the feedback and reminder from Lars and Luke!

Luke, I list my plan below. And please let me know if anything is confusing or looks bad to you.
I intend to work on a simple version and then work on how to avoid potential issues (and go through the comments above) and improve performance.

For the simple version:

  • Implement a pref "dom.postMessage.sharedArrayBuffer_disable_high_resolution_time" (off by default)
  • Start "in sequence" timing: while deserialized and there is a SAB or there is a wasm shared memory object, I will ask the monitor thread start doing stuff.
  • End "in sequence" timing: the SAB object is free -> I'm not sure where is it in the JS engine, but I will file a bug for that.

Questions:

we probably also need to integrate with nested event loops (e.g., for sync XHR and alert()), which can block an unbounded amount of time waiting for clicks or packets.

  • Could you elaborate more on how to integrate with nested event loops? Does that mean, for instance, while we are doing a sync XHR, we need to send a notification to the monitor?
Flags: needinfo?(luke)

Does the JS engine know which two JSContexts are sharing memory? If not, I might need to implement a hashtable on the monitor thread so that the monitor thread knows which multiple JSContexts need to be interrupted.

The simplest option would be for there to simply be a process-wide bool "are there any live, shared SABs?" and, when it was true, all worker/window JSContexts got sequentialized. This is overly conservative, but I think it might be simpler than the alternative of maintaining a more-precise list of JSContexts.

I'm not sure how to recover a script after an interruption. Would you mind shedding the light on it? Thanks!

First call JS_AddInterruptCallback() to register the monitor's callback. Then call JS_RequestInterruptCallback() from the monitor thread (when it's time to interrupt), then, before too long, the callback registered earlier will be called and you can now block the thread by waiting on some condvar. The monitor's callback would always return true, to avoid aborting script execution.

Could you elaborate more on how to integrate with nested event loops?

I think the basic rule is: before a worker/window thread waits for an unbounded time on a condvar or lock, it should notify the monitor thread that it's stopping JS execution and then, when it's down waiting on the condvar/lock, it should request from the monitor thread the permission to resume executing JS. This way, when thread A blocks, the time slice is immediately yielded to some other thread which may be holding the lock that thread A is waiting on, avoiding the perf cliff lth identified above.

Flags: needinfo?(luke)

(In reply to Luke Wagner [:luke] from comment #14)
Thanks! I will update some initial patches tomorrow.

Assignee: ttung → yaron.tausky
Status: NEW → ASSIGNED
Assignee: yaron.tausky → ytausky
Assignee: ytausky → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(bas)
Depends on: 1599659

Looks like I'll be doing this.

Assignee: nobody → bas
Status: NEW → ASSIGNED
Flags: needinfo?(bas)

One outstanding concern may be how we keep this tested with relation to the Main Thread throttling, since that will only apply to the 'Oh Crap' situation and none of the other usecases. During development the majority of test failures were related to deadlocks between the main thread and a waiting worker. One 'obvious' choice would be to run our entire test suite with a pref that causes all contexts to be throttled, as I did for my manual testing, but from a testing resource perspective that's unlikely to be a realistic approach.

What do you think Luke?

Flags: needinfo?(luke)

(In reply to Bas Schouten (:bas.schouten) from comment #18)

One 'obvious' choice would be to run our entire test suite with a pref that causes all contexts to be throttled, as I did for my manual testing, but from a testing resource perspective that's unlikely to be a realistic approach.

Do you need full OS coverage with debug and opt? And what's the entire test suite - is it all tests (e.g. ./mach try -u all or just a few suites)? Adding just a few suites for linux64 debug does not seem prohibitive (although you should double check with jmaher/ahal...)

Are there any test runs that we do less frequently than every commit? (I'm way out of date on this aspect of testing.) I expect we don't need each OS or all tests, so the cost for doing a linux64 debug run less frequently than every commit would seem like a reasonable cost.

(Later, if we can use this mechanism for a performance throttling feature, then perhaps we could justify something more often/exhaustive.)

Flags: needinfo?(luke)

(In reply to Luke Wagner [:luke] from comment #20)

Are there any test runs that we do less frequently than every commit? (I'm way out of date on this aspect of testing.) I expect we don't need each OS or all tests, so the cost for doing a linux64 debug run less frequently than every commit would seem like a reasonable cost.

(Later, if we can use this mechanism for a performance throttling feature, then perhaps we could justify something more often/exhaustive.)

That's in the same direction as I was thinking.

Depends on: 1609916

Sorry about delay with the review. I'll try to find time tomorrow.

One thing that would be good to add is to AutoRelinquishClearance inside FutexThread::wait so that, when a thread goes to sleep waiting on a condvar we allow other threads to run. Otherwise, we'll fall into the pathological slow behavior Lars mentions above (comment 9).

Blocks: 1617872

(In reply to Luke Wagner [:luke] from comment #23)

One thing that would be good to add is to AutoRelinquishClearance inside FutexThread::wait so that, when a thread goes to sleep waiting on a condvar we allow other threads to run. Otherwise, we'll fall into the pathological slow behavior Lars mentions above (comment 9).

I looked at this... We'd need a callback mechanism into Gecko if we wanted to do this. I wonder if we want to do that at a later time or make that a blocker for this work?

For any pthreaded wasm app with any contention at all, I expect this FutexThread::wait() support will be the difference between decent as-if-single-core performance and something much worse. I also think it would be reasonable for Tor to enable SAB by always having single-threaded-mode always set (thus improving their webcompat w/o ever providing a high-res timer), so there is even a reason to care in the non-oh-crap scenario.

I don't think this technically blocks landing, since there's a clear path to implementing this. We should get it landed "before too long" though, given the above. But it also seems like this change wouldn't take long -- mostly just adding a pair of runtime callbacks and a JS_SetBlockingCallbacks(JSContext*, JSBeforeBlockCallback, JSAfterBlockCallback) to set them -- would it be possible to just do that instead of filing a follow-up bug and worrying about getting it fixed before too long?

I would be really reluctant to ship the single-threading feature without the necessary work to switch threads when we are about to block in a lock, otherwise performance can be horrible and my gut feeling is that this will affect effectively all the programs that cause the single-threading feature to be enabled. As Luke says, this is only an issue when there is contention, but we should expect contention in most multi-threaded programs.

I'm happy to help with the infrastructure for the callbacks, of course.

Depends on: 1619641

I've tested things locally and everything seems to be working. However obviously the problem of CI testing is not yet solved.

Pushed by bschouten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/94ab655aae4f Part 1: Implement mechanism to throttle JS execution. r=smaug,asuth
Pushed by bschouten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/07d5ad6378e0 Part 2: Add a pref for serializing all JS threads that access shared SABs. r=smaug
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla76
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: