Closed Bug 384412 (js-unwind) Opened 17 years ago Closed 12 years ago

Allow suspending JavaScript interpreter by removing use of C stack

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
blocking2.0 --- -
status2.0 --- wanted

People

(Reporter: joachim.kuebart, Assigned: joachim.kuebart)

References

Details

Attachments

(3 files, 18 obsolete files)

(deleted), text/plain
Details
(deleted), text/plain
Details
(deleted), patch
Details | Diff | Splinter Review
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en) AppleWebKit/522.11.3 (KHTML, like Gecko) Version/3.0 Safari/522.11.3
Build Identifier: SpiderMonkey engine (1.5 - present)

A new API is proposed to allow an embedding application to suspend the
JavaScript interpreter from any native callback. This process is referred
to as "blocking" in the following. When a native callbacks blocks, the
JavaScript interpreter returns to the application immediately and without
losing the current execution state. The application can then perform any
processing of its own and may resume the interpreter at a later time. To
the JavaScript code, blocking native functions are indistinguishible from
normal ones.


Reproducible: Always

Steps to Reproduce:
SpiderMonkey embeddings typically run entire blocks of JavaScript code,
waiting for each to return before continuing. Native calls that are
provided to the script for communication need to complete all their
processing before returning to the JavaScript interpreter. Returning to the
interpreter from a callback cannot be avoided and the application cannot
continue processing before the JavaScript code has finished even if it does
not wish to run the rest of the JavaScript code. This sometimes requires
both the JavaScript blocks as well as the native functions to be expressed
as short-running callbacks.




Please see the attached README for a detailed explanation of the proposed API.

PLEASE NOTE: The attached patch is against the JavaScript 1.5 release source.
If this proposal is accepted, we will produce patches for the latest CVS version
and release them under the same conditions as the SpiderMonkey engine.
Attached patch Support JavaScript interpreter stack unwinding (obsolete) (deleted) — — Splinter Review
PLEASE NOTE:

  These patches are against the JavaScript 1.5 release source. If this
  proposal is accepted, we will produce patches for the latest CVS version
  and release them under the same conditions as the SpiderMonkey engine.
(In reply to comment #1)
> Created an attachment (id=268341) [details]
> Document describing new API, giving use example and detailing code changes
> 

How this is supposed to deal with native C frames on the stack?  I.e. many if not most native C functions can trigger interpreter nesting. So how this proposal is supposed to deal with this case?
(In reply to comment #3)
> How this is supposed to deal with native C frames on the stack?  I.e. many if
> not most native C functions can trigger interpreter nesting. So how this
> proposal is supposed to deal with this case?

There is no native C frame handling.

Native C frames have been removed where easily possible from the interpreter itself,
but currently remain in several places such as constructors and regexp replace
expressions. In these contexts, suspension is not possible and will result in an
exception. More work can be done in this area to further reduce interpreter
nesting.

An embedding application that uses stack unwinding is expected not to introduce
interpreter nesting within its callbacks. It should instead use new API calls to handle
nested calls itself.

(In reply to comment #3)
> (In reply to comment #1)
> > Created an attachment (id=268341) [details] [details]
> > Document describing new API, giving use example and detailing code changes
> > 
> 
> How this is supposed to deal with native C frames on the stack?  I.e. many if
> not most native C functions can trigger interpreter nesting. So how this
> proposal is supposed to deal with this case?

Note also that I am not arguing against the proposal. Rather I think it should be extended to specify exactly when the script state can be captured and to provide a way for a native function to capture its state so at least the standard library in SpiderMonkey can be made continuation-friendly.

Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to comment #5)
>
> Note also that I am not arguing against the proposal. Rather I think it should
> be extended to specify exactly when the script state can be captured and to
> provide a way for a native function to capture its state so at least the
> standard library in SpiderMonkey can be made continuation-friendly.

You are right, there should be an API for a native call to find out whether it was called from a context that allows blocking (I will add this), or to deal more gracefully if blocking occurs inside a nested interpreter call (I need to think about this).

You can have a look at the changes to fun_apply() and fun_call() to see how standard library functions can be made continuation-aware.

(In reply to comment #4)
> Native C frames have been removed where easily possible from the interpreter
> itself,
> but currently remain in several places such as constructors and regexp replace
> expressions. 

The problem is that a script can define a getter or setter or toString and friends conversion function. Thus any native function that query for a property or convert an argument to string can cause interpreter nesting.

> In these contexts, suspension is not possible and will result in
> an
> exception.

I think a better solution would be to have a flag that state that from now on the interpreter can not be nested so an embedding that relies on continuations can detect the problematic JS code ASAP and not when it is written, grown and some time later is suspended.

> More work can be done in this area to further reduce interpreter
> nesting.

In cases like eval and Function.prototype.call the solution is to allow tail calls so they can delegate the invocation to the interpreter. In other cases when the argument is converted to string or other primitive value a way to resolve this would be to mark the function somehow that it requires string as argument so the interpreter can call the conversion code itself. 

But in generic case a mechanism to allow to capture native is required.
Joachim, some comments:

* It's great to have a patch (and design doc), even if a bit of a "bolt from the blue" (sky, if you know what I mean ;-).

* Opera uses explicit state saving and avoids nesting event loops, while yielding to the main event loop, from its JS engine (or did, my information may be out of date). This is an interesting approach, and I'm glad to look into it with you, Igor, and others (we need to take account of the coming Tamarin / SpiderMonkey integration, so I am cc'ing Ed Smith here).

* I would recommend not using JS_HAS_UNWIND or jsconfig.h. If we need an #ifdef it should be a separate macro, a la JS_THREADSAFE. But I would prefer to simply add new APIs and make old ones work as much as possible. Where impossible, it would be better to have assertions and exceptions (not silent null or false returns without an error report).

* We absolutely need a patch against current source. If you are willing to use Mercurial (hg, see http://www.selenic.com/mercurial/), then you could work from the new http://hg.mozilla.org/ server, the cvs-trunk-mirror (updated every six hours or so from cvs-mirror.mozilla.org) repository, or the mozilla-central (updated less often by hand, currently around the state of cvs.mozilla.org's trunk at 1.9a4 but due to be updated).

If you are ok using CVS, then cvs-mirror.m.o is ok too. I mention hg.mozilla.org because we will be doing advanced "ActionMonkey" (Tamarin + SpiderMonkey) work there, and hg has better branching and merging than CVS.

Mozilla is more collaborative than formal about accepting proposals, but since Igor and I have expressed interest in this approach, I hope you will take the next step and update to current SpiderMonkey. We simply cannot work on an old, unmaintained version.

/be
Alias: js-unwind
Cc'ing Edwin. My mention of Opera was an indirect way of acknowledging that they can service events better than we can in the presence of scripts that do not complete "soon". This is one aspect of responsiveness that we Mozillans want to do better at.

Also, I should mention that we want to implement proper tail calls soon, for JS2 (ES4, http://ecmascript-lang.org/, http://developer.mozilla.org/es4/) if not for JS1.8 (see bug js1.8) even sooner.

This implications of new API entry points and a new unwinding mode of dispatching native methods will have far-reaching effects on code such as the mozilla/caps code that walks the stack in order to make security judgments. Cc'ing mrbkap as well.

/be
Blocks: 384323
Added JS_SetNativeTail() call and section on Limitations.
Attachment #268341 - Attachment is obsolete: true
Attached patch Support JavaScript interpreter stack unwinding (obsolete) (deleted) — — Splinter Review
Added JS_SetNativeTail()
Attachment #268342 - Attachment is obsolete: true
(In reply to comment #7)
> (In reply to comment #4)
>
> The problem is that a script can define a getter or setter or toString and
> friends conversion function. Thus any native function that query for a property
> or convert an argument to string can cause interpreter nesting.

I think these cases can be documented as limitations, particularly as they can only arise in certain well-defined situations. I think avoiding calls that may potentially block in property getters/setters, toString() methods and the like is not a drastic constraint.

> > In these contexts, suspension is not possible and will result in
> > an
> > exception.
>
> I think a better solution would be to have a flag that state that from now on
> the interpreter can not be nested so an embedding that relies on continuations
> can detect the problematic JS code ASAP and not when it is written, grown and
> some time later is suspended.

I'm not sure I understand what you mean by this. For example, if native functions were flagged as potentially blocking, this could still only be detected at run-time after the called function has been resolved. On the other hand, disallowing interpreter nesting completely would be unnecessarily limiting as it would also prevent trivial getters/setters etc. I agree that an Exception does not feel like the ideal solution, but I can't think of another solution that would allow similar flexibility.

> But in generic case a mechanism to allow to capture native is required.

I have now added a facility for native calls to designate a "tail function" to run when a nested frame returns. Therefore, even native calls with non-trivial tails can now avoid nesting the interpreter (demonstrated currently by obj_eval()/obj_eval_tail()), without the need for fixed flags and special. I think this allows a great deal of flexibility, without the considerable complexity introduced by handling native C frames.
(In reply to comment #8)
> Joachim, some comments:
> * I would recommend not using JS_HAS_UNWIND or jsconfig.h. If we need an #ifdef
> it should be a separate macro, a la JS_THREADSAFE. But I would prefer to simply
> add new APIs and make old ones work as much as possible. Where impossible, it
> would be better to have assertions and exceptions (not silent null or false
> returns without an error report).

Agreed, JS_HAS_UNWIND was a bit of an afterthought and it didn't work very well for the reasons you mention. I have removed it for now from the current patch as existing API is not modified, so it might not be needed in the end.

> * We absolutely need a patch against current source.

I'm working on this now and will upload it here as soon as I get it done.
(In reply to comment #12)
...
> > I think a better solution would be to have a flag that state that from now on
> > the interpreter can not be nested so an embedding that relies on continuations
> > can detect the problematic JS code ASAP and not when it is written, grown and
> > some time later is suspended.
> 
> I'm not sure I understand what you mean by this. For example, if native
> functions were flagged as potentially blocking, this could still only be
> detected at run-time after the called function has been resolved. On the other
> hand, disallowing interpreter nesting completely would be unnecessarily
> limiting as it would also prevent trivial getters/setters etc.

Suppose one wants to implement alert() in a browser using a suspension to avoid nested event loops. Since the implementation must be compatible with the current web, all its native function should either never call API that can potentially  trigger interpreter nesting or be restartable.

To archive that one needs a tool that can tell if a particular function can ever invoke the interpreter and, if so, must be made restartable. Ideally a static code analyzer would be nice, but if it is not available, then at least a runtime check would be nice. To be effective, the runtime checker must rise an error as soon as any API that can potentially nest interpreter is called.

I agree that such checks would limit the functionality if a particular script does not trigger interpreter nesting, but for suspension to be usable in the browser one can not assume that scripts are nice. 

> I have now added a facility for native calls to designate a "tail function" to
> run when a nested frame returns.

That is nice, but for the browser a generic way to code restartable native functions should be supported. And if that mechanism would allow to make, say, Array.prototype.sort restartable without a significant extra code bloat, that would be really good.
Attached patch Support JavaScript interpreter stack unwinding (obsolete) (deleted) — — Splinter Review
A new version of the patch against the mozilla-central Mercurial tree as of last night. This version now also supports XML predicate filters, generators and (rudimentarily) blocking inside native methods such as Array.prototype.sort.
Attachment #268808 - Attachment is obsolete: true
API and code description updated to cover the extended patch 272755.
Attachment #268806 - Attachment is obsolete: true
Cool -- only initial reaction (haven't looked at code) is: mozilla-central is out of date, which was why I recommended the cvs trunk (cvs-mirror.mozilla.org via CVS or http://hg.mozilla.org/cvs-trunk-mirror). We'll see how the merge algs do with your change; you might try fetching from cvs-trunk-mirror and updating to merge, and see whether automerge succeeds.

/be
Attached patch Support JavaScript interpreter stack unwinding (obsolete) (deleted) — — Splinter Review
Patch updated against today's CVS from cvs-mirror.mozilla.org.
Attachment #272755 - Attachment is obsolete: true
Hi Joachim,

Please see bug 385393 and feel free to comment there or (if it seems more like this is the place) here. I'm probably making merge work for myself more than you, but we can get through it. Swamped right now getting ready for a conference tomorrow (Ajax Experience SF) where I'm speaking, but I'll look at this after tomorrow when I catch a break at the show.

/be
Attached patch Support JavaScript interpreter stack unwinding (obsolete) (deleted) — — Splinter Review
New version of the patch (against current CVS) that fixes a few problems found in the previous version:

XMLFilters are pushed to the tempPool instead of the stackPool since the latter prevented proper script-depth enlargement for calls with missing parameters. Also, jumpPool has been abandoned in favour of tempPool.

Improved argument handling in js_PushFrame and js_CloneStack. The local root stack wasn't cloned properly by js_CloneContext. Fixed problem cloning XMLFilters.

Fixed a few gcc warnings.
Attachment #272946 - Attachment is obsolete: true
Attached patch Support JavaScript interpreter stack unwinding (obsolete) (deleted) — — Splinter Review
New patch against this morning's current CVS.
Attachment #274770 - Attachment is obsolete: true
Attached file Script to exercise the frame handling code (deleted) —
This script allows to quickly run much of the frame handling code. It might be a useful addition to the test suite, even though most of these are probably also tested elsewhere.
Attached patch Support JavaScript interpreter stack unwinding (obsolete) (deleted) — — Splinter Review
Slightly improved version of yesterday's patch to use the inline returning code in more cases to prevent leaving and re-entering js_Interpret() also for generator frames and XML predicate filters.

Also updated to today's CVS head.
Attachment #275444 - Attachment is obsolete: true
Flags: in-testsuite?
I'm late again (new baby arrived on 8/30 but that doesn't excuse earlier silence). Only one naming suggestion at this point: s/NATIVE_BLOCK/NATIVE_UNWIND/ to avoid overloading "block" (lexical scope unit, block statement containing 'let' decls). 

Joachim, have you a tree to update and generate a fresh patch? Conflicts should not be too bad, I hope -- but Igor has a patch pending for always-frameless fast natives.

Igor, any comments?

/be
Assignee: general → jkuebart
Most of the functionality that the patch adds can only be used when one controls scripts. Getter/setters, watch points, toString/valueOf implementations, callback passed to Array's methods, generators etc. introduce native stack in the picture. That can not be eliminated unless one significantly alters SpiderMonkey's public API. So browser embeddings can not take advantage of the continuations support added by the patch. 

On the other hand, when one does control scripts, continuations are useful. For example, one can use them not only to suspend/resume scripts, but also to migrate running scripts to an interpreter running on a different computer.

Thus I suggest to split the patch into 2 parts, the native tail code elimination part and the continuation support itself.

That is, the part of the patch that allows for eval/fun_apply/fun_call to delegate execution of the last js_Invoke to the caller, is useful even for the browser as it eliminates unnecessary native stack consumption.

The rest of the code should go to the second part protected with the necessary #ifdef statements so it would not add extra unused code/struct fields to browser embeddings. 
Attached patch Support JavaScript interpreter stack unwinding (obsolete) (deleted) — — Splinter Review
First of all, congratulations to Brendan for the baby!

New version of patch against current CVS. The patch has become considerably smaller due to recent changes by Igor (in particular, bug 394551).

As per Brendan's suggestion, the option for native stack handling is now called JS_HAS_NATIVE_UNWIND.

As pointed out by Igor, there is now a JS_HAS_CONTINUATIONS option which disables context cloning and related features which are unlikely to be used by the browser embedding in the near future. Without both these options defined, the code size and memory footprint impact should be minimal, while still allowing tail call elimination in fun_call, fun_apply, obj_eval, NoSuchMethod Generators and XML filters and in the interpreter itself.
Attachment #275592 - Attachment is obsolete: true
Attachment #281845 - Flags: review?(brendan)
Attachment #281845 - Flags: review?(brendan)
Attached patch Support JavaScript interpreter stack unwinding (obsolete) (deleted) — — Splinter Review
One more try, trimmed version of patch. Sorry about the noise.
Attachment #281845 - Attachment is obsolete: true
Attachment #281846 - Flags: review?(brendan)
Attachment #281846 - Attachment is patch: true
Attachment #281846 - Flags: review?(igor)
(In reply to comment #27)
> Created an attachment (id=281846) [details]
> Support JavaScript interpreter stack unwinding
> 
> One more try, trimmed version of patch. Sorry about the noise.
> 

Can you split the patch in 2 parts, the tail-code elimination part for natives and the rest of changes?
(In reply to comment #25)
> Most of the functionality that the patch adds can only be used when one
> controls scripts. Getter/setters, watch points, toString/valueOf
> implementations, callback passed to Array's methods, generators etc. introduce
> native stack in the picture. That can not be eliminated unless one
> significantly alters SpiderMonkey's public API.

We can do that, over time. New API would be "opt in", runtime if not compile time (code footprint is probably decisive there).

> So browser embeddings can not
> take advantage of the continuations support added by the patch. 

With enough work, I'm pretty sure even browser embeddings could take advantage of this. IIUC Opera already does something like this in its JS engine and web browser.

Still in favor of patch, hoping for split patch as Igor suggests.

/be
(In reply to comment #29)
> > So browser embeddings can not
> > take advantage of the continuations support added by the patch. 
> 
> With enough work, I'm pretty sure even browser embeddings could take advantage
> of this. IIUC Opera already does something like this in its JS engine and web
> browser.

One do not need continuations to allow suspension/resumption of JS code with native frames. One can just use cooperative threading with the main thread waiting until a JS thread yields control to it. Then there is no problems with native frames, one can easily pause JS and implement cancellation, alert and similar methods would not require modal dialog windows blockin the rest of GUI etc.

My wild guess is that Opera uses something like that.
No threads, cooperative or preemptive, in Opera (my sources say).

/be
(In reply to comment #31)
> No threads, cooperative or preemptive, in Opera (my sources say).

Then the guess is wrong, but is does not mean the main point is not valid: to suspend/resume script execution one does not the power of continuations and native stack avoidance. The same can be done with a cooperative threading implemented via either native threads or setjmp/lomgjmp and a bit of assembler to initialize the stack.
My understanding is that Opera avoids native stack usage; painful though that may be on desktops, it can be good on low-end phones.

Using green/user-level threads as NSPR 1 did (I was co-implementor of NSPR 1) and which still exist in NSPR 2 (mozilla/nsprpub in cvs.mozilla.org) is a bit shaky on modern OSes, since you lose the safety of the stack segment managed by the OS, with a redzone at its low (growing downward assumed) end, etc.

But I see setjmp/longjmp in the patch in this bug, so there we go :-).

/be
Attachment #281846 - Attachment is obsolete: true
Attachment #281846 - Flags: review?(igor)
Attachment #281846 - Flags: review?(brendan)
This is the first part of the patch as requested by Igor and Brendan. Any code to handle stopping/restarting scripts has been removed, the only change to the public API is JS_SetNativeTail().

Interpreter recursion is removed from fun_call, fun_apply, obj_eval, NoSuchMethod, Generators, XML list filters, and the interpreter itself.
Blocks: 397490
I filed the second half of the patch that deals with stopping/restarting scripts and context cloning as bug 397490 comment 1.
New version of the reduced patch, updated to today's CVS.
Attachment #282262 - Attachment is obsolete: true
Attachment #293116 - Flags: review?(brendan)
Version: unspecified → Trunk
This sounds cool! Brendan, any chance of reviewing?
Yes, it's on my queue but I keep having higher priority interrupts (had some to do with new baby last fall). It will get attention, and I'll help unbitrot if needed.

/be
I guess there is no chance this will get in 1.9 anymore, right? I was hoping this would allow fixing bug 61098.
Martijn: this is a high post-1.9/firefox3 priority for me. Joachim has been very patient and I will help unbitrot. The biggest challenge there will be the move to jsinvoke.c of all but js_Interpret from jsinterp.c. But with line adjustments (I hope, big enough fuzz option to patch) the patches should still be applicable.

/be
The test was modified to pass on the trunk. The only "major change" was to remove InternalError: yield not yet supported from filtering predicate as an expected error, although it still occurs on branch.

/cvsroot/mozilla/js/tests/js1_8/regress/regress-384412.js,v  <--  regress-384412.js
initial revision: 1.1
Flags: in-testsuite? → in-testsuite+
Flags: wanted1.9.1?
My priorities changed due to TraceMonkey, and I have to apologize again for not helping keep this patch up to date. Joachim, have you been tracking mozilla-central's js/src? If not, I will help merge the patch forward starting next week.

/be
Due to other work I haven't been tracking mozilla-central, the most recent sync was on 14 December 2007. Since then I have done very few modifications/fixes which I am currently in the process of collecting. I will post this revised set later today, but unfortunately it is still against the 14 December sources.

Help with bringing this up to date would be greatly appreciated as I won't be able to spend significant time on this before November/December.

Joachim
A new version of the patch with a few fixes, but still against trunk from 14 December 2007.
Attachment #293116 - Attachment is obsolete: true
Attachment #293116 - Flags: review?(brendan)
Blocks: alertloops
Flags: wanted1.9.2?
Flags: blocking1.9.2?
Old patch.  Needs a refesh.  Not blocking 1.9.2.
Flags: wanted1.9.2?
Flags: wanted1.9.2+
Flags: blocking1.9.2?
Flags: blocking1.9.2-
blocking2.0: --- → ?
I'm going to update this patch to current TraceMonkey.

This time, I will manage the patches using MQ on top of hg.m.o/tracemonkey. As a result, they will be easier to digest and hopefully discuss. I'm planning to update them every say couple of weeks (counting on support from hg rebase).

Alternatively, by popular demand I can also publish my hg repo. Obviously this will mean additional merge records, and the end result won't be a set of separate patches against some future version of TraceMonkey. It will make it easier for people to build it and contribute...

Let me know which would be more useful, and also whether I should work off of /mozilla-central or /tracemonkey.
This patch is relative to today's hg.m.o/tracemonkey.
Attached patch Move hookData member from JSInlineFrame to JSStackFrame (obsolete) (deleted) — — Splinter Review
This patch is relative to today's hg.m.o/tracemonkey.
Attached patch Move mark member from JSInlineFrame to JSStackFrame (obsolete) (deleted) — — Splinter Review
This patch is relative to today's hg.m.o/tracemonkey.
Attached patch Split js_Invoke into two halves, js_PushFrame and js_PopFrame. (obsolete) (deleted) — — Splinter Review
This patch is relative to today's hg.m.o/tracemonkey.
Joachim, fyi, there is currently work separate underway to move JSStackFrames onto the heap (bug 536275).

A separate question:
It seems like the goal of this patch is to allow natives to avoid blocking the JS engine.  I assume the goal is so that the JS engine can share a thread with something latency sensitive, like UI event handling.  However, even with the proposed change, the JS engine can still block (as mentioned in comment 6 and related), and this blocking is dependent on JS content.  Thus, if the goal is to avoid blocking, it seems the embedder would still need to use alternative techniques.  That is, this change doesn't really seem to solve the problem, but perhaps I misunderstand the problem.
Hi Luke,

that's great news that JSStackFrames are moving onto the heap! Thanks for the pointer, I'll look at the patches that are in the queue and make sure mine don't conflict.

As for you question:
I my case, to service a blocking JavaScript call involves returning from the function that started the JS engine. This is because the embedding application essentially consists of co-routines yielding control to each other via a rather "high up" dispatcher. Therefore, the JS engine also has to unwind and be able to restart when control is yielded back to it.

Since the processing of the "blocking" JavaScript calls cannot be done without stack unwinding, such calls aren't allowed when JavaScript cannot unwind (such as valueOf(), toString(), __getter__() etc) and cause an exception. For me this is acceptable behaviour, for a browser's alert() it obviously isn't. In this case, I suggested "native" unwinding using setjmp()/longjmp() which has obvious major drawbacks...

In my case, sometimes the entire JavaScript state has to be captured, stored and potentially restarted at a later point. This patch also prepares this continuation facility, see bug 397490.
No longer blocks: alertloops
blocking2.0: ? → -
status2.0: --- → wanted
I've updated my patch to the latest gecko-1.9.2 version from <hg.mozilla.org/releases/mozilla-1.9.2>. Specifically this patch is based on d7c24aa07c300a3eeb1610e627ad71ce294663ca.

There are still some things missing compared to the old version, like unwinding from inside generator frames.

I've organised the individual changes in a patch queue which I'm happy to share, or I can also publish my repository.

I'm also updating bug 397490 with a patch that implements a class Continuation similar to Rhino's, even though there are some limitations how it can be used (no jumping into or out of property getters/setters, for example).
Attachment #336352 - Attachment is obsolete: true
Attachment #422113 - Attachment is obsolete: true
Attachment #422114 - Attachment is obsolete: true
Attachment #422115 - Attachment is obsolete: true
Attachment #422116 - Attachment is obsolete: true
You're almost certainly going to want to base your patches off the tracemonkey branch (hg.mozilla.org/tracemonkey) where all current js development is happening rather than gecko-1.9.2 which is only taking security and stability updates at this point.
(In reply to comment #54)
> You're almost certainly going to want to base your patches off the tracemonkey
> branch (hg.mozilla.org/tracemonkey) where all current js development is
> happening rather than gecko-1.9.2 which is only taking security and stability
> updates at this point.

I'm aware that gecko-1.9.2 is "old" code -- I was working on the tracemonkey branch until about Q3/2010 when it became too difficult for me to keep up with the changes (particularly due to JägerMonkey). As it was important for me to get the required functionality working, and less important to have it for the bleeding edge SpiderMonkey, I was forced to switch to the 1.9.2 branch.

I haven't looked in a few months, but I expect that it will be a fair amount of work to update this patch to tracemonkey. I don't have enough time to do this myself, especially given how fast that repository is moving. I'd like to be involved if I can interest a group of people in working together on this. The first effort would need to be an agreement on the right design of this for JägerMonkey and which features and limitations are expected and acceptable.
Luke,
Is this at all feasible in the current spidermonkey world? Would be nice to do alert() properly and be able to turn blocking js apis inside out(ie bug 701880).
Sounds like opera does this.
Whiteboard: [Snappy]
(In reply to Taras Glek (:taras) from comment #56)
> Luke, Is this at all feasible in the current spidermonkey world?

As a Spidermonkey feature, I think 'no'.  What makes it particularly hard is when JS calls into C++ which calls into JS... and then we want to suspend.  For this to work, every C++ JSNative needs to participate in some suspend/resume protocol.  That's a huge JSAPI change, huge amount of work (think of how many natives we have in SM, XPConnext, DOM, etc), performance hazard (have to maintain local state in some form of interruptible structure and use interruptible algorithms), and would surely be a wellspring of bugs.  I'm not in favor.

OTOH, I think it would be possible to run JS on a separate green thread that only executes on the (HW) main thread.  Then, at well-defined points, we could suspend the JS thread, effectively popping it off the stack without killing execution, and resume on the "real" main thread.  That would be implemented at a browser level and SM would be mostly oblivious.  Maybe I'm missing an obvious difficulty with this approach.
Luke, thanks. This does not sound like something we can do in the near term, so I'm going to remove this from [Snappy] list. 

Makes me wonder if we can atleast suspend JS that does not involve C++. Would that make it easier? It would make it possible for while(1); loops to not jank us.

Boris, is the green thread idea feasible?
Whiteboard: [Snappy]
(In reply to Taras Glek (:taras) from comment #58)
> Makes me wonder if we can atleast suspend JS that does not involve C++.
> Would that make it easier? It would make it possible for while(1); loops to
> not jank us.

That's a lot more feasible but still not easy.  I'm not terribly excited to do a lot of work for a partial fix, so I'd be a lot more inclined toward the green thread solution.

Again, I don't know all the issues, but I would guess that the browser changes to get suspension working via spidermonkey vs. green-threads would be similar in magnitude.  One challenge for green threads would be getting the right API on all platforms; a quick Googling shows makecontext/swapcontext in POSIX, Linux, and OS X docs and that Win32 has fibers.
But really, for anywhere where we would ask suspend (via JSAPI or green thread), couldn't we just spin up a nested event loop?  I hear they are evil and all, but if we want the quick fix...
(In reply to Luke Wagner [:luke] from comment #60)
> But really, for anywhere where we would ask suspend (via JSAPI or green
> thread), couldn't we just spin up a nested event loop?  I hear they are evil
> and all, but if we want the quick fix...

That does not scale. Ie with dom storage you could be using blocking function calls to write into it in a loop... which would then blow our little stack.
(In reply to Taras Glek (:taras) from comment #61)
> That does not scale. Ie with dom storage you could be using blocking
> function calls to write into it in a loop... which would then blow our
> little stack.

Ah, I see, thanks.  So the goal is to, effectively, spin a nested event loop without chewing through the C stack.  Yeah, green threads sound much better to me.
The green thread thing has one minor-ish issue (need to fix all the assertions about the DOM only being touched on the main thread) and one major-ish issue: When JS is suspended, then what?  After all, the caller on the main thread expected run to completion semantics....
(In reply to Boris Zbarsky (:bz) from comment #63)
> The green thread thing has one minor-ish issue (need to fix all the
> assertions about the DOM only being touched on the main thread) and one
> major-ish issue: When JS is suspended, then what?  After all, the caller on
> the main thread expected run to completion semantics....

I think this means that we would only be able to suspend from callers that are aware of green threads and can cope with them...ie save state and resume later.
(In reply to Boris Zbarsky (:bz) from comment #63)
> When JS is suspended, then what?  After all, the caller on
> the main thread expected run to completion semantics....

I thought this is the tricky issue that nested event loops already have to deal with.  I was naively thinking that the green threads would use the same sort of event filtering.
The event filtering we have right now is imperfect at best.  And if we're planning to reuse it, what's the benefit over spinning a nested event loop?
(In reply to Boris Zbarsky (:bz) from comment #66)
I asked this in comment 60, Taras answered in comment 61.  Maybe we can just increase our C stack limit? (IonMonkey would appreciate this as well.)
Hmm.  I don't understand comment 61.  If the call is "blocking", that means the next call won't start (and in particular the loop won't continue) until the call finishes.  So what's the scenario we're worried about?
Can't nest events arbitrarily without creating data races. JS has a "run to completion" or "non-preemptive shared memory" execution model since forever and for good cause. Imperfect event filtering in various browsers, definitely including Gecko-based ones, constitute bugs to fix, not multiply.

Netscape 4 had JS in a separate thread. The browser UI (native widgets) shrugged off JS iloops.

Green threads are possible, provided we can get OS stack redzoning for safety. Chrome JS would not run in the green thread, only content JS (chrome JS might face some novel data races, but arguably it could cope).

One could have a green thread per "window group" of content windows/frames reachable via name targeting / window.open. Mike McCool was working on this for the ill-fated Netscape 5 when Mozilla reset around Gecko.

/be
Before we do any major rearchitecture, though, we need evidence that content JS janks users often enough to matter. Taras, whaddya got?

/be
The only JS stuff I noticed in the hang detector (30+ second hangs) was the GC and array_indexOf.  The former will be addressed by incremental GC, and Luke thought the latter was due to the hanging script popup not working in some cases.  Microjank may be a different matter, of course.
(In reply to Brendan Eich [:brendan] from comment #70)
> Before we do any major rearchitecture, though, we need evidence that content
> JS janks users often enough to matter. Taras, whaddya got?

bug 707391 aims to do this right In meantime: bug 705695 is evidence that js loops can jank us. Dom storage is a blocking api that is called from js that can jank us(bug 701880)...and janks us a lot.
Blocks: 709217
No longer blocks: 709217
Blocks: 709217
(In reply to Boris Zbarsky (:bz) from comment #63)
> The green thread thing has one minor-ish issue (need to fix all the
> assertions about the DOM only being touched on the main thread) and one
> major-ish issue: When JS is suspended, then what?  After all, the caller on
> the main thread expected run to completion semantics....

It seems to me that in order for interruptible JS to be useful, we need to split our current single main thread task queue to one task queue per each constellation of mutually-reachable docshells. That is, we need separate task queues for where Chrome has separate processes.

Then, if JS belonging to a page is suspended, the task queue for the docshell constellation that the page belongs to should be marked as being on hold so that no tasks are dequeued from it. Meanwhile, tasks from the queues belonging to other docshell constellations could be run.
What do nested event loops do currently (for alert() and the slow script dialog)?
alert just spins the gecko event loop (and freezes some, but not all, events fired to the relevant window).

Not sure how the slow script dialog works.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #75)
So does that mean comment 73 is how we should have done alert() in the first place?
Yes. But note that because of the way our chrome and extensions currently work, the main (chrome) event loop can enter content code at pretty much any time, so in order to preserve sane run-to-completion you'd still need to block and clear out the pending content event queue before running that code. Isn't that still the same basic problem that e10s had, perhaps with fewer complications because all the green threads live on the "main thread"? I don't really see how this gets us usable wins without also inverting our control flow a-la message manager.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #77)
> Yes. But note that because of the way our chrome and extensions currently
> work, the main (chrome) event loop can enter content code at pretty much any
> time,

Interesting.  Since calling from chrome to content necessarily goes through a wrapper, we could have that wrapper transition resume the content's green thread, run it to completion, then resume the call from chrome to content.  This breaks the LIFO-ness of the C++ stack which is somewhat terrifying...

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #77)
> Isn't that still the same basic problem that e10s had, perhaps with
> fewer complications because all the green threads live on the "main thread"?

Yeah, I much preferred the e10s plan for dealing with content hangs; we're just wondering if there is anything relatively easy that can be done in the meantime.
Benjamin: is this something that *can* happen (nothing prevents it) or something we expect to happen regularly, for some meaningful reason?  The more I think about it, the more weird it seems that chrome JS would call content JS outside of any message to that content (in general calling content seems like something that shouldn't just happen willy nilly).  If there are only a few cases, perhaps we could change them to dispatch messages that would be blocked properly.  Once we have the cases in mozilla fixed, we could catch addons by just throwing an exception at the chrome-to-content wrapper boundary.
(In reply to Luke Wagner [:luke] from comment #79)
> Benjamin: is this something that *can* happen (nothing prevents it) or
> something we expect to happen regularly, for some meaningful reason?  The
> more I think about it, the more weird it seems that chrome JS would call
> content JS outside of any message to that content (in general calling
> content seems like something that shouldn't just happen willy nilly).  If
> there are only a few cases, perhaps we could change them to dispatch
> messages that would be blocked properly.  Once we have the cases in mozilla
> fixed, we could catch addons by just throwing an exception at the
> chrome-to-content wrapper boundary.

Or just have the chrome and addons block in that case, perhaps? This would maintain backwards compatibility at the expense of performance (but note, no worse than the performance we have today). Then we could start to convert chrome and addons to use the message manager, maintaining compatibility all the way. Once that is complete, we're basically e10s-compatible, so we could move to e10s.
That really depends on how much surface area you want to guarantee is run-to-completion. If we only need to block chrome on content script *running*, then I expect that in normal (non-weird-extension) cases there isn't a lot of access.

However, I don't think that is sufficient. Content scripts don't expect their objects or the DOM to mutate under them, so you effectively have to block on any DOM mutation. We can probably block the parser separately, so we're basically worried about chrome script that modifies the DOM. This isn't all that uncommon for things like Flashblock, as well as greasemonkey and greasemonkey-like things. We also have internal DOM modifications for form and password fill, the code which sets up and implements window.console, and other similar things.

There's a fair bit of readonly access to the DOM for things like session restore, browser-find, and maybe things like spellcheck (I'm not sure exactly how that works). Can we safely assume that readonly DOM operations don't have to block on hung content?

None of these should be considered impediments if this will be a fairly small project to undertake, and it would certainly provide some great wins. But if this is a biggish project that will involve a bunch of engineering time, I'm skeptical that we should work on it now. We dropped the e10s plan precisely because concerns that inverting control flow in our code and in extensions was not the right decision now, so anything which assumes that we will invert control flow in order to get large benefits is a risky bet.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #81)
> None of these should be considered impediments if this will be a fairly
> small project to undertake, and it would certainly provide some great wins.
> But if this is a biggish project that will involve a bunch of engineering
> time, I'm skeptical that we should work on it now. We dropped the e10s plan
> precisely because concerns that inverting control flow in our code and in
> extensions was not the right decision now, so anything which assumes that we
> will invert control flow in order to get large benefits is a risky bet.

I agree with this -- we need more information as to whether this is a large project or a small one. The thing that seems appealing about this approach is that it provides a migration path to e10s that might provide some large wins in responsiveness right now, while maintaining full backward compatibility for chrome JavaScript and addons. But you are right that this is gated on this being a project that could be completed in a relatively short timeframe.
> Can we safely assume that readonly DOM operations don't have to block on hung content?

Even readonly DOM operations can end up lazily adding properties to various DOM JS objects due to resolve hooks.  It's not a problem in terms of run-to-completion (because resolve hooks are transparent to content, modulo XBL); not clear whether it's a problem for the JS engine with the proposals here.
(In reply to Patrick Walton (:pcwalton) from comment #80)
> Or just have the chrome and addons block in that case, perhaps?

Perhaps you have a different meaning of "block", but if you mean "resume running content and then resume chrome after content is finished" then you would get the unholy non-LIFO C++ stack problem mentioned in comment 78.

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #81)
> None of these should be considered impediments if this will be a fairly
> small project to undertake,

Well, IIUC, the project is big or small depending on how hard/easy it is to fix the issues you've described (as well as those in comment 73).  Right now, it is sounding "big".
(In reply to Luke Wagner [:luke] from comment #76)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #75)
> So does that mean comment 73 is how we should have done alert() in the first
> place?

Yes. alert() breaks the run-to-completion semantics. See bug 637264, for example.
Blocks: 735246
This will be implemented when?
& windows & Linux will also be beneficial?
This is a huge architectural change that we have no plans to invest in. Marking as WONTFIX, since people are coming across this bug and wondering what the status is.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: