Closed Bug 496240 Opened 16 years ago Closed 15 years ago

Trace JSOP_NAME for general closures

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: dmandelin, Assigned: dmandelin)

References

(Blocks 2 open bugs)

Details

Attachments

(2 files, 2 obsolete files)

This is a bit easier than bug 495329, and probably more important, although the latest test case in bug 460050 depends on SETNAME/BINDNAME.

Specifically, we want to trace the case in TraceRecorder::name that resolves a name to an out-of-reach (not part of the current trace) outer-scope variable. The dumb (i.e., obviously correct in general) way to evaluate reading such a variable is to search the scope chain each time. But we want to do a little better than this.

As Brendan said, there are about 9 dimensions to this problem, so I'm going to start with a simplified version.

First consider the case where there is no dynamic scoping in effect (no with statements and no eval). In that case, for a given inner function |g|, the scope chain is mostly invariant for different activations of |g|. Specifically, the number of elements is the same and the functions on the scope chain are the same. But the activations on the scope chain (i.e., Call objects) may differ. Thus, if during recording the variable is found |k| steps up the scope chain, it will be properly found |k| steps up the scope chain always. Also, if during recording the variable is found in a scope chain Call object for function |f|, then the variable will always be found on the scope chain Call object for function |f|. Either search method seems suitable and efficient for tracing.

The generated LIR for the access must handle several cases:

 (1) Target call object represents an active function (i.e., has a JSStackFrame)

     (1.1) Target call object is not on trace. This is the case observed
           during recording. The LIR should search the scope chain. It will find
           a Call object with an active JSStackFrame that has the up-to-date
           value and can be read out directly from argv or slots and then
           unboxed.

     (1.2) Target call object is now on trace. This can happen if the original
           trace is called as an inner trace. In this case the up-to-date value
           is in an outer trace's native stack area. The LIR should again search
           the scope chain for the target Call object. The call object's private
           member is the callee function object. Now, the trace code can search
           the trace native stack frames (as in js_GetUpvarOnTrace) for a frame
           with the same callee. Finally, the code can read out the value from
           the native stack frame. 

           There is an inefficiency here: we do two searches. This probably
           won't matter that much. If it does, we could try to create a link
           from the call object directly to the native stack frame.

 (2) Target call object represents a function that has returned. We don't have
     to handle this case right away. Case (1) is more important and represents
     the usual case of calling map-like functions. But I'll still describe
     briefly how this might work.

     In this case, it appears the target variable cannot be part of a native
     trace stack, because the native stack holds only locals of active 
     functions. So the only possibility is that the variable is a property of
     a call object that uses a slot. Thus, the generated code can search the
     scope chain as before. Then, it should read the value out of the call
     object as is done for any other property.

I already prototyped (1.1) and it gave a 10x speedup on a microbenchmark.

All of the above relies on the shapes of the call objects being more or less unchanged for active functions. This is not true when there is eval or with. For that case, we need to guard that the shape of the target call object is the same as what it was during recording.

Another item to note is that if the callee object is unchanged since recording time, we could skip the searching of the call stack and instead use the same target object we had during recording. If we used this optimization, then when the callee object does change, we would have to invalidate and do the search. This is just another instance of inline caching. For now, I won't do it, but we should remember that we can for later. It would definitely speed up applications of map-like functions to long lists with closures.
Attached patch WIP (obsolete) (deleted) β€” β€” Splinter Review
This version works for simple programs, with a good speedup on microbenchmarks.

Notable:

- I used a template function parameterized on a trait type to make one function that can handle both the args case and the vars case. I decided to try this as an alternative to giant macros. It seems to work. The syntax is a little heavy but I like the explicitness and type checking.

Major TODOs:

- put in the shape guard on the found call object
- handle the case where the upvar was defined on a non-entry native tracing frame. This isn't hard; I just don't want to code it until I have a test case for it, and generating a test case for it is not easy.

The native stack frame search really is unfortunate. Hopefully we can get a bright idea for eliminating it.
Longer-term thought: the shape guard is kind of unfortunate too. But I realized this is a good application for invalidation. By "invalidation", I mean throwing away a trace when the assumptions used in its generation are falsified. If we invalidate on a condition C, then we don't need to guard on C.

In this case, I think the only way we go wrong is when a PurgeScopeChain changes the shape of the call target. So if we could arrange for the trace to be invalidated when the shape of a call target of one of these traces is changed, we don't need the guard. And the invalidation shouldn't happen often: only an eval that defines a variable or a with should do it.
http://off.net/~mark/upvar_test.js is a good test case for case (2) in comment 0.
(In reply to comment #3)
> http://off.net/~mark/upvar_test.js is a good test case for case (2) in comment
> 0.

Blake asked why saddFields is not a flat closure, and I thought I'd comment in case anyone else wonders. The reason is because saddFields is a function created by a function definition, not a function expression. Therefore its binding and construction are hoisted to the top of the outer function, before var size is initialzed.

In this particular case, there's no way for saddFields to escape before the size var's initialization, though, so we should have a bug on optimizing harder.

/be
Attached patch Patch (obsolete) (deleted) β€” β€” Splinter Review
Attachment #381456 - Attachment is obsolete: true
Attachment #387719 - Flags: review?(gal)
Attached patch Patch 2 (deleted) β€” β€” Splinter Review
This fixes a bug in the previous version, which did not handle the case where the function that defined the upvar has returned. For this bug, I would like to simply exit trace in that case, and handle it as a follow-on bug.
Attachment #387719 - Attachment is obsolete: true
Attachment #387974 - Flags: review?(gal)
Attachment #387719 - Flags: review?(gal)
Attached file Microbenchmark (deleted) β€”
The last 2 tests don't speed up yet because they use returned closures. The first one speeds up by about 3.7x. The second one is fast even before the patch, because k is part of the trace.
Comment on why doing returned closures is hard:

I think the right thing to do is add a case to the builtin that reads the value from the Call object as a normal property. If I do this directly with js_GetPropertyHelper or JS_GetPropertyById, then it crashes. The problem is that when the property doesn't exist, js_GetPropertyHelper tries to read the PC (js_GetCurrentByteCodePC), in order to see if it should do special handling, but there's isn't a proper PC on trace when we haven't taken a bailExit.

It seems pretty evil to me that js_GetPropertyHelper relies on being able to read a PC. Furthermore, when the op is JSOP_NAME, there is never any special PC-specific stuff to do. So, it seems to me that we should just refactor a bit so that stuff can get skipped entirely when we don't need it.

A related issue is that Call objects with a null fp (private) are pretty much just standard objects, so it shouldn't be necessary to call resolvers and the like for them. It would be nice to have faster paths through property access code in cases where we know the object is "normal".
Comment on attachment 387974 [details] [diff] [review]
Patch 2

Awesome. This will minimally conflict with Jason's getter/setter work, so ccing him.
Attachment #387974 - Flags: review?(gal) → review+
Pushed to TM as b75efc9ee5b1.
Blocks: 505591
Depends on: 506018
http://hg.mozilla.org/mozilla-central/rev/b75efc9ee5b1
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Blocks: 552375
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: