Closed
Bug 531250
Opened 15 years ago
Closed 6 years ago
Tune ScriptObject initialization
Categories
(Tamarin Graveyard :: Virtual Machine, defect)
Tamarin Graveyard
Virtual Machine
Tracking
(Not tracked)
RESOLVED
WONTFIX
Future
People
(Reporter: lhansen, Unassigned)
References
Details
(Whiteboard: PACMAN Tracking)
Attachments
(3 files, 14 obsolete files)
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
text/plain
|
Details |
We really want to enable rapidly-allocating programs to function well. (Rapidly-allocating is not an absolute, but it should never be an issue about whether it's right to reuse an event handler object or allocate a new one - it should always be affordable to allocate a new one.)
There's probably some low-hanging fruit in ScriptObject initialization; I've just started looking. Consider ScriptObject::initHashtable:
iht->initialize(this->gc(), capacity);
In turn InlineHashtable::initialize:
capacity = MathUtils::nextPowerOfTwo(capacity);
setCapacity(capacity*2);
setAtoms((Atom *)gc->Calloc(getCapacity(),
sizeof(Atom),
GC::kContainsPointers|GC::kZero));
nextPowerOfTwo is a proper function and that function has a loop (which probably terminates reasonably quickly since most arguments are small, but still). In turn setCapacity() calls FindOneBit to compute the log of capacity; getCapacity() undoes that computation to produce the original value, etc - small potatoes to be sure, but if the allocation rate is high enough this all adds up. Most of the values could presumably be cached with the ScriptObject metadata in this case, ie, we're looking for specializing / parameterizing fast factories for ScriptObjects.
Reporter | ||
Comment 1•15 years ago
|
||
This is a simple allocating loop that just creates and discards a lot of class instances. Everything is strongly typed and I expect this is a typical "best case" for both the GC and the VM; the GC does not run, everything is handled by DRC as we want to. The idea is just to get an idea of where the cost goes in instance creation and destruction. Will post data when I have them.
Reporter | ||
Comment 2•15 years ago
|
||
Gross findings from this simple benchmark:
- Overall running time for 10 million allocations is 1523ms on my Mac Pro,
a cost of 0.15us / object or ~450 clocks / object, including all overhead
from the benchmark loop etc.
- 48.5% of the time spent is in creating and destroying Point objects (not
including memory management):
ScriptObject::createInstance 20.1
Traits::destroyInstance 9.5
ScriptObject::~ScriptObject 5.9
ScriptObject::ScriptObject 5.4
AvmCore::newObject 5.1
ClassClosure::newInstance 2.5
- 31% of the time spent in MMgc:
Allocation, deallocation 17.5
ZCT reaping 9.0
Write barrier 4.4
- 20% executing the code:
Jitted code 14.4
finddef_cache 5.6
This is with the allocator/deallocator optimizations from bug #528338 applied. Note that the GC basically does not run, everything is handled by reference counting.
General observations and questions:
- The best case for initializing an eight-word AS3 object (three 'int' fields,
AS3 vtable, delegate pointer, RC overhead, C++ vtable, rounded up) is more
than 8 simple stores, but 2.5 times the allocation+deallocation cost is
much more than expected.
- Why is finddef_cache showing up so much? I tried my best to make
everything be early-bindable, but maybe I failed.
- 14.4% in jitted code is more than expected - it's just a loop calling a
constructor - so presumably some of this time is actually for jitted
initialization code?
Reporter | ||
Comment 3•15 years ago
|
||
Another finding is that __bzero accounts for fully 3.8% of the running time, and almost all that time is accountable to destroyInstance (so it accounts for 40% of the time in destroyInstance). __bzero is only called if the object has no pointer or atom slots (as is true here). Its setup cost can be substantial however; I shaved 2.5% off the overall running time by open-coding special cases for objects up to 32 bytes inside destroyInstance.
Reporter | ||
Comment 4•15 years ago
|
||
Reporter | ||
Comment 5•15 years ago
|
||
AvmCore::newObject is just a trampoline but accounts for 5% of the running time, making it REALLY_INLINE removes about half of that.
Reporter | ||
Comment 6•15 years ago
|
||
Reporter | ||
Comment 7•15 years ago
|
||
A significant fraction of the time in ScriptObject::ScriptObject and ScriptObject::~ScriptObject - say 20% - is for adding the pointer of the new reference to the ZCT and removing it again.
Reporter | ||
Comment 8•15 years ago
|
||
It would seem that ScriptObject::createInstance could be specialized in various ways: once the first instance of a type has been created, some of the checking in createInstance appears not necessary; once it's been established that a type does not extend a class with a custom createInstance then the recursive virtual calls can be performed as a loop. The latter may be important because the prologue of createInstance is hot; there are many dependent loads. If refactored as a loop these loads could be hoisted out of the loop, possibly (remains to be seen). In general, perhaps the indirection could be flattened a bit.
Reporter | ||
Comment 9•15 years ago
|
||
... and of course that "loop" can be completely unrolled.
Reporter | ||
Comment 10•15 years ago
|
||
Actually the "loop" disappears completely, if we assume the error checking is redundant; all other data are loop-invariant.
The proof-of-concept code here specializes the case where only ScriptObjects are on the path, custom createInstance don't get in the way. This provides another healthy speedup - the three patches together have lopped off about 300ms (20%) of the running time. Now jitted code consumes about 21% and MMgc nearly 40% of the remaining running time.
Reporter | ||
Comment 11•15 years ago
|
||
Annoyingly GCC does not tail-call-optimize calls through a function pointer, so newInstance is more expensive than it needs to be - it calls the now-static createInstance helper indirect; that function then returns only to return again.
newInstance also contains a check for whether 'prototype' is NULL, which is presumably almost never true - it's wasted work to an arbitrary number of decimal places. It's particularly annoying since the indirection does not serve to create anything lazily, just to fetch a constant value (using a non-constant computation.)
Part of the problem is that the prototype can be set by user code, so even if it wasn't NULL it can become NULL. That setting is always via an accessor so it can be intercepted, but there are other complications: lots of native code access the 'prototype' property on the C++ side directly and will need to read NULL. A splitting into two separate properties /might/ fix it but the protocol vis-a-vis host code is unclear right now.
Reporter | ||
Comment 12•15 years ago
|
||
The 'gcInterface' member is not used either in the shell or in the Player, but it adds one callback object to the GC's list of callback objects. Callbacks are evil because of the prereap(obj) callback; that facility needs to be removed, but in the mean time we should only use it when necessary, not willy-nilly.
(Drops the time from 1230 to 1204 ms pretty consistently - 2%. Two percent here and two percent there... maybe it'll add up.)
It might be useful to split the prereap(obj) callback out as a separate object that the Player can register only when it has objects that require it - almost none I think (some SharedObject thing). It may be the best we can do in the short term.
Reporter | ||
Comment 13•15 years ago
|
||
One more note about newInstance. The code for OP_construct has two parts, the one part invoking newInstance and the other part calling the object constructor. The jitter does this for
LIns* vtable = loadVTable(objDisp);
LIns* ivtable = loadIns(LIR_ldcp, offsetof(VTable, ivtable), vtable);
method = loadIns(LIR_ldcp, offsetof(VTable, init), ivtable);
LIns* inst = callIns(FUNCTIONID(newInstance),1, localGetp(objDisp));
localSet(dest, inst, result);
If we avoid for the moment the song and dance around 'prototype' the body of newInstance is this:
VTable* ivtable = this->ivtable();
return ivtable->createInstance(this, ivtable, prototype);
where ClassClosure::ivtable() is
return vtable->ivtable;
We see that if we can get rid of the song and dance around 'prototype' then newInstance is trivial and can be removed. The savings from the jitter calling ivtable->createInstance directly will not just be the removed function call, but also removing redundant chained loads to load ivtable (the jitted code does it, and newInstance does it too).
Reporter | ||
Comment 14•15 years ago
|
||
Anyway after all this dinking we're in a situation where coerceUnboxEnter and finddef_cache together take about 30% of the running time. Yet the benchmark is a simple loop calling 'new' on a known class that has a trivial, generated constructor, inheriting directly from Object. In turn Object has a trivial, generated constructor. Where is all the time going?
(Lest this be thought an academic exercise, the Point class in the player is very similar to this example: http://help.adobe.com/en_US/AS3LCR/Flash_10.0/flash/geom/Point.html. It only has x and y fields of type Number (not int), it has some methods, and it has a constructor whose default argument values are 0 and 0 - basically a trivial constructor though it's harder to see that.)
The jitted code apparently calls finddef_cache to get the class, then newInstance to create an instance, then calls the constructor, which apparently calls the base class constructor.
This is the ABC for the Point constructor by asc -optimize:
function foo::Point():* /* disp_id -1*/
{
// local_count=1 max_scope=1 max_stack=1 code_len=6
0 getlocal0
1 pushscope
2 getlocal0
3 constructsuper (0)
5 returnvoid
}
Judging from disassembly (I'm still a novice) this is compiled pretty straight:
push ebp
mov ebp,esp
sub esp,56
mov -28(ebp),ebx
mov -44(ebp),esi
mov ecx,8(ebp)
mov eax,16(ebp)
lea edx,-40(ebp)
mov esi,0(513494)
mov -36(ebp),ecx
mov -40(ebp),esi
mov 5321876(0),edx
mov -32(ebp),-55903873
mov esi,0(513488)
cmp edx,esi
mov esi,-44(ebp)
jnb 0x618f0a
mov ebx,8(ebp)
mov ecx,ebx
call handleStackOverfl
mov ecx,ebx
mov ebx,-28(ebp)
mov eax,16(ebp)
0000618f0a
mov eax,0(eax)
or eax,1
mov ecx,8(ecx)
mov ecx,0(ecx)
mov ecx,8(ecx)
mov ecx,4(ecx)
lea edx,-28(ebp)
mov -28(ebp),eax
mov eax,0(ecx)
sub esp,4
push edx
push 0
push ecx
call eax
add esp,16
mov eax,4
mov ecx,-40(ebp)
mov 5321876(0),ecx
mov esp,ebp
pop ebp
ret
That is, 38 dynamic instructions, one of them a call, none of which are technically needed since the constructor is trivial.
This is the constructor for 'Object':
function Object():* /* disp_id -1*/
{
// local_count=1 max_scope=1 max_stack=1 code_len=3
0 getlocal0
1 pushscope
2 returnvoid
}
I haven't looked at the disassembly for this but I assume there's at least some boilerplate stack overflow checking and so on (TBD).
Comment 15•15 years ago
|
||
One would think that a supercall to Object() could *always* be safely eliminated...
Reporter | ||
Comment 16•15 years ago
|
||
(In reply to comment #12)
> Created an attachment (id=414861) [details]
> Remove the 'gcInterface' member from AvmCore
>
> The 'gcInterface' member is not used either in the shell or in the Player, but
> it adds one callback object to the GC's list of callback objects.
... and that callback is actually used in the vm core for pool management, so we can't delete gcInterface that easily.
Makes it more attractive to split the single callback into multiple callbacks.
Reporter | ||
Comment 17•15 years ago
|
||
This is another sketch; there may be corner cases here I haven't thought much about (when 'super' passes arguments, for example.) This doesn't make a lot of difference (< 1% speedup) but it's consistenly measurable.
Reporter | ||
Comment 18•15 years ago
|
||
Data point: the time in stock TR (ie without any allocator optimizations) is 1700ms on the same machine.
Reporter | ||
Comment 19•15 years ago
|
||
(In reply to comment #11)
>
> newInstance also contains a check for whether 'prototype' is NULL, which is
> presumably almost never true - it's wasted work to an arbitrary number of
> decimal places. It's particularly annoying since the indirection does not
> serve to create anything lazily, just to fetch a constant value (using a
> non-constant computation.)
>
> Part of the problem is that the prototype can be set by user code, so even if
> it wasn't NULL it can become NULL. That setting is always via an accessor so
> it can be intercepted, but there are other complications: ...
The most credible fix for this problem is as follows:
- Make ClassClosure::prototype private, hide it behind a getter/setter pair
prototypePtr / setPrototypePtr. This causes a fair amount of pain as every
native subclass uses the prototype field during construction. Strike one
for C++ which does not have getter/setter syntax. But the pain is not deep.
(Be sure to change even the update from the set_prototype setter to go
through setPrototypePtr.)
- Move the handling of a NULL prototype value to the ivtable->createInstance
function. The generic case can always check whether the prototype needs to
be updated from NULL to something more reasonable. The fast case never needs
to check provided setPrototypePtr snaps the createInstance pointer back to
the generic case again when a NULL value is stored.
The benefit is that there is more or less provably no change in semantics: the change filters out redundant NULL checks but every non-redundant NULL check will happen exactly at the time it currently happens, and will have exactly the same effects.
Reporter | ||
Comment 20•15 years ago
|
||
Comment on attachment 415107 [details] [diff] [review]
Don't call the Object constructor from a subclass's constructor
Testing shows that this introduces regressions eg in as3/Definitions/Classes/ClassDef/Bug118272Package.abc
Reporter | ||
Comment 21•15 years ago
|
||
Fix embarrassing bug.
Attachment #414834 -
Attachment is obsolete: true
Reporter | ||
Comment 22•15 years ago
|
||
Comment on attachment 414861 [details] [diff] [review]
Remove the 'gcInterface' member from AvmCore
Unsound idea.
Attachment #414861 -
Attachment is obsolete: true
Comment 23•15 years ago
|
||
The jit is only able to early bind and invoke a constructor method when we know that the class doesn't require custom logic prior to the constructor invocation. the flag that controls this has been found to be flaky (bug 530222).
If, while exploring, we see an opportunity to restructure the object construction protocol to avoid the need for the flag, and let the jit unconditionally invoke the constructor, then we'll get a nice win for those classes that require customized ScriptObject::construct() logic.
Reporter | ||
Comment 24•15 years ago
|
||
(In reply to comment #19)
>
> The most credible fix for this problem is as follows:
To my relative astonishment this appears to work and yields about 2% speedup on this benchmark. Patches coming.
Reporter | ||
Comment 25•15 years ago
|
||
Obviously a biggish patch for the avmglue will also be needed before this one can land, but this is good API discipline in any case.
Reporter | ||
Comment 26•15 years ago
|
||
Drive-by tuning, needs further vetting but the FreeNotNull tweak was a visible win.
Reporter | ||
Comment 27•15 years ago
|
||
In general I also question the need for the 'composite == 0' invariant in the reference counting code. It's of dubious value now that the object index is stored in that word when the object is on the free list - and no asserts seem to have fired anyway - and there are tests in the RC code that check it yet no ill effects have some far come from the tests failing, that I have seen. In general it just inserts work everywhere. We should revisit it.
Reporter | ||
Comment 28•15 years ago
|
||
A bit rough this one - contains reworkings of parts of earlier patches - but passes acceptance tests. The trick is as outlined earlier: whenever the prototype property is updated we snap the createInstance pointer back to the general case, which will then compute a new prototype value as appropriate.
Reporter | ||
Comment 29•15 years ago
|
||
This refactors GC::FreeNotNull by moving most of the logic into GCAlloc::Free and GCLargeAlloc::Free; the way it does that is to create a GCAllocBase class that the two allocator types derive from, with a virtual Free() method. FreeNotNull dispatches by grabbing the allocator out of the block header and making a virtual call.
This yields a healthy 7% speedup on the Mac Pro. The savings comes from several sources. FreeNotNull is now inlinable so we avoid the call overhead. The inlined virtual call is correctly predicted pretty much always. But most importantly, gcc did not generate good code for FreeNotNull because it had several uncommon-case branches - as a result the procedure prologue was very heavyweight, for something that was supposed to be a quick dispatch. Now it /is/ a quick dispatch, the complexity has been absorbed into GCAlloc::Free and GCLargeAlloc::Free, where it does much less damage. Finally, the really-uncommon cases were factored out as a separate bailout function in GC to reduce compiler confusion further.
Reporter | ||
Comment 30•15 years ago
|
||
With all those changes we've seen a 36% reduction in running time (relative to TR) and the most-wanted list now looks like this:
MethodEnv::coerceUnboxEnter 19.2%
ScriptObject::fastCreateInstance 14.4%
GCAlloc::Alloc 13.4%
ZCT::Reap 10.2%
ScriptObject::~ScriptObject 8.5%
GCAlloc::Free 8.2%
Traits::destroyInstance 7.9%
finddef_cache 7.7%
GC::WriteBarrierRC_ctor 3.8%
GC::WriteBarrierRC_dtor 2.7%
ZCT::PinStackObjects 2.0%
GCCallback::prepreap(obj) 1.3%
fastCreateInstance inlines newObject which appears to have inlined ScriptObject::ScriptObject, which in turn inlines RCObject::RCObject, which inlines logic to add the object to the ZCT - so the time spent here is not surprising, and some should be billed to the GC.
ScriptObject::~ScriptObject inlines RCObject::~RCObject, which inlines logic to remove the object from the ZCT - again, some of this time should be billed to the GC.
At this point the profile shows mutator/GC ~60/~40 though in truth it's probably closer to 50/50 when all the inlining has been accounted for.
Reporter | ||
Comment 31•15 years ago
|
||
This program is similar to the one in attachment #414742 [details], except that a pointer to each new object is stored in an instance variable, not a local variable. Ergo each object's reference count is incremented and then decremented when the next object is stored. The profile is similar to the first program, but the RC write barrier has taken the number two spot with 15% of the running time. This program runs in 1345ms compared to 1090ms for the first program.
Comment 32•15 years ago
|
||
Recently it occurred to me that all instances of ordinary classes share the same value of ScriptObject::delegate, and therefore that value could be stashed somewhere else (VTable). The only way SO::delegate can vary from instance to instance is instances of Object created with a Function constructor.
Saving one word per object will be interesting, eventually, but this could also reduce work at object instantiation time.
Reporter | ||
Comment 33•15 years ago
|
||
Edwin notes elsewhere about how the JIT code is distributed (this is TR):
"With an AS3-symbol-instrumented shark run, I get 6.5% self time in Point(),
5.9% in allocate(), 1.3% in Object(), negligible everywhere else. If I
flatten, it all attributes to allocate(), except for a few stray samples
attributed to Point but without allocate on the stack. (i blame shark for
fuzzy noise like that)"
Comparatively that's a lot of time in allocate(), but if it has to call
finddef_cache, then call newInstance, then call the constructor, I can see it,
depending on how well those calls are done.
Ed again: "[Calls] are not superb, so the relative time in Point vs Allocate
doesn't surprise me. Improvements to finddef, the as3 jit ABI, and general jit
code quality should uniformly help."
Reporter | ||
Comment 34•15 years ago
|
||
In reference to comment #30, can we optimize the ZCT further?
Some observations:
- It's always valid to evict members of the ZCT. The ZCT helps to make memory available sooner, but it's just an optimization (though an important one).
- Objects must be inserted in the ZCT when they are created (with reference count 0), and must be removed when they become referenced by other objects (and their reference counts go to 1)
- Objects in the ZCT have a "ZCT index" that indicate their place in the ZCT, so that an explicit deletion of an object in the ZCT can remove it quickly.
- Every RCObject is allocated via GCAlloc::Alloc or GCLargeAlloc::Alloc, so anything that goes on in the RCObject constructor can move into the allocators if we so desire.
There is a fair amount of logic in the RCObject constructor to insert the object into the ZCT, call a function on overflow, and compute and insert the ZCT index. The code is large and branchy in practice - it's been tuned, but it's still complex.
Two optimizations suggest themselves:
- One optimization is that the ZCT does not have to be precise; it can be a hash table indexed by the object pointer. Collisions would be handled by simple overwriting. Whether an object is in the ZCT or not becomes a lookup in the hash table rather than a check of the object's header. The ZCT index field - and its maintenance - goes away. Insertions into the ZCT and removals from the ZCT are simple pointer writes, no decision points, no overflow checks.
- Another optimization is that CodegenLIR generates better code when an object is constructed that is likely to become referenced by another object; this completely removes the ZCT insertion/removal cost. Remember, it's always correct not to insert an object in the ZCT, so it's OK if the JIT guesses wrong from time to time, or if exceptions or similar effects invalidate a correct analysis.
In a pattern like
construct
...
putproperty
the 'construct' call would create the object without adding it to the ZCT but setting its reference count to 1, and putproperty would store it using a cheaper write barrier, like WriteBarrierRC_dtor (because only the existing content in the slot is affected by the barrier, the new object has a correct reference count already).
What are the dangers of these optimizations? We risk that large object structures that would have been freed by RC will no longer be so, they will require GC. This is so because pointers to their roots (or significant elements) will have been evicted from the ZCT by an overwrite or not entered into the ZCT by an incorrect analysis. We can combat these dangers by using a larger hash table, a better hash function, and a better analysis in the JIT.
Reporter | ||
Comment 35•15 years ago
|
||
(In reply to comment #34)
> In reference to comment #30, can we optimize the ZCT further?
The Smalltalk "Green Book" notes that RC logic becomes much simpler if the RC field is so large that it won't overflow. That's a trivial observation perhaps, but it applies to us: if we don't need a ZCT index because the ZCT is a hash table indexed by the pointer value, then the ZCT index field's space as well as the in-ZCT bit now in 'composite' can be used for the reference count. Since we only need one bit to denote pinning, we can devote 31 bits to the reference count. Since the smallest object is 8 bytes, we only need 29 bits for a reference count that's guaranteed never to overflow. We currently have code that depends on being able to 'stick' an RCObject, but (a) I suspect that code is dodgy and (b) if we really need it there's no problem using a reference count that can never reach zero - it has the same effect.
Reporter | ||
Comment 36•15 years ago
|
||
Comment on attachment 415359 [details] [diff] [review]
Refactor GC::FreeNotNull
Moved to bug #528338.
Attachment #415359 -
Attachment is obsolete: true
Reporter | ||
Comment 37•15 years ago
|
||
Comment on attachment 415162 [details] [diff] [review]
Some tweaks to the ZCT code
The FreeNotNull tweak landed with bug #528338, the removal of the local variables needs investigation on multiple systems before it's trustworthy.
Reporter | ||
Updated•15 years ago
|
Priority: -- → P3
Target Milestone: --- → flash10.2
Reporter | ||
Comment 38•15 years ago
|
||
Rebased to redux 3530.
Attachment #414835 -
Attachment is obsolete: true
Attachment #421770 -
Flags: review?(edwsmith)
Reporter | ||
Comment 39•15 years ago
|
||
Comment on attachment 415162 [details] [diff] [review]
Some tweaks to the ZCT code
The FreeNotNull change landed elsewhere; the rest are probably machine-specific and not worth changing.
Attachment #415162 -
Attachment is obsolete: true
Reporter | ||
Comment 40•15 years ago
|
||
(In reply to comment #10)
> Created an attachment (id=414845) [details]
> Short-circuit createInstance
The method Traits::hasCustomConstruct is the canonical way to determine whether the base case constructor is overridden, and we should probably just use that. See bug #530222 for a discussion of problems with that method, however.
Reporter | ||
Comment 41•15 years ago
|
||
Rebased to redux 3530; more documentation. FP builtins vetted to ensure they obey the restriction now documented in ScriptObject.h.
The condition I'm computing here appears to be different from the condition implied by hasCustomConstructor, which is why I'm not reusing that.
Attachment #414845 -
Attachment is obsolete: true
Attachment #421797 -
Flags: review?(stejohns)
Attachment #421797 -
Flags: review?(edwsmith)
Reporter | ||
Comment 42•15 years ago
|
||
Rebased to redux 3530. This is a non-functional change creating controlled access to the 'prototype' field. The patch sets us up for the next one, where setPrototypePtr will do more work (off the hot path of object creation).
Attachment #415159 -
Attachment is obsolete: true
Attachment #421800 -
Flags: review?(stejohns)
Attachment #421800 -
Flags: review?(edwsmith)
Updated•15 years ago
|
Attachment #421770 -
Flags: review?(edwsmith) → review+
Comment 43•15 years ago
|
||
Comment on attachment 421797 [details] [diff] [review]
Short-circuit createInstance
Assuming the invariant is true that the base case is always hit or never hit, this looks right.
It might be possible to use the function pointer alone and no baseCase flag, if we didn't mind genericCreateInstance() being called repeatedly for the non-optimized case. (fast path gets faster, slow path gets slower). meh.
Attachment #421797 -
Flags: review?(edwsmith) → review+
Reporter | ||
Comment 44•15 years ago
|
||
(In reply to comment #43)
> It might be possible to use the function pointer alone and no baseCase flag, if
> we didn't mind genericCreateInstance() being called repeatedly for the
> non-optimized case. (fast path gets faster, slow path gets slower). meh.
I don't understand how that would make the fast path faster.
Updated•15 years ago
|
Attachment #421800 -
Flags: review?(edwsmith) → review+
Comment 45•15 years ago
|
||
eliminating the baseCase flag doesn't make the fast path faster, i was referring to the overall change. moot point since the new bool uses padding space anyway.
Reporter | ||
Comment 46•15 years ago
|
||
Rebased to redux 3530 + the above three patches.
Avoids calling newInstance at all by moving the check for the validity of the prototype off the hot path. When the prototype is set to NULL by user code the original createInstance trampoline is snapped back in and computes a proper value.
Attachment #415163 -
Attachment is obsolete: true
Attachment #421812 -
Flags: review?(stejohns)
Attachment #421812 -
Flags: review?(edwsmith)
Reporter | ||
Updated•15 years ago
|
Whiteboard: Has patch
Comment 47•15 years ago
|
||
Comment on attachment 421812 [details] [diff] [review]
Make the jit call ivtable->createInstance directly
Looks right.
Attachment #421812 -
Flags: review?(edwsmith) → review+
Reporter | ||
Comment 48•15 years ago
|
||
newInstance is no longer needed by the JIT, so remove the defs.
newInstance and ivtable are small enough to inline, and there are other inline functions in ClassClosure, so introduce ClassClosure-inlines.h.
Attachment #421815 -
Flags: review?(edwsmith)
Updated•15 years ago
|
Attachment #421815 -
Flags: review?(edwsmith) → review+
Comment 49•15 years ago
|
||
Comment on attachment 421797 [details] [diff] [review]
Short-circuit createInstance
This is a nice change, but I suspect we could do even better, as a given class shouldn't ever be able to switch this behavior at runtime: ie, we should (in theory) be able to decide at compiletime which to use for a given class, shouldn't we?
Attachment #421797 -
Flags: review?(stejohns) → review+
Updated•15 years ago
|
Attachment #421800 -
Flags: review?(stejohns) → review+
Comment 50•15 years ago
|
||
Comment on attachment 421812 [details] [diff] [review]
Make the jit call ivtable->createInstance directly
r+, but IMHO having both "genericCreateInstance" and "generalCreateInstance" is... confusing. Surely better names are available?
Attachment #421812 -
Flags: review?(stejohns) → review+
Reporter | ||
Comment 51•15 years ago
|
||
Comment on attachment 421770 [details] [diff] [review]
Make AvmCore::newObject REALLY_INLINE
redux changeset: 3574:8390bcf5f5ea
Attachment #421770 -
Attachment is obsolete: true
Reporter | ||
Comment 52•15 years ago
|
||
Comment on attachment 421797 [details] [diff] [review]
Short-circuit createInstance
redux changeset: 3574:8390bcf5f5ea
Reporter | ||
Updated•15 years ago
|
Attachment #421797 -
Attachment is obsolete: true
Reporter | ||
Comment 53•15 years ago
|
||
Comment on attachment 421800 [details] [diff] [review]
Hide the 'prototype' field
redux changeset: 3574:8390bcf5f5ea
Attachment #421800 -
Attachment is obsolete: true
Reporter | ||
Comment 54•15 years ago
|
||
Comment on attachment 421812 [details] [diff] [review]
Make the jit call ivtable->createInstance directly
redux changeset: 3574:8390bcf5f5ea
Attachment #421812 -
Attachment is obsolete: true
Reporter | ||
Comment 55•15 years ago
|
||
Comment on attachment 421815 [details] [diff] [review]
Clean up newInstance
redux changeset: 3574:8390bcf5f5ea
Attachment #421815 -
Attachment is obsolete: true
Reporter | ||
Comment 56•15 years ago
|
||
(In reply to comment #50)
> (From update of attachment 421812 [details] [diff] [review])
> r+, but IMHO having both "genericCreateInstance" and "generalCreateInstance"
> is... confusing. Surely better names are available?
Agreed. Renamed the latter as "slowCreateInstance" to contrast with "fastCreateInstance". redux changeset: 3574:8390bcf5f5ea.
Comment 57•15 years ago
|
||
(In reply to comment #1)
> nextPowerOfTwo is a proper function and that function has a loop (which
> probably terminates reasonably quickly since most arguments are small, but
> still).
In particular, that loop might be made faster; it currently executes n left-shifts by 1; it could instead do log(n) shifts of varying magnitude (wikipedia has the pseudocode for this). If most arguments are small as you said, then that might be a bad trade-off. . . but its sounds like a pretty simple thing to try out.
Reporter | ||
Updated•15 years ago
|
Summary: [Speed] Tune ScriptObject initialization → Tune ScriptObject initialization
Whiteboard: Has patch → PACMAN, Has patch
Reporter | ||
Updated•15 years ago
|
Whiteboard: PACMAN, Has patch → PACMAN
Comment 58•14 years ago
|
||
This bug is still open, and marked for PACMAN, but it's not clear which (specific) items are being targeted, nor when we'll consider it "done". I'm guessing the items in comments 34 and 35 are worthy of exploring; are there other items on the try-it-out list?
Reporter | ||
Comment 59•14 years ago
|
||
(In reply to comment #58)
> This bug is still open, and marked for PACMAN, but it's not clear which
> (specific) items are being targeted, nor when we'll consider it "done". I'm
> guessing the items in comments 34 and 35 are worthy of exploring; are there
> other items on the try-it-out list?
Useless calls to useless constructors - we want to optimize that if we can, as it accounts for 50% of the running time of the benchmark, function calls to empty functions being way expensive.
See comment #14 and comment #17 (at least).
Reference counting overhead is being addressed by some MMgc projects, probably not worth focusing on those here.
Feel free to open more targeted bugs for items in this bug, so that we can close it - 60 comments is more than enough.
Comment 60•14 years ago
|
||
Continued into https://bugzilla.mozilla.org/show_bug.cgi?id=561140
Closing this one
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 61•14 years ago
|
||
Reopening this as a tracking bug
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: PACMAN → PACMAN Tracking
Reporter | ||
Updated•14 years ago
|
Comment 62•14 years ago
|
||
(In reply to comment #7)
> A significant fraction of the time in ScriptObject::ScriptObject and
> ScriptObject::~ScriptObject - say 20% - is for adding the pointer of the new
> reference to the ZCT and removing it again.
A lot of times the ZCT operates as a LIFO: object is born into it and more or less immediately removed.
So I'm wondering if in Remove() when you're removing the last thing you could just decrement top. That might be a tiny bit faster in itself, but more importantly the ZCT will grow a lot more slowly since the top entry will tend to be used over and over again.
(Yeah, it might be a bit fiddly since "top==limit" operates as a flag.)
Reporter | ||
Comment 63•14 years ago
|
||
Comment on attachment 415107 [details] [diff] [review]
Don't call the Object constructor from a subclass's constructor
This has been handled in bug #561140.
Attachment #415107 -
Attachment is obsolete: true
Reporter | ||
Updated•13 years ago
|
Assignee: lhansen → nobody
Status: ASSIGNED → NEW
Comment 64•6 years ago
|
||
Tamarin is a dead project now. Mass WONTFIX.
Status: NEW → RESOLVED
Closed: 14 years ago → 6 years ago
Resolution: --- → WONTFIX
Comment 65•6 years ago
|
||
Tamarin isn't maintained anymore. WONTFIX remaining bugs.
You need to log in
before you can comment on or make changes to this bug.
Description
•