Closed Bug 621961 Opened 14 years ago Closed 14 years ago

TI+JM: inline some native functions

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: jandem, Assigned: jandem)

References

(Blocks 1 open bug)

Details

(Whiteboard: fixed-in-jaegermonkey)

Attachments

(1 file, 7 obsolete files)

WIP v1 14 years ago Jan de Mooij [:jandem] (deleted), patch		Details \| Diff \| Splinter Review
WIP v2 14 years ago Jan de Mooij [:jandem] (deleted), patch		Details \| Diff \| Splinter Review
WIP v3 14 years ago Jan de Mooij [:jandem] (deleted), patch		Details \| Diff \| Splinter Review
WIP v4 14 years ago Jan de Mooij [:jandem] (deleted), patch		Details \| Diff \| Splinter Review
WIP v5 14 years ago Jan de Mooij [:jandem] (deleted), patch		Details \| Diff \| Splinter Review
WIP v6 14 years ago Jan de Mooij [:jandem] (deleted), patch		Details \| Diff \| Splinter Review
Patch v7 14 years ago Jan de Mooij [:jandem] (deleted), patch	bhackett1024 : review+	Details \| Diff \| Splinter Review
Patch v8 14 years ago Jan de Mooij [:jandem] (deleted), patch		Details \| Diff \| Splinter Review

Jan de Mooij [:jandem]

Assignee

Description

•

14 years ago

Kraken's imaging-gaussian-blur has *many* calls to Math.abs. For these calls TI knows it calls the builtin Math.abs and the argument is an integer. We can probably inline this in the methodjit like the tracer does. Maybe we can also call traceable natives? This is related to bug 579574. The difference is that the patch there does this in the MIC and needs more guards, memory stores and type checks. IIUC, with type inference we can do this directly in JM and use the FrameState machinery, eg. for Math.abs(x) we can keep x in registers. Combined with bug 612204 this could make these functions very fast. Brian, does this makes some sense or am I missing something? I can work on a prototype for Math.abs and see how much it helps Kraken/V8.

Jan de Mooij [:jandem]

Assignee

Updated

•

14 years ago

Assignee: general → jandemooij

Status: NEW → ASSIGNED

Brian Hackett [Laid off!]

Comment 1

•

14 years ago

This is definitely the right idea. From numbers in bug 579574 and my own experiments, most of the gain in optimizing these builtins is avoiding spill code --- Math.abs(int) should not spill anything, and just test on the register containing x. It would probably be easiest to look at doing bug 612204 after putting together a prototype. Then the JSOP_CALL could pop a constant-valued frame entry and specializing the call path would fall naturally from there. That would require being able to identify singleton TypeObjects and go to the corresponding JSObject (probably just a 'JSObject *singleton' in TypeObject; eventually we will I think want to avoid constructing singleton TypeObjects and manipulate the JSObjects directly in type sets, but that's longer term).

Jan de Mooij [:jandem]

Assignee

Comment 2

•

14 years ago

Attached patch WIP v1 (obsolete) (deleted) — Details — Splinter Review

First stab. This inlines Math.abs(int) in the methodjit. Also triggers a recompile for Math.abs(MIN_INT) which results in a double. This also does constant folding for e.g. Math.abs(-10) -> 10. Easy to do; maybe not very relevant for abs, but it could help Math.sqrt, Math.PI etc. Some numbers: --- function f() { var abs = Math.abs; var t0 = new Date; var x = 0; for(var i=0; i<10000000; i++) { x = abs(i); } print(new Date - t0); } f(); --- V8: 116 ms TM: 37 ms JM (tip): 231 ms JM (infer old): 218 ms JM (infer new): 21 ms Generated code looks like this: -- movl %esi, %edi cmpl $0x0, %edi jge <end> negl %edi cmpl $0x0, %edi jl <stub path> -- GCC does this without jumps, but we need to check for MIN_INT anyway and GCC requires more registers. So I don't think this is too bad. Next, I will do some cleanup, extend this to JSOP_CALLPROP and add other builtins like floor.

Jan de Mooij [:jandem]

Assignee

Comment 3

•

14 years ago

I had to change Math.abs to a local abs variable to test Kraken's imaging-gaussian-blur (until we optimize JSOP_CALLPROP; but it will be as fast). Also verified the output is still correct. TM: 875 ms JM+TI: 1243 ms V8: 2610 ms JM: 4719 ms So, with this patch we may beat Crankshaft on another test. Can't wait to see TM+TI, should be impressive :)

Brian Hackett [Laid off!]

Comment 4

•

14 years ago

Nice! One issue here is that if MathAbsMinIntHelper is called and triggers an overflow, you can't count on the recompiled caller still knowing the called function is Math.abs or that the argument is an int. In such cases the recompiled code will not have a call to MathAbsMinIntHelper, the recompiler will not be able to patch the stack and will crash safely. The compiler can test the 'recompiling' flag to see whether it might need to emit extra OOL call paths for rejoining from recompiled calls. Since most inlinable natives will have cases where they need to stub out and can trigger recompilation, we don't want to end up with many different stubs for the compiler to call when recompiling. Hopefully what will work is to have all inline natives use the same stub, probably ic::NativeCall/NativeNew (looking, I think there's a bug where we don't call these when recompiling to an uncached call), and have the Compiler make sure to emit a call to that function whenever recompiling.

Jan de Mooij [:jandem]

Assignee

Comment 5

•

14 years ago

Thanks for your feedback. I tested the recompiling path and it seemed to work, but that was probably luck. I will try to understand the recompiler better and fix it the Right Way. (fortunately this should not affect performance numbers in comment 2 and 3). We also have to think about Math.abs(float,int) etc. Limiting the number of stubs will be good and make it easier to compile more builtins. (At some point we should probably add some instrumentation to find out which functions and argument types will benefit the most from inlining.)

Tom S [:evilpie]

Comment 6

•

14 years ago

Cool stuff, will do |String()| and |Number()|.

Jan de Mooij [:jandem]

Assignee

Comment 7

•

14 years ago

Attached patch WIP v2 (obsolete) (deleted) — Details — Splinter Review

Refactoring, added a new file FastBuiltins.cpp for inlined natives, added tests. Static overflow (Math.abs(-2147483648)) is now handled correctly. JSOP_CALLPROP is now optimized like JSOP_CALLLOCAL, only difference is that we still do the "getgname Math" but I think we can optimize that too. Slow path calls stubs::SlowCall instead of a special Math.abs helper, like Brian suggested.

Attachment #500404 - Attachment is obsolete: true

Jan de Mooij [:jandem]

Assignee

Comment 8

•

14 years ago

Attached patch WIP v3 (obsolete) (deleted) — Details — Splinter Review

Added more Math functions. This patch inlines: Math.abs(int) : int Math.abs(double) : double Math.sqrt(double) : double Math.sqrt(int) : double Math.floor(double) : int Math.round(double) : int Math.round helps Kraken audio-oscillator: V8: 326 ms TM: 595 ms JM+TI old: 673 ms JM+TI new: 344 ms Smaller wins on v8-raytrace (Math.sqrt) and some other tests. Next are Math.{sin,cos,log,..}. These are more complex (and need the math cache), I think I will try to call the traceable natives or call js_math_sin etc directly. Maybe also Math.{pow,ceil,max,min}, don't know how common these are.

Attachment #500612 - Attachment is obsolete: true

Jan de Mooij [:jandem]

Assignee

Comment 9

•

14 years ago

Attached patch WIP v4 (obsolete) (deleted) — Details — Splinter Review

Inline charAt and charCodeAt with integer argument and non-rope string. For the typical micro-benchmark (doing c = s.charCodeAt(5); 1000000 times) we have: TM: 6 ms JM (FF4): 23 ms JM+TI (new): 3 ms d8: 10 ms This helps SS string-base64 (11 ms to 7 ms) and some other SS tests.

Attachment #501026 - Attachment is obsolete: true

Jan de Mooij [:jandem]

Assignee

Comment 11

•

14 years ago

Attached patch WIP v5 (obsolete) (deleted) — Details — Splinter Review

Compile Math.pow(x, 0.5) and Math.pow(x, -0.5) as sqrt(x) and 1/sqrt(x). Still seeing about a 4-5% win on SS, and ~40% on Kraken (hard to measure because some tests fail on the TI branch)

Attachment #506908 - Attachment is obsolete: true

Brian Hackett [Laid off!]

Updated

•

14 years ago

Blocks: 638794

Jan de Mooij [:jandem]

Assignee

Comment 12

•

14 years ago

Attached patch WIP v6 (obsolete) (deleted) — Details — Splinter Review

Rebased and fixed some issues. JM branch can run Kraken now, for -m -n: ** TOTAL **: 1.51x as fast 14975.2ms +/- 0.2% 9902.2ms +/- oscillator: 2.10x as fast 622.4ms +/- 0.3% 295.8ms gaussian-blur: 4.13x as fast 6349.4ms +/- 0.1% 1536.6ms

Attachment #510628 - Attachment is obsolete: true

Brian Hackett [Laid off!]

Comment 13

•

14 years ago

Nice! Are these numbers comparing vs. JM tip or TM tip?

Jan de Mooij [:jandem]

Assignee

Comment 14

•

14 years ago

Comparing JM tip vs. JM tip + patch. TM tip (-m only) runs Kraken in 12247 ms and gaussian-blur in 4968 ms, so some tests are slower with TI enabled (probably bug 619592 and others). Others like audio-fft and some crypto tests are already ~2x faster, pretty cool (TM tip -m vs JM tip -m -n)

Jeff Walden [:Waldo]

Comment 15

•

14 years ago

(In reply to comment #11) > Compile Math.pow(x, 0.5) and Math.pow(x, -0.5) as sqrt(x) and 1/sqrt(x). Note that 1 / Math.pow(-0, 0.5) !== 1 / Math.sqrt(-0).

OS: Mac OS X → Windows XP

Jeff Walden [:Waldo]

Updated

•

14 years ago

OS: Windows XP → All

Hardware: x86 → All

Jan de Mooij [:jandem]

Assignee

Comment 16

•

14 years ago

(In reply to comment #15) > Note that 1 / Math.pow(-0, 0.5) !== 1 / Math.sqrt(-0). Yes for pow(x, 0.5) we take a slow path for x === -Infinity and otherwise add 0.0 to convert -0 to +0. These edge cases can be very tricky.

Jan de Mooij [:jandem]

Assignee

Comment 17

•

14 years ago

Attached patch Patch v7 (obsolete) (deleted) — Details — Splinter Review

Rebased, added tests, refactoring. If there's anything you would have done differently, let me know and I'll happily change it. This passes jit-tests and jstests with -m -n -a.

Attachment #518509 - Attachment is obsolete: true

Attachment #520671 - Flags: review?(bhackett1024)

Jan de Mooij [:jandem]

Assignee

Updated

•

14 years ago

Summary: TypeInference: JM: optimize calls to simple builtins → TI+JM: inline some native functions

Brian Hackett [Laid off!]

Comment 18

•

14 years ago

Comment on attachment 520671 [details] [diff] [review] Patch v7 High level: This patch looks great! However, there are some correctness problems with the interface with the type information, which I've described below (and will clean up and add to the code as comments). I'm happy to take the patch as is and fix these myself if you want, let me know your preference. Would be good to get this landed quickly. > /* Whether any objects this represents have an equality hook. */ > bool hasSpecialEquality; > >+ JSObject *singleton; >+ A comment here would be good. Also, the singleton should be marked by TypeObject::trace. Normally type objects get marked when one of the objects using them as the type gets marked (which could only be the singleton), but that is not always the case --- if a GC occurs during an inference operation or during compilation then all type objects in the compartment are marked, which could leave a dangling pointer here. > BEGIN_CASE(JSOP_CALLGLOBAL) >- jsop_getglobal(GET_SLOTNO(PC)); >- if (op == JSOP_CALLGLOBAL) >- frame.push(UndefinedValue()); >+ if (JSObject *singleton = pushedSingleton(0)) >+ frame.push(ObjectValue(*singleton)); >+ else >+ jsop_getglobal(GET_SLOTNO(PC)); >+ frame.push(UndefinedValue()); > END_CASE(JSOP_GETGLOBAL) > bool > mjit::Compiler::jsop_callprop(JSAtom *atom) > { > FrameEntry *top = frame.peek(-1); > >+ JSObject *singleton = pushedSingleton(0); >+ if (singleton && singleton->isFunction()) { >+ // THIS >+ >+ frame.dup(); >+ // THIS THIS >+ >+ frame.push(ObjectValue(*singleton)); >+ // THIS THIS FUN >+ >+ frame.shift(-2); >+ // FUN THIS >+ >+ return true; >+ } >+ /* Optimize singletons like Math for JSOP_CALLPROP. */ >+ JSObject *obj = pushedSingleton(0); >+ if (obj && *(PC+JSOP_GETGNAME_LENGTH) == JSOP_CALLPROP) { >+ frame.push(ObjectValue(*obj)); >+ return; >+ } It's important for property and global variable accesses that we don't optimize away any type monitoring for undefined values. The type set for a property only describes the possible types for the property * if it exists *, it does not guarantee that the property is actually on the object. There is an exception to this: global variables defined with 'var x' do not have undefined added to their type set even though they are properties of the global object. So these ops can only be nop'ed if we can also determine that an undefined value cannot be pushed. That can be done by establishing a new invariant: For a TypeObject objtype with singleton JSObject obj and property id, if obj has a non-undefined value for id then any lookup on id in the future cannot produce undefined unless 'undefined' is also added to the type set for id in objtype. Adding undefined in this way would trigger recompilation of the script and cause the pushedSingleton calls to return NULL. We already almost have this invariant, we just need to make sure that deleting a property from an object marks that property as undefined in the type (we currently don't do anything with 'delete' in inference), must call cx->addTypeProperty in obj->deleteProperty. Then, after calling pushedSingleton, CALLGLOBAL and CALLGNAME should check the global object (these opcodes only appear in compileAndGo code) to make sure the property is there. The optimization for CALLPROP should only apply if we can test a JSObject which is definitely either the operand or is on the operand's prototype chain. For Math.foo the operand should be a constant FrameEntry (may need to make sure for the NAME accesses on Math, whose IC and monitoring checks can't be optimized but whose pushed value can still be marked as a singleton). For ('foo').charAt(0), if the operand is definitely a primitive string and the code is compileAndGo, you can lookup the String.prototype for the global and check there. The above invariant might need some tuning for handling properties with custom getter hooks, etc. This stuff gets pretty hairy pretty fast. Also, why the '*(PC+JSOP_GETGNAME_LENGTH) == JSOP_CALLPROP' test? >+JSObject * >+mjit::Compiler::pushedSingleton(unsigned pushed) >+{ >+ if (!cx->typeInferenceEnabled()) >+ return NULL; >+ >+ types::TypeSet *types = script->types->pushed(PC - script->code, pushed); >+ if (types && types->getKnownTypeTag(cx, script) == JSVAL_TYPE_OBJECT && >+ types->objectCount == 1) { >+ types::TypeObject *obj = (types::TypeObject *)types->objectSet; >+ return obj->singleton; >+ } >+ return NULL; >+} It's also important to make sure that all information the Compiler requests from type sets is duly recorded as constraints on those type sets, so that if the information changes in the future a recompilation will be triggered. The 'getKnownTypeTag' test here adds such constraints, but the 'types->objectCount == 1' doesn't. If a different object is added to the type set, recompilation is not guaranteed. The existing recompilation constraints don't really capture this use case, I think this needs a 'JSObject *getKnownObject()' that works similarly to 'knownNonEmpty'.

Brian Hackett [Laid off!]

Updated

•

14 years ago

Attachment #520671 - Flags: review?(bhackett1024) → review+

Jan de Mooij [:jandem]

Assignee

Comment 19

•

14 years ago

Attached patch Patch v8 (deleted) — Details — Splinter Review

Add comments and trace singleton. Brian, as discussed on IRC, I'd appreciate it if you could fix the type inference property part.

Attachment #520671 - Attachment is obsolete: true

Brian Hackett [Laid off!]

Comment 20

•

14 years ago

http://hg.mozilla.org/projects/jaegermonkey/rev/0e427e383bfd I added a stopgap fix for opcodes which don't expect to pop a constant object (just stuff the object into a register), so we can use singleton info wherever we want and be more robust. http://hg.mozilla.org/projects/jaegermonkey/rev/99a3fe34ccc6

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Whiteboard: fixed-in-jaegermonkey

You need to log in before you can comment on or make changes to this bug.