121414 - JS indirect threaded interpreter

Reporter

Description

•

23 years ago

I used a relatively slow PC (300 MHz) under NT4. The JS tests (attached) doesn't involve any DOM stuff. I just re-code in JS some C programs I wrote a long long time ago to find the best way to get the power-of-two greater or equal to a given number. It involves only a for loop, some assigments, a while loop and some logical and shift operations. The comparison with IE5.5 is shown in the following table: +---------------------+-------------------+-------------------+--------+ | Test | Mozilla (20020121)| IE 5.5 | Ratio | |---------------------|-------------------|-------------------|--------| |Test 1 | 4516 ms | 2814 ms | 1.60 | |---------------------|-------------------|-------------------|--------| |Test 2 | 4086 ms | 2594 ms | 1.57 | |---------------------|-------------------|-------------------|--------| |Test 3 | 4766 ms | 3876 ms | 1.22 | +---------------------+-------------------+-------------------+--------+ The ratio of 1.6 is what triggered this bug. I decided to separate this bug from bug #117611 since it is already huge and now focused on sort and concat. Note that IE5.5 doesn't seem to perform so well in shifting (test3) though still faster than Mozilla. Hope it can help.

Marc Boullet

Reporter

Comment 1

•

23 years ago

Attached file page with the tests (deleted) — Details

Oliver Klee

Updated

•

23 years ago

Keywords: perf

Oliver Klee

Comment 2

•

23 years ago

Confirming.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Phil Schwartau

Comment 3

•

23 years ago

Here are my timings on WinNT 4.0 (SP6), 500 MHz CPU, 128M RAM. This was Mozilla vs. IE6. The ratios are the same as Marc's above: +---------------------+-------------------+-------------------+--------+ | Test | Mozilla (20020116)| IE 6 | Ratio | |---------------------|-------------------|-------------------|--------| |Test 1 | 2640 ms | 1625 ms | 1.62 | |---------------------|-------------------|-------------------|--------| |Test 2 | 2390 ms | 1484 ms | 1.61 | |---------------------|-------------------|-------------------|--------| |Test 3 | 2781 ms | 2265 ms | 1.23 | +---------------------+-------------------+-------------------+--------+

Phil Schwartau

Comment 4

•

23 years ago

Here are Marc's tests: var max = Math.pow(2,17); function test1() { var i,powerOfTwo,j; var start,end; document.forms[0].testResult1.value = ''; start = new Date(); for (i = 1; i <= max; i++) { powerOfTwo = i; j = powerOfTwo; while (j &= j^-j) powerOfTwo = j << 1; } end = new Date(); document.forms[0].testResult1.value = end-start + ' ms'; } function test2() { var i,powerOfTwo,j,k; var start,end; document.forms[0].testResult2.value = ''; start = new Date(); for (i = 1; i <= max; i++) { powerOfTwo = i; j = powerOfTwo; do k = j; while (j &= j^-j); if (k != powerOfTwo) powerOfTwo = k << 1; } end = new Date(); document.forms[0].testResult2.value = end-start + ' ms'; } function test3() { var i,powerOfTwo,k; var start,end; document.forms[0].testResult3.value = ''; start = new Date(); for (i = 1; i <= max; i++) { powerOfTwo = i; k = 0; while (powerOfTwo >>= 1) k++; powerOfTwo = 1 << k if (powerOfTwo != i) powerOfTwo <<= 1 } end = new Date(); document.forms[0].testResult3.value = end-start + ' ms'; }

Assignee: rogerl → khanson

Phil Schwartau

Updated

•

23 years ago

Blocks: js-perf

Sören 'Chucker' Kuklau (gone)

Comment 5

•

23 years ago

Hi, first of all, this bug should probably be OS => All and perhaps also Platform => All. And here, my test results: Machine: Athlon/500, 416 Megs of RAM, Windows XP Pro Mozilla: 2002-01-22-09 Win32, SVG/MathML-enabled IE : 6.0 XP-bundled release +---------------------+-------------------+-------------------+--------+ | Test | Mozilla | IE 6.0 | Ratio | |---------------------|-------------------|-------------------|--------| |Test 1 | 2703 ms | 2153 ms | 1.25 | |---------------------|-------------------|-------------------|--------| |Test 2 | 2414 ms | 1933 ms | 1.25 | |---------------------|-------------------|-------------------|--------| |Test 3 | 2884 ms | 2954 ms | 0.98 | <== ! +---------------------+-------------------+-------------------+--------+

Phil Schwartau

Comment 6

•

23 years ago

As suggested, changing OS: WinNT ---> All, Platform: WinNT --> All Also, tentatively changing summary from "Yet another benchmark comparing JS engines speeds" to "Bitwise operators should be faster"

OS: Windows NT → All

Hardware: PC → All

Summary: Yet another benchmark comparing JS engines speeds → Bitwise operators should be faster

Steven Cole

Comment 7

•

23 years ago

The claim is that these tests don't use the DOM, and that's true, almost. The loop termination variable "max" is a variable with global scope, meaning that the DOM has to search through its nodes each time the variable is accessed, to see if some HTML element named 'max' has changed. (Or so I assume, anyway, not having looked too deeply at how the DOM is connected to JS.) A better test would be to pass 'max' in as the argument to each of the functions: that way, the loop termination condition is outside of that global object. (Any reference to a global object is slow, when JS is connected to a browser...)

Phil Schwartau

Comment 8

•

23 years ago

I modified Marc's test to run in the standalone JS shell, where there is no DOM. The global object is that of the JS shell itself. The timings I got are roughly comparable to those in Comment #3: Test1: 2750 ms Test2: 2485 ms Test3: 2750 ms

Marc Boullet

Reporter

Comment 9

•

23 years ago

Concerning timings in Comment #5: Mozilla numbers (with a similar clocked machine as Phil's) are roughly comparable to those in comment #3. It seems that IE6 under XP is much slower than under NT4, but that's MS problem not Moz. Also making changes suggested by Phil in comment #6.

No longer blocks: js-perf

Marc Boullet

Reporter

Comment 10

•

23 years ago

Though I don't see what the DOM has to do with global variables, I rerun modified tests passing |max| as a param to the test functions. Hereafter are the results (same machine as in comment #2): +---------------------+-------------------+-------------------+--------+ |Test w/ max as a par.| Mozilla (20020121)| IE 5.5 | Ratio | |---------------------|-------------------|-------------------|--------| |Test 1 | 4276 ms | 2754 ms | 1.55 | |---------------------|-------------------|-------------------|--------| |Test 2 | 3836 ms | 2484 ms | 1.54 | |---------------------|-------------------|-------------------|--------| |Test 3 | 4506 ms | 3705 ms | 1.22 | +---------------------+-------------------+-------------------+--------+ The first two tests show a 5% decrease of the ratio. This means that Mozilla doesn't perform so well with global variables (although it could also mean that Moz perform better than IE on function parameters, which is unlikely).

Steven Cole

Comment 11

•

23 years ago

Another thought occurs to me: Can we run a loop with precisely the same number of iterations and loop logic, but with no body? Then subtract that time from the other times to determine whether this slowdown is because bitwise ops are slow or loop ops are slow. Looking at the tests, this may be somewhat difficult. But that sort of just says that the tests ought to be simplified to directly test what they mean to test. (I.e.: we ought to have a "left-shift" test, a "xor" test, an "and" test, and so on---not the "interesting" algorithms we have now that combine all the ops together.) Just a thought... -scole

Phil Schwartau

Updated

•

23 years ago

Blocks: js-perf

Phil Schwartau

Comment 12

•

23 years ago

Note: since bug 117611 is "[META] JavaScript Performance Issues", I have added the current bug to the dependency list of bug 117611. We hope to solve each individual performance problem in individual bugs, and track them all from the [META] bug.

Kenton Hanson (gone)

Updated

•

23 years ago

Target Milestone: --- → Future

Yuedong Du

Comment 13

•

22 years ago

My test result show that the problem maybe not just the bitwise operator's problem. Even empty loops will spend nearly twice times of the IE do. I modified the test script. Just to compare the empty loops and loops with some assignment statement. The modified test1() is show as below, it is used to test the empty loop. remove the comments in the function to test loops with some assignments. function test1() { var i,powerOfTwo,j; var start,end; document.forms[0].testResult1.value = ''; start = new Date(); for (i = 1; i <= max; i++) { //junk = i; //junk2 = i; //junk3 = i; } end = new Date(); document.forms[0].testResult1.value = end-start + ' ms'; } tests(ms) Mozilla IE5 Ratio empty loop: 79 46 1.71 loop with 1 assign: 140 78 1.79 loop with 2 assign: 203 93 2.18 And my system configuration is: P4 1.8GHZ cpu, 256M memory. C:\>uname -a Windows_NT SETPOINT 5 00 586

Yuedong Du

Comment 14

•

22 years ago

I am doing some exploration a technique called direct thread code on it. The idea is that use array of code address as the pragram to execute. The basic idea is that get a op code and switch it is very inefficient. Because 'switch' is a very costful, it will be tranlated to about 10 of machine instructions. Below is the general way of interpret code: void engine() { static Inst program[] = { inst1 /* ... */ }; Inst *ip; for (;;) switch (*ip++) { case inst1: ... break; ... } } And the direct thread code use the form below, void engine() { static Inst program[] = { &inst1 /* ... */ }; Inst *ip; goto *ip inst1: ... go *ip++; ... } The detail of the direct thread code can be found a http://www.complang.tuwien.ac.at/forth/threaded-code.html Good news is that it did show about 10% improvement for this case(sparc 500Mhz, solaris 8). The bad news are(:-(): 1. the modification is vast. And I am still debugging it. 2. Do not know if it will cause regression. Although good for this case, do not know if it is OK for general case. 3. The techique need a special C language extension, called 'label as value'. The extension is supported by gcc. This may cause the method unportable. See http://www.freebsd.org/info/gcc/gcc.info.Labels_as_Values.htm Any suggestion or critism is welcome. York

Yuedong Du

Comment 15

•

22 years ago

After 2 weeks of struggle, I have to abandon the method of so called direct thread code.After fixing some bugs in my code, it finally do not show any performance improvement, :-(.

Yuedong Du

Comment 16

•

22 years ago

Attached patch Patch for review (obsolete) (deleted) — Details — Splinter Review

With the patches, there is a large improvement on solaris, a small improvement on windows. The patches is try to improve the interpret of JSOP_NAME and JSOP_NAMEINC. When the name is a native simple variable, like a int var 'i', the processing is ineffecient. Because the case is obviously of high usage frequency, it should be optimized. I will introduce the parts of patch for interpret of JSOP_NAME and JSOP_NAMEINC respectively below. At last I will give the test data on solaris and windows. 1)interpret of JSOP_NAME To get value of the name, first it store rval with value in obj->slot. But, if the name has a getter, it will call getter and store the returned value in rval, and stor rval to obj->slot. The operation of store returned value rval in obj->slot is needless if the name do not have getter at all, just like int variable 'i'. So just skip the part when it is a native object without getter. See code below, case JSOP_NAME: ... ... ok = js_FindProperty(cx,id,&obj,&obj2,&prop); ... ... rval = (slot != SPROP_INVALID_SLOT) ? LOCKED_OBJ_GET_SLOT(obj2, slot) : JSVAL_VOID; JS_UNLOCK_OBJ(cx, obj2); ok = SPROP_GET(cx, sprop, obj, obj2, &rval); JS_LOCK_OBJ(cx, obj2); ... ... if (SPROP_HAS_VALID_SLOT(sprop, OBJ_SCOPE(obj2))) LOCKED_OBJ_SET_SLOT(obj2, slot, rval); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^useless for 'int variable i' ... ... #define SPROP_GET(cx,sprop,obj,obj2,vp) \ (((sprop)->attrs & JSPROP_GETTER) \ ? js_InternalCall(cx, obj, OBJECT_TO_JSVAL((sprop)->getter), 0, 0, vp) \ : SPROP_CALL_GETTER(cx, sprop, (sprop)->getter, obj, obj2, vp)) Thus if (!(sprop->attrs & JSPROP_GETTER) && !(sprop->getter)) is true we know that we just useless re-assign rval to slots. 2) JSOP_NAMEINC When process JSOP_NAMEINC, it first get the property of obj[id] by js_FindProperty(), then goto do_incop. and in do_incop it will call CACHED_GET(OBJ_GET_PROPERTY()) to get the value of the property. In js_FindProperty(), it will search cache on rt->propertyCache to find the corresponding scope property. Later it call CACHED_GET(), which will do the cache test again. Quite inefficient. Another inefficient point is it first call OBJECT_TO_JSVAL(obj) and later call VALUE_TO_OBJECT(). It is just kind of waste. Actually if the name is a native object and sprop is returned by js_FindProperty we can just get the value of property as in processing of JSOP_NAME. it is, first get scope property, and then get the value by call SCOPE_GET(). see the code below, case JSOP_INCNAME: ... ... case JSOP_NAMEINC: case JSOP_NAMEDEC: atom = GET_ATOM(cx, script, pc); id = (jsid)atom; SAVE_SP(fp); ok = js_FindProperty(cx, id, &obj, &obj2, &prop); OBJ_DROP_PROPERTY(cx, obj2, prop); ... ... lval = OBJECT_TO_JSVAL(obj); goto do_incop; ... ... do_incop: VALUE_TO_OBJECT(cx, lval, obj); /* The operand must contain a number. */ SAVE_SP(fp); CACHED_GET(OBJ_GET_PROPERTY(cx, obj, id, &rval)); #define CACHED_GET(call) \ JS_BEGIN_MACRO \ if (!OBJ_IS_NATIVE(obj)) { \ ok = call; \ } else { \ JS_LOCK_OBJ(cx, obj); \ PROPERTY_CACHE_TEST(&rt->propertyCache, obj, id, sprop); \ if (sprop) { \ .... ... } else { \ JS_UNLOCK_OBJ(cx, obj); \ ok = call; \ /* No fill here: js_GetProperty fills the cache. */ \ } \ } \ JS_END_MACRO 3) test data Use the attched testcase and ibench to test the patch, on solaris 8 500Mhz cpu: average of 3 times of tests: test1(ms) test2(ms) test3(ms) ibench(s) without the patch 2837 2574 3589 28.02 with the patch, 2591 2467 3025 27.88 ratio 1.09 1.04 1.18 1.005 on windows platform, See data of 5 times repeat the tests below, test1 test2 test3 without the patch 1946 1853 1981 with the patch 1931 1821 1949

Henry Jia

Updated

•

22 years ago

Keywords: patch, review

Phil Schwartau

Comment 17

•

22 years ago

cc'ing more reviewers for this patch -

page with the tests 23 years ago Marc Boullet (deleted), text/html		Details
Patch for review 22 years ago Yuedong Du (deleted), patch		Details \| Diff \| Splinter Review
New testcase with "max" as an argument 22 years ago Marc Boullet (deleted), text/html		Details
venkman profile 22 years ago Robert Ginda (deleted), text/plain		Details
quantify result on solaris 22 years ago Yuedong Du (deleted), text/plain		Details
patch to try that localizes rt->interruptHandler 22 years ago Brendan Eich [:brendan] (deleted), patch	rginda : review+ jband_mozilla : superreview+	Details \| Diff \| Splinter Review
quantify solaris remove js debugger support 22 years ago Yuedong Du (deleted), text/plain		Details
Testcase for js shell 21 years ago Igor Bukanov (deleted), application/x-javascript		Details
Making test run faster by 10% 21 years ago Igor Bukanov (deleted), patch	brendan : review+	Details \| Diff \| Splinter Review
Extending special treatment of tagged ints to unary +/- 21 years ago Igor Bukanov (deleted), patch		Details \| Diff \| Splinter Review
cleaned up version 21 years ago Brendan Eich [:brendan] (deleted), patch		Details \| Diff \| Splinter Review
patch extension to include binary +/- 21 years ago Igor Bukanov (deleted), patch		Details \| Diff \| Splinter Review
indirect threading, passes testsuite 19 years ago Brendan Eich [:brendan] (deleted), patch		Details \| Diff \| Splinter Review
interdiff against last patch to show how debugging still works 19 years ago Brendan Eich [:brendan] (deleted), patch		Details \| Diff \| Splinter Review
patch, v2 19 years ago Brendan Eich [:brendan] (deleted), patch		Details \| Diff \| Splinter Review
interdiff against last patch to compile on MSVC 19 years ago Brendan Eich [:brendan] (deleted), patch		Details \| Diff \| Splinter Review
patch, v3 19 years ago Brendan Eich [:brendan] (deleted), patch	shaver : superreview+	Details \| Diff \| Splinter Review
patch I checked in 19 years ago Brendan Eich [:brendan] (deleted), patch	mrbkap : review+	Details \| Diff \| Splinter Review
no switch in JS_THREADED_INTERP 19 years ago Brendan Eich [:brendan] (deleted), patch	igor : review+	Details \| Diff \| Splinter Review
-puwU5 diff of jsinterp.c when "human preprocessed" for the non-threaded case 19 years ago jag (Peter Annema) (deleted), patch		Details \| Diff \| Splinter Review