Closed Bug 679710 Opened 13 years ago Closed 5 years ago

FF6 is 5x slower than Chromium 15 on this JS benchmark

Categories

(Core :: JavaScript Engine, defect)

All
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: trigrou, Unassigned)

References

Details

(Keywords: perf, testcase, Whiteboard: js-triage-done)

Attachments

(2 files)

User Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.1 (KHTML, like Gecko) Ubuntu/11.04 Chromium/15.0.849.0 Chrome/15.0.849.0 Safari/535.1 Steps to reproduce: I run this page http://osgjs.org/osgjs/sandbox/key_implementation.html Actual results: It's 5 times slower on firefox 6 than on chromium 15 Expected results: at best the same
Assignee: nobody → general
Component: General → JavaScript Engine
Product: Firefox → Core
QA Contact: general → general
Attached file raw perf data, xz-compressed (deleted) —
Here's the raw data for the perf profiler. This includes firefox warm startup, but that took a lot less time than running the benchmark so it should still be meaningful. Below is a summary, you can get full details by doing $ xz -d perf.data.xz $ perf report Here you go: 55.93% perf-6053.map 0x7f127313ff18 8.83% libxul.so js_GetProperty(JSContext*, JSObject*, JSObject* 6.23% libxul.so js::mjit::stubs::GetElem(js::VMFrame&) 5.62% libxul.so js::PropertyTable::search(long, bool) 3.35% libxul.so js::NewBuiltinClassInstance(JSContext*, js::Cla 2.97% libxul.so RunTracer(js::VMFrame&, js::mjit::ic::TraceICIn 2.08% libxul.so void js::gc::FinalizeArenas<JSObject_Slots2>(JS 1.55% libxul.so js::MonitorTracePoint(JSContext*, bool*, void** 1.13% libxul.so js_CheckForStringIndex(long) 1.07% libxul.so js::mjit::stubs::GreaterEqual(js::VMFrame&) 0.75% libxul.so js_ValueToNonNullObject(JSContext*, js::Value c 0.66% libxul.so js_ValueToBoolean(js::Value const&) 0.48% libxul.so DisabledGetElem(js::VMFrame&, js::mjit::ic::Get 0.39% libxul.so js::ToNumberSlow(JSContext*, js::Value, double* 0.33% libxul.so js::mjit::stubs::ValueToBoolean(js::VMFrame&) 0.30% libxul.so js::gc::RefillFinalizableFreeList(JSContext*, u 0.28% libxul.so JSObject::getGlobal() const 0.25% libxul.so js_GetCurrentBytecodePC(JSContext*) 0.19% libxul.so js::mjit::stubs::InvokeTracer(js::VMFrame&, js: 0.18% [nvidia] 0x698e0c 0.16% libxul.so MOZ_Z_inflate_fast 0.16% [kernel.kallsyms] csd_lock_wait.clone.1 0.14% libpthread-2.13.so pthread_mutex_lock 0.13% ld-2.13.so do_lookup_x 0.13% [kernel.kallsyms] hpet_next_event.clone.3 0.12% libxul.so SearchTable(PLDHashTable*, void const*, unsigne 0.12% [nvidia] cache_flush 0.11% libpthread-2.13.so __pthread_mutex_unlock_usercnt 0.11% [kernel.kallsyms] clear_page_c 0.10% [kernel.kallsyms] put_mems_allowed 0.09% libxul.so JS_PropertyStub 0.08% firefox arena_dalloc 0.08% [kernel.kallsyms] handle_mm_fault 0.07% libc-2.13.so __memset_sse2 0.07% [kernel.kallsyms] page_fault 0.06% ld-2.13.so _dl_fixup 0.06% libxul.so CheckScript(JSScript*, JSScript*) 0.06% libxul.so MOZ_Z_crc32 0.06% libxul.so PickChunk(JSContext*)
Note: I got this in Nightly from August 15 on linux x86-64
And I confirm that Chrome 13 is a lot faster on it.
Version: 6 Branch → Trunk
Hardware: x86 → All
The top 55.93% entry, 55.93% perf-6053.map 0x7f127313ff18 is methodjit code according to this call tree: perf-6053.map 0x7f127313ff18 0x7f1273105b9d js::mjit::EnterMethodJIT(JSContext*, js::StackFrame*, void*, js::Value*) js::mjit::JaegerShotAtSafePoint(JSContext*, void*) js::Interpret(JSContext*, js::StackFrame*, js::InterpMode) js::Invoke(JSContext*, js::CallArgs const&, js::MaybeConstruct) js::ExternalInvoke(JSContext*, js::Value const&, js::Value const&, unsig JS_CallFunctionValue nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS*, unsigned short, XPTMeth nsXPCWrappedJS::CallMethod(unsigned short, XPTMethodDescriptor const*, n PrepareAndDispatch SharedStub nsEventListenerManager::HandleEventSubType(nsListenerStruct*, nsIDOMEven nsEventListenerManager::HandleEventInternal(nsPresContext*, nsEvent*, ns nsEventTargetChainItem::HandleEvent(nsEventChainPostVisitor&, unsigned i nsEventTargetChainItem::HandleEventTargetChain(nsEventChainPostVisitor&, nsEventDispatcher::Dispatch(nsISupports*, nsPresContext*, nsEvent*, nsID DocumentViewerImpl::LoadComplete(unsigned int) _ZN10nsDocShell11EndPageLoadEP14nsIWebProgressP10nsIChannelj.part.109 nsDocShell::OnStateChange(nsIWebProgress*, nsIRequest*, unsigned int, un nsDocLoader::FireOnStateChange(nsIWebProgress*, nsIRequest*, int, unsign nsDocLoader::doStopDocumentLoad(nsIRequest*, unsigned int) nsDocLoader::DocLoaderIsEmpty(int) nsDocLoader::OnStopRequest(nsIRequest*, nsISupports*, unsigned int) nsLoadGroup::RemoveRequest(nsIRequest*, nsISupports*, unsigned int) nsDocument::DoUnblockOnload() nsDocument::DispatchContentLoadedEvents() nsRunnableMethodImpl<void (nsHTMLStyleElement::*)(), true>::Run() nsThread::ProcessNextEvent(int, int*) NS_ProcessNextEvent_P(nsIThread*, int) mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) MessageLoop::Run() nsBaseAppShell::Run() nsAppStartup::Run() XRE_main main __libc_start_main 0x7f1273138835 js::mjit::EnterMethodJIT(JSContext*, js::StackFrame*, void*, js::Value*)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Whiteboard: js-triage-needed
Summary: javascript performance issue → FF6 is 5x slower than Chromium 15 on this JS benchmark
Attached file Shell testcase (deleted) —
Some numbers: Test 0 test 1 d8 : 1251 2306 js -m -n: 7686 3618 js -m : 6377 4056
Assignee: general → jandemooij
Status: NEW → ASSIGNED
The main problem here is GetElem stub calls. The script defines a Key function which inherits from Array. A Key object is just an array with length 3, some extra functions and a named property. So this is yet another bug depending on bug 586842...
Depends on: 586842
Keywords: perf, testcase
Whiteboard: js-triage-needed → js-triage-done
On closer look only Test 0 uses arrays with named properties. That explains why test 0 is faster in d8 than test 1. Test 1 uses a single array to store everything. We are 1.5x slower there, I will try to find out why.
Argh, the difference for Test 1 is due to a bug in the script. This line: var endTime = keyEnd[keyEnd]; Should be: var endTime = keys[keyEnd]; The problem was that we were taking stub calls for the GetElem and later on for >= because endTime was undefined. With this fixed, for Test 1: d8 : 1168 js -m -n: 1264 js -m : 1912 d8 no/cs: 2739 Interestingly, Test 1 is now faster than Test 0 in both SM and V8.
Cedric, the -n switch enables the type inference engine in the JavaScript shell. TI will be available in Firefox 9 if no (major) problems are found. In case you want to try it out, you can download a nightly build from http://nightly.mozilla.org/. TI is enabled by default in the browser. Thanks for the bug report and please let us know if you find other performance problems.
To beat V8 on Test 1 we have to make this fast: -- function f() { var t0 = new Date; var x; for (var i=0; i<10000000; i++) { if (x) {}; x = 1; } print(new Date - t0); } f(); -- The problem is that we call stubs::ValueToBoolean for "if (x)" because x is either undefined or int32. booleanJumpScript only supports boolean or known-int32. With TI, booleanJumpScript should look at the possible types of x. Ideally it should support things like undefined-or-object, null-or-object, undefined-or-int32, etc. Generating inline code for 2 or 3 types would probably cover most cases.
Depends on: 670493
Bug 827490 just landed. It might help.
Thanks to the shell test case this was pretty easy to test. Before: Test 0: 5024 Test 1: 2076 After: Test 0: 1614 Test 1: 2118 Looking pretty good.
Depends on: 827490
So just to check, it's expected that we're still 3x slower than V8 on this?
Well, no. There's a shell testcase, would be good to add to awfy-assorted so we can track perf here. Taking a glance, the testcase is wrapped in a big closure, which are pretty rotten for (spidermonkey) perf. I wrote a patch for this last month in bug 821361 but it's just been sitting there waiting for review. What happens if you rm the closure?
With closure: SpiderMonkey: Test 0: 1767 Test 1: 2257 d8: Test 0: 429 Test 1: 794 Without closure: SpiderMonkey: Test 0: 1405 Test 1: 2012 d8: Test 0: 611 Test 1: 812
Er, wait. There are more nested closures. If I take those out too, I get: SpiderMonkey: Test 0: 1371 Test 1: 2028 d8: Test 0: 646 Test 1: 840 Comment 10 might cover this.
Now I get: SpiderMonkey: Test 0: 279 Test 1: 2130 d8: Test 0: 245 Test 1: 938 So test 0 is close, test 1 is still a lot slower. According to Instruments, we spend 76% under js::GetElement, will take a look.
(In reply to Jan de Mooij [:jandem] from comment #17) > So test 0 is close, test 1 is still a lot slower. According to Instruments, > we spend 76% under js::GetElement, will take a look. Ah this is due to the problem I mentioned in comment 8: keyEnd[keyEnd] should be keys[keyEnd]. With that fixed: SpiderMonkey: Test 0: 279 Test 1: 186 d8: Test 0: 243 Test 1: 214 So Test 1 is a bit faster than d8, test 0 is about 10% slower.
Depends on: 1073587
Test 0 is still a bit slower than V8 due to bug 1073587. Unassigning myself as I'm not working on this atm and it's fixed for the most part.
Assignee: jdemooij → nobody
Status: ASSIGNED → NEW

The shell testcase (with comment #8 fixed) is now slightly faster for Test 1 when compared to V8, and noticeably faster for Test 0. Therefore resolving this issue as WFM.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: