Closed Bug 407216 Opened 17 years ago Closed 16 years ago

Investigate fast-methods (or faster-methods) for top N DOM methods

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: brendan, Assigned: jorendorff)

References

Details

(Keywords: perf)

Attachments

(8 files, 18 obsolete files)

(deleted), text/html
Details
(deleted), text/html
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), text/plain
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
jorendorff
: review+
jorendorff
: superreview+
Details | Diff | Splinter Review
First task: identify the top N for real-world Firefox usage and Ajax apps, N = 20 would be a good start. Count calls to rank, assume high-frequency ones need lower call overhead. Could measure time in call to rank by call-frequency / time-in-method-guts to avoid speeding up invocation of methods that take a long time in their guts, but if call frequency correlate with fast-in-guts methods, no need. Help wanted on measurement. Vlad said dtrace might be quickest way. SpiderMonkey has slightly rusty DUMP_CALL_TABLE code too. /be
JSFastNative has a faster calling convention, can't have local GC roots, and still requires the callee to check argument and result types and convert if needed. XPConnect has detailed arg and result type info. If we manually write stubs for the top N methods, we'll have to write arg and result conversion too. Of course, we can't skip security checks until they are optimized to wrappers. But there ought to be a fast path that prefigures the wrapper work, where for same-origin caller and callee, we can do whatever faster-native path we want. One thought that does not require offline analysis of a fixed (prior to release) set of "representative benchmarks" is to enhance XPConnect to measure call frequency and optimize the top N over some period. In this approach, table-driven code generation for specialized arg and result type checking/conversion could be done based on method type info. I'm going to investigate this approach for faster joy. /be
Status: NEW → ASSIGNED
Flags: wanted1.9+
Priority: -- → P2
Attached patch instrument XPCWrappedNative::CallMethod (obsolete) (deleted) — Splinter Review
This patch adds a fixed-size LRU cache indexed by methodInfo and argc, plus a complete signature table to hash (argtype1, ... argtypeN) -> rvaltype and count occurrences. Results next. /be
Here's a dump of the XPCCallMethodLRUCache for a GC far into that gmail+bugzilla session: GC 228: 1: 69 ID(0) 2: 10541 setTimeout(2) 3: 9736 href(0) 4: 29211 offsetWidth(0) 5: 98 getPrefType(1) 6: 98 getService(1) 7: 98 getItemsWithAnnotation(2) 8: 819 innerHTML(1) 9: 1159 getElementById(1) 10: 627 readyState(0) 11: 129 responseText(0) 12: 138 status(0) 13: 198 clearTimeout(1) 14: 73 cookie(0) 15: 85 onreadystatechange(1) 16: 32 abort(0) 17: 41 send(1) 18: 41 setRequestHeader(2) 19: 41 open(3) 20: 9 appendChild(1) 21: 900 removeAttribute(1) 22: 900 collapsed(1) 23: 1125 parentNode(0) 24: 943 getAttribute(1) 25: 900 setAttribute(2) 26: 3600 interfaces(0) 27: 9 createElement(1) 28: 1269 top(0) 29: 10 getIntPref(1) 30: 2 display(1) 31: 36 style(0) 32: 2 className(1) 33: 5 className(0) 34: 2 removeEventListener(3) 35: 2 title(1) 36: 2 getElementsByTagName(1) 37: 1 cookie(1) 38: 1 firstChild(0) 39: 1 direction(1) 40: 664 nodeType(0) 41: 43 getPropertyValue(1) 42: 332 getComputedStyle(2) 43: 660 defaultView(0) 44: 495 ownerDocument(0) 45: 43 getAttributeNS(2) 46: 1 dir(0) 47: 1 namespaceURI(0) 48: 1 tooltipNode(0) 49: 177 metaKey(0) 50: 177 shiftKey(0) 51: 177 altKey(0) 52: 177 ctrlKey(0) 53: 109 button(0) 54: 109 screenY(0) 55: 109 screenX(0) 56: 218 clientY(0) 57: 218 clientX(0) 58: 244 layerY(0) 59: 244 layerX(0) 60: 193 relatedTarget(0) 61: 245 target(0) 62: 313 type(0) 63: 16 metaKey(0) 64: 14 shiftKey(0) Here is the sorted signature table for that GC: 106459 length() -> I32 34243 ID() -> INTERFACE 30674 localName() -> DOMSTRING 6111 exists() -> BOOL 5575 id(DOMSTRING) 4007 getElementById(DOMSTRING) -> INTERFACE 3425 getAttribute(DOMSTRING) -> DOMSTRING 2586 nodeType() -> U16 2320 setAttribute(DOMSTRING, DOMSTRING) 2069 length() -> U32 1694 appendChild(INTERFACE) -> INTERFACE 1344 followLinks(BOOL) 923 getComputedStyle(INTERFACE, DOMSTRING) -> INTERFACE 774 equals(INTERFACE) -> BOOL 735 close() 634 getIntPref(CHAR_STR) -> I32 609 getAttributeNS(DOMSTRING, DOMSTRING) -> DOMSTRING 546 callback(INTERFACE) 467 textZoom() -> FLOAT 462 leafName() -> ASTRING 447 OS() -> UTF8STRING 424 getBoolPref(CHAR_STR) -> BOOL 340 enumerateCategory(CHAR_STR) -> INTERFACE 263 ID() -> CSTRING 249 open(UTF8STRING, UTF8STRING) 248 getItemsWithAnnotation(UTF8STRING, U32) -> ARRAY 211 newURI(UTF8STRING, CHAR_STR, INTERFACE) -> INTERFACE 209 removeEventListener(DOMSTRING, INTERFACE, BOOL) 206 data() -> WCHAR_STR 200 getProperty(WCHAR_STR) -> WCHAR_STR 193 getInterface(IID) -> INTERFACE_IS 175 hasAttribute(DOMSTRING) -> BOOL 116 setScrollPosition(I32, I32) 112 name() -> CHAR_STR 98 delay(U32) 97 item(U32) -> INTERFACE 90 initEvent(DOMSTRING, BOOL, BOOL) 86 registerFactoryLocation(IID, CHAR_STR, CHAR_STR, INTERFACE, CHAR_STR, CHAR_STR) 74 append(ASTRING) 72 getAnonymousElementByAttribute(INTERFACE, DOMSTRING, DOMSTRING) -> INTERFACE 70 createElementNS(DOMSTRING, DOMSTRING) -> INTERFACE 58 addObserver(INTERFACE, CHAR_STR, BOOL) 51 get(CHAR_STR, IID) -> INTERFACE_IS 51 GetChildAt(I32) -> INTERFACE 45 insertBefore(INTERFACE, INTERFACE) -> INTERFACE 44 lastModifiedTime() -> I64 33 setAttributeNS(DOMSTRING, DOMSTRING, DOMSTRING) 31 import(UTF8STRING) 30 getEntryAtIndex(I32, BOOL) -> INTERFACE 30 getCharPref(CHAR_STR) -> CHAR_STR 29 addProgressListener(INTERFACE, U32) 27 addCategoryEntry(CHAR_STR, CHAR_STR, CHAR_STR, BOOL, BOOL) -> CHAR_STR 25 textZoom(FLOAT) 25 newlineHandling(I32) 25 addObserver(CHAR_STR, INTERFACE, BOOL) 24 annotateCrashReport(CSTRING, CSTRING) 20 AddChild(INTERFACE, I32) 19 hookupTo(INTERFACE, INTERFACE) 19 getPluginTags(U32) -> ARRAY 19 addObserver(INTERFACE, BOOL) 18 writeCompoundObject(INTERFACE, IID, BOOL) 18 setUpdateUrl(CSTRING) 18 init(BOOL, BOOL, U32, U32, INTERFACE) 17 setBoolPref(CHAR_STR, I32) 17 notifyObservers(INTERFACE, CHAR_STR, WCHAR_STR) 17 getPref(INTERFACE, ASTRING) -> INTERFACE 16 init(INTERFACE, I32, I32, I32) 16 group(INTERFACE) -> ASTRING 16 getItemAnnotation(I64, UTF8STRING) -> INTERFACE 16 getBookmarkIdsForURI(INTERFACE, U32) -> ARRAY 16 cloneNode(BOOL) -> INTERFACE 14 readLine(CSTRING) -> BOOL 13 handleFlagWithParam(ASTRING, BOOL) -> ASTRING 12 initWithCallback(INTERFACE, U32, U32) 12 getMostRecentWindow(WCHAR_STR) -> INTERFACE 11 safeLookup(CSTRING, INTERFACE) 11 lookup(CSTRING, INTERFACE, BOOL) 10 readString(U32, ASTRING) -> U32 10 getChildList(CHAR_STR, U32) -> ARRAY 8 loadURI(WCHAR_STR, U32, INTERFACE, INTERFACE, INTERFACE) 7 setAndLoadFaviconForPage(INTERFACE, INTERFACE, BOOL) 7 removeObserver(INTERFACE, CHAR_STR) 7 parseFromStream(INTERFACE, CHAR_STR, I32, CHAR_STR) -> INTERFACE 6 write(CHAR_STR, U32) -> U32 6 formatStringFromName(WCHAR_STR, ARRAY, U32) -> WCHAR_STR 6 charset(CHAR_STR) 6 GetResource(UTF8STRING) -> INTERFACE 6 ConvertFromUnicode(ASTRING) -> CSTRING 5 shouldLoad(U32, INTERFACE, INTERFACE, INTERFACE, CSTRING, INTERFACE) -> I16 5 registerTimer(ASTRING, INTERFACE, U32) 5 handleFlag(ASTRING, BOOL) -> BOOL 5 checkLoadURIWithPrincipal(INTERFACE, INTERFACE, U32) 4 loadSubScript(WCHAR_STR) 4 add(UTF8STRING, UTF8STRING, CSTRING, CSTRING, BOOL, BOOL, BOOL, I64) 3 itemHasAnnotation(I64, UTF8STRING) -> BOOL 3 getProperty(ASTRING) -> INTERFACE 3 countLogins(ASTRING, ASTRING, ASTRING) -> U32 3 checkLoadURIStrWithPrincipal(INTERFACE, UTF8STRING, U32) 2 setRelativeDescriptor(INTERFACE, CSTRING) 2 registerTable(CSTRING, BOOL) -> BOOL 2 isSuccessCode(U32) -> BOOL 2 findFlag(ASTRING, BOOL) -> I32 2 downloadUpdates(CSTRING, INTERFACE, INTERFACE, INTERFACE) -> BOOL 2 create(U32, U32) 1 setFolders(ARRAY, U32) 1 queryStringToQueries(UTF8STRING, ARRAY, U32, INTERFACE) 1 queriesToQueryString(ARRAY, U32, INTERFACE) -> UTF8STRING 1 openWindow(INTERFACE, CHAR_STR, CHAR_STR, CHAR_STR, INTERFACE) -> INTERFACE 1 isDefaultBrowser(BOOL) -> BOOL 1 insertControllerAt(U32, INTERFACE) 1 init(INTERFACE, CHAR_STR, I32, U16) 1 importNode(INTERFACE, BOOL) -> INTERFACE 1 getLoginSavingEnabled(ASTRING) -> BOOL 1 getCategory(ASTRING) -> ASTRING 1 executeQueries(ARRAY, U32, INTERFACE) -> INTERFACE 1 createTable(CHAR_STR, CHAR_STR) 1 addObserver(ASTRING, INTERFACE) Interesting to note that names are uniquely associated with signatures here. Selective JSFastNative definition when caller and callee receivers (|this| objs) are same-origin, with custom arg/result conversion, should win. Want to avoid hardcoding for one benchmark/session, but could handcode for popular by static count methods that are well-represented in the top 20 above. /be
Target Milestone: mozilla1.9beta3 → mozilla1.9beta4
Target Milestone: mozilla1.9beta4 → ---
bz and I were chatting and it seems we really want to bypass as much of XPConnect as we can, for DOM calls. The DOM is pretty stable as APIs go, and we actually would rather reflect concrete classes into JS so that their prototype chains work as people expect (IIRC this is a w3c webapi topic, and webkit has it more right). I'm focusing on pure-JS issues with tracemonkey and other work, so I'd love for someone to take this bug off my hands. bz, jst, shaver: feel free! /be
I'd add Bkap or Igor to that list - or one of our new hotshots if we can get a volunteer to point them in the right direction...
I'd like to take this.
Jason: great -- bz and jst may well be fine with you grabbing this too (I don't know, but jst may be looking into this bug already), so I leave it to you guys to agree on who takes assignment. Main thing is to get xpconnect bypassed as much as possible. Thanks, /be
Jason unless you hear otherwise please take this bug and let's get rockin - love to get something in asap esp for the hot paths.
Jason taking this is great as far as I'm concerned.
Jason: go strong.
Assignee: brendan → jorendorff
Status: ASSIGNED → NEW
Attached patch Proof of concept v1 (obsolete) (deleted) — Splinter Review
A little proof-of-concept code. In nsNodeSH::PostCreate, I'm adding JS properties "nodeType" and "nodeName" (with JSPropertyOp getters) to each JS object that serves as a prototype of a node (these are XPCWrappedNativeProto objects). (Also some work-in-progress on "nextSibling", but it's not finished yet.) These getters are per-interface-member, not per-class, so they have to do a QI. One of the things I eliminated was the call into the SecurityManager. I don't know if that's needed or not. Haven't tried building this in DEBUG. In an optimized browser build, Mochitests pass, and the patch is measurably faster on a bogus benchmark I threw together (forthcoming).
Attached file bogus DOM benchmark (deleted) —
Before writing the above patch, I did some profiling. I wrote a (bogus, contrived) DOM benchmark and got the numbers below. The benchmark is attached. Sorry it's 300KB--it contains a copy of Hamlet which in hindsight probably wasn't necessary. :-) All numbers are "total" (i.e. including all children) except where labeled "itself", in which case the children are shown separately. 0.9% XPC_WN_GetterSetter itself (75.4% total) 5.2% XPCCallContext ctor 2.3% XPCCallContext dtor 3.1% XPCNativeMember::GetCallInfo 0.6% XPCCallContext::SetCallInfo 0.4% nsXPConnect::Release 7.8% XPCWrappedNative::CallMethod itself (61.4% total) 12.9% nsScriptSecurityManager::CanAccess 1.9% xptiInterfaceEntry::GetMethodInfo 1.7% XPCCallContext::CanCallNow 1.3% GetInterfaceTypeFromParam 0.6% nsExceptionManager::SetCurrentException 1.0% NS_INvokeByIndex_P itself (10.4% total) 9.4% actual time spent in the DOM 1.6% XPCConvert::NativeData2JS itself (23.2% total) 4.4% XPCStringConvert::ReadableToJSString 0.7% nsCOMPtr_base dtor 3.1% XPCConvert::NativeInterface2JSObject itself (15.9% total) 8.0% XPCWrappedNative::GetNewOrUsed 4.2% XPCNativeInterface::GetNewOrUsed 0.6% other 0.6% other 1.6% other 1.5% other Summary: 9.4% in the DOM 12.9% doing security checks 24.6% in JS 53.1% in XPConnect
Jason this is awesome - particularly the breakdown of times in zones. Questions: 1) What's the speedup you are seeing and the relative weighting of times? 2) Any ideas on "lightspeed here" e.g. the theoretical max if we had hand-rolled call paths for every dom method called in your benchmark?
I'd be grateful to have some help on these points before I go much further: - Are these per-interface-feature "quick stubs" the right approach? I think they look OK. Would per-class-feature stubs be better? (My take: they might be a touch faster at the margin--I can replace the QI with something faster--but there are a lot more classes than interfaces.) - Right now the quick stubs do no security checking at all. How broken is that? Should I just keep the security check, in the interest of separation of concerns? (The performance cost is highish. The check itself is expensive; plus it requires me to have an XPCCallContext, which I might otherwise avoid.) - Is there anything else XPConnect is doing that I'm not doing, that I need to worry about? - Which unit tests should I be running? Btw, the plan is to auto-generate the quick stubs using Python. See bsmedberg's Python IDL parser: http://hg.mozilla.org/users/bsmedberg_mozilla.com/idl-parser/ . (This bit of metaprogramming, or parts of it, could be done with C++ templates instead. I like Python because you get explicit generated code that way, something to look at, consider, and step through in gdb.)
Status: NEW → ASSIGNED
(In reply to comment #13) I really need to re-profile to answer this, but just to get something up: here are the benchmark times (in milliseconds) on my MacBook. Both these builds are optimized, but with -g. In both cases Firefox is starting "warm". # baseline (without the patch) A = [233, 236, 155, 173, 156, 152, 158, 218, 135, 136] #avg==175.2 # with the patch B = [214, 117, 116, 114, 115, 125, 221, 117, 117, 127] #avg==138.3 21% faster with just these 2 properties.
In terms of security, I think that the classinfo should (does?) indicate whether a particular method requires a security check. We should simply avoid the fast-methods for any operation that requires a security check. The QI should be cheap... that is certainly no reason to switch to class-based fast methods. It might be interesting to see if we can implement these methods in terms of internal interfaces instead of nsIDOMNode. e.g. Node_GetFirstChild could use nsINode::GetChildAt(0). This would avoid several addref/qi/release pairs. We'd have to be careful about making sure subclasses don't do funny implementations of GetFirstChild. The mochitests are probably the most important DOM tests to run, though the full testsuite is probably pretty important.
(In reply to comment #16) > In terms of security, I think that the classinfo should (does?) indicate > whether a particular method requires a security check. We should simply avoid > the fast-methods for any operation that requires a security check. We really shouldn't need to do any security checks here, XOWs (cross origin wrappers) do that for us. > The QI should be cheap... that is certainly no reason to switch to class-based > fast methods. It should be, but in reality it's not that cheap on DOM nodes due to the cycle collector doing it's purple marking thing :( Probably not expensive enough to write per-class hooks etc. For certain classes (nodes are an example), we rely on nsINode being the canonical nsISupports pointer in other case too, so in that case it's actually safe to cast the nsISupports pointer straight to nsINode (see the top of nsNodeSH::PreCreate()) w/o a QI. In other cases we're not as lucky, but nsINode things are what we primarily care about here. > It might be interesting to see if we can implement these methods in terms of > internal interfaces instead of nsIDOMNode. e.g. Node_GetFirstChild could use > nsINode::GetChildAt(0). This would avoid several addref/qi/release pairs. We'd > have to be careful about making sure subclasses don't do funny implementations > of GetFirstChild. Yeah, that should be doable. It'd take some testing (which mochitest would do for us to great lenghts) to verify that there's no funny implementations out there, but none come to mind here... > The mochitests are probably the most important DOM tests to run, though the > full testsuite is probably pretty important. Yes, absolutely.
I'm working to get my suite of DOM perf tests up and running for Jason to work off of (running on Dromaeo, naturally). These should provide some better benchmarks as to how much these improvements are helping.
It's OK to skip the security checks if we're dropping the CAPS infrastructure that allows blocking access per-property. I'm fine with us doing that, but we should announce it somewhere. Also, we might still need address_of() when passing around an nsCOMPtr<>*. Or we could use nsCOMPtr<>& instead to be safe. What happens if this prototype is set as the prototype of some random JS object that QIs to nsINode? Will things work like they did before when the relevant DOM methods are called on this random JS object, or in a different way? If the latter, is the different way safe?
To answer my own question r.e. perf - this simple testcase runs for me in 125ms on trunk and around 40ms on Safari 3.1.2. So that implies on the order of a 3x speedup is possible here.
Comment 12's profile data indicates that we spend 62% in XPConnect and secmgr combined, so yeah, if you lose that 62% off our current 125ms you end up around 48ms. Getting the remaining bit probably requires changes on the DOM side.
(In reply to comment #19) > Also, we might still need address_of() when passing around an nsCOMPtr<>*. Or > we could use nsCOMPtr<>& instead to be safe. OK, I'll change that to just take a T**. > What happens if this prototype is set as the prototype of some random JS object > that QIs to nsINode? Will things work like they did before when the relevant > DOM methods are called on this random JS object, or in a different way? If the > latter, is the different way safe? Attach a test? I'll be happy to run scripts in both builds and see if the behavior differs. I expect a few differences. For example, suppose node1 and node2 are two DOM nodes. node2.__proto__ = node1; node1.appendChild.apply(node2, [x]); As it stands, this does node2.appendChild(x) if node1 and node2 are the same type (that is, they have the same prototype), and node1.appendChild(x) otherwise, due to some funny stuff in XPCWrappedNative::GetWrappedNativeOfJSObject. But if appendChild were implemented as a quick stub along the lines of what's in the patch, this would always do node2.appendChild(x). The new code does a QI and finds that node2 is capable of appendChild-ing. I think it's "safe" in the sense that it doesn't open up new security holes. Boris, I don't think I can win big on speed while keeping the exact behavior of XPConnect in every corner case--it's too much of a monster--and the feedback I'm hearing from other quarters is "Don't worry about that right now." New patch tomorrow morning.
By no means do we need to keep the behavior in every corner case. Like I said in comment 19, if the new behavior differs in corner cases we just need to make sure the new behavior is safe (both in terms of web compat and security).
Attached file Testcase using appendChild (deleted) —
Let me know if I should look for some other method to test?
Attached patch Proof of concept v2 (obsolete) (deleted) — Splinter Review
Changes: * The patch now contains quick stubs for everything the benchmark hits. (I did this to get profile numbers that clearly show the remaining slowness.) * I implemented quick stubs for: a method; and some getters that return XPCOM objects. (I did this mostly to make sure I could.) * The quick stubs, which live in nsDOMClassInfo.cpp, now need xpcprivate.h (for XPCCallContext). So this patch will only work in a libxul build. * Miscellaneous fixes. Benchmark numbers now (same setup as before): # baseline - no patch A = [224, 145, 133, 139, 138, 138, 136, 144, 135, 140] # avg: 138.67 # with the PoC v2 patch B = [142, 68, 68, 63, 70, 68, 65, 69, 68, 68] # avg: 67.44 So, 2.05x as fast. Why are we not at "light speed" here? The next comment will address this. Another question: The first time you push the button, and every time you push it after a pause of a second or so, the test takes twice as long to run. Anyone know why? A quick profile suggested GC is only a small part of the answer. Next: an updated profile.
Attachment #328515 - Attachment is obsolete: true
Here's a profile of the bogus benchmark with the POC-v2 patch. Summary: 13.2% DOM 15.0% JS 36.9% Overhead: Wrapping COM objects 10.4% Overhead: AddRef/Release/nsCOMPtr 9.7% Overhead: QueryInterface 8.9% Overhead: xpc_qsStringToJsval 5.9% Other, including the quick stubs themselves Why does the patch reduce the percentage of time spent in JS? I dunno. Maybe JSPropertyOp getters and setters are immensely faster than JSNative getters and setters. (?) Why so much time wrapping COM objects? In short, I haven't succeeded at making a specialized fast path through XPCConvert::NativeInterface2JSObject. It's hairy. Any methods and getters that return XPCOM objects hit this code. Profile: 15.0% JS 1.0% nsIDOMNode_GetNextSibling quick stub itself (38.1% total) 25.5% xpc_qsXPCOMObjectToJsval, XPCCallContext ctor/dtor 4.7% DOM 3.3% AddRef/Release/nsCOMPtr 0.3% xpc_qsUnwrapImpl itself 3.1% QI 0.2% other 0.3% nsIDOMNode_GetFirstChild quick stub itself (15.5% total) 11.3% xpc_qsXPCOMObjectToJsval, XPCCallContext ctor/dtor 1.2% DOM 1.2% AddRef/Release/nsCOMPtr 0.1% xpc_qsUnwrapImpl itself 1.1% QI 0.3% other 0.2% nsIDOMNode_GetNodeName quick stub itself (8.8% total) 3.7% DOM 3.1% xpc_qsStringToJsval (it's copying! why?) 0.1% xpc_qsUnwrapImpl itself 0.6% QI 0.8% AddRef/Release/nsCOMPtr 0.3% other 0.6% nsIDOMNode_GetNodeType quick stub itself (8.5% total) 3.5% AddRef/Release/nsCOMPtr 0.5% xpc_qsUnwrapImpl itself 3.5% QI 0.2% DOM 0.2% other 0.2% nsIDOMHTMLElement_GetClassName quick stub itself (7.4% total) 2.9% xpc_qsStringToJsval (again with the copying!) 2.2% DOM 0.9% AddRef/Release/nsCOMPtr 0.1% xpc_qsUnwrapImpl itself 0.9% QI 0.2% other 0.2% nsIDOMNode_GetNodeValue quick stub itself (5.9% total) 2.9% xpc_qsStringToJsval 1.2% DOM 0.0% xpc_qsUnwrapImpl itself 0.5% QI 0.7% AddRef/Release/nsCOMPtr 0.2% nsAString overhead 0.2% other 0.8% other (misc DOM, QI, and string overhead) My loose threads: 1. Autogenerate quick stubs; work toward something land-able. 2. Reduce object-wrapping overhead. 3. Reduce other overhead. 4. Figure out why the first run is so much slower. I will do #1 unless encouraged otherwise.
Jason - great stuff - a 2x win is nothing to sneeze at so I would agree that getting something landed with room for further optimization would be the right first step. IS there anything hot in NativeInterface2JSObject that is an easy candidate for optimization?
> Another question: The first time you push the button, and every > time you push it after a pause of a second or so, the test takes > twice as long to run. Anyone know why? A quick profile suggested > GC is only a small part of the answer. Do we spend more time under NativeInterface2JSObject in those cases? I'd expect that if we GC all the XPCWrappedNatives we have to rewrap all the nodes if we touch them again... XPCWrappedNative::GetNewOrUsed is a known issue perf-wise. I think it's worth adressing, but not necessarily as part of this bug. Longer-term, perhaps we can just stop using XPCWrappedNative entirely for DOM nodes. One other question. With these changes, does XPCNativeWrapper automation still work right?
Oh, and one more thing. When designing stub-autogeneration, we may want to keep in mind the Null= and Undefined= stuff people are introducing in web idl: that is, that different methods might need to convert JS null and undefined to different things, based on IDL annotations. Not sure whether it affects the autogeneration code at all, but worth remembering.
Eventually we probably want to be able to call DOM methods from within JIT-compiled JavaScript code. Since such code uses its own native frame format instead of the usual cx->fp->regs layout, the DOM code can't safely reach back into the VM and inspect these data structures directly. We can probably create some fast paths for certain information that can be extracted from the native frame layout (i.e. what scripts called me, whats argv[0], etc). But all this information should be retrieved through an explicit API and JSContext* and similar data structures should be just black box void* pointers as far as the DOM is concerned.
(In reply to comment #24) > Created an attachment (id=328808) [details] > Testcase using appendChild > > Let me know if I should look for some other method to test? I wrote a quick stub for appendChild, very similar to the other stuff in the patch, to run this test. The quick stub behaves the same as the Real Thing in these three cases. The first test appends a text node to document.body. The second and third both throw. I haven't implemented XPCWrappedJS auto-wrapping yet (XPCConvert::JSObject2NativeInterface), so it is perhaps coincidence that tests 2 and 3 behave the same. But no nasty surprises. (In reply to comment #28) > Do we spend more time under NativeInterface2JSObject in those cases? I'd > expect that if we GC all the XPCWrappedNatives we have to rewrap all the nodes > if we touch them again... Yes, of course. I feel silly now. :) > One other question. With these changes, does XPCNativeWrapper > automation still work right? I don't know what that is, so... good question. Tests are welcome! (In reply to comment #29) > Oh, and one more thing. When designing stub-autogeneration, we may want to > keep in mind the Null= and Undefined= stuff people are introducing in web idl: > that is, that different methods might need to convert JS null and undefined to > different things, based on IDL annotations. Not sure whether it affects the > autogeneration code at all, but worth remembering. I plan to implement it the way it is in XPCConvert today. I'll be sure to make the autogeneration code choke on any annotations it doesn't understand.
> I don't know what that is, so... good question. Tests are welcome! I wonder whether we can get some general XPCNativeWrapper regression tests into the tree so we don't have to worry too much about this.... What exceptions did the second and third tests end up throwing?
(In reply to comment #27) > IS there anything hot in NativeInterface2JSObject that is an easy > candidate for optimization? All the following numbers are percentages of the overall end-to-end total in the profile. Keep in mind this is from an bogus benchmark that mainly exercises one path, the quickest success path through this code. 28.3% is spent under NativeInterface2JSObject, including: 3.4% in XPC_XOW_ClassNeedsXOW, which is silly. 7.5% in locking, even though there's no lock contention and the objects in question happen to be main-thread-only. The fastest path through it locks and unlocks rt->GetMapLock twice. 5.9% in XPCNativeInterface::GetNewOrUsed. In the fast path (Used, not New) the only reason this needs to be called is because XPCWrappedNative::FindTearOff needs it. I wonder if we can ever avoid calling FindTearOff, maybe by asking the native's proto something about the class. 3.2% in cycle-collector suspicion, sigh. 2.9% in hash tables. I bet those could be faster but I'm not sure.
(In reply to comment #32) > > I don't know what that is, so... good question. Tests are welcome! > > I wonder whether we can get some general XPCNativeWrapper regression tests into > the tree so we don't have to worry too much about this.... > > What exceptions did the second and third tests end up throwing? Test 2 throws: NS_ERROR_DOM_HIERARCHY_REQUEST_ERR Test 3 throws: NS_ERROR_XPC_BAD_OP_ON_WN_PROTO On test 2, my quick stub differs because I don't have auto-wrapping working yet. It throws NS_ERROR_XPC_BAD_OP_ON_WN_PROTO instead (unable to unwrap the object). But I'll get there.
Sounds great.
Attached patch Proof of concept v3 (obsolete) (deleted) — Splinter Review
This adds: - Benjamin Smedberg's XPIDL parser in Python (xpidl.py) - A new script to autogenerate quick stubs (qsgen.py) - A little build glue. It's not any faster than v2. Passes Mochitests. I also did a pretty thorough code read-through to find places where a quick stub can differ from XPConnect. I'll comment on this in a sec.
Attachment #328945 - Attachment is obsolete: true
Note: yacc.py and lex.py are LGPL, so you're going to have to put them in other-licenses/ply. Because they are build-only tools it's not a problem to have them in the tree.
There's a long comment in qsgen.py about incompatibilities, which I'll paste here. The complete list of known differences, as of this writing, after an assiduous search: - Quick stub getters and setters are JSPropertyOps-- that is, they do not use JSPROP_GETTER or JSPROP_SETTER. This means __lookupGetter__ does not work on them. This change is visible to scripts. - Many quick stubs don't create an XPCCallContext. In those cases, no entry is added to the XPCCallContext stack. So native implementations of quick-stubbed methods must avoid nsXPConnect::GetCurrentNativeCallContext. (Even when a quick stub does have an XPCCallContext, it never pushes it all the way to READY_TO_CALL state, so a lot of its members are garbage. But this doesn't endanger native implementations of non-quick-stubbed methods that use GetCurrentNativeCallContext and are called indirectly from quick-stubbed methods, because only the current top XPCCallContext is exposed--nsAXPCNativeCallContext does not expose XPCCallContext::GetPrevCallContext.) - There are a few differences in how the "this" JSObject is unwrapped. Ordinarily, XPConnect searches the prototype chain of the "this" JSObject for an XPCOM object of the desired "proto". For details, see the parts of XPCWrappedNative::GetWrappedNativeOfJSObject that use "proto". Some quick stubs (methods, not getters or setters, that have XPCCallContexts) do this, but most instead look for an XPCOM object that supports the desired *interface*. This is more lenient. The difference is observable in some cases where a getter/setter/method is taken from one object and applied to another object. Another notable difference in this area: Quick stubs don't support split objects. - Quick stubs don't call XPCContext::SetLastResult. This is visible on the Components object. (It would be easy to fix this if anyone cares.) - Quick stubs skip a security check that XPConnect does in XPCWrappedNative::CallMethod. This means the security manager doesn't have an opportunity to veto accesses to members for which quick stubs exist. - Error messages in certain cases aren't as good. For example, document.body.appendChild(31) says this in trunk: Could not convert JavaScript argument arg 0 [nsIDOMHTMLBodyElement.appendChild] With a quick stub for nsIDOMNode.appendChild, you don't get the last bit. I think we must retain at least *some* kind of interface or class name (not necessarily nsIDOMHTMLBodyElement) *and* the member name (appendChild), so this will have to change. - There are many features of IDL that XPConnect supports but xpcqsgen does not, including dependent types and out parameters.
If you care what's fast and what's not, *please* help me choose which methods and attributes should have quick stubs! Unless I hear otherwise, I'm going with the list in comment 3. Judging by the output of nm dom_quickstubs.o, it looks like quick stubs weigh about 382 bytes each on Mac (Intel). The overhead in xpc_quickstubs.o looks like 2KB. I expect that to go up somewhat. Roughly: fifty for 22KB, a hundred for 40KB.
I'd think that .style.left and .style.top should have quickstubs (so a quickstub for .style and two for left/top).
(In reply to comment #38) > - There are a few differences in how the "this" JSObject is > unwrapped. Ordinarily, XPConnect searches the prototype chain of > the "this" JSObject for an XPCOM object of the desired "proto". > For details, see the parts of > XPCWrappedNative::GetWrappedNativeOfJSObject that use "proto". You might be wary of Firebug here (ie, test with it); in the past it's been sensitive to changes in this stuff.
I've pulled together my DOM tests and brought them in to the updated Dromaeo suite. You can run them from your browser here: http://v2.dromaeo.com/?dom If you wish to run them locally (but still in a browser) you can get it from the Git repository here (to build a copy just run 'make web'): http://github.com/jeresig/dromaeo/tree/master The tests are pretty raw right now but they have solid coverage of the DOM (attributes, modification, querying, traversal, and events) and coverage of all the major libraries (extensive coverage of jQuery and Prototype). Let me know if you think any of the tests need tweaking (or if some things need to be tested more) and I can make it happen.
I took Brendan's patch and made a few changes. Browsing google and yahoo mail and other randomly chosen web apps, I got these numbers: 167847 nsIDOMNode.childNodes 68438 nsIDOMElement.tagName 47460 nsIDOMNode.parentNode 45434 nsIDOMElementCSSInlineStyle.style 35978 nsIDOMElement.getAttribute 34445 nsIDOMElement.attributes 33701 nsIDOMNamedNodeMap.getNamedItem 33237 nsIDOMAttr.value 26206 nsIDOMEvent.type 25650 nsIDOMElement.setAttribute 24981 nsIDOMJSWindow.setTimeout 24948 nsIXPCComponents.interfaces 24418 nsISimpleEnumerator.hasMoreElements 24121 nsISimpleEnumerator.getNext 23968 nsICookie2.isSession 23609 nsIDOMMouseEvent.clientY 23609 nsIDOMMouseEvent.clientX 23485 nsIDOMNSUIEvent.layerY 23485 nsIDOMNSUIEvent.layerX 19535 nsIDOMEvent.target 16701 nsIDOMNodeList.length 15005 nsIDOMNSCSS2Properties.opacity 14258 nsIDOMElement.getElementsByTagName 13932 nsIDOMNode.ownerDocument 13825 nsIDOMHTMLAnchorElement.href 13350 nsIDOMMouseEvent.relatedTarget 13205 nsIDOMDocumentView.defaultView 12397 nsIDOMMouseEvent.ctrlKey 12396 nsIDOMMouseEvent.metaKey 12395 nsIDOMMouseEvent.shiftKey 12377 nsIDOMMouseEvent.altKey 11957 nsIDOMMouseEvent.button 11223 nsIXMLHttpRequest.readyState 10993 nsIDOMMouseEvent.screenY 10993 nsIDOMMouseEvent.screenX 10803 nsIDOMLocation.href 10226 nsIDOMKeyEvent.keyCode 10043 nsIDOMNSHTMLElement.innerHTML 9492 nsIDOMXULDocument.commandDispatcher 9456 nsIDOMCSS2Properties.visibility 9366 nsIDOMWindow.document 8696 {nsIPrefBranch,nsIPrefBranch2}.getIntPref 8439 nsIDOMDocument.documentElement 8391 nsIDOMViewCSS.getComputedStyle 8327 nsIDOMElement.removeAttribute 8255 nsIDOMNode.nodeType 8058 nsIJSCID.getService 8006 nsIDOMElement.getAttributeNS 7624 nsIDOMWindow.top 7550 nsIDOMElement.id and so on. This is slightly edited for readability. I'll attach the new patch and the raw data.
This is not so much a "v2" as a sequel.
Attachment #293060 - Attachment is obsolete: true
Dunno if there is a coverage vs code size tradeoff here - if so based on those counts you get coverage like so: Top100: 86.64% Top150: 92.68% Top200: 95.95%
Yeah, I think we should generate until it hurts. :) Some of the things that are a little low on that page (.left) are used in perf-sensitive contexts if they're used at all, so I'm all about turning the dial to 11 (for HTML/CSS DOM) to start and then backing off if we discover that it costs us a meg of codesize. It'd be interesting to get a census for just startup, too, to see how much we might win there from making XUL manipulation faster.
Here are the stats from a Dromaeo DOM run. 878061 nsIDOMNode.previousSibling 211132 nsIDOMElement.tagName 151535 nsIDOMNodeList.length 145340 nsIDOMXPathResult.snapshotItem 142386 nsIDOMHTMLElement.id 107142 nsIDOMNode.parentNode 81888 nsIDOMElement.getAttribute 62420 nwIDOMElement.getElementsByTagName 47961 nsIDOMElementCSSInlineStyle.style 44821 nsIDOMHTMLDocument.body 40961 nsIDOMHTMLElement.id 38870 nsIDOMElement.setAttribute 32540 nsIDOMDocumentView.defaultView 31494 nsIDOMNode.cloneNode 19512 nsIDOMElement.childNodes 18767 nsIDOMCSS2Properties.display 17047 nsIDOMNode.insertBefore 13425 nsIDOMViewCSS.getComputedStyle 12777 nsIDOMNode.removeChild 11509 nsIDOMNSHTMLElement.innerHTML 10169 nsIDOMJSWindow.setTimeout 10024 nsIDOMNode.ownerDocument 9765 nsIDOMCSS2Properties.display 7869 nsIDOMDocument.createElement 6212 nsIDOMCSS2Properties.visibility 5071 nsIDOMNode.lastChild 4650 nsIDOMCSS2Properties.visibility 4650 nsIDOMCSS2Properties.position 4006 nsIDOMHTMLDocument.createTextNode 3107 nsIDOMCSS2Properties.color 2605 nsIDOMHTMLDocument.getElementsByTagName 2325 nsIDOMCSS2Properties.position 2325 nsIDOMNSHTMLElement.clientWidth 2325 nsIDOMNSHTMLElement.clientHeight 2265 nsIJSID.equals 2221 nsIDOMXPathResult.iterateNext 1883 {nsIPropertyBag2,nsIWritablePropertyBag2}.getProperty 1718 nsIDOMNode.removeAttribute 1550 nsIDOMCSSStyleDeclaration.cssText 1550 nsIDOMCSSStyleDeclaration.cssText 1550 nsIDOMCSS2Properties.color 1525 nsIDOMCSS2Properties.width 1506 nsIDOMCSS2Properties.width 1416 nsIDOMEventTarget.dispatchEvent 1416 nsIDOMDocumentEvent.createEvent 1416 nsIDOMEvent.initEvent 799 nsIDOMXPathEvaluator.evaluate 736 nsIXPCComponents.interfaces 732 nsIFeedResult.version 672 nsIDOMXPathResult.snapshotLength I think when a member appears twice, it's because it's a property, and both the getter and the setter are hot. Next I'll add Schrep's requested stat and I'll measure browser startup.
(In reply to comment #48) To shaver's point why not codegen for 100%, build, and see what that does to perf?
(In reply to comment #49) > (In reply to comment #48) > To shaver's point why not codegen for 100%, build, and see what that does to > perf? OK, I'll try taking the DOM to 11 and see what happens. It's a lot of methods, though, so I suspect we'll have to pick and choose in the end. We can't codegen for *all* scriptable XPCOM objects because qsgen.py isn't, like, a complete reimplementation of XPConnect. There are *lots* of corner cases, and even not-so-corner cases, like out parameters and optional parameters, where it currently bails out.
Here are stats for browser startup. top 50: 82.4% top 100: 90.9% (by this point you're down to methods called 5 times or less) top 150: 94.6% top 200: 96.6% Browser startup only, top 50: 592 11.3 nsIJSID.equals 464 20.2 {nsIDOMElement,nsIDOMXULElement}.get childNodes 365 27.2 nsIXPCComponents.get interfaces 320 33.4 nsIJSCID.getService 286 38.8 nsIDOMNodeList.get length 196 42.6 {nsIDOMElement,nsIDOMHTMLHtmlElement,nsIDOMXULElement}.getAttribute 176 46.0 nsISimpleEnumerator.hasMoreElements 164 49.1 nsISimpleEnumerator.getNext 162 52.2 nsICookie2.get isSession 153 55.1 {nsIDOMHTMLInputElement,nsIDOMXULElement}.setAttribute 113 57.3 nsIXPCComponents.get classes 85 58.9 nsIPrefService.getBranch 82 60.5 nsIJSID.get name 62 61.7 {nsIDOMDocument,nsIDOMXULElement}.get parentNode 55 62.7 {nsIDOMElement,nsIDOMXULElement}.hasAttribute 54 63.8 nsIObserverService.addObserver 53 64.8 {nsIPrefBranch,nsIPrefBranch2}.getBoolPref 50 65.7 nsIDOMNodeList.item 46 66.6 {nsIFile,nsILocalFile}.exists 46 67.5 nsIJSCID.createInstance 44 68.4 nsIDOMXULElement.get id 43 69.2 nsIPrefBranch.getPrefType 42 70.0 nsIDOM3Node.get textContent 34 70.6 nsIXPCComponents.get ID 32 71.2 {nsIPrefBranch,nsIPrefBranch2}.getIntPref 32 71.9 {nsIPrefBranch,nsIPrefBranch2}.getCharPref 32 72.5 nsIDOMXULElement.removeAttribute 29 73.0 {nsIFile,nsILocalFile}.append 29 73.6 nsIProperties.get 28 74.1 nsIDocShell.get securityUI 28 74.7 nsIDOMDocumentXBL.getAnonymousElementByAttribute 27 75.2 nsIDOMXULElement.get boxObject 27 75.7 nsIXPCComponents.get results 27 76.2 nsIDOMXULElement.set collapsed 25 76.7 {nsIPrefBranch,nsIPrefBranch2}.getComplexValue 24 77.1 nsIXPCComponents_Utils.import 24 77.6 {nsIIOService,nsIIOService2}.newURI 23 78.0 {nsIPrefBranch2,nsIPrefBranchInternal}.addObserver 22 78.5 nsIURI.get scheme 21 78.9 nsIStringBundle.GetStringFromName 21 79.3 nsIXPCComponents.get utils 20 79.7 nsIURI.get spec 19 80.0 nsIDOMXULDocument.get commandDispatcher 18 80.4 nsIDOMCSSStyleDeclaration.getPropertyValue 18 80.7 nsIObserverService.notifyObservers 18 81.1 nsIDOMWindow.get top 18 81.4 nsIDOMXULCommandDispatcher.getControllerForCommand 18 81.7 nsIBoxObject.get width 17 82.1 nsIDOMJSWindow.setTimeout 16 82.4 nsIDOMWindowInternal.get content The second column a cumulative percentage of the total number of calls logged. In this sample, the top 50 methods together account for 82.4% of all calls; the top 10, 55.1%.
(In reply to comment #50) > (In reply to comment #49) > > (In reply to comment #48) > > To shaver's point why not codegen for 100%, build, and see what that does to > > perf? > > OK, I'll try taking the DOM to 11 and see what happens. It's a lot of methods, > though, so I suspect we'll have to pick and choose in the end. > > We can't codegen for *all* scriptable XPCOM objects because qsgen.py isn't, > like, a complete reimplementation of XPConnect. There are *lots* of corner > cases, and even not-so-corner cases, like out parameters and optional > parameters, where it currently bails out. > If it is easily tunable - based on the stats I'd try 100% (or whatever is near 100%) and top150. We run drameo on both and see what happens. I'll test if you give me try server builds!
That's a great idea. I should run this through the try server anyway. The reasons I can't do this in the next hour are: * qsgen.py currently lacks every single XPCOM/XPConnect feature it hasn't needed yet, which is a lot (out parameters, dependent types, arrays, non-DOM strings, floats, etc. etc.). * Quick stubs are only for main-thread-only objects that get prototypes, and I'm not sure which interfaces comply; I'm just going to ask jst about this as I go. * The magic by which quick stubs are attached to XPConnect prototype JSObjects currently involves some hand coding per interface. I should eliminate that hassle. * The Makefile changes have been giving me some grief; I'll ping ted or bsmedberg about it early tomorrow. Soon though!
Keywords: perf
Attached patch Proof of concept v4 (obsolete) (deleted) — Splinter Review
This patch wins 28% on Dromaeo DOM: m-c tip: http://v2.dromaeo.com/?id=17055 (8678.2 msec) m-c+wip4: http://v2.dromaeo.com/?id=17054 (6779.4 msec) I encourage folks to take a look at this one. The four bullet points in my previous comment are pretty much taken care of. I'll throw this at the try server now. To do before review: * Due to licensing issues, the Python lex and yacc stuff needs to go in a third-party directory (trivial); * Make less quick stubs. This patch stubs out almost every property and method under dom/public/idl; as a result XUL is 1MB larger (on Mac). * Fix a bug I found today: quick stubs are always enumerable, regardless of DONT_ENUM_STATIC_PROPS. I think the answer is probably "don't do that"; i.e. don't use quick stubs on any interfaces implemented by classes that have DONT_ENUM_STATIC_PROPS, and enforce with assertions.
Attachment #329850 - Attachment is obsolete: true
You could make those quickstubs non-enumerable, right? Also, you'll need to rev IIDs before final commit.
make[6]: *** No rule to make target `/Users/jruderman/central/mozilla/dom/src/base/dom_quickstubs.cfg', needed by `dom_quickstubs.h'. Stop.
Attached patch Proof of concept v5 (obsolete) (deleted) — Splinter Review
Yep, sorry about that. This one should have less FAIL in it, I hope.
Attachment #331722 - Attachment is obsolete: true
This one's broken too. Better patch on Monday.
Attached patch Proof of concept v5a (obsolete) (deleted) — Splinter Review
OK, Jesse, this one will build on Mac or Linux. Apply it on top of revision 2d91c01ea27b. On Windows, dom_quickstubs.cpp won't compile. (This is the auto-generated source file containing quick stub implementations. It #includes 331 header files, and there is a conflict somewhere: somebody #defines ERROR and later nsIDOMNSEvent.h tries to `enum { ERROR = 8388608 };` But it used to work. It broke when I moved the file from $objdir/dom/src/base to $objdir/js/src/xpconnect/src, which was necessary to avoid breaking non-libxul linkage. Investigating.)
Attachment #332033 - Attachment is obsolete: true
Attached patch Proof of concept v6 (obsolete) (deleted) — Splinter Review
Snapshot taken while sprinting toward a reviewable state. I'm running this through the try server tonight. I'm not sure it will build on Windows; details below. Changes in this version: * Changed #undef ERROR workaround in nsIDOMNSEvent.idl. For some reason this only applies #ifdef WINCE; I changed it to #ifdef WIN32. * Moved Python Lex-Yacc to /other-licenses/ply. * Support quick stubs on split objects. * Support quick stubs for methods with optional params. * Add more interfaces to dom_quickstubs.qsconf; also comment out quite a few interfaces that contain methods whose implementations may call GetCurrentNativeCallContext, as such methods mustn't be stubbed. * Fix a bug in quick stubs for getters and methods that return nsIVariant * Fix bug: make qsgen.py refuse to generate a stub for a non-readonly attribute of type nsIVariant (previously it would generate a setter that would do the wrong thing at run time). * Fix bug: quick stub properties for readonly attributes were JSPROP_READONLY, which seems sensible, but produces the wrong behavior. Now they instead have a setter that always fails. It now seems to work properly when setting or redefining a property of the (split!) window object, as in |var parent=3;| which shadows window.__proto__.parent--although I still don't thoroughly understand how or why that works.
Attachment #332213 - Attachment is obsolete: true
Forgot to mention: This is failing a mochitest that examines Error.stack: /tests/js/src/xpconnect/tests/mochitest/test_bug390488.html not ok - Stack from walking caller chain should be correct got " 1. checkForStacks 2. onclick 3. simulateClick", expected " 1. checkForStacks 2. onclick 3. dispatchEvent 4. simulateClick" ok - Stack from |new Error().stack| should include simulateClick Mozilla Bug 390488 Here dispatchEvent doesn't generate a JS stack frame because it has a quick stub (which is a JSFastNative). brendan/mrbkap: Should I downgrade to JSNative? Or loosen up the test?
You should loosen the test, I think. I only wrote it that way because that let me easily use is(). At least assuming JSFastNative is faster than JSNative. ;)
Attached patch Proof of concept v7 (obsolete) (deleted) — Splinter Review
Based on revision 253a65668b31. And this is pretty much it. I expect to post a reviewable version later today, with the sole difference that it'll make fewer quick stubs (hundreds instead of thousands). I'm posting this version just because it seems like fuzzing it should find crashes 10x faster (due to 10x more quick stubs). *shrug*
Attachment #332614 - Attachment is obsolete: true
Mind kicking off a try server build?
Kicked one just now. I ran out of time today. Reviewable patch on Monday. :-P
Attached patch v1 (obsolete) (deleted) — Splinter Review
Try it out. You should see a win of around 29% on the Dromaeo DOM tests: http://v2.dromaeo.com/?dom This version creates 382 quick stubs. The list is pretty arbitrary but includes all the properties that were hot in either of the two big test runs I did--plus a lot more stuff from canvas and events, as those are places where there could definitely be thousands of calls in quick succession.
Attachment #332982 - Attachment is obsolete: true
Attachment #333301 - Flags: review?(jst)
Comment on attachment 333301 [details] [diff] [review] v1 r?bsmedberg for build system changes r?mrbkap for security and a +JS_FRIEND_API change in jscntxt.h
Attachment #333301 - Flags: review?(mrbkap)
Attachment #333301 - Flags: review?(benjamin)
On Mac, this makes XUL 294KB larger. I see three obvious ways to try and reduce this: 1. make fewer quick stubs 2. un-inline some of the helper functions in xpcquickstubs.h 3. build dom_quickstubs.cpp with "optimize for code size" compiler flags I'll try #2 and #3, and ignore #1 unless there are helpful comments.
I thought -Os was already the default.... What's the performance effect of un-inlining those functions? What's the codesize effect? Depending on how big the functions are and how commonly they're used, un-inlining them can go anywhere from actually increasing codesize to actually improving performance (due to better cache locality).
Attached file crash steps and stack (obsolete) (deleted) —
I'm crashing a lot with infinite recursion through XPCThrower::Throw after applying 'proof of concept v7'. See the attachment for details.
Jesse, I can't reproduce that. The stack looks a little funky too. Are you in a debug build? Windows?
I was using a debug build on Mac. The stack trace was from the system crash reporter; I can try again in gdb if you want.
Attached file gdb stack trace for the crash in comment 70 (obsolete) (deleted) —
OK, I'm getting it too now that I'm testing the right thing. More in a sec.
Comment on attachment 333301 [details] [diff] [review] v1 > %{C++ >-#ifdef WINCE >+#ifdef WIN32 > #undef ERROR > #endif > %} Might as well make this #ifdef ERROR #undef ERROR >diff --git a/js/src/xpconnect/idl/nsIXPCScriptable.idl b/js/src/xpconnect/idl/nsIXPCScriptable.idl >+ >+ void postCreatePrototype(in JSContextPtr cx, in JSObjectPtr proto); You should probably document that this callback is controlled by WANT_POSTCREATE (I assume we're not adding a new flag because we don't have any more bits?) >diff --git a/js/src/xpconnect/src/Makefile.in b/js/src/xpconnect/src/Makefile.in > LOCAL_INCLUDES = \ > -I$(srcdir)/../loader \ >+ -I$(DEPTH)/dist/include/content \ >+ -I$(DEPTH)/dist/include/dom \ >+ -I$(DEPTH)/dist/include/editor \ >+ -I$(DEPTH)/dist/include/layout \ >+ -I$(DEPTH)/dist/include/rdf \ >+ -I$(DEPTH)/dist/include/svg \ >+ -I$(DEPTH)/dist/include/xuldoc \ >+ -I$(DEPTH)/dist/include/xultmpl \ This is incorrect, you want REQUIRES += content dom editor etc... >+export:: dom_quickstubs.h Is there a reason you have to do this during the export phase? If you do this during the libs phase, you can just set the searchdir to dist/idl and skip the includePath in dom_quickstubs.qsconf. I think you can do this during libs by a rule nsXPConnect.$(OBJ_SUFFIX): dom_quickstubs.h >+dom_quickstubs.h: $(srcdir)/dom_quickstubs.qsconf \ >+ $(topsrcdir)/xpcom/idl-parser/qsgen.py \ >+ $(topsrcdir)/xpcom/idl-parser/header.py \ >+ $(topsrcdir)/xpcom/idl-parser/xpidl.py >+ $(PYTHON) $(topsrcdir)/xpcom/idl-parser/qsgen.py \ >+ --root=$(topsrcdir) \ >+ --cachedir=$(DEPTH)/xpcom/idl-parser \ >+ --header-output dom_quickstubs.h \ >+ --stub-output dom_quickstubs.cpp \ >+ --makedepend-output dom_quickstubs.depends \ >+ $(srcdir)/dom_quickstubs.qsconf nit, indent the continuation lines "\t " >diff --git a/js/src/xpconnect/src/dom_quickstubs.qsconf b/js/src/xpconnect/src/dom_quickstubs.qsconf >+# The Initial Developer of the Original Code is >+# Mozilla Corporation. Foundation owns all our copyrights. >+# Portions created by the Initial Developer are Copyright (C) 1999 1999 doesn't sound correct >+# A quick warning: >+# >+# Attributes or methods that call GetCurrentNativeCallContext must not be >+# quick-stubbed, because quick stubs don't generate a native call context. >+# qsgen.py has no way of knowing which attributes and methods do this, as it >+# looks at interfaces, not implementations. The symptoms, if you quick-stub >+# one of those, can be really weird, because GetCurrentNativeCallContext >+# doesn't crash--it may in fact return a plausible wrong answer. Is it possible, in debug builds, to insert a dummy native call context as part of the quickstub, which causes asserts if you try to get it? >diff --git a/js/src/xpconnect/src/xpcquickstubs.cpp b/js/src/xpconnect/src/xpcquickstubs.cpp >+static const xpc_qsHashEntry * >+LookupEntry(PRUint32 tableSize, const xpc_qsHashEntry *table, const nsID &iid) This hashtable code looks a whole lot like plhash, which is a chained hashtable. Did you consider using that instead of inventing a new one? http://mxr.mozilla.org/mozilla-central/source/nsprpub/lib/ds/plhash.h I did not review the rest of this file in any depth. >diff --git a/js/src/xpconnect/src/xpcquickstubs.h b/js/src/xpconnect/src/xpcquickstubs.h >+ * The Original Code is Mozilla Communicator client code, released >+ * March 31, 1998. again, check details in this license block >diff --git a/xpcom/idl-parser/qsgen.py b/xpcom/idl-parser/qsgen.py Because this file is xpconnect-specific, I tend to think it should live in xpconnect, not in xpcom. And, check the license block ;-) I kinda wish we had a better templating language for qsgen, but since this works we'll use it. r=me with nits and comments addressed
Attachment #333301 - Flags: review?(benjamin) → review+
The REQUIRES changes (and maybe all of the quickstubs) should only get built ifndef XPCONNECT_STANDALONE, right?
xpconnect-standalone is not maintained and should be removed.
(In reply to comment #75) > >+# A quick warning: > >+# > >+# Attributes or methods that call GetCurrentNativeCallContext must not be > >+# quick-stubbed, because quick stubs don't generate a native call context. > >+# qsgen.py has no way of knowing which attributes and methods do this, as it > >+# looks at interfaces, not implementations. The symptoms, if you quick-stub > >+# one of those, can be really weird, because GetCurrentNativeCallContext > >+# doesn't crash--it may in fact return a plausible wrong answer. > > Is it possible, in debug builds, to insert a dummy native call context as part > of the quickstub, which causes asserts if you try to get it? Is this something we could have a static analysis check for?
The sequence of events in Jesse's crash seems to be: - Start transitioning between pages. This happens in nsGlobalWindow::SetNewDocument. For a moment, there is no inner window object. - One thing we do during that time is try to wrap the existing navigator object. http://mxr.mozilla.org/mozilla-central/source/dom/src/base/nsGlobalWindow.cpp#1764 Note that mInnerWindow won't be set until line 1816. - Now we try to create the navigator XPCWNProto object; we try to define quick stubs on it; which results in a call to js_NewFunction; which ends up calling XPC_WN_InnerObject; which fails. So this is bug #1; calling XPC_WN_InnerObject at this point is bad. Bug #2 is that XPConnect blows the stack trying to report this error. If the error can be reported safely, nsDOMClassInfo will catch it and ignore it, so fixing bug #2 is the easiest fix.
To clarify comment 79, this crash does not happen in "v1" because v1 doesn't have quick stubs for any interfaces that the navigator object implements. jst suggests fixing bug #1 by wrapping the navigator object before clearing mInnerWindow. It still might be a good idea to have XPC_WN_InnerObject not call Throw().
Attached patch incremental patch from v1 to v2 (obsolete) (deleted) — Splinter Review
Attached patch v2 - addressing bsmedberg comments mostly (obsolete) (deleted) — Splinter Review
Two things aren't addressed: the desired for dummy XPCCallContexts, which I will look at on Monday; and the comment "You should probably document that this callback is controlled by WANT_POSTCREATE", just a silly oversight which I'll fix with a comment in v3.
Attachment #333301 - Attachment is obsolete: true
Attachment #333301 - Flags: review?(mrbkap)
Attachment #333301 - Flags: review?(jst)
Attached patch fuzzer bonus patch for v2 (deleted) — Splinter Review
Apply v2, then apply this to maximize opportunities for quickstub-related crashiness. This contains: (1) a huge number of quick stubs, the main feature; (2) a fix for the navigator-related crashiness of comment 70.
Attachment #333350 - Attachment is obsolete: true
Attachment #333634 - Attachment is obsolete: true
Attachment #334004 - Flags: review?(jst)
Attachment #334004 - Flags: review?(mrbkap)
I'm seeing mochitest failures in v2. Diagnosing now.
Comment on attachment 334005 [details] [diff] [review] v2 - addressing bsmedberg comments mostly I started looking at this, and so far it looks good but I didn't get through all that much of it. The one thing that caught my eye so far was this: - In xpcprivate.h: void SuspendRequest() { - if (mCX && mCX->thread != XPCPerThreadData::sMainJSThread) + if (mCX && XPCPerThreadData::IsMainThread(mCX)) That should be !XPCPerThreadData::IsMainThread(mCX), right? More later...
(In reply to comment #85) > That should be !XPCPerThreadData::IsMainThread(mCX), right? Yes. (In reply to comment #84) > I'm seeing mochitest failures in v2. Diagnosing now. This bug was pretty crazy. The specific failure is like this: var ctx = canvas.getContext('2d'); try { ctx.globalAlpha = Infinity; // should and does throw } catch (e) { } assert(DOMException !== undefined); // fails The badness happens in the setter. After SetGlobalAlpha returns failure, in XPConnect, when creating the DOMException, XPConnect calls nsWindowSH::PostCreate. It does a JS_LookupProperty(cx, window, "DOMException") for complicated reasons (see the source; there's a comment). Without the quick stub, we're in an XPConnect getter here, so there's a native JSStackFrame, and this JS_LookupProperty results in a js_LookupPropertyWithFlags() call with flags==0. With the quick stub, the top JSStackFrame is the one for the script, so the current opcode is an assigning opcode (assigning to ctx.globalAlpha), so flags==JSRESOLVE_QUALIFIED|JSRESOLVE_ASSIGNING. It gets even more ridiculous from there, but that's the root of the trouble, fixed by changing that JS_LookupProperty call to JS_LookupPropertyWithFlags(... JSRESOLVE_CLASSNAME). Patch coming.
Attached patch incremental patch from v2 to v3 (obsolete) (deleted) — Splinter Review
Attached patch v3 (obsolete) (deleted) — Splinter Review
Attachment #334004 - Attachment is obsolete: true
Attachment #334005 - Attachment is obsolete: true
Attachment #334549 - Flags: review?(jst)
Attachment #334004 - Flags: review?(mrbkap)
Attachment #334004 - Flags: review?(jst)
Attachment #334549 - Flags: review?(jst) → review?(mrbkap)
Blocks: 451279
Comment on attachment 334549 [details] [diff] [review] v3 Nit: - In xpc_qsDOMString::xpc_qsDOMString(): + else if(JSVAL_IS_NULL(v)) + { + (new(mBuf) implementation_type( + traits::sEmptyBuffer, PRUint32(0)))->SetIsVoid(PR_TRUE); + mValid = JS_TRUE; + return; + } + else else-after-return. Same thing in the other string ctors. Other than that this looks good! sr=jst We should have a discussion about what to actually include quick stubs for once this is in the tree, for instance I can't imagine how UI event getters could be performance critical enough to warrant generating quickstubs for, but we can tweak all that later if we decide there's a need to do so.
Attachment #334549 - Flags: superreview+
Comment on attachment 334549 [details] [diff] [review] v3 r=me with the nits I mentioned on IRC and a followup bug to investigate JSFUN_STUB_GSOPS.
Attachment #334549 - Flags: review?(mrbkap) → review+
Attached patch v4 - for check-in (obsolete) (deleted) — Splinter Review
Tree is currently burning, but otherwise this is ready to push.
Attachment #334546 - Attachment is obsolete: true
Attachment #334549 - Attachment is obsolete: true
Attachment #334610 - Flags: superreview+
Attachment #334610 - Flags: review+
Blocks: 451303
(In reply to comment #92) > Created an attachment (id=334610) [details] > v4 - for check-in I don't know if this is okay, but patch v4 still doesn't seem to address the first point in comment #86: void SuspendRequest() { - if (mCX && mCX->thread != XPCPerThreadData::sMainJSThread) + if (mCX && XPCPerThreadData::IsMainThread(mCX)) mDepth = JS_SuspendRequest(mCX); else mCX = nsnull; }
Good eye. That's not ok, it's a bad oversight. Fixing...
Hmm. Even with this one-byte fix, the patch is flunking the exciting new mochitests under dom/src/threads/test (threads+gczeal where available!). Run those 4 tests alone in a debug build: they pass. Run the *entire* test suite in a debug build: test_longThread.html times out, eventually 4 more tests failed and the browser became unresponsive. In addition, while I was waiting for mochitests, I did a little static analysis and found something interesting. In nsScriptSecurityManager::ReportError: http://mxr.mozilla.org/mozilla-central/source/caps/src/nsScriptSecurityManager.cpp#1435 there's this code: // Tell XPConnect that an exception was thrown, if appropriate if (sXPConnect) { nsAXPCNativeCallContext* xpcCallContext = nsnull; sXPConnect->GetCurrentNativeCallContext(&xpcCallContext); if (xpcCallContext) xpcCallContext->SetExceptionWasThrown(PR_TRUE); } This could be called from any security check, right? If so, I think this means quick stubs must do *something* to the XPCCallContext stack. Otherwise this would end up telling the wrong context that an exception was thrown.
(In reply to comment #19) > It's OK to skip the security checks if we're dropping the CAPS infrastructure > that allows blocking access per-property. I thought Thunderbird used this but I must have been dreaming...
Attached patch requires-ccx.js static analysis (deleted) — Splinter Review
Here's the static analysis code I mentioned in comment 95. I found three more methods with code similar to what ReportError does: nsScriptSecurityManager::CheckPropertyAccessImpl nsContentUtils::NotifyXPCIfExceptionPending nsScriptLoader::EvaluateScript Still not sure what to do about that.
> I thought Thunderbird used this but I must have been dreaming... It does. There was discussion about removing that at the summit, last I checked, and moving thunderbird to a different setup, if it still needs something. Jason, the link in comment 95 is off (bonsai-or-hgweb-with-revision would work better than lxr for links like that, since lxr changes all the time), but code similar to what you cite appears in nsScriptSecurityManager::ReportError and nsScriptSecurityManager::CheckPropertyAccessImpl, as you noticed. The ReportError case will only be hit from calls to nsScriptSecurityManager::CheckSameOrigin (the JSContext + URI version). The CheckPropertyAccessImpl case will also only be hit if someone calls into that code. The nsContentUtils I'm not sure, and nsScriptLoader could be a problem indeed. :( In particular, what happens if an appendChild() of a <script> with a textnode child is done? We really really want to avoid the XPCCallContext overhead if we can avoid it... maybe we can move to a better setup for these SetExceptionWasThrown things?
(In reply to comment #98) > The ReportError case will only be hit from calls to > nsScriptSecurityManager::CheckSameOrigin (the JSContext + URI version). > > The CheckPropertyAccessImpl case will also only be hit if someone calls into > that code. Ah, that's true. My script wasn't smart enough to see that cx is always null from the other call sites (and neither was I). > The nsContentUtils I'm not sure, and nsScriptLoader could be a problem indeed. > :( In particular, what happens if an appendChild() of a <script> with a > textnode child is done? > > We really really want to avoid the XPCCallContext overhead if we can avoid > it... maybe we can move to a better setup for these SetExceptionWasThrown > things? Worst case, I add a new XPCCallContext constructor that doesn't initialize anything except mState=STUB. GetCurrentNativeCallContext would check for that case and return null. This is pretty bad for perf. Here's another plan. Add a new member variable to XPCCallContext: JSStackFrame *mJSFrame; In the XPCCallContext constructor, if callerLanguage is JS_CALLER, mJSFrame is initialized to the incoming cx->fp. Now I claim that GetCurrentNativeCallContext knows the top ccx is bogus, because we're in a quick stub, if and only if (ccx.mJSFrame != ccx.mJSContext->fp). In that case it can just return null. My assumptions are that whenever JS calls into C++, there's a JStackFrame, and whenever we call C++ -> JS -> C++, that adds at least one JSStackFrame. I think this is true.
I think I'll try just deleting this SetExceptionWasThrown stuff. JS_IsExceptionPending seems to be enough.
That should probably happen in a separate bug as a prereq to this one, right? And land on different days so we can tell the regressions apart...
Depends on: 451571
Agreed. Filed bug 451571 for deleting SetExceptionWasThrown.
Attached patch incremental patch from v4 to v5 (tiny) (obsolete) (deleted) — Splinter Review
Attached patch v5 (deleted) — Splinter Review
(See also the separate patch in bug 451571. You can apply the patches in either order.)
Attachment #334610 - Attachment is obsolete: true
Attachment #334940 - Attachment is obsolete: true
Attachment #334941 - Flags: superreview+
Attachment #334941 - Flags: review+
This appears to have helped performance of the ACID3 test as far a smoothness of animation, and less reporting of test that took too long to run.
Could this landing be causing these reftest failures ? REFTEST TEST-UNEXPECTED-FAIL | file:///builds/moz2slave/trunk_linux-3/build/layout/reftests/bugs/273681-1.html | REFTEST IMAGE 1 (TEST):  REFTEST IMAGE 2 (REFERENCE):  REFTEST number of differing pixels: -1 If you look at the images, it seems the <script> isn't executing. We're seeing it across all five unittest machines since they came back online (bug 453071), and there were only four commits http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=10%3A40+PDT+August+30+2008&enddate=20%3A00+PDT+August+31+2008
I also filed bug 453105, which has the same regression range.
Depends on: 453105
The reftest failure is just like bug 453105 but for nsIDOMHTMLOptionElement/nsIDOMNSHTMLOptionElement.text. I have to ask... Given that this landed yesterday and the tree has been orange since, why is this still in the tree? Or the issue not fixed? I've commented out the relevant line in dom_quickstubs.qsconf for now, to get the tree green.
(In reply to comment #108) > The reftest failure is just like bug 453105 but for > nsIDOMHTMLOptionElement/nsIDOMNSHTMLOptionElement.text. > > I have to ask... Given that this landed yesterday and the tree has been orange > since, why is this still in the tree? Or the issue not fixed? I've commented > out the relevant line in dom_quickstubs.qsconf for now, to get the tree green. the comment out fixed Bug 453171
Depends on: 453171
Thanks for patching that, BZ, and I apologize for the sloppiness. I'm now working on finding all other such cases.
No longer depends on: 453171
Depends on: 453171
Depends on: 453331
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Could we get some verification here, with some nightly builds?
Blocks: 453705
(In reply to comment #112) > Could we get some verification here, with some nightly builds? Checked, and got a ~9500ms -> ~7500ms speedup on dromaeo dom
ACCEPTABLE. PLZ SEND MORE SPEEDUPS NOW KTHXBYE. (You guys rock.)
(In reply to comment #98) > > I thought Thunderbird used this but I must have been dreaming... > > It does. There was discussion about removing that at the summit, last I > checked, and moving thunderbird to a different setup, if it still needs > something. Did that talk make it to a bug, and is that bug where we've got Tb blocking flags? Whatever we're doing with it isn't obvious enough for me to even find it, so I can't evaluate whether we want to not ship a beta while we're naked, or only want to make sure we tuck in our slip before we ship final.
Depends on: 453825
Depends on: 453649
Depends on: 454343
Flags: in-testsuite-
Flags: in-litmus-
Depends on: 462428
No longer depends on: 463639
Depends on: 584960
Depends on: 578478
No longer depends on: 584960
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: