Closed Bug 516832 Opened 15 years ago Closed 14 years ago

TM: Conservative Stack Scanning

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta2+

People

(Reporter: wagnerg, Assigned: igor)

References

(Depends on 2 open bugs)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files, 64 obsolete files)

(deleted), image/png
Details
(deleted), patch
brendan
: review+
Details | Diff | Splinter Review
For the new garbage collector we want a conservative stack scanner. The idea is to start with a stack scan between the mark and sweep phase in the debug mode. A warning is issued whenever a pointer to a not marked GC Object is found.
Attached patch preview hack version. runs on osX and windows (obsolete) (deleted) — Splinter Review
Attached patch new version (obsolete) (deleted) — Splinter Review
This version includes the patch for bug 508707 since it is backed out currently.
Attachment #400933 - Attachment is obsolete: true
Attached patch new version (obsolete) (deleted) — Splinter Review
removed code from bug 508707. Added set for JSGCArenaInfos and minimum/maximum Heap address.
Attachment #401509 - Attachment is obsolete: true
This is some awesome prep work gregor. I will de-std::-ify this a bit.
Blocks: 518942
Attached patch patch (obsolete) (deleted) — Splinter Review
This patch is losely based on Gregor's work and scans the native stack of every thread when GC-ing. All valid pointers (tagged and raw) are marked.
Assignee: wagnerg → gal
Attachment #401573 - Attachment is obsolete: true
Blocks: 508140
Attachment #404009 - Flags: review?(igor)
Blocks: 519947
Are you sure that "dummy" always comes after the registers in the memory? Some variable reordering might break this assumption. + /* + * Non-volatile registers might still hold references. Force them onto the + * stack so we will scan them when we walk the native stack. + */ + jmp_buf registers; +#if defined(_MSC_VER) + #pragma warning(push) + #pragma warning(disable: 4611) +#endif + setjmp(registers); +#if defined(_MSC_VER) + #pragma warning(pop) +#endif + + /* + * Snapshot the top of the stack for this thread. + */ + jsuword dummy; + JS_THREAD_DATA(cx)->stackTop = &dummy;
What is the performance of the patch with deep native stacks? I remember especially during parsing of some machine-generated scripts JS may hit 100K or more bytes on the native stack. So why not to put the GC arenas into a hashtable for now?
Gregor did some benchmarks on this. The stack is usually reasonably shallow. If once in a blue moon we have to scan 100k, we will live. The tricky is to encourage GCs with MaybeGC when we know the stack is shallow.
Deep stacks was are not that rare. An event handler called during DOM building that triggers a load of a complex script may result in a heavy recursion in JS parser together with a burst of atom allocation and a last-ditch GC with over 10K of data on the stack. Using a hashtable (or a bitmap on 32-bit platforms) would address this for now.
Yeah, sure, #8 wasn't meant to say we don't want a hash map. I am just saying deep stacks are the exception. I see a last-ditch GC as a failure to GC in time. Its basically a separate bug (that exists and should be fixed).
Attached image stackslots js_GC enter. (deleted) —
I was running some experiments in the browser. Opening a lot (20+) of tabs with popular websites for about 10 minutes and running chrome experiments for about 1 hour... I never saw more than 10000 stack slots. Y-axis: stack slots X-Axis: time
#6: you think I should snapshot the stack water level inside a function that can't be inlined?
(In reply to comment #12) > #6: you think I should snapshot the stack water level inside a function that > can't be inlined? Yeah thats how I (and our webkit friends) ended up implementing it. This shouldn't matter performance-wise.
Attached patch patch (obsolete) (deleted) — Splinter Review
Attachment #404009 - Attachment is obsolete: true
Attachment #404009 - Flags: review?(igor)
Attached patch patch (obsolete) (deleted) — Splinter Review
Attachment #407635 - Attachment is obsolete: true
Attachment #407642 - Flags: review?(igor)
Attachment #407642 - Flags: review?(anygregor)
There is still WINCE missing for the currentThreadStackBase function. I found a solution but I am still waiting for the tryserver access to test it.
Attached patch patch (obsolete) (deleted) — Splinter Review
Attachment #407642 - Attachment is obsolete: true
Attachment #407642 - Flags: review?(igor)
Attachment #407642 - Flags: review?(anygregor)
Attached patch patch (obsolete) (deleted) — Splinter Review
Attachment #407672 - Attachment is obsolete: true
Attachment #407673 - Flags: review?(igor)
gregor, can you post the wince code? I can tryserver it
Blocks: 523765
Attached patch added wince getStackBase (obsolete) (deleted) — Splinter Review
try if this works.
patch crashes this test case: Makes this crash with -j: for (let j = 0; j < 3; ++j) try { for(let y=0;y<3;++y) { } with(null) { } } catch(e) { } for each (let z in [false, false, false]) { } for (let m = 0; m < 4; ++m) { gc(); }
Marking unallocated objects with a null map isn't working as expected for two reasons. First, ->map is used bu JSGCThing for the link pointer in the freelist, so it gets overwritten with non-NULL junk, and also in debug mode we fill the GCThing with 0xda. I will have to come up with a better way to tell allocated from unallocated objects. Strings should be fine since mBase is the 2nd word. Same for JSXML where the indicator is not in the first word either. Still, this approach isn't super pretty. Igor, any ideas?
(In reply to comment #23) > Marking unallocated objects with a null map isn't working as expected for two > reasons. First, ->map is used bu JSGCThing for the link pointer in the > freelist, so it gets overwritten with non-NULL junk, and also in debug mode we > fill the GCThing with 0xda. I will have to come up with a better way to tell > allocated from unallocated objects. Strings should be fine since mBase is the > 2nd word. Same for JSXML where the indicator is not in the first word either. > Still, this approach isn't super pretty. Such nulling implies that we will have an extra check in the finalizer to distinguish dead form non-finalized things. This is effectively an extra check per object allocation. So I think we should invent a better way for fast checking of thing presence on the free list. For example, for new arenas the check is trivial - the code can check if a pointer is less than the free list head. If not, then it is on the free list. So the real issue is how to deal with non-new arenas. But then perhaps we can just scan those free lists and cache the result using a bitmap? An arena that survived the GC probably means that it contains long-terms GC things that stay alive few more GC cycles. As such these things will be scanned during the GC so an extra single walk over the free list should not affect the performance in a significant way.
Here is one way to check for GC thing presence on the free list that uses in the essential way the fact that the free list are strictly in ascending order. Suppose a conservative scanner finds a thing that points to the arena. If thing's address is less than the list head, then we know that the thing was allocated either recently from the free list or before previous GC cycle. So we can just mark the thing. If the address is greater or equal than the free list head, then the thing cannot be allocated after the last GC. But then we can just check if the last GC has marked the thing. If not, then it should be on the free list. This schema requires that we will bring back something like GCF_FINAL. The GC will set the bit for things on the free list but it will be checked only by the conservative scanner for things with addresses that at least as great as the free list head.
(In reply to comment #24) > Such nulling implies that we will have an extra check in the finalizer to > distinguish dead form non-finalized things. This is not true - the finalizer will work as before explicitly checking for things on the free list and avoiding running the finalizer on them. The only performance hit with this schemma would be an extra write into a hot cache per finalizer to set an unique tag that would indicate finalized thing. That tag even not necessary should be null - any unique combination of bits that cannot happen in live GC things will work.
Attached patch patch (obsolete) (deleted) — Splinter Review
Properly excluded dead objects. Enable GCMETER stuff in DEBUG builds but only dump it if JS_GCMETER is set. It was completely stale and this seems the only way to avoid that.
Attachment #407673 - Attachment is obsolete: true
Attachment #407673 - Flags: review?(igor)
This passes the testcase above. Here are some statistics: CONSERVATIVE STACK SCANNING: number of words scanned: 14097 excluded, low bit set: 9744 not withing arena range: 3936 excluded, wrong tag: 230 excluded, not live: 3 things marked: 184 raw pointers marked: 159 Only around 200 pointers make it past the initial filter (low bit, arena range), so this should be really fast.
Attachment #407886 - Flags: review?(igor)
Attached patch again with WinCE support (obsolete) (deleted) — Splinter Review
Attachment #407689 - Attachment is obsolete: true
Attachment #407943 - Flags: review?(igor)
Comment on attachment 407943 [details] [diff] [review] again with WinCE support >+void* g_stackBase = 0; >+ If you initialize 2 runtimes on 2 different threads this is not going to work.
Attachment #407943 - Flags: review?(igor) → review-
Attached patch patch (obsolete) (deleted) — Splinter Review
Attachment #407943 - Attachment is obsolete: true
Blocks: 524051
Comment on attachment 407944 [details] [diff] [review] patch >diff -r 34be4f52a1df js/src/jscntxt.h >@@ -317,27 +317,35 @@ struct JSThreadData { ... >+ /* >+ * Begin and top of the native stack of this thread. >+ */ >+ jsuword *stackBase; >+ jsuword *stackTop; I do not see code that initializes stackTop for other threads. GetStackTop is only called for the current thread in js_GC, not for any other thread that may wait for the GC. > #ifdef JS_THREADSAFE > > /* > * Structure uniquely representing a thread. It holds thread-private data > * that can be accessed without a global lock. >@@ -462,16 +470,18 @@ struct JSRuntime { > uint32 gcEmptyArenaPoolLifespan; > uint32 gcLevel; > uint32 gcNumber; > JSTracer *gcMarkingTracer; > uint32 gcTriggerFactor; > size_t gcTriggerBytes; > volatile JSBool gcIsNeeded; > volatile JSBool gcFlushCodeCaches; >+ jsuword gcMinArenaAddress; >+ jsuword gcMaxArenaAddress; I do not think that we can get away from just scanning the arena list. If using a hashtable to hash GC chunks would be too complex for now (but we need a followup bug to address this), the code should at least check against GC chunks, not arenas. With 64K chunks and typical gcMaxBytes on the scale of 64M we would have at worst 1K of chunks. This is not that nice either but at least this would bound the worst case scenario to scanning of 512/4 or 128K words (we restrict the stack to 512 KB) on 1000 list entries or about 150e6 operations. >diff -r 34be4f52a1df js/src/jsgc.cpp > >+static bool >+IsGCThing(JSRuntime *rt, jsuword thing, JSFinalizeGCThingKind *kindp = NULL, JSGCArenaInfo **arenap = NULL) >+{ >+ JS_ASSERT(FINALIZE_OBJECT == 0); >+ >+ for (unsigned i = 0; i != FINALIZE_LIMIT; ++i) { >+ JSGCArenaList *arenaList = &rt->gcArenaList[i]; >+ size_t thingSize = arenaList->thingSize; >+ size_t limit = THINGS_PER_ARENA(thingSize) * thingSize; >+ for (JSGCArenaInfo *a = arenaList->head; a; a = a->prev) { >+ jsuword start = ARENA_INFO_TO_START(a); >+ jsuword offset = thing - start; >+ if (offset < limit) { >+ if (kindp) >+ *kindp = JSFinalizeGCThingKind(i); >+ if (arenap) >+ *arenap = a; >+ return !(offset % thingSize); The extra check that offset devides thingSize should be removed. What if compiler would store &JSObject.fslots[2] and not JSObject on the stack for optimal performance because the code just accesses that slot after a function call that can trigger GC? >+static void >+js_MarkConservatively(JSTracer *trc, jsuword w) >+{ ... >+ JSFinalizeGCThingKind kind; >+ JSGCArenaInfo *arena; >+ if ((tag != JSVAL_DOUBLE) && IsGCThing(rt, p, &kind, &arena)) { >+ JS_ASSERT(tag == JSVAL_OBJECT || tag == JSVAL_STRING); >+ if ((tag == JSVAL_OBJECT) != ((kind == FINALIZE_OBJECT) || (kind == FINALIZE_XML))) >+ RETURN(wrongtag); As above, the tag matching is too optimistic. If the compiler stores JSObject.classword address on the stack, that would effectively mean a pointer with a tag of 4 on 32 bit CPU or JSVAL_STRING. Due to such optimizations I suggest to be truly conservative here and ignore any tag mismatch. >+ /* Make sure the thing is not on the freelist of the arena. */ >+ JSGCThing *thing = (JSGCThing *)p; >+ JSGCThing *cursor = arena->finalizable.freeList; >+ while (cursor) { >+ /* If the cursor moves past the thing, its not in the freelist. */ >+ if (thing < cursor) >+ break; >+ /* If we find it on the freelist, its dead. */ >+ if (thing == cursor) >+ RETURN(notlive); >+ cursor = cursor->link; This does not implements a special optimization to find dead things avoiding the free list scan.
Attachment #407944 - Flags: review-
Comment on attachment 407886 [details] [diff] [review] patch Is this patch an obsolete one? If not, please request for a review again.
Attachment #407886 - Flags: review?(igor)
Comment on attachment 407944 [details] [diff] [review] patch >+void* CurrentThreadStackBase() Nit: this function finally addresses the bug 237006. Thus I suggest to make it if not a part of public API then at least a friend API so the code that calls JS_SetThreadStackLimit can use it as is.
(In reply to comment #32) > >+ jsuword *stackTop; > > I do not see code that initializes stackTop for other threads. GetStackTop is > only called for the current thread in js_GC, not for any other thread that may > wait for the GC. The place to initialize stackTop would be before JS_AWAIT_GC_DONE. Currently it is called in 3 places where js_WaitForGC is the most hard case since it is also called when JSThread is not yet created. But in that case there should be no unrooted jsvals on the stack.
#35: JS_AWAIT_GC_DONE is not enough IMO. I think we would have to flush the register state with a setjmp and snapshot the top of the stack in ever JS_LOCK_GC, which might be too expensive. Instead we can use the OS to get the thread's registers and SP but thats quite tricky. I will discuss this with gregor and hand this over to him.
Gregor: - I think Igor is right. We should be more conservative and don't check for the right tag. I do think the low-bit-must-not-bit-set check is ok though. - I don't see a need for a better algorithm to exclude dead objects just yet. The numbers above show we only deal with very few objects that get to that point (200-ish). I am worried about average case performance, not worst case. We have to measure again once we are more conservative. If more pointers get through, we might have to speed up this test. But lets measure that. - The StackTop issue has to be fixed. Thats going to be more gross OS-dependent stuff.
(In reply to comment #36) > #35: JS_AWAIT_GC_DONE is not enough IMO. The GC marking and finalization is running outside JS_LOCK_GC to prevent nesting of any locks code may need to take during marking and finalization inside the GC lock. As such any other thread which stack may take part in the GC marking must be inside JS_AWAIT_GC_DONE.
(In reply to comment #37) > I do think the low-bit-must-not-bit-set check is ok though. IIRC the conservative scanner in ActionScript MMGC always clear the tag bits without. I do not know whether it was done based on the simplicity of code or of fears that smarter filter may miss a live GC thing. In general I think the answer depends on how much paranoia one has about compilers. For example, what if a compiler decides to optimize jsval & ~7 as (jsval >> 3) << 3 and store jsval >> 3, not jsval, on the stack across a function call for whatever misguided optimization reasons? With a conservative scanner we should just assume that this will never happen. Still I wish we would have working process isolation and low privilege like in Chrome at this point so we could be more flexible about tolerating such compiler bugs.
I know of no such compiler code generation. Let's focus on what we know and get this going for 3.7 -- we'll have electrolysis for process isolation before too long (no point coupling these yet to make this depend on electrolysis, IMO). /be
only works for OS X in multithreaded environment.
Attached patch updated version (obsolete) (deleted) — Splinter Review
stable version for mac.
Attachment #408291 - Attachment is obsolete: true
Comment on attachment 408441 [details] [diff] [review] updated version The code needs to be spidermonkeified. GetStackBase and jsuword instead of DWORD i.e.
(In reply to comment #41) > Created an attachment (id=408291) [details] > work in progress. get registers of other threads > > only works for OS X in multithreaded environment. I do not understand why any extra platform-specific code is necessary. A thread that waits for the GC to finish on another thread calls JS_AWAIT_GC_DONE. So any such thread can get its stack top with registers saved in the same way that is done for js_GC just before it calls JS_AWAIT_GC_DONE.
Gregor is exploring the platform-dependent code. I will hack up the JS_AWAIT_GC_DONE approach.
Igor's right, Andreas and I talked last night. Not sure we need too much platform code if setjmp has sufficiently low overhead. Could be we do want a tuned get_sp primitive. It would be great to avoid ifdef hell. /be
Depends on: 524841
Attached patch patch (obsolete) (deleted) — Splinter Review
Patch with proper top-of-stack capturing as threads wait for the GC to finish.
Attachment #407886 - Attachment is obsolete: true
Attachment #407944 - Attachment is obsolete: true
Attachment #408441 - Attachment is obsolete: true
> I do not see code that initializes stackTop for other threads. GetStackTop is > only called for the current thread in js_GC, not for any other thread that may > wait for the GC. Thanks, you were right. The attached patch should fix this. > >+ jsuword gcMinArenaAddress; > >+ jsuword gcMaxArenaAddress; > > I do not think that we can get away from just scanning the arena list. If using > a hashtable to hash GC chunks would be too complex for now (but we need a > followup bug to address this), the code should at least check against GC > chunks, not arenas. With 64K chunks and typical gcMaxBytes on the scale of 64M > we would have at worst 1K of chunks. This is not that nice either but at least > this would bound the worst case scenario to scanning of 512/4 or 128K words (we > restrict the stack to 512 KB) on 1000 list entries or about 150e6 operations. Gregor attached data showing the actual amount of work being done. I attached some numbers showing how many pointers end up requiring arena classification. We have a tendency to heavily optimize non-problems with complex code, introduce bugs, and then spend a lot of time fixing those bugs, creating a lot of code that runs fast for the case that doesn't happen and slow when it really matters. I would like to avoid that. If we see MarkConservatively pop up in profiles, we can optimize it further in follow-up patches. For now this seems to work pretty well from what I can measure and shark. > The extra check that offset devides thingSize should be removed. What if > compiler would store &JSObject.fslots[2] and not JSObject on the stack for > optimal performance because the code just accesses that slot after a function > call that can trigger GC? Right. I will remove that. > As above, the tag matching is too optimistic. If the compiler stores > JSObject.classword address on the stack, that would effectively mean a pointer > with a tag of 4 on 32 bit CPU or JSVAL_STRING. Due to such optimizations I > suggest to be truly conservative here and ignore any tag mismatch. I think we can check for doubles since they use a 2-bit tag and we don't do unaligned accesses. I will remove the string/object distinction. > > This does not implements a special optimization to find dead things avoiding > the free list scan. Yes, it doesn't. See rationale above. We run this code for 13 pointers or so per GC, which basically translates to no measurable time spent in the code.
Attached patch patch (obsolete) (deleted) — Splinter Review
Consider all pointers inside a GCThing as valid references.
Attachment #408960 - Attachment is obsolete: true
Attached patch patch (obsolete) (deleted) — Splinter Review
Whitespace fix.
Attachment #408966 - Attachment is obsolete: true
I talked today to someone from Adobe's Flash group and they have observed problems like igor pointed out. The compiler keeps a reference to &foo->bar[2] and such. We should have that case covered now. MMgc only considers pointers into the object header (base and base + 8). We consider any pointer into the object now, so we should be safer. Adobe sometimes had to make some local pointers volatile to work around this issue and force the base address to be present on the stack. I don't think we will ever need that.
CONSERVATIVE STACK SCANNING: number of words scanned: 1446536 excluded, low bit set: 359606 not withing arena range: 932339 excluded, wrong tag: 53363 excluded, not live: 2 things marked: 101226 raw pointers marked: 80884 Stats for a trace-test run. We end up marking about 7% of pointers now, due to the more relaxed classification rules. These are all last-ditch GCs, which is basically the worst case for conservative scanning. Filing a dependent bug to properly do preventive GCs from the event loop, which has a shallow stack and should not have much or any roots on the stack.
Attached patch patch (obsolete) (deleted) — Splinter Review
Fix thread-safe builds.
Attachment #408968 - Attachment is obsolete: true
Depends on: 525117
Blocks: 525121
No longer depends on: 525117
(In reply to comment #53) > Created an attachment (id=408976) [details] > patch > > Fix thread-safe builds. try { Function("function(){}") } catch(e) {} for each(let a in [0, 0, 0, 0]) { gc() } crashes js opt builds with -j when passed in as a CLI argument. I tested this doesn't occur without the patch. Exception Type: EXC_BAD_ACCESS (SIGBUS) Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000000000000 Crashed Thread: 0 Thread 0 Crashed: 0 ??? 0000000000 0 + 0 1 js-opt-tm-darwin 0x0004bffc js_MarkConservatively + 380 2 js-opt-tm-darwin 0x0001fbd0 js_TraceThreads(JSRuntime*, JSTracer*) + 64 3 js-opt-tm-darwin 0x0004c8a6 js_TraceRuntime + 182 4 js-opt-tm-darwin 0x0004ce16 js_GC + 1062 5 js-opt-tm-darwin 0x0000fa38 JS_GC + 72 6 js-opt-tm-darwin 0x00005db2 __ZL2GCP9JSContextjPl + 50 7 ??? 0x007e2f2b 0 + 8269611 8 js-opt-tm-darwin 0x000fdb89 __ZL11ExecuteTreeP9JSContextPN7nanojit8FragmentERjPP10VMSideExit + 825 9 js-opt-tm-darwin 0x0011cd9b js_MonitorLoopEdge(JSContext*, unsigned int&, RecordReason) + 1531 10 js-opt-tm-darwin 0x0005c8e2 js_Interpret + 54738 11 js-opt-tm-darwin 0x0005f20c js_Execute + 444 12 js-opt-tm-darwin 0x0000dc6c JS_ExecuteScript + 60 13 js-opt-tm-darwin 0x000048c5 __ZL7ProcessP9JSContextP8JSObjectPci + 1605 14 js-opt-tm-darwin 0x000088b4 main + 2212 15 js-opt-tm-darwin 0x0000274b _start + 209 16 js-opt-tm-darwin 0x00002679 start + 41 I also hit this assertion when testing this patch but wasn't able to get a testcase: Assertion failure: !*flagp, at ../jsgc.cpp:2982
Cool. Thanks Gary.
(In reply to comment #48) > > >+ jsuword gcMinArenaAddress; > > >+ jsuword gcMaxArenaAddress; > > > > I do not think that we can get away from just scanning the arena list. If using > > a hashtable to hash GC chunks would be too complex for now (but we need a > > followup bug to address this), the code should at least check against GC > > chunks, not arenas. With 64K chunks and typical gcMaxBytes on the scale of 64M > > we would have at worst 1K of chunks. This is not that nice either but at least > > this would bound the worst case scenario to scanning of 512/4 or 128K words (we > > restrict the stack to 512 KB) on 1000 list entries or about 150e6 operations. > > Gregor attached data showing the actual amount of work being done. Much appreciated! Don't want premature optimization here, even though I defend us below against that charge in the main, in the past. > I attached > some numbers showing how many pointers end up requiring arena classification. > We have a tendency to heavily optimize non-problems with complex code, > introduce bugs, and then spend a lot of time fixing those bugs, creating a lot > of code that runs fast for the case that doesn't happen and slow when it really > matters. This sounds like a serious charge (premature optimization) but it's generally not true. SpiderMonkey has been optimized only on demand until around Firefox 3 era, based on bug reports about O(n^2) growth being evident on the web, etc. Maybe our on-demand optimizations suck, but they weren't premature. Of course we had the usual "Jenga" game of balancing pieces on one another in a big pile. Digging out from under the jumble when the game collapses is hard. We need to dig harder (I'm working on single-thread objects and scopes/sprops). I agree we should find the best path forward from any given point, with maximum safe leverage, so I'm all in favor of conservative GC stack scanning. I wanted to defend against the premature optimization charge seemingly leveled at jsgc. We tried switching to MMgc (sorry, jorendorff!) and it was way slower than SpiderMonkey's GC, esp. at double allocation. That fast path for doubles was not "premature optimization" and whatever bugs it had were low (not par) for the course, IMHO -- thanks to Igor. The rooting API messes are a different story and my fault, and we should ditch those ASAP (the wiki says #ifdef JS_NAKED_GC_ROOTS for the embeddings to use as a crutch, but we could just break everyone; let's decide on this after a fresh newsgroup thread to alert embedders, and then update the wiki). /be
The comment aims narrowly at our habit to optimize aggressively issues we perceive to be a problem, instead of measuring aggressively what the problems really are. And sometimes I am just as guilty of that than the next guy. We need a change in attitude how we go about reviewing and designing patches. Next time someone nits a patch of mine claiming that by manually strength reducing an expression at the source level we would be faster, I will want to see a profile. "Show me you measured it, and I believe you." should be our new mantra. Architecture is hard. Very hard. Shark surprises me every day. I looked at the loop igor was worried about, and the modulo was the most expensive part (hence the optimization of that in the latest patch), not the pointer chasing. We have a bad habit of looking for performance problems by looking at source code. That doesn't seem to work well for modern processors. We have to let go of that and start looking at profiles instead.
Instead of optimizing the live/dead classification, we can probably easily optimize for repeated classification of values we already decided over. This is the population count for a randomly chosen stack scan: 120 5e8f000 54 5ec12e0 24 5e8f0e0 20 5e8f060 18 5e8d930 14 5e8f100 11 5e8f080 8 5ebe02c 8 5e916a8 8 5e9156c 8 5e9123c 7 5ebe020 7 5ebe010 7 5e8f040 6 5ec1300 6 5ebe028 6 5ebe014 6 5e91694 6 5e9168c 6 5e9106c 6 5e8f160 6 5e8f0a0 6 5e8eb98 6 5e8d888 5 5ebe004 4 5ec0000 4 5ebe000 4 5e91e18 3 5e91e10 2 5ec0160 2 5ebe018 2 5ebe008 2 5e918e4 2 5e8e230 2 5e852e2 1 5ec01a0 1 5ec0040 1 5ec0020 1 5ebffec 1 5ebf004 1 5ebe01c 1 5e91e1c 1 5e90002 We seem to find the same values on the stack over and over. We could maintain a hashtable of values we already classified and grow the able as we scan the stack. If we run out of memory to grow the table further, we just stop adding values and slow down a bit. After the stack scan the hashtable is deallocated. If we don't want to allocate during that phase we can probably also just look out for the last N values classified since we seem to hit identical values in fairly close proximity: (again, totally random and not necessarily representative sample:) xxx 5e8f000 xxx 5e8f040 xxx 5e8f060 xxx 5ebe008 xxx 5e916a8 xxx 5ebe020 xxx 5ebe020 xxx 5ebe020 xxx 5ebe020 xxx 5ebe020 xxx 5ebe020 xxx 5ebe018 xxx 5e852e2 xxx 5ebe014
Re: comment 57 -- agree 100% that superscalar modern CPUs + better compilers (even GCC) confound source-only reading. Reading doesn't really help much for a lot of things (another example: for readability we often break, continue, or return early on error or abnormal case but depending on size of basic blocks and lack of PGO or just good profiling data, this can deoptimize prefetching badly). Re: comment 58 -- good idea on the hash table. Would it be purged by address when finalizing? Anything in the literature on this idea? /be
58: I would just purge it right after stack-scanning and make this super short-lived.
I printed the stack at the beginning of a gc in bug 508401. https://bug508401.bugzilla.mozilla.org/attachment.cgi?id=392824 It seems for Mac that the first 150-200 slots are always 0. It might be safe to skip these entries. I also remember from my measurements that the stack slots become very stable towards the bottom entries.
Attached patch windows fix (obsolete) (deleted) — Splinter Review
add WinCE support and fix windows compilation. I have no idea how to get the stackBase for Maemo. I guess we need some help from mobile for this.
(In reply to comment #54) > (In reply to comment #53) > > Created an attachment (id=408976) [details] [details] > > patch > > > > Fix thread-safe builds. > > try { > Function("function(){}") > } catch(e) {} > for each(let a in [0, 0, 0, 0]) { > gc() > } > > crashes js opt builds with -j when passed in as a CLI argument. I tested this > doesn't occur without the patch. > > Exception Type: EXC_BAD_ACCESS (SIGBUS) > Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000000000000 > Crashed Thread: 0 Something very strange happens here. It crashes in JS_PUBLIC_API(void) JS_TraceChildren(JSTracer *trc, void *thing, uint32 kind) at this line: obj->map->ops->trace(trc, obj); But the bug disappears when I insert a printf to get the address of the object right before this call. When it crashes, the address of ->trace is 0x0.
(In reply to comment #48) > Yes, it doesn't. See rationale above. We run this code for 13 pointers or so > per GC, which basically translates to no measurable time spent in the code. I am not convinced that no extra work is necessary regarding the performance of the scanner. The current GC contains code that makes sure that the GC can run without recursion and overusing the native stack. The original reports IIRC came not from crashing web sites but rather from web developers that reported that simple mistakes made the browser unstable and hard to use for debugging the problems. The idea is to make sure that Firefox can run bad and buggy code without crashing or long freezes so one can use it for debugging and fixing the code. For this reason I do not think that testing against web sites that contain working polished code is enough. Still I agree that optimizing the scanner can wait especially since it is known how to fix this.
I take that back. My version crashes too, but only in OPT. This is with OPT -g3: (gdb) bt #0 0x00000000 in ?? () #1 0x0004b2bd in JS_TraceChildren [inlined] () at /Users/gal/workspace/tracemonkey-repository/js/src/jsgc.cpp:2065 #2 0x0004b2bd in JS_CallTracer (trc=0xbfff6514, thing=0x5d8d658, kind=0) at ../jsgc.cpp:2325 #3 0x0004b92c in js_MarkConservatively (trc=0xbfff6514, begin=0xbfffe7d8, end=0xc0000000) at ../jsgc.cpp:1037 #4 0x0001faa0 in js_TraceThreads (rt=0x6027200, trc=0xbfff6514) at jscntxt.h:343 #5 0x0004c1c6 in js_TraceRuntime (trc=0xbfff6514, allAtoms=0) at ../jsgc.cpp:2743 #6 0x0004c736 in js_GC (cx=0x5c0cce0, gckind=GC_NORMAL) at ../jsgc.cpp:3319 #7 0x0000fa28 in JS_GC (cx=0x5c0cce0) at ../jsapi.cpp:2445 #8 0x00005d62 in GC (cx=0x5c0cce0, argc=0, vp=0xbfff66c8) at ../../shell/js.cpp:1120 #9 0x05da1f2b in ?? () #10 0x000fbf1d in ExecuteTree (cx=0x5c0cce0, f=0x60040d4, inlineCallCount=@0xbffff60c, innermostNestedGuardp=0xbffff294) at ../jstracer.cpp:6479 #11 0x00118eaa in js_MonitorLoopEdge (cx=0x5c0cce0, inlineCallCount=@0xbffff60c, reason=Record_Branch) at ../jstracer.cpp:6966 #12 0x0005c142 in js_Interpret (cx=0x5c0cce0) at jsops.cpp:360 #13 0x0005eaac in js_Execute (cx=0x5c0cce0, chain=0x5d8f000, script=0x5c0e850, down=0x0, flags=0, result=0x0) at jsinterp.cpp:1616 #14 0x0000dc1c in JS_ExecuteScript (cx=0x5c0cce0, obj=0x5d8f000, script=0x5c0e850, rval=0x0) at ../jsapi.cpp:4965 #15 0x00004895 in Process (cx=0x5c0cce0, obj=0x5d8f000, filename=0xbffff9dc "x.js", forceTTY=0) at ../../shell/js.cpp:438 #16 0x00008854 in main (argc=2, argv=0xbffff8e4, envp=0xbffff8f0) at ../../shell/js.cpp:847 (gdb)
Attached patch patch (obsolete) (deleted) — Splinter Review
Add a hash set that keeps track of words we already classified. This basically completely eliminates MarkConservatively() from the GC profile, which should make further optimization of the miss case (when the hash set doesn't bypass the pointer classification) unnecessary.
Attachment #409026 - Attachment is obsolete: true
Attachment #409208 - Attachment is obsolete: true
Attached patch silence gcc warnings, include wince fixes (obsolete) (deleted) — Splinter Review
Attachment #409253 - Attachment is obsolete: true
Attached patch patch (obsolete) (deleted) — Splinter Review
Removed some obsolete code.
Attachment #409255 - Attachment is obsolete: true
Attached patch patch (obsolete) (deleted) — Splinter Review
Removed dead code, for real.
Attachment #409257 - Attachment is obsolete: true
Attached patch patch (obsolete) (deleted) — Splinter Review
Fix use of jsdhash. The hashset was using jsdhash wrong. The hit rate is now lower than before so we might still have to do some extra optimization on top of this patch. I will re-shark. This is for mandelbrot: CONSERVATIVE STACK SCANNING: number of cache hits: 9225 number of words scanned: 8059 So we avoid 9225 reclassifications but still have to classify 8059. excluded, low bit set: 3238 Out of those 8059, 3238 have the low bit set. We exclude them. The compiler is super unlikely to make that kind of stupid optimizations. We scan the arenas for about 4700 words. not withing arena range: 3509 Most are not within the arenas. excluded, wrong tag: 1023 We assume double tagging to be stable so something must be a raw type or tagged as double if its a double. Compilers will not randomly set the 2-lsb, so this is safe (note that 3-lsb checking is not safe, so we don't distinguish strings and objects at this level, since &classword would possibly lead to such optimizations). excluded, not live: 0 Nothing excluded because it was on the free list. things marked: 289 We ended up marking 289 things so thats basically the number of items we had to check the freelist against. raw pointers marked: 191
Attachment #409258 - Attachment is obsolete: true
Attached patch patch (obsolete) (deleted) — Splinter Review
Attachment #409260 - Attachment is obsolete: true
I made a new version of the patch that uses a hash map to find the arena associated with a pointer. CONSERVATIVE STACK SCANNING: number of cache hits: 9203 number of words scanned: 8081 excluded, low bit set: 3241 not withing arena range: 3476 not withing an arena: 474 excluded, wrong tag: 600 excluded, not live: 0 things marked: 290 raw pointers marked: 192 I will measure whether this is any faster.
Ok, this nailed it I think. Of a full GC with actual arenas to walk through, this reduces the contribution of conservative marking to total GC time 0.7% or less. The worst numbers I have seen are for short GCs with barely anything to mark (i.e. shutdown GC), in which case contribution goes up to 10% or so. Thats probably acceptable.
Attached patch patch (obsolete) (deleted) — Splinter Review
Attachment #409261 - Attachment is obsolete: true
Gary, could you give this a bit more fuzzer love? Thanks.
Attached patch patch (obsolete) (deleted) — Splinter Review
Restore an assert that was accidentally lost.
Attachment #409274 - Attachment is obsolete: true
Attached patch restoring an assert that was accidentally lost (obsolete) (deleted) — Splinter Review
Attachment #409276 - Attachment is obsolete: true
This should be ready to review, minus any fuzzer bugs gary might turn up.
You could add a fast path for the value 0 since there are about 150 - 200 zeros at the bottom of the stack.
The min/max check eliminates 0.
I get an assertion in the windows browser: ASSERTION: Non-global object has the wrong flags: '!(jsclazz->flags & JSCLASS_IS_GLOBAL)', file .... spconnect/src/xpcwrappednative.cpp, line 1114
(In reply to comment #79) > This should be ready to review, minus any fuzzer bugs gary might turn up. This seems to pass preliminary fuzzing (at least for ~ 5-10 minutes).
Comment on attachment 409277 [details] [diff] [review] restoring an assert that was accidentally lost >diff --git a/js/src/jsapi.cpp b/js/src/jsapi.cpp >--- a/js/src/jsapi.cpp >+++ b/js/src/jsapi.cpp >@@ -932,16 +932,19 @@ JS_SetRuntimePrivate(JSRuntime *rt, void > > JS_PUBLIC_API(void) > JS_BeginRequest(JSContext *cx) > { > #ifdef JS_THREADSAFE > JS_ASSERT(CURRENT_THREAD_IS_ME(cx->thread)); > if (!cx->requestDepth) { > JSRuntime *rt = cx->runtime; >+ >+ GC_SAFE_POINT(JS_THREAD_DATA(cx)); >+ > JS_LOCK_GC(rt); > > /* Wait until the GC is finished. */ > if (rt->gcThread != cx->thread) { > while (rt->gcLevel > 0) GC_SAFE_POINT should be moved right before the while loop so it will called only when the thread is really waits for GC. Also, can we avoid macro and just declare the variable with a constructor doing all the necessary job. To prevent usage like JSGCSafePoint(); written instead of JSGCSafePoint pnt(); we can use JSGuardObjectNotifier and friends from jsutil.h. Also the destructor for the variable can set stackTop to NULL indicating that the thread is no longer serialized with the GC. >diff --git a/js/src/jscntxt.cpp b/js/src/jscntxt.cpp >--- a/js/src/jscntxt.cpp >+++ b/js/src/jscntxt.cpp >@@ -66,31 +66,149 @@ > #include "jspubtd.h" > #include "jsscan.h" > #include "jsscope.h" > #include "jsscript.h" > #include "jsstaticcheck.h" > #include "jsstr.h" > #include "jstracer.h" > >-static void >-FreeContext(JSContext *cx); >+#if defined(XP_UNIX) >+#if defined(SOLARIS) >+#include <thread.h> >+#else >+#include <pthread.h> >+#endif >+#endif Nit: use #ifdef and add blanks between # and if to indicate indentation level: >+#ifdef XP_UNIX >+# ifdef SOLARIS >+# include <thread.h> >+# else >+# include <pthread.h> >+# endif >+#endif >+ >+#if defined(_WIN32) >+#include <windows.h> >+#endif Nit: add a blank between # and include as above. >+#if defined(WINCE) Nit: use #ifdef WINCE >+inline bool >+isPageWritable(void *page) >+{ >+ MEMORY_BASIC_INFORMATION memoryInformation; >+ jsuword result = VirtualQuery(page, &memoryInformation, sizeof(memoryInformation)); >+ >+ // return false on error, including ptr outside memory >+ if (result != sizeof(memoryInformation)) >+ return false; >+ >+ jsuword protect = memoryInformation.Protect & ~(PAGE_GUARD | PAGE_NOCACHE); >+ return protect == PAGE_READWRITE >+ || protect == PAGE_WRITECOPY >+ || protect == PAGE_EXECUTE_READWRITE >+ || protect == PAGE_EXECUTE_WRITECOPY; Nit: put || at the end of line, not at the start and make sure that protect is aligned in the same column. >+} >+ >+static void * >+getStackBase(void *previousFrame) >+{ >+ // find the address of this stack frame by taking the address of a local variable >+ bool isGrowingDownward; >+ void *thisFrame = (void *)(&isGrowingDownward); >+ >+ isGrowingDownward = previousFrame < &thisFrame; We do not need the runtime stack growth detection as JS_STACK_GROWTH_DIRECTION compile-time constant from jscpucfg.h provides the answer. >+ static jsuword pageSize = 0; >+ if (!pageSize) { >+ SYSTEM_INFO systemInfo; >+ GetSystemInfo(&systemInfo); >+ pageSize = systemInfo.dwPageSize; >+ } >+ >+ // scan all of memory starting from this frame, and return the last writeable page found ... >+ // guaranteed to complete because isPageWritable returns false at end of memory Nit: use /* */ comments here and everywhere. > > static void > InitThreadData(JSThreadData *data) > { > #ifdef DEBUG > /* The data must be already zeroed. */ > for (size_t i = 0; i != sizeof(*data); ++i) > JS_ASSERT(reinterpret_cast<uint8*>(data)[i] == 0); > #endif > #ifdef JS_TRACER > js_InitJIT(&data->traceMonitor); > #endif > js_InitRandom(data); >+ data->stackBase = (jsuword *)CurrentThreadStackBase(); Nit: blank after the cast () operator. >+static void >+FreeContext(JSContext *cx); Why is this pre-declarais here? > void > js_WaitForGC(JSRuntime *rt) > { > JS_ASSERT_IF(rt->gcRunning, rt->gcLevel > 0); > if (rt->gcRunning && rt->gcThread->id != js_CurrentThreadId()) { >+ GC_SAFE_POINT(js_CurrentThreadData(rt)); This is wrong as js_WaitForGC is called from js_CurrentThread when the JSThread for the current thread is not yet initialized. >@@ -933,16 +1055,18 @@ js_DiscountRequestsForGC(JSContext *cx) > JS_NOTIFY_REQUEST_DONE(rt); > } > return requestDebit; > } > > void > js_RecountRequestsAfterGC(JSRuntime *rt, uint32 requestDebit) > { >+ GC_SAFE_POINT(js_CurrentThreadData(rt)); >+ > while (rt->gcLevel > 0) { To avoid useless GC_SAFE_POINT write this as if (rt->gcLevel > 0) { GC_SAFE_POINT(js_CurrentThreadData(rt)); do {.. } while } >@@ -679,16 +679,28 @@ NewGCArena(JSContext *cx) >+ /* >+ * Remember the lowest and highest address in use for the heap. This >+ * is used to quickly disqualify pointers during stack walking. >+ */ >+ jsuword start = (jsuword)ARENA_INFO_TO_START(a); Nit: blank after (jsuword). >+template <typename Key, typename Value> >+class HashMap >+{ >+ struct Entry : public JSDHashEntryHdr >+ { >+ Key key; >+ Value value; >+ }; >+ >+ static JSBool >+ Match(JSDHashTable *table, const JSDHashEntryHdr *entry, const void *key) { >+ return ((const Entry*)entry)->key == *(Key *)key; >+ } >+ >+ static JSDHashNumber >+ Hash(JSDHashTable *table, const void *key) >+ { >+ return JSDHashNumber(*(Key *)key); >+ } >+ >+ static const JSDHashTableOps DHashOps; >+ >+ JSDHashTable *table; >+ >+ public: >+ HashMap<Key, Value>() { >+ init(); >+ } >+ >+ ~HashMap<Key, Value>() { >+ clear(); >+ } >+ >+ bool empty() { >+ return !table || !table->entryCount; >+ } >+ >+ void clear() { >+ if (table) { >+ JS_DHashTableDestroy(table); >+ table = NULL; >+ } >+ } >+ >+ bool reset() { >+ clear(); >+ return init(); >+ } >+ >+ bool init() { >+ table = JS_NewDHashTable(&DHashOps, NULL, sizeof(Entry), >+ JS_DHASH_DEFAULT_CAPACITY(32)); >+ return table != NULL; >+ } >+ >+ bool add(Key key, Value value) { >+ Entry *entry = (Entry *)JS_DHashTableOperate(table, &key, JS_DHASH_ADD); >+ if (!entry) >+ return false; >+ entry->key = key; >+ entry->value = value; >+ return true; >+ } >+ >+ void remove(Key key) { >+ (void) JS_DHashTableOperate(table, &key, JS_DHASH_REMOVE); >+ } >+ >+ bool contains(Key key) { >+ return JS_DHASH_ENTRY_IS_BUSY(JS_DHashTableOperate(table, &key, JS_DHASH_LOOKUP)); >+ } >+ >+ bool lookup(Key key, Value& value) { >+ Entry *entry = (Entry*)JS_DHashTableOperate(table, &key, JS_DHASH_LOOKUP); >+ if (entry->key != key) >+ return false; >+ value = entry->value; >+ return true; >+ } >+}; >+ >+template <typename Key, typename Value> >+const JSDHashTableOps HashMap<Key, Value>::DHashOps = >+{ >+ JS_DHashAllocTable, >+ JS_DHashFreeTable, >+ HashMap<Key, Value>::Hash, >+ HashMap<Key, Value>::Match, >+ JS_DHashMoveEntryStub, >+ JS_DHashClearEntryStub, >+ JS_DHashFinalizeStub, >+ NULL >+}; >+ >+template <typename Key> >+class HashSet : public HashMap<Key, int> >+{ >+ public: >+ bool add(Key key) { >+ return HashMap<Key, int>::add(key, 0); >+ } >+}; Nit: move all hash templates to jshash.h. >+ >+#define RETURN(x) do { METER(rt->gcStats.conservative.x++); return; } while (0) >+ >+static bool >+ConfirmArena(JSRuntime *rt, JSGCArenaInfo *arena, int& kind) >+{ >+ for (unsigned i = 0; i != FINALIZE_LIMIT; ++i) { >+ JSGCArenaList *arenaList = &rt->gcArenaList[i]; >+ for (JSGCArenaInfo *a = arenaList->head; a; a = a->prev) { >+ if (a == arena) { >+ kind = i; >+ return true; >+ } >+ } >+ } >+ for (JSGCArenaInfo *a = rt->gcDoubleArenaList.head; a; a = a->prev) { >+ if (arena == a) { >+ kind = -1; >+ return true; >+ } >+ } >+ return false; >+} >+ >+static void >+js_MarkConservatively(JSTracer *trc, HashMap<JSGCArenaInfo *, int>& arenas, jsuword w) >+{ >+ JSRuntime *rt = trc->context->runtime; >+ >+ METER(rt->gcStats.conservative.scanned++); >+ >+ /* If this is an odd addresses or a tagged integer, skip it. */ >+ if (w & 1) >+ RETURN(oddaddress); >+ >+ /* Strip off the tag bits. */ >+ jsuword tag = w & JSVAL_TAGMASK; >+ jsuword p = w & ~(JSVAL_TAGMASK); >+ >+ /* Also ignore tagged special values (they never contain pointers). */ >+ if (tag == JSVAL_SPECIAL) >+ RETURN(wrongtag); We should add DEBUG warnings checking those assumptions. But that can wait. >+ >+ JSGCThing *thing; >+ jsuword start = ARENA_INFO_TO_START(arena); >+ if (kind == -1) { /* doubles */ >+ if (tag && tag != JSVAL_DOUBLE) >+ RETURN(wrongtag); >+ JS_ASSERT(!(p & 7)); Nit: replace this assert a static assert that JSVAL_TAGMASK == 7 == sizeof(jsdouble) - 1. >+ if (p - start >= DOUBLES_PER_ARENA * sizeof(jsdouble)) >+ RETURN(notarena); >+ thing = (JSGCThing *)p; >+ } else { >+ if (tag == JSVAL_DOUBLE) >+ RETURN(wrongtag); JSXML contains few uint16 fields. Thus in principle the compiler can store 2-byte aligned pointer to that field. Thus this filter is should be applied after we know that this is not JSXML. >+ jsuword offset = p - start; >+ size_t thingSize = arena->list->thingSize; >+ p = (start + offset - (offset % thingSize)); Nit: the last line is just p -= offset % thingSize; >+ size_t limit = THINGS_PER_ARENA(thingSize) * thingSize; >+ if (offset >= limit) >+ RETURN(notarena); >+ thing = (JSGCThing *)p; >+ /* Make sure the thing is not on the freelist of the arena. */ Nit: blank after the cast and a blank line before the comment. >+ METER(rt->gcStats.conservative.marked++); >+ if (!tag) >+ METER(rt->gcStats.conservative.raw++); Nit: use METER_IF(!tag, rt->gcStats.conservative.raw++) >+ >+ /* We have now a valid pointer, that is either raw or tagged properly. */ >+ uint32 traceKind = js_GetGCThingTraceKind(thing); >+ JS_CALL_TRACER(trc, thing, traceKind, "machine stack"); >+} >+ >+void >+js_MarkConservatively(JSTracer *trc, jsuword *begin, jsuword *end) { >+ JSRuntime *rt = trc->context->runtime; >+ HashSet<jsuword> history; >+ HashMap<JSGCArenaInfo *, int> arenas; >+ >+ bool error = false; >+ for (unsigned i = 0; i != FINALIZE_LIMIT; ++i) { >+ JSGCArenaList *arenaList = &rt->gcArenaList[i]; >+ for (JSGCArenaInfo *a = arenaList->head; a; a = a->prev) >+ if (!arenas.add(a, i)) >+ error = true; >+ } >+ for (JSGCArenaInfo *a = rt->gcDoubleArenaList.head; a; a = a->prev) >+ if (!arenas.add(a, -1)) >+ error = true; >+ if (error) >+ arenas.clear(); >+ >+ while (begin < end) { >+ jsuword p = *begin++; >+ if (history.contains(p)) { >+ METER(trc->context->runtime->gcStats.conservative.cached++); >+ continue; >+ } >+ js_MarkConservatively(trc, arenas, p); >+ /* >+ * If adding the address to the hash table fails because we are out of >+ * memory, stack scanning slows down but is otherwise unaffected. >+ */ Nit: a blank line before comments. >diff --git a/js/src/jsgc.h b/js/src/jsgc.h >+/* >+ * Force all non-volatile registers onto the stack. As long the RegisterBarrier >+ * object is on the stack, no values will be invisible if the stack is scanned >+ * for pointers. >+ */ >+class RegisterBarrier >+{ Nit: add JS prefix for the class.
Thanks for the review. Gregor saw a regression, I will fix the nits and then put you up for review once we fixed that last regression. > GC_SAFE_POINT should be moved right before the while loop so it will called > only when the thread is really waits for GC. Also, can we avoid macro and just > declare the variable with a constructor doing all the necessary job. To RegisterBarrier must be scoped into the calling scope, so I don't think we can avoid a macro here without defining the RegisterBarrier manually by end at every call site. > Nit: move all hash templates to jshash.h. I think Luke is going to provide us a proper fast hash map/set implementation soon. I will check with him. This is meant as a stop-gap measure only. > JSXML contains few uint16 fields. Thus in principle the compiler can store > 2-byte aligned pointer to that field. Thus this filter is should be applied > after we know that this is not JSXML. I will just make the 16-bit fields 32-bit. > Nit: add JS prefix for the class. Why? Thanks for the review. A new version will be ready soon.
Depends on: 525527
(In reply to comment #86) > > GC_SAFE_POINT should be moved right before the while loop so it will called > > only when the thread is really waits for GC. Also, can we avoid macro and just > > declare the variable with a constructor doing all the necessary job. To > > RegisterBarrier must be scoped into the calling scope, so I don't think we can > avoid a macro here without defining the RegisterBarrier manually by end at > every call site. I prefer an explicit GCSafePoint point(threadData); versus a macro that pollutes the scope. Plus there is an extra bug in the last patch - it should not scan threads that are not waiting. One way to ensure that is to zero stackTop in the GCSafePoint destructor. But the care must be done so stackTop is set/cleared inside the GC lock so its access is always serialized. > > > Nit: move all hash templates to jshash.h. > > I think Luke is going to provide us a proper fast hash map/set implementation > soon. I will check with him. This is meant as a stop-gap measure only. > > > JSXML contains few uint16 fields. Thus in principle the compiler can store > > 2-byte aligned pointer to that field. Thus this filter is should be applied > > after we know that this is not JSXML. > > I will just make the 16-bit fields 32-bit. > > > Nit: add JS prefix for the class. > > Why? > > Thanks for the review. A new version will be ready soon.
(In reply to comment #87) > I prefer an explicit GCSafePoint point(threadData); versus a macro that > pollutes the scope. > > Plus there is an extra bug in the last patch - it should not scan threads that > are not waiting. One way to ensure that is to zero stackTop in the GCSafePoint > destructor. But the care must be done so stackTop is set/cleared inside the GC > lock so its access is always serialized. I meant here to scan only threads that have non-zero stackTop and then make sure that transition from zero to non-zero and back is done inside the GC lock. Another observation is that few calls to js_WaitForGC are done from places where the caller is outside JS request when the native stack must not contain any GC things, but that can be optimized later.
Attached patch patch (obsolete) (deleted) — Splinter Review
Nits addressed. Still have to identify the source of the win32 crash.
Attachment #409277 - Attachment is obsolete: true
The stack trace from visual studio for the windows bug: > xpcom_core.dll!Break(const char * aMsg=0x001ddf88) Line 489 C++ xpcom_core.dll!NS_DebugBreak_P(unsigned int aSeverity=1, const char * aStr=0x66125298, const char * aExpr=0x66125290, const char * aFile=0x66125240, int aLine=786) Line 354 + 0xc bytes C++ xpc3250.dll!DEBUG_CheckForComponentsInScope(JSContext * cx=0x0a2d6010, JSObject * obj=0x05da45a0, int OKIfNotInitialized=0, XPCJSRuntime * runtime=0x00476e88) Line 786 + 0x1c bytes C++ xpc3250.dll!XPCWrappedNativeScope::FindInJSObjectScope(JSContext * cx=0x0a2d6010, JSObject * obj=0x05da45a0, int OKIfNotInitialized=0, XPCJSRuntime * runtime=0x00476e88) Line 842 + 0x15 bytes C++ xpc3250.dll!XPCConvert::NativeInterface2JSObject(XPCLazyCallContext & lccx={...}, int * d=0x001de810, nsIXPConnectJSObjectHolder * * dest=0x00000000, nsISupports * src=0x0b261034, const nsID * iid=0x00000000, XPCNativeInterface * * Interface=0x00000000, nsWrapperCache * cache=0x00000000, JSObject * scope=0x02947ba0, int allowNativeWrapper=0, int isGlobal=0, unsigned int * pErr=0x00000000) Line 1111 + 0x11 bytes C++ xpc3250.dll!XPCConvert::NativeInterface2JSObject(XPCCallContext & ccx={...}, int * d=0x001de810, nsIXPConnectJSObjectHolder * * dest=0x00000000, nsISupports * src=0x0b261034, const nsID * iid=0x00000000, XPCNativeInterface * * Interface=0x00000000, nsWrapperCache * cache=0x00000000, JSObject * scope=0x02947ba0, int allowNativeWrapper=0, int isGlobal=0, unsigned int * pErr=0x00000000) Line 3037 + 0x34 bytes C++ xpc3250.dll!nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS * wrapper=0x06387f40, unsigned short methodIndex=3, const XPTMethodDescriptor * info=0x0338e9a8, nsXPTCMiniVariant * nativeParams=0x001de9dc) Line 1420 + 0x39 bytes C++ xpc3250.dll!nsXPCWrappedJS::CallMethod(unsigned short methodIndex=3, const XPTMethodDescriptor * info=0x0338e9a8, nsXPTCMiniVariant * params=0x001de9dc) Line 571 C++ xpcom_core.dll!PrepareAndDispatch(nsXPTCStubBase * self=0x06ec66d8, unsigned int methodIndex=3, unsigned int * args=0x001dea9c, unsigned int * stackBytesToPop=0x001dea8c) Line 114 + 0x21 bytes C++ xpcom_core.dll!SharedStub() Line 142 C++ gklayout.dll!nsDOMEventListenerWrapper::HandleEvent(nsIDOMEvent * aEvent=0x04405028) Line 66 C++ gklayout.dll!nsEventListenerManager::HandleEventSubType(nsListenerStruct * aListenerStruct=0x0a7b6a68, nsIDOMEventListener * aListener=0x09e07dc0, nsIDOMEvent * aDOMEvent=0x04405028, nsPIDOMEventTarget * aCurrentTarget=0x0b261030, unsigned int aPhaseFlags=6, nsCxPusher * aPusher=0x001deddc) Line 1059 + 0x12 bytes C++ gklayout.dll!nsEventListenerManager::HandleEvent(nsPresContext * aPresContext=0x00000000, nsEvent * aEvent=0x04e9a088, nsIDOMEvent * * aDOMEvent=0x001dedcc, nsPIDOMEventTarget * aCurrentTarget=0x0b261030, unsigned int aFlags=6, nsEventStatus * aEventStatus=0x001dedd0, nsCxPusher * aPusher=0x001deddc) Line 1177 C++ gklayout.dll!nsEventTargetChainItem::HandleEvent(nsEventChainPostVisitor & aVisitor={...}, unsigned int aFlags=6, int aMayHaveNewListenerManagers=1, nsCxPusher * aPusher=0x001deddc) Line 252 C++ gklayout.dll!nsEventTargetChainItem::HandleEventTargetChain(nsEventChainPostVisitor & aVisitor={...}, unsigned int aFlags=6, nsDispatchingCallback * aCallback=0x00000000, int aMayHaveNewListenerManagers=1, nsCxPusher * aPusher=0x001deddc) Line 319 C++ gklayout.dll!nsEventDispatcher::Dispatch(nsISupports * aTarget=0x0b261030, nsPresContext * aPresContext=0x00000000, nsEvent * aEvent=0x04e9a088, nsIDOMEvent * aDOMEvent=0x04405028, nsEventStatus * aEventStatus=0x00000000, nsDispatchingCallback * aCallback=0x00000000, nsCOMArray<nsPIDOMEventTarget> * aTargets=0x00000000) Line 584 + 0x1e bytes C++ gklayout.dll!nsEventDispatcher::DispatchDOMEvent(nsISupports * aTarget=0x0b261030, nsEvent * aEvent=0x00000000, nsIDOMEvent * aDOMEvent=0x04405028, nsPresContext * aPresContext=0x00000000, nsEventStatus * aEventStatus=0x00000000) Line 647 + 0x1d bytes C++ gklayout.dll!nsDOMEventTargetHelper::DispatchDOMEvent(nsEvent * aEvent=0x00000000, nsIDOMEvent * aDOMEvent=0x04405028, nsPresContext * aPresContext=0x00000000, nsEventStatus * aEventStatus=0x00000000) Line 218 + 0x19 bytes C++ gklayout.dll!nsXMLHttpRequest::ChangeState(unsigned int aState=4, int aBroadcast=1) Line 2937 C++ gklayout.dll!nsXMLHttpRequest::OnStartRequest(nsIRequest * request=0x08d6f1a0, nsISupports * ctxt=0x00000000) Line 1912 C++ gklayout.dll!nsCrossSiteListenerProxy::OnStartRequest(nsIRequest * aRequest=0x08d6f1a0, nsISupports * aContext=0x00000000) Line 177 C++ necko.dll!nsHttpChannel::CallOnStartRequest() Line 839 + 0x47 bytes C++ necko.dll!nsHttpChannel::OnStartRequest(nsIRequest * request=0x09196bf8, nsISupports * ctxt=0x00000000) Line 5165 C++ necko.dll!nsInputStreamPump::OnStateStart() Line 439 + 0x2c bytes C++ necko.dll!nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream * stream=0x0b503808) Line 395 + 0xb bytes C++ xpcom_core.dll!nsInputStreamReadyEvent::Run() Line 113 C++ xpcom_core.dll!nsThread::ProcessNextEvent(int mayWait=1, int * result=0x001df228) Line 527 + 0x19 bytes C++ xpcom_core.dll!NS_ProcessNextEvent_P(nsIThread * thread=0x003d5860, int mayWait=1) Line 239 + 0x16 bytes C++ gkwidget.dll!nsBaseAppShell::Run() Line 170 + 0xc bytes C++ toolkitcomps.dll!nsAppStartup::Run() Line 182 + 0x1c bytes C++ xul.dll!XRE_main(int argc=1, char * * argv=0x003d17d0, const nsXREAppData * aAppData=0x003d1e80) Line 3471 + 0x25 bytes C++ firefox.exe!NS_internal_main(int argc=1, char * * argv=0x003d17d0) Line 156 + 0x12 bytes C++ firefox.exe!wmain(int argc=1, wchar_t * * argv=0x01f0a738) Line 110 + 0xd bytes C++ firefox.exe!__tmainCRTStartup() Line 579 + 0x19 bytes C firefox.exe!wmainCRTStartup() Line 399 C kernel32.dll!75323677() [Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll] ntdll.dll!776c9d72() ntdll.dll!776c9d45()
Nasty. No clue here. Blake? Help?
a lot more visual studio output from the Method: xpc3250.dll!XPCWrappedNativeScope::FindInJSObjectScope(JSContext * cx=0x0a2d6010, JSObject * obj=0x05da45a0, int OKIfNotInitialized=0, XPCJSRuntime * runtime=0x00476e88) Line 842 + 0x15 bytes C++ OKIfNotInitialized 0 int - cx 0x0a2d6010 {operationCallbackFlag=0 link={...} xmlSettingFlags=0 ...} JSContext * operationCallbackFlag 0 volatile long + link {next=0x085c44dc prev=0x09ec023c } JSCListStr xmlSettingFlags 0 unsigned char padding 0 unsigned char + display 0x0a2d6020 JSStackFrame * [16] version 8192 unsigned short options 3080 unsigned long + localeCallbacks 0x654df540 localeCallbacks {localeToUpperCase=0x64b40400 localeToLowerCase=0x64b404f0 localeCompare=0x64b40510 ...} JSLocaleCallbacks * + resolvingTable 0x044d5638 {ops=0x66540ef8 data=0x00000000 hashShift=28 ...} JSDHashTable * generatingError 0 unsigned char insideGCMarkCallback 0 unsigned char throwing 0 unsigned char exception 22 int stackLimit 1436896 unsigned int scriptStackQuota 104837122 unsigned int + runtime 0x00478c70 {state=JSRTS_UP cxCallback=0x660167f0 protoHazardShape=340526 ...} JSRuntime * const + stackPool {first={...} current=0x0915b310 arenasize=8192 ...} JSArenaPool + fp 0x00000000 {regs=??? imacpc=??? slots=??? ...} JSStackFrame * + tempPool {first={...} current=0x0a2d60a8 arenasize=1024 ...} JSArenaPool + globalObject 0x05dbdb20 {map=0x04d2bf08 classword=81085661 fslots=0x05dbdb28 ...} JSObject * + weakRoots {finalizableNewborns=0x0a2d60cc newbornDouble=0x00000000 lastAtom=0 ...} JSWeakRoots + regExpStatics {input=0x00000000 multiline=0 parenCount=0 ...} JSRegExpStatics + sharpObjectMap {depth=0 sharpgen=0 table=0x00000000 } JSSharpObjectMap + busyArrayTable 0x0852e458 {buckets=0x083f5048 nentries=0 shift=28 ...} JSHashTable * + argumentFormatMap 0x0aa0d7b8 {format=0x660fce6c "%ip" length=3 formatter=0x65fd1749 ...} JSArgumentFormatMap * + lastMessage 0x0874b300 "TypeError: this.ht is null" char * tracefp 0x00000000 void * + tracePrevPc 0x00000000 <Bad Ptr> unsigned char * errorReporter 0x64b3df50 NS_ScriptErrorReporter(JSContext *, const char *, JSErrorReport *) void (JSContext *, const char *, JSErrorReport *)* operationCallback 0x64b3eb70 nsJSContext::DOMOperationCallback(JSContext *) int (JSContext *)* interpLevel 0 unsigned int data 0x083f4fc8 void * data2 0x0aa0d750 void * + dormantFrameChain 0x00000000 {regs=??? imacpc=??? slots=??? ...} JSStackFrame * + thread 0x0043d8a0 {contextList={...} id=32515400 titleToShare=0x00000000 ...} JSThread * requestDepth 2 long outstandingRequests 2 long + lockedSealedTitle 0x00000000 {ownercx=??? lock={...} u={...} } JSTitle * + threadLinks {next=0x085c4690 prev=0x09ec03f0 } JSCListStr + stackHeaders 0x00000000 {nslots=??? down=??? } JSStackHeader * + localRootStack 0x00000000 {scopeMark=??? rootCount=??? topChunk=??? ...} JSLocalRootStack * + tempValueRooters 0x00000000 {down=??? count=??? u={...} } JSTempValueRooter * + debugHooks 0x00478e08 {interruptHandler=0x00000000 interruptHandlerData=0x00000000 newScriptHook=0x00000000 ...} JSDebugHooks * + securityCallbacks 0x00000000 {checkObjectAccess=??? principalsTranscoder=??? findObjectPrincipals=??? } JSSecurityCallbacks * + regexpPool {first={...} current=0x09cd2740 arenasize=12248 ...} JSArenaPool resolveFlags 0 unsigned int interpState 0x00000000 InterpState * bailExit 0x00000000 VMSideExit * - found 0x06384178 {mRuntime=0x00476e88 mWrappedNativeMap=0x04e960b8 mWrappedNativeProtoMap=0x043d89b8 ...} XPCWrappedNativeScope * + PRCListStr {next=0x06efe7e4 prev=0x0aa0d770 } PRCListStr + __vfptr 0x661259dc const XPCWrappedNativeScope::`vftable' * + gScopes 0x089cf2b0 {mRuntime=0x00476e88 mWrappedNativeMap=0x0891f1e8 mWrappedNativeProtoMap=0x08c924b0 ...} XPCWrappedNativeScope * + gDyingScopes 0x00000000 {mRuntime=??? mWrappedNativeMap=??? mWrappedNativeProtoMap=??? ...} XPCWrappedNativeScope * + mRuntime 0x00476e88 {mStrIDs=0x00476e88 mStrJSVals=0x00476ed0 mXPConnect=0x004750f8 ...} XPCJSRuntime * + mWrappedNativeMap 0x04e960b8 {mTable=0x063841f8 } Native2WrappedNativeMap * + mWrappedNativeProtoMap 0x043d89b8 {mTable=0x06384258 } ClassInfo2WrappedNativeProtoMap * + mMainThreadWrappedNativeProtoMap 0x0875e028 {mTable=0x0848db98 } ClassInfo2WrappedNativeProtoMap * + mWrapperMap 0x063282d0 {mTable=0x0848dbf8 } WrappedNative2WrapperMap * + mComponents 0x06139a38 {mRefCnt={...} _mOwningThread={...} mInterfaces=0x00000000 ...} nsXPCComponents * + mNext 0x06438db8 {mRuntime=0x00476e88 mWrappedNativeMap=0x04789e60 mWrappedNativeProtoMap=0x06310270 ...} XPCWrappedNativeScope * + mGlobalJSObject 0x05da45a0 {map=0x0a91f380 classword=81085661 fslots=0x05da45a8 ...} JSObject * + mPrototypeJSObject 0x05da45c0 {map=0x087d1c00 classword=1717397425 fslots=0x05da45c8 ...} JSObject * + mPrototypeJSFunction 0x05f9b1c0 {map=0x04210820 classword=1717394737 fslots=0x05f9b1c8 ...} JSObject * + mPrototypeNoHelper 0x00000000 {map=??? classword=??? fslots=0x00000008 ...} JSObject * + mContext 0x0aa0d750 {mRuntime=0x00476e88 mJSContext=0x0a2d6010 mLastResult=0 ...} XPCContext * + mScriptObjectPrincipal 0x08859028 {mRefCnt={...} _mOwningThread={...} mIsFrozen=0 ...} nsIScriptObjectPrincipal * - obj 0x05da45a0 {map=0x0a91f380 classword=81085661 fslots=0x05da45a8 ...} JSObject * - map 0x0a91f380 {ops=0x661523a0 shape=340549 } JSObjectMap * - ops 0x661523a0 XPC_WN_NoCall_JSOps {objectMap=0x00000000 lookupProperty=0x66362d60 defineProperty=0x66361802 ...} const JSObjectOps * const + objectMap 0x00000000 {ops=??? shape=??? } const JSObjectMap * lookupProperty 0x66362d60 _js_LookupProperty int (JSContext *, JSObject *, int, JSObject * *, JSProperty * *)* defineProperty 0x66361802 _js_DefineProperty int (JSContext *, JSObject *, int, int, int (JSContext *, JSObject *, int, int *)*, int (JSContext *, JSObject *, int, int *)*, unsigned int)* getProperty 0x66361645 _js_GetProperty int (JSContext *, JSObject *, int, int *)* setProperty 0x66362d6f _js_SetProperty int (JSContext *, JSObject *, int, int *)* getAttributes 0x663642eb _js_GetAttributes int (JSContext *, JSObject *, int, JSProperty *, unsigned int *)* setAttributes 0x66361037 _js_SetAttributes int (JSContext *, JSObject *, int, JSProperty *, unsigned int *)* deleteProperty 0x6636152d _js_DeleteProperty int (JSContext *, JSObject *, int, int *)* defaultValue 0x66362338 _js_DefaultValue int (JSContext *, JSObject *, JSType, int *)* enumerate 0x6603e2f0 XPC_WN_JSOp_Enumerate(JSContext *, JSObject *, JSIterateOp, int *, int *) int (JSContext *, JSObject *, JSIterateOp, int *, int *)* checkAccess 0x66362c9d _js_CheckAccess int (JSContext *, JSObject *, int, JSAccessMode, int *, unsigned int *)* thisObject 0x6603efa0 XPC_WN_JSOp_ThisObject(JSContext *, JSObject *) JSObject * (JSContext *, JSObject *)* dropProperty 0x6636326f js_DropProperty(struct JSContext *,struct JSObject *,struct JSProperty *) void (JSContext *, JSObject *, JSProperty *)* call 0x00000000 int (JSContext *, JSObject *, unsigned int, int *, int *)* construct 0x00000000 int (JSContext *, JSObject *, unsigned int, int *, int *)* hasInstance 0x66362013 _js_HasInstance int (JSContext *, JSObject *, int, int *)* trace 0x66361564 _js_TraceObject void (JSTracer *, JSObject *)* clear 0x6603ef00 XPC_WN_JSOp_Clear(JSContext *, JSObject *) void (JSContext *, JSObject *)* shape 340549 unsigned long classword 81085661 unsigned int - fslots 0x05da45a8 int [5] [0] 98190944 int [1] 0 int [2] 103353944 int [3] 22 int [4] 22 int - dslots 0x0b9b8e14 int * 22 int + runtime 0x00476e88 {mStrIDs=0x00476e88 mStrJSVals=0x00476ed0 mXPConnect=0x004750f8 ...} XPCJSRuntime *
the crash happens (not every time) when I log out of gmail.
Oh and the assertion: + aMsg 0x001ddf88 "###!!! ASSERTION: XPConnect is being called on a scope without a 'Components' property!: 'Error', file c:/Users/Gregor/code/tracemonkey/js/src/xpconnect/src/xpcwrappednativescope.cpp, line 786" const char *
Gary, can you fuzz a windows shell?
Can we get a stack traces of the last GC that happens before the crash or stack traces of the each GC?
I was wrong that scanning only threads that calls JS_AWAIT_GC_DONE is enough. The problem comes from threads that use JS_SuspendRequest. Such threads are effectively outside their requests and can run freely even if their stack must be scanned.
The crash could be due to interference of the conservative scanner with the cycle collector or refrence counting. IIRC there are few XPCOM things that require at least 2 cycles of JS GC to be released as their finalization releases pointer to GC things that were counted as alive leading to the need in the second cycle. The conservative scanner may accidentally mark otherwise unreachable things that the code assumes should be only subject of finalization not marking.
Include previous gc stack traces for the crash... > mozjs.dll!js_MarkConservatively(JSTracer * trc=0x003fb2f8, unsigned int * begin=0x003fb240, unsigned int * end=0x00400000) Line 1124 C++ mozjs.dll!JSThreadData::mark(JSTracer * trc=0x003fb2f8) Line 343 + 0x1d bytes C++ mozjs.dll!thread_tracer(JSDHashTable * table=0x00cd90f8, JSDHashEntryHdr * hdr=0x00c9c5a0, unsigned long __formal=0, void * arg=0x003fb2f8) Line 417 C++ mozjs.dll!JS_DHashTableEnumerate(JSDHashTable * table=0x00cd90f8, JSDHashOperator (JSDHashTable *, JSDHashEntryHdr *, unsigned long, void *)* etor=0x6f4f73a0, void * arg=0x003fb2f8) Line 743 + 0x19 bytes C++ mozjs.dll!js_TraceThreads(JSRuntime * rt=0x00cd8ed0, JSTracer * trc=0x003fb2f8) Line 479 + 0x18 bytes C++ mozjs.dll!js_TraceRuntime(JSTracer * trc=0x003fb2f8, int allAtoms=0) Line 2856 + 0xd bytes C++ mozjs.dll!js_GC(JSContext * cx=0x046d2c00, JSGCInvocationKind gckind=GC_NORMAL) Line 3433 + 0xd bytes C++ mozjs.dll!JS_GC(JSContext * cx=0x046d2c00) Line 2451 + 0xb bytes C++ xpc3250.dll!nsXPConnect::Collect() Line 477 + 0xa bytes C++ xpcom_core.dll!nsCycleCollector::Collect(unsigned int aTryCollections=1) Line 2507 + 0x19 bytes C++ xpcom_core.dll!nsCycleCollector_collect() Line 3202 + 0x16 bytes C++ gklayout.dll!nsJSContext::CC() Line 3514 + 0x6 bytes C++ gklayout.dll!nsJSContext::IntervalCC() Line 3603 C++ gklayout.dll!nsJSContext::MaybeCC(int aHigherProbability=1) Line 3580 + 0x5 bytes C++ gklayout.dll!nsJSContext::CCIfUserInactive() Line 3590 + 0x7 bytes C++ gklayout.dll!GCTimerFired(nsITimer * aTimer=0x0961cc38, void * aClosure=0x00000000) Line 3629 C++ xpcom_core.dll!nsTimerImpl::Fire() Line 427 + 0xe bytes C++ xpcom_core.dll!nsTimerEvent::Run() Line 521 C++ xpcom_core.dll!nsThread::ProcessNextEvent(int mayWait=1, int * result=0x003ff3b8) Line 527 + 0x19 bytes C++ xpcom_core.dll!NS_ProcessNextEvent_P(nsIThread * thread=0x00c35860, int mayWait=1) Line 239 + 0x16 bytes C++ gkwidget.dll!nsBaseAppShell::Run() Line 170 + 0xc bytes C++ toolkitcomps.dll!nsAppStartup::Run() Line 182 + 0x1c bytes C++ xul.dll!XRE_main(int argc=1, char * * argv=0x00c317d0, const nsXREAppData * aAppData=0x00c31e80) Line 3471 + 0x25 bytes C++ firefox.exe!NS_internal_main(int argc=1, char * * argv=0x00c317d0) Line 156 + 0x12 bytes C++ firefox.exe!wmain(int argc=1, wchar_t * * argv=0x004fa738) Line 110 + 0xd bytes C++ firefox.exe!__tmainCRTStartup() Line 579 + 0x19 bytes C firefox.exe!wmainCRTStartup() Line 399 C > mozjs.dll!js_MarkConservatively(JSTracer * trc=0x003fb2f8, unsigned int * begin=0x003fb240, unsigned int * end=0x00400000) Line 1124 C++ mozjs.dll!JSThreadData::mark(JSTracer * trc=0x003fb2f8) Line 343 + 0x1d bytes C++ mozjs.dll!thread_tracer(JSDHashTable * table=0x00cd90f8, JSDHashEntryHdr * hdr=0x00c9c5a0, unsigned long __formal=0, void * arg=0x003fb2f8) Line 417 C++ mozjs.dll!JS_DHashTableEnumerate(JSDHashTable * table=0x00cd90f8, JSDHashOperator (JSDHashTable *, JSDHashEntryHdr *, unsigned long, void *)* etor=0x6f4f73a0, void * arg=0x003fb2f8) Line 743 + 0x19 bytes C++ mozjs.dll!js_TraceThreads(JSRuntime * rt=0x00cd8ed0, JSTracer * trc=0x003fb2f8) Line 479 + 0x18 bytes C++ mozjs.dll!js_TraceRuntime(JSTracer * trc=0x003fb2f8, int allAtoms=0) Line 2856 + 0xd bytes C++ mozjs.dll!js_GC(JSContext * cx=0x046d2c00, JSGCInvocationKind gckind=GC_NORMAL) Line 3433 + 0xd bytes C++ mozjs.dll!JS_GC(JSContext * cx=0x046d2c00) Line 2451 + 0xb bytes C++ xpc3250.dll!nsXPConnect::Collect() Line 477 + 0xa bytes C++ xpcom_core.dll!nsCycleCollector::Collect(unsigned int aTryCollections=1) Line 2507 + 0x19 bytes C++ xpcom_core.dll!nsCycleCollector_collect() Line 3202 + 0x16 bytes C++ gklayout.dll!nsJSContext::CC() Line 3514 + 0x6 bytes C++ gklayout.dll!nsJSContext::IntervalCC() Line 3603 C++ gklayout.dll!nsJSContext::MaybeCC(int aHigherProbability=1) Line 3580 + 0x5 bytes C++ gklayout.dll!nsJSContext::CCIfUserInactive() Line 3590 + 0x7 bytes C++ gklayout.dll!GCTimerFired(nsITimer * aTimer=0x0961cc38, void * aClosure=0x00000000) Line 3629 C++ xpcom_core.dll!nsTimerImpl::Fire() Line 427 + 0xe bytes C++ xpcom_core.dll!nsTimerEvent::Run() Line 521 C++ xpcom_core.dll!nsThread::ProcessNextEvent(int mayWait=1, int * result=0x003ff3b8) Line 527 + 0x19 bytes C++ xpcom_core.dll!NS_ProcessNextEvent_P(nsIThread * thread=0x00c35860, int mayWait=1) Line 239 + 0x16 bytes C++ gkwidget.dll!nsBaseAppShell::Run() Line 170 + 0xc bytes C++ toolkitcomps.dll!nsAppStartup::Run() Line 182 + 0x1c bytes C++ xul.dll!XRE_main(int argc=1, char * * argv=0x00c317d0, const nsXREAppData * aAppData=0x00c31e80) Line 3471 + 0x25 bytes C++ firefox.exe!NS_internal_main(int argc=1, char * * argv=0x00c317d0) Line 156 + 0x12 bytes C++ firefox.exe!wmain(int argc=1, wchar_t * * argv=0x004fa738) Line 110 + 0xd bytes C++ firefox.exe!__tmainCRTStartup() Line 579 + 0x19 bytes C firefox.exe!wmainCRTStartup() Line 399 C + aMsg 0x217400fc <Bad Ptr> const char * > xpcom_core.dll!Break(const char * aMsg=0x003f9244) Line 489 C++ xpcom_core.dll!NS_DebugBreak_P(unsigned int aSeverity=1, const char * aStr=0x6f35b00c, const char * aExpr=0x6f35afe4, const char * aFile=0x6f35af98, int aLine=1114) Line 354 + 0xc bytes C++ xpc3250.dll!XPCWrappedNative::Init(XPCCallContext & ccx={...}, JSObject * parent=0x00783ca0, int isGlobal=0, const XPCNativeScriptableCreateInfo * sci=0x003f97b0) Line 1114 + 0x2a bytes C++ xpc3250.dll!XPCWrappedNative::GetNewOrUsed(XPCCallContext & ccx={...}, nsISupports * Object=0x090fe2d8, XPCWrappedNativeScope * Scope=0x046543b8, XPCNativeInterface * Interface=0x00000000, nsWrapperCache * cache=0x00000000, int isGlobal=0, XPCWrappedNative * * resultWrapper=0x003f9928) Line 571 + 0x1f bytes C++ xpc3250.dll!XPCConvert::NativeInterface2JSObject(XPCLazyCallContext & lccx={...}, int * d=0x003f9b80, nsIXPConnectJSObjectHolder * * dest=0x003f9b78, nsISupports * src=0x090fe2d8, const nsID * iid=0x00000000, XPCNativeInterface * * Interface=0x00000000, nsWrapperCache * cache=0x00000000, JSObject * scope=0x00783ca0, int allowNativeWrapper=0, int isGlobal=0, unsigned int * pErr=0x003f9980) Line 1202 + 0x3d bytes C++ xpc3250.dll!NativeInterface2JSObject(XPCLazyCallContext & lccx={...}, JSObject * aScope=0x00783ca0, nsISupports * aCOMObj=0x090fe2d8, const nsID * aIID=0x00000000, int aAllowWrapping=0, int * aVal=0x003f9b80, nsIXPConnectJSObjectHolder * * aHolder=0x003f9b78) Line 1226 + 0x2b bytes C++ xpc3250.dll!nsXPConnect::WrapNativeToJSVal(JSContext * aJSContext=0x04648708, JSObject * aScope=0x00783ca0, nsISupports * aCOMObj=0x090fe2d8, const nsID * aIID=0x00000000, int aAllowWrapping=0, int * aVal=0x003f9b80, nsIXPConnectJSObjectHolder * * aHolder=0x003f9b78) Line 1280 + 0x24 bytes C++ gklayout.dll!nsContentUtils::WrapNative(JSContext * cx=0x04648708, JSObject * scope=0x00783ca0, nsISupports * native=0x090fe2d8, const nsID * aIID=0x00000000, int * vp=0x003f9b80, nsIXPConnectJSObjectHolder * * aHolder=0x003f9b78, int aAllowWrapping=0) Line 5168 + 0x2f bytes C++ gklayout.dll!nsDOMClassInfo::WrapNative(JSContext * cx=0x04648708, JSObject * scope=0x00783ca0, nsISupports * native=0x090fe2d8, const nsID * aIID=0x00000000, int aAllowWrapping=0, int * vp=0x003f9b80, nsIXPConnectJSObjectHolder * * aHolder=0x003f9b78) Line 144 + 0x21 bytes C++ gklayout.dll!nsDOMClassInfo::WrapNative(JSContext * cx=0x04648708, JSObject * scope=0x00783ca0, nsISupports * native=0x090fe2d8, int aAllowWrapping=0, int * vp=0x003f9b80, nsIXPConnectJSObjectHolder * * aHolder=0x003f9b78) Line 164 + 0x1f bytes C++ gklayout.dll!nsNodeSH::PreCreate(nsISupports * nativeObj=0x090f2fa8, JSContext * cx=0x04648708, JSObject * globalObj=0x00783ca0, JSObject * * parentObj=0x003f9c04) Line 7216 + 0x2f bytes C++ xpc3250.dll!ConstructSlimWrapper(XPCCallContext & ccx={...}, nsISupports * p=0x090f3074, nsWrapperCache * cache=0x090f2fac, XPCWrappedNativeScope * xpcScope=0x046543b8, int * rval=0x003f9d10) Line 3815 + 0x3c bytes C++ xpc3250.dll!XPCConvert::NativeInterface2JSObject(XPCLazyCallContext & lccx={...}, int * d=0x003fa548, nsIXPConnectJSObjectHolder * * dest=0x00000000, nsISupports * src=0x090f3074, const nsID * iid=0x6f379fd0, XPCNativeInterface * * Interface=0x6f392484, nsWrapperCache * cache=0x090f2fac, JSObject * scope=0x05f16440, int allowNativeWrapper=1, int isGlobal=0, unsigned int * pErr=0x003f9d84) Line 1152 + 0x19 bytes C++ xpc3250.dll!xpc_qsXPCOMObjectToJsval(XPCLazyCallContext & lccx={...}, nsISupports * p=0x090f3074, nsWrapperCache * cache=0x00000000, const nsID * iid=0x6f379fd0, XPCNativeInterface * * iface=0x6f392484, int * rval=0x003fa548) Line 1082 + 0x30 bytes C++ xpc3250.dll!nsIDOMEvent_GetTarget(JSContext * cx=0x04648708, JSObject * obj=0x05f16440, int id=2422052, int * vp=0x003fa548) Line 10038 + 0x3b bytes C++ mozjs.dll!JSScopeProperty::get(JSContext * cx=0x04648708, JSObject * obj=0x05f16440, JSObject * pobj=0x033b15c0, int * vp=0x003fa548) Line 681 + 0x41 bytes C++ mozjs.dll!js_NativeGet(JSContext * cx=0x04648708, JSObject * obj=0x05f16440, JSObject * pobj=0x033b15c0, JSScopeProperty * sprop=0x06035cc8, unsigned int getHow=1, int * vp=0x003fa548) Line 4306 + 0x18 bytes C++ mozjs.dll!js_GetPropertyHelper(JSContext * cx=0x04648708, JSObject * obj=0x05f16440, int id=2422052, unsigned int getHow=1, int * vp=0x003fa548) Line 4487 + 0x1d bytes C++ mozjs.dll!js_Interpret(JSContext * cx=0x04648708) Line 1554 + 0x32 bytes C++ mozjs.dll!js_Invoke(JSContext * cx=0x04648708, unsigned int argc=1, int * vp=0x0a592bb8, unsigned int flags=0) Line 1383 + 0x9 bytes C++ xpc3250.dll!nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS * wrapper=0x060b4c18, unsigned short methodIndex=3, const XPTMethodDescriptor * info=0x0376edc8, nsXPTCMiniVariant * nativeParams=0x003faa14) Line 1696 + 0x1b bytes C++ xpc3250.dll!nsXPCWrappedJS::CallMethod(unsigned short methodIndex=3, const XPTMethodDescriptor * info=0x0376edc8, nsXPTCMiniVariant * params=0x003faa14) Line 571 C++ xpcom_core.dll!PrepareAndDispatch(nsXPTCStubBase * self=0x060b4b18, unsigned int methodIndex=3, unsigned int * args=0x003faad4, unsigned int * stackBytesToPop=0x003faac4) Line 114 + 0x21 bytes C++ xpcom_core.dll!SharedStub() Line 142 C++ gklayout.dll!nsEventListenerManager::HandleEventSubType(nsListenerStruct * aListenerStruct=0x0607c2b0, nsIDOMEventListener * aListener=0x060b4b18, nsIDOMEvent * aDOMEvent=0x062de108, nsPIDOMEventTarget * aCurrentTarget=0x04bdd9c0, unsigned int aPhaseFlags=6, nsCxPusher * aPusher=0x003fae00) Line 1059 + 0x12 bytes C++ gklayout.dll!nsEventListenerManager::HandleEvent(nsPresContext * aPresContext=0x00000000, nsEvent * aEvent=0x090de1e0, nsIDOMEvent * * aDOMEvent=0x003fadf0, nsPIDOMEventTarget * aCurrentTarget=0x04bdd9c0, unsigned int aFlags=6, nsEventStatus * aEventStatus=0x003fadf4, nsCxPusher * aPusher=0x003fae00) Line 1177 C++ gklayout.dll!nsEventTargetChainItem::HandleEvent(nsEventChainPostVisitor & aVisitor={...}, unsigned int aFlags=6, int aMayHaveNewListenerManagers=1, nsCxPusher * aPusher=0x003fae00) Line 252 C++ gklayout.dll!nsEventTargetChainItem::HandleEventTargetChain(nsEventChainPostVisitor & aVisitor={...}, unsigned int aFlags=6, nsDispatchingCallback * aCallback=0x00000000, int aMayHaveNewListenerManagers=1, nsCxPusher * aPusher=0x003fae00) Line 319 C++ gklayout.dll!nsEventDispatcher::Dispatch(nsISupports * aTarget=0x04bdd9c0, nsPresContext * aPresContext=0x00000000, nsEvent * aEvent=0x090de1e0, nsIDOMEvent * aDOMEvent=0x062de108, nsEventStatus * aEventStatus=0x003faf38, nsDispatchingCallback * aCallback=0x00000000, nsCOMArray<nsPIDOMEventTarget> * aTargets=0x00000000) Line 584 + 0x1e bytes C++ gklayout.dll!nsEventDispatcher::DispatchDOMEvent(nsISupports * aTarget=0x04bdd9c0, nsEvent * aEvent=0x00000000, nsIDOMEvent * aDOMEvent=0x062de108, nsPresContext * aPresContext=0x00000000, nsEventStatus * aEventStatus=0x003faf38) Line 647 + 0x1d bytes C++ gklayout.dll!nsGenericElement::DispatchDOMEvent(nsEvent * aEvent=0x00000000, nsIDOMEvent * aDOMEvent=0x062de108, nsPresContext * aPresContext=0x00000000, nsEventStatus * aEventStatus=0x003faf38) Line 2894 + 0x19 bytes C++ gklayout.dll!nsContentUtils::DispatchChromeEvent(nsIDocument * aDoc=0x090f2fa8, nsISupports * aTarget=0x090f2fa8, const nsAString_internal & aEventName={...}, int aCanBubble=1, int aCancelable=1, int * aDefaultAction=0x00000000) Line 3255 + 0x38 bytes C++ gklayout.dll!nsDocument::DoNotifyPossibleTitleChange() Line 5091 + 0x2c bytes C++ gklayout.dll!nsNonOwningRunnableMethod<nsDocument,void>::Run() Line 356 C++ xpcom_core.dll!nsThread::ProcessNextEvent(int mayWait=1, int * result=0x003fb0a8) Line 527 + 0x19 bytes C++ xpcom_core.dll!NS_ProcessNextEvent_P(nsIThread * thread=0x00c35860, int mayWait=1) Line 239 + 0x16 bytes C++ xpcom_core.dll!nsThread::Shutdown() Line 468 + 0xb bytes C++ gkwidget.dll!nsSound::PurgeLastSound() Line 140 C++ gkwidget.dll!nsSound::~nsSound() Line 135 C++ gkwidget.dll!nsSound::`scalar deleting destructor'() + 0xf bytes C++ gkwidget.dll!nsSound::Release() Line 124 + 0xd3 bytes C++ xpc3250.dll!DoDeferredRelease<nsISupports *>(nsTArray<nsISupports *> & array={...}) Line 488 + 0xe bytes C++ xpc3250.dll!XPCJSRuntime::GCCallback(JSContext * cx=0x046d2c00, JSGCStatus status=JSGC_END) Line 759 + 0xf bytes C++ gklayout.dll!DOMGCCallback(JSContext * cx=0x046d2c00, JSGCStatus status=JSGC_END) Line 3700 + 0x17 bytes C++ xpc3250.dll!XPCCycleCollectGCCallback(JSContext * cx=0x046d2c00, JSGCStatus status=JSGC_END) Line 411 + 0x17 bytes C++ mozjs.dll!js_GC(JSContext * cx=0x046d2c00, JSGCInvocationKind gckind=GC_NORMAL) Line 3646 + 0x9 bytes C++ mozjs.dll!JS_GC(JSContext * cx=0x046d2c00) Line 2451 + 0xb bytes C++ xpc3250.dll!nsXPConnect::Collect() Line 477 + 0xa bytes C++ xpcom_core.dll!nsCycleCollector::Collect(unsigned int aTryCollections=1) Line 2507 + 0x19 bytes C++ xpcom_core.dll!nsCycleCollector_collect() Line 3202 + 0x16 bytes C++ gklayout.dll!nsJSContext::CC() Line 3514 + 0x6 bytes C++ gklayout.dll!nsJSContext::IntervalCC() Line 3603 C++ gklayout.dll!nsJSContext::MaybeCC(int aHigherProbability=1) Line 3580 + 0x5 bytes C++ gklayout.dll!nsJSContext::CCIfUserInactive() Line 3590 + 0x7 bytes C++ gklayout.dll!GCTimerFired(nsITimer * aTimer=0x0961cc38, void * aClosure=0x00000000) Line 3629 C++ xpcom_core.dll!nsTimerImpl::Fire() Line 427 + 0xe bytes C++ xpcom_core.dll!nsTimerEvent::Run() Line 521 C++ xpcom_core.dll!nsThread::ProcessNextEvent(int mayWait=1, int * result=0x003ff3b8) Line 527 + 0x19 bytes C++ xpcom_core.dll!NS_ProcessNextEvent_P(nsIThread * thread=0x00c35860, int mayWait=1) Line 239 + 0x16 bytes C++ gkwidget.dll!nsBaseAppShell::Run() Line 170 + 0xc bytes C++ toolkitcomps.dll!nsAppStartup::Run() Line 182 + 0x1c bytes C++ xul.dll!XRE_main(int argc=1, char * * argv=0x00c317d0, const nsXREAppData * aAppData=0x00c31e80) Line 3471 + 0x25 bytes C++ firefox.exe!NS_internal_main(int argc=1, char * * argv=0x00c317d0) Line 156 + 0x12 bytes C++ firefox.exe!wmain(int argc=1, wchar_t * * argv=0x004fa738) Line 110 + 0xd bytes C++ firefox.exe!__tmainCRTStartup() Line 579 + 0x19 bytes C firefox.exe!wmainCRTStartup() Line 399 C ###!!! ASSERTION: Non-global object has the wrong flags: '!(jsclazz->flags & JSCLASS_IS_GLOBAL)', file c:/Users/Gregor/code/tracemonkey/js/src/xpconnect/src/xpcwrappednative.cpp, line 1114
How about this: If a thread calls JS_SuspendRequest, we snapshot the stack and flush registers. The thread is not allowed to create any new roots, but that should be ok since SuspendRequests implies that no JS activity occurs.
Attached patch patch (obsolete) (deleted) — Splinter Review
This patch tries to capture and snapshot the current stack height when a context is suspended. Further nesting is ignored. The last resuming context resets the thread state.
Attachment #409420 - Attachment is obsolete: true
Testing on mac and windows both fails. I think its time to call it a day and go back to OS-specific methods to stop threads and read the register state. The GC protocol is too lax to be of any use here.
I get an assertion now when I start the browser in windows: So this happens during startup mozjs.dll!JS_Assert(const char * s=0x667c7bcc, const char * file=0x667c7b98, int ln=925) Line 67 C++ > mozjs.dll!GCBarrier::set(JSThreadData * data=0x0054f920) Line 925 + 0x22 bytes C++ mozjs.dll!GCBarrier::GCBarrier(JSThreadData * data=0x0054f920, const JSGuardObjectNotifier & _notifier={...}) Line 100 + 0x9 bytes C++ mozjs.dll!JS_BeginRequest(JSContext * cx=0x0057dd40) Line 945 + 0x1e bytes C++ xpc3250.dll!XPCNativeMember::Resolve(XPCCallContext & ccx={...}, XPCNativeInterface * iface=0x00581d60) Line 210 + 0xa bytes C++ xpc3250.dll!XPCNativeMember::NewFunctionObject(XPCCallContext & ccx={...}, XPCNativeInterface * iface=0x00581d60, JSObject * parent=0x00a0d1e0, int * pval=0x0039e2ac) Line 119 + 0x1c bytes C++ xpc3250.dll!DefinePropertyIfFound(XPCCallContext & ccx={...}, JSObject * obj=0x00a0d1e0, int idval=10548092, XPCNativeSet * set=0x00584af8, XPCNativeInterface * iface=0x00581d60, XPCNativeMember * member=0x00581dcc, XPCWrappedNativeScope * scope=0x03404d18, int reflectToStringAndToSource=1, XPCWrappedNative * wrapperToReflectInterfaceNames=0x00000000, XPCWrappedNative * wrapperToReflectDoubleWrap=0x00000000, XPCNativeScriptableInfo * scriptableInfo=0x00000000, unsigned int propFlags=7, int * resolved=0x00000000) Line 485 + 0x18 bytes C++ xpc3250.dll!XPC_WN_NoMods_Proto_Resolve(JSContext * cx=0x0054f6c8, JSObject * obj=0x00a0d1e0, int idval=10548092) Line 2046 + 0x46 bytes C++ mozjs.dll!js_LookupPropertyWithFlags(JSContext * cx=0x0054f6c8, JSObject * obj=0x00a0d1e0, int id=10548092, unsigned int flags=1, JSObject * * objp=0x0039e470, JSProperty * * propp=0x0039e464) Line 4046 + 0xf bytes C++ mozjs.dll!js_GetPropertyHelper(JSContext * cx=0x0054f6c8, JSObject * obj=0x00a0d200, int id=10548092, unsigned int getHow=1, int * vp=0x0039ea80) Line 4411 + 0x23 bytes C++ mozjs.dll!js_Interpret(JSContext * cx=0x0054f6c8) Line 1554 + 0x32 bytes C++ mozjs.dll!js_Execute(JSContext * cx=0x0054f6c8, JSObject * chain=0x00a0d1a0, JSScript * script=0x034666b0, JSStackFrame * down=0x00000000, unsigned int flags=0, int * result=0x0039ed38) Line 1616 + 0x9 bytes C++ mozjs.dll!JS_ExecuteScript(JSContext * cx=0x0054f6c8, JSObject * obj=0x00a0d1a0, JSScript * script=0x034666b0, int * rval=0x0039ed38) Line 4980 + 0x19 bytes C++ xpc3250.dll!mozJSComponentLoader::GlobalForLocation(nsILocalFile * aComponent=0x004dd008, JSObject * * aGlobal=0x034049c4, char * * aLocation=0x034049c8, int * exception=0x00000000) Line 1378 + 0x27 bytes C++ xpc3250.dll!mozJSComponentLoader::LoadModule(nsILocalFile * aComponentFile=0x004dd008, nsIModule * * aResult=0x0039eff8) Line 703 + 0x2c bytes C++ xpcom_core.dll!nsComponentManagerImpl::AutoRegisterComponent(nsILocalFile * aComponentFile=0x004dd008, nsTArray<DeferredModule> & aDeferred={...}, int minLoader=0) Line 3048 + 0x33 bytes C++ xpcom_core.dll!nsComponentManagerImpl::LoadLeftoverComponents(nsCOMArray<nsILocalFile> & aLeftovers={...}, nsTArray<DeferredModule> & aDeferred={...}, int minLoader=0) Line 3103 + 0x1d bytes C++ xpcom_core.dll!nsComponentManagerImpl::AutoRegister(nsIFile * aSpec=0x00000000) Line 3367 C++ xpcom_core.dll!NS_InitXPCOM3_P(nsIServiceManager * * result=0x0039f540, nsIFile * binDirectory=0x0016c928, nsIDirectoryServiceProvider * appFileLocationProvider=0x0039f658, const nsStaticModuleInfo * staticComponents=0x66b2a7d0, unsigned int componentCount=1) Line 620 C++ xul.dll!ScopedXPCOMStartup::Initialize() Line 1073 + 0x26 bytes C++ xul.dll!XRE_main(int argc=1, char * * argv=0x004917d0, const nsXREAppData * aAppData=0x00491e80) Line 3278 + 0xb bytes C++ firefox.exe!NS_internal_main(int argc=1, char * * argv=0x004917d0) Line 156 + 0x12 bytes C++ firefox.exe!wmain(int argc=1, wchar_t * * argv=0x0016a738) Line 110 + 0xd bytes C++ firefox.exe!__tmainCRTStartup() Line 579 + 0x19 bytes C firefox.exe!wmainCRTStartup() Line 399 C
Re: comment 102: before giving up we need evidence for the hypothesis that the GC protocol allows stack roots above the top of stack at suspend time -- which on its face is a request model violation therefore a GC safety bug. The alternative hypothesis is a bug in the patch. It may well be we want OS-specific conservatism but why? We need to understand the requirement. /be
I don't think there is a GC safety bug here. The suspend protocol is per context, not per thread. So a thread can suspend one of its contexts and then proceed to much around with another context and enter the GC protocol (overwriting the local register state). GC protocol ends, thread some out of the GC barrier and now still has a context live and goes on to behave as if the request is suspended and doesn't participate in the GC protocol. We GC. We die (register state was cleared by previous barrier).
That's a bug in the patch, then -- the thread state has to track the maximum stack in use by active and suspended requests. Are you gonna make me look? :-P /be
That doesn't work. There are 2 protocols in play here. The GC protocol is performed by threads. The resume/suspend protocol by contexts. See #105. We track the stack correctly, but the GC protocol can still happen and destroys the stack balance game.
Yes, requests nest not only on one context but among multiple -- so what? The patch is too simplistic in using set and clear instead of push and pop. /be
To be clearer, if the protocol would take a thread and calling "suspend" means that this thread will not perform any JS activities until it does a "resume" again, that would work. But thats not the case here.
Only one thread may use a context at a time. A thread may use more than one context. A context may nest more than one active request. A request cannot be suspended on one cx and resumed on another (see cx->outstandingRequests updates). Therefore there is no "destroys the stack balance game". There's a need for the patch to maintain its own stack of stack extents. Comment 109 says: > To be clearer, if the protocol would take a thread and calling "suspend" means > that this thread will not perform any JS activities until it does a "resume" > again, that would work. But thats not the case here. Why is this necessary? The API is what it is. The patch needs to keep a stack or other add and subtract (union and set-difference) stack extents. It could even just stack its GCBarrier state. /be
> or other add and subtract (union and set-difference) stack extents. It could "or other data structure with which to add and subtract ...." /be
Its not just the stack extent. I also have to store the registers somewhere in a stack-y non-destructive manner. To store it on a stack I need to allocate. To allocate means I can fail. Failure is not an option here. The API doesn't report an error, just a refcount.
We have some OS-dependent code already. I would rather stick with code we know and understand and can copy from WebKit than trying to run circles around our request model. I just don't see how in the presents of resume/suspend _per context_ there is any guarantees the GC protocl gives you _per thread_ in this context.
The thread stack itself can hold the allocations unless I'm missing something. (In reply to comment #113) > We have some OS-dependent code already. I would rather stick with code we know > and understand and can copy from WebKit I'm with you so far. > than trying to run circles around our request model. It was not clear there wasn't a bug being worked around here. Or, misunderstanding of the LIFO nature of requests but maybe I am missing a break in that. We need to have mutual understanding and not one side throwing up of hands, esp. without clarity about what the bug was. > I just don't see how in the presents of resume/suspend _per > context_ there is any guarantees the GC protocl gives you _per thread_ in this > context. A given context can be used by only one thread at a time. Suspend and resume must be matched on the same cx. Are you worried about a cx moving from one thread to another thread? /be
No, I am mostly worried about #113. I have no way to track the stack and register state as a stack.
Maybe I can add a struct to every context and use that for the stack? Its all not very pretty though.
I still do not understand comment 113. Per-cx API calls must balance per thread unless a cx migrates from one thread to another between matched calls. Begin cx1 ... Begin cx2 ... End cx2 ... End cx1 is one possible order: nesting on another cx. You can nest requests on one cx too: Begin cx ... Begin cx ... End cx ... End cx And you can suspend and resume with either of the above or more suspend/resume (yield) in the middle: Begin cx ... Suspend cx ... <any of these sequences> ... Resume cx ... End cx However you do it, though, we don't abuse the API by unwinding the stack in the middle and restoring it before Resume, AFAIK. The API could be used in that way. But my point is that you can monitor API calls and construct thread suspend resume using a threadData-homed counter, as the patch did. The patch needs to keep more state, to cover all used stack, though. /be
Ok, I hacked up a patch that uses a per-context JSThreadState that is enqueued into a per-thread list of pushed register/stackTop states. The approach dies here: void js_WaitForGC(JSRuntime *rt) { JS_ASSERT_IF(rt->gcRunning, rt->gcLevel > 0); if (rt->gcRunning && rt->gcThread->id != js_CurrentThreadId()) { GCBarrier barrier(js_CurrentThreadData(rt)); do { JS_AWAIT_GC_DONE(rt); } while (rt->gcRunning); } } Some of our API functions don't take cx, they take rt. I have a hack to get the current ThreadData, but I can't get to a specific cx. So this approach doesn't work either, because - some APIs don't report an error value (I can't allocate memory) - some APIs don't take a cx (I can't stash it in cx) I am fresh out of bright ideas on this topic. WebKit OS-dependent code here we go. Alternative patches welcome, but no more wild goose chases :)
Agreed, we don't want to change the API. JS_Save/RestoreExceptionState pointed the way (it mallocs a two-word API-opaque struct! overkill but extensible) but if we break API compat all we'll have is pain for embedders and an awkward, bloat-y or leaky API. /be
Attachment #409873 - Attachment is obsolete: true
Sorry to keep playing devil's advocate, but why can't you keep the top cx's threadstate pointer in the JSThread? /be
I need a cx to put the current state in and then put that on top of the stack. If all I get is an rt, I am hosed.
The fact that several context may enter the request on single thread does not matter. What matter is that due to JS_SuspendRequest we do not have a good point to capture the stack and the registers for a thread that calls the function. Such thread will have part of its native stack populated with GC things so it must be scanned. What is not known is what part of the stack to scan. A starightforward aproach would be just to record the stack pointer and registers whenever JS_SuspendRequest that really suspend the request (due to multiple contexts that may not be the case for each cal to the function) is executed and scan the stack from the base until that point. Then the question is how to ensure that we really capture all the stack that can contain the GC things. AFAICS that should not be a problem in principle since the caller of JS_SuspendRequest that really suspends must assume that from that point the GC may run freely and the caller cannot rely on weak roots for GC rooting and everything should be strongly rooted. Still for real guarantee we may scan the whole CPU page containing the stack pointer of JS_SuspendRequest. Even better would be to deprecate JS_SuspendRequest and replace it with a version that takes a callback that would be executed outside a request. This way it should be easy to capture all the necessary parts of the stack.
I spent some more time refactoring the code and use a stack-allocated JSThreadState object when no cx is around, but this is really getting out of hand. I will ask Gregor to go back to work on the OS-supported thread state capturing. We can do #123 as an optimization later on, though, I think the OS-supported code will be shorter and faster (no constant setjmps, just a pthread register read when we GC), so I am not sure its even worth poking at this problem too much.
(In reply to comment #124) > I spent some more time refactoring the code and use a stack-allocated > JSThreadState object when no cx is around, Don't do that! You have an rt. From it and the current thread's id you get the current JSThread. From that you get the cx most recently used in a request API call. /be
Attached patch patch (obsolete) (deleted) — Splinter Review
back to the os specific stuff. work in progress...
Attachment #409842 - Attachment is obsolete: true
Attachment #410042 - Attachment is obsolete: true
Attached patch patch (obsolete) (deleted) — Splinter Review
stable version for mac. more changes needed for win and linux
Attached patch patch with mac support (obsolete) (deleted) — Splinter Review
Attachment #410075 - Attachment is obsolete: true
Attachment #410090 - Attachment is obsolete: true
Attached patch patch (obsolete) (deleted) — Splinter Review
Fix mach insanity (register structure size is in ints, not words ... bleh).
Attachment #410100 - Attachment is obsolete: true
Attached patch windows support (obsolete) (deleted) — Splinter Review
Attachment #410101 - Attachment is obsolete: true
Attached patch windows support, for real (obsolete) (deleted) — Splinter Review
Attachment #410105 - Attachment is obsolete: true
Attached patch windows fix (obsolete) (deleted) — Splinter Review
Attachment #410106 - Attachment is obsolete: true
Attachment #410111 - Attachment is obsolete: true
Attached patch patch (obsolete) (deleted) — Splinter Review
Attachment #410114 - Attachment is obsolete: true
Attached patch working windows support (obsolete) (deleted) — Splinter Review
ping for peterv/mrbkap to look at the cycle collector issue
The cause for the strange windows assert is suspend/resume. If I comment those out, stuff works. Maybe resuming a sleeping thread is a bad idea?
Attached patch patch (obsolete) (deleted) — Splinter Review
fix mac __DARWIN_UNIX03 registers
Attachment #410117 - Attachment is obsolete: true
the patch crashes the jsapi-tests for the optimized mac shell. It happens during the cleanup for the testDefineGetterSetterNonEnumerable test. TEST-PASS | testDefineGetterSetterNonEnumerable | ok Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_PROTECTION_FAILURE at address: 0x00000fec 0x00045c20 in JS_CallTracer () (gdb) bt #0 0x00045c20 in JS_CallTracer () #1 0x000bc627 in JSScopeProperty::trace () #2 0x0006403c in js_TraceObject () #3 0x00045dfd in JS_CallTracer () #4 0x0004653c in js_MarkConservatively () #5 0x0001b950 in js_TraceThreads () #6 0x00046e46 in js_TraceRuntime () #7 0x000473c2 in js_GC () #8 0x0001b3f3 in js_DestroyContext () #9 0x0000a8f9 in JS_DestroyContext () #10 0x000036e4 in JSAPITest::uninit () #11 0x000032d3 in main ()
and now with debug symbols looks like we trace a 0x0. Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_PROTECTION_FAILURE at address: 0x00000fec 0x00045ba0 in GetGCThingFlags [inlined] () at /Users/mozilla/code/gTM/js/src/jsgc.cpp:860 860 return THING_FLAGP(a, index); (gdb) bt #0 0x00045ba0 in GetGCThingFlags [inlined] () at /Users/mozilla/code/gTM/js/src/jsgc.cpp:860 #1 0x00045ba0 in JS_CallTracer (trc=0xbffff6cc, thing=0x0, kind=0) at ../jsgc.cpp:2325 #2 0x000bc627 in JSScopeProperty::trace (this=0x820cd0, trc=0xbffff6cc) at ../jsscope.cpp:2325 #3 0x00063fbc in JSScope::trace () at /Users/mozilla/code/gTM/js/src/jsscopeinlines.h:156 #4 0x00063fbc in js_TraceObject (trc=0xbffff6cc, obj=0x1db3a0) at jsscopeinlines.h:2325 #5 0x00045d7d in JS_TraceChildren [inlined] () at /Users/mozilla/code/gTM/js/src/jsgc.cpp:2085 #6 0x00045d7d in JS_CallTracer (trc=0xbffff6cc, thing=0x1db3a0, kind=0) at ../jsgc.cpp:2325 #7 0x000464bc in js_MarkConservatively (trc=0xbffff6cc, begin=0xbffff628, end=0xc0000000) at ../jsgc.cpp:2325 #8 0x0001b8d0 in js_TraceThreads (rt=0x1ab000, trc=0xbffff6cc) at ../jscntxt.cpp:2325 #9 0x00046dc6 in js_TraceRuntime (trc=0xbffff6cc, allAtoms=0) at ../jsgc.cpp:2325 #10 0x00047342 in js_GC (cx=0x20d890, gckind=GC_LAST_CONTEXT) at ../jsgc.cpp:2325 #11 0x0001b373 in js_DestroyContext (cx=0x20d890, mode=JSDCM_FORCE_GC) at ../jscntxt.cpp:2325 #12 0x0000a879 in JS_DestroyContext (cx=0x20d890) at ../jsapi.cpp:2325 #13 0x00003604 in JSAPITest::uninit (this=0x148b9c) at tests.h:2325 #14 0x000031f3 in std::string::_M_rep () at /usr/include/c++/4.2.1/bits/basic_string.h:70 #15 0x000031f3 in ~basic_string [inlined] () at /usr/include/c++/4.2.1/bits/basic_string.h:493 #16 0x000031f3 in main () at ../../jsapi-tests/tests.cpp:2325
I can reproduce this.
(gdb) p this $1 = (JSScopeProperty * const) 0x60274d0 (gdb) p *this $2 = { id = 96352716, getter = 0, setter = 0, slot = 2, attrs = 52 '4', flags = 1 '\001', shortid = 0, parent = 0x0, kids = 0x0, shape = 487 } We have a property that indicates in the attributes that is has a getter but it doesn't.
What's js_DumpId($2.id) disclose? How about js_DumpObject on the relevant obj? Can you get this only with the conservative scanning patch? That's scary if so. /be
Ok, so we are tracing obj/vobj here: #4 0x0006400c in js_TraceObject (trc=0xbffff6ec, obj=0x5be13a0) at jsscopeinlines.h:2325 153 do { (gdb) p/x vobj $8 = { cx = 0x5c0fb40, v = 0x5be13a0 } Its still on the stack, and the conservative stack scanner finds it.
Breakpoint 1, cls_testDefineGetterSetterNonEnumerable::run (this=0x148b9c) at ../../jsapi-tests/testDefineGetterSetterNonEnumerable.cpp:15 15 JSObject *obj = JS_NewObject(cx, NULL, NULL, NULL); (gdb) n 16 CHECK(obj); (gdb) n 17 vobj = OBJECT_TO_JSVAL(obj); (gdb) n cls_testDefineGetterSetterNonEnumerable::run (this=0x148b9c) at ../../jsapi-tests/testDefineGetterSetterNonEnumerable.cpp:19 19 jsvalRoot vget(cx); (gdb) p/x vobj $15 = { cx = 0x5c0fb40, v = 0x5be13a0 } (gdb) p vobj.v $16 = 96342944 (gdb) p/x vobj.v $17 = 0x5be13a0 (gdb) p/x *(JSObject*)vobj.v $18 = { map = 0x5c10240, classword = 0x145d40, fslots = {0x5be1020, 0x5be1000, 0x16, 0x16, 0x16}, dslots = 0x680 } (gdb) dslots = 0x680 looks a bit disconcerting.
Ignore #146. Thats dmandelin's special null values for debugging purposes.
Attached patch patch (obsolete) (deleted) — Splinter Review
Don't do conservative scanning when gckind==GC_LAST_CONTEXT since we are shutting down anyway and some parts of the GC are skipped. This probably also fixed the shutdown leak on the try server.
Attachment #410115 - Attachment is obsolete: true
Attachment #410162 - Attachment is obsolete: true
Attached patch fix WinCE build (obsolete) (deleted) — Splinter Review
Attached patch fix __DARWIN_UNIX03 build (obsolete) (deleted) — Splinter Review
Attachment #410450 - Attachment is obsolete: true
Attached patch initial OS/2 impl (obsolete) (deleted) — Splinter Review
This is an initial OS/2 implementation of the platform specific bits. It's untested so I didn't integrate it with your large patch.
Great. I am glad to hear that this is at least in principle possible on OS/2. I will try to have someone with Solaris experience look at the patch too.
Attached patch WiP Linux implementation (obsolete) (deleted) — Splinter Review
Attachment #410616 - Attachment is obsolete: true
The Linux implementation breaks on weak references (test_js_weak_references.js): function run_test() { var obj = { num: 5, str: 'foo' }; var weak = Components.utils.getWeakReference(obj); do_check_true(weak.get() === obj); do_check_true(weak.get().num == 5); do_check_true(weak.get().str == 'foo'); // Force garbage collection Components.utils.forceGC(); // obj still references the object, so it should still be accessible via weak do_check_true(weak.get() === obj); do_check_true(weak.get().num == 5); do_check_true(weak.get().str == 'foo'); // Clear obj's reference to the object and force garbage collection obj = null; Components.utils.forceGC(); // The object should have been garbage collected and so should no longer be // accessible via weak do_check_true(weak.get() === null); }
In JSThreadData::mark() I see this assert: JS_ASSERT(stackTop && stackTop < stackBase); In the OS/2 impl I tried to use stackBase provided by the system (ptib->tib_pstack), but then this assert always fails, because stackTop will always be larger than stackBase, as the stack grows downwards. I would have adjusted the OS/2 code to use the end of the stack instead (ptib->tib_pstacklimit) but I see that at least Windows (and now Linux) also uses stackBase. So how come that this assert passes on Windows/Linux but not on OS/2?
(In reply to comment #137) > ping for peterv/mrbkap to look at the cycle collector issue I wasn't cc'ed. Do we still need to look at something? We tried looking into comment 93, but I could never get it to assert or crash.
(In reply to comment #157) > I wasn't cc'ed. Do we still need to look at something? We tried looking into > comment 93, but I could never get it to assert or crash. See comment 155. Note that currently patch does not remove any roots that exist. As such crashes can only come from the GC things that live too long due to the conservative scanner marking them.
No longer blocks: 508140
Attached patch refresh and fix maemo build (obsolete) (deleted) — Splinter Review
Attachment #411580 - Attachment is obsolete: true
What's the current status/progress rate on this? This could be really valuable in decreasing the number of GC-related crashes.
Right now we have several problems. We fail the "shutdown during GC" testcase on windows. I guess we suspend a thread during GC and the browser freezes during shutdown. We also have to find a solution for weak references as mentioned in comment 155. The thread suspend/resume mechanism for linux is not tested for all corner cases.
(In reply to comment #161) > Right now we have several problems. > We fail the "shutdown during GC" testcase on windows. I guess we suspend a > thread during GC and the browser freezes during shutdown. Do you know more details on the exact cause of the freeze? > We also have to find a solution for weak references as mentioned in comment > 155. In what way exactly does that test case fail? > The thread suspend/resume mechanism for linux is not tested for all corner > cases. Do you need any help writing test cases?
Sorry for the delay but I have been in Europe the last 2 weeks. I will start working on it again. My plan is to come up to MV in 2 weeks for a few days. We should talk about it then.
There is now a separate bug 538702 for the thread synchronization stuff.
Another problem with conservative GC is that we can't test for garbage collected objects any more. Tests like testIsAboutToBeFinalized_bug528645 check if an object got collected after a GC. Since we might have a reference on the stack, we can't Assert on a collection. I also get a lot of leaks within xpcshell tests. I guess they are caused by the same problem.
How about we run leak checks in a special mode where we don't scan the stack? (And ensure they run at a time where there actually are no roots on the stack)
Attached patch refresh (obsolete) (deleted) — Splinter Review
Refresh because of recent GC changes. First attempt to add a marking queue to avoid deadlocks as mentioned by Igor in bug 538702. This patch is also based on the patch of bug 538702.
Attachment #412311 - Attachment is obsolete: true
(In reply to comment #167) > First attempt to add a marking queue to avoid deadlocks as mentioned by Igor in > bug 538702. Why a marking queue is necessary? It should be possible to use DelayMarkingChildren as it serves exactly the same purpose and avoids any surprises like OOM during the marking phase. That is, the conservative marker should call the marker for strings and doubles directly and for the rest of things it should just call DelayMarkingChildren.
Attached patch refresh (obsolete) (deleted) — Splinter Review
Removed the MarkStack and changed it to DelayMarkingChildren. Removed custom HashTable and changed to new HashMap and HashSet.
Attachment #421863 - Attachment is obsolete: true
Attached patch patch (obsolete) (deleted) — Splinter Review
Attachment #428355 - Attachment is obsolete: true
Attached patch update (obsolete) (deleted) — Splinter Review
Attachment #428361 - Attachment is obsolete: true
Attached patch update (obsolete) (deleted) — Splinter Review
add delayMarkingChildren flag to js_CallGCMarker.
Attachment #428518 - Attachment is obsolete: true
Attached patch patch (obsolete) (deleted) — Splinter Review
leakage fix
Attachment #429235 - Attachment is obsolete: true
Tryserver results: green on linux, same error as on tracemonkey tree on windows, http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1267285772.1267295179.28854.gz js_weak_reference.js fail and another unrelated? fail on OSX http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1267285772.1267297874.2696.gz
The patch also disables testIsAboutToBeFinalized. Can we fix that test? Maybe clear the stack or registers or something like that?
I think its fine to land this with the two tests disabled since both explicitly test the wrong thing once this patch lands.
(In reply to comment #177) > I think its fine to land this with the two tests disabled since both explicitly > test the wrong thing once this patch lands. It would be bad to loose the only test that verifies that JS_IsAboutToBeFinalized works. We should fix it instead to work with the conservative scanner.
If JS_IsAboutToBeFinalized doesn't work, I doubt you survive try server. So that's most definitely not the only test we have that verifies that JS_IsAboutToBeFinalized works. Its a quick and nice shell test. If there is a reasonable way to force msvc and gcc to purge the value from the stack and registers (and spill slots), we can potentially make it work again. I don't see that as a blocker for landing though. This patch has been in the making for 4 month now. There is an arbitrary number of awesome features and improvements we can always tack on, but as long we get the fundamentals right I really would like to land it before my grand kids grow up ;) PS: Gregor, do we have any performance feedback from tryserver. Did we ding any of the the talos numbers?
Gregor, want to try to move all the code that allocates gcthings in the test case into a separate method and then flag that as no-inline? The only left-over references should be in the heap then. That _might_ work. Note: there is no way to reliably test what the test is trying to test in the presence of conservative stack walking. We can always misread an integer as a pointer. We might be able to make the test "work most of the time" if we get lucky, but that's about all we can do. If the system is doing address space randomization, the test might randomly fail because some constant happens to overlap that run's address of some object or string.
Back when I was working on mmGC I had this exact problem. I employed disgusting hacks to make the test go. It seemed worth the gore to have at least one test that showed the GC was doing something. ;) A saner alternative is to weaken the test to accept either correct outcome -- either both the object IsAboutToBeFinalized and subsequently the finalizer is called, or neither. If one happens without the other, that's a regression.
Actually the test case might be easy to fix by inserting a JS_CommenceShutdown() prior to JS_GC(), which will disable stack scanning.
(In reply to comment #182) > Actually the test case might be easy to fix by inserting a > JS_CommenceShutdown() prior to JS_GC(), which will disable stack scanning. Is there a new runtime created after the test? We can't reactivate the stack scanning right now.
Gregor, the patch is full of trailing white-space. You should remove those, too.
Right, its all compiled into one file, so that wouldn't work. I guess moving it into its own function and hoping the stack will be unwound before we scan it seems the only way to go.
Attached patch update (obsolete) (deleted) — Splinter Review
separate platform specific code.
Attachment #429342 - Attachment is obsolete: true
Depends on: 551680
I should update the patch with workaround for suspended requests before the weekend.
Depends on: 561657
Assigning this to Igor, given comment 187. Igor, this is something you're working on now, right? If not, feel free to throw it back and someone will pick it up.
Assignee: gal → igor
(In reply to comment #188) > Assigning this to Igor, given comment 187. > > Igor, this is something you're working on now, right? Yes. I would like first to land the scanner in pure check-for-leaks mode. That is, it will run after the GC has done marking of all other GC things. At that point any unmarked thing would mean a leak. If the leaks would be acceptable, we can run the scanner first and replace various auto-classes with asserts that the things should be marked. That would ensure that we have not missed anything in the stack scanner.
Depends on: 563345
I'm going to block 1.9.3 on this. We've been bitten by enough GC hazards on the stack that. We should make the systemic fix. Holler if you disagree, of course. Igor, Gregor: please take the steps necessary to land this patch as soon as possible. Igor's plan for a gradual introduction seems good.
blocking2.0: --- → beta1+
I don't disagree, and I'd like to take this path, but (not the first time you have heard this) static analysis could help us do without a fix for this bug. Why don't we invest in it? We'll be supporting branches for a while. /be
(In reply to comment #190) > I'm going to block 1.9.3 on this. We've been bitten by enough GC hazards on the > stack that. We should make the systemic fix. I have an updated patch but it depends on bug 561657 and bug 563345. I hope to get those bugs landed soon.
Depends on: 237006
Attached patch update (obsolete) (deleted) — Splinter Review
To deal with threads calling JS_SuspendRequest the patch snapshots the stack and the registers in the function using setjmp. The patch also runs the conservative scanner after all other marking is done. This way any unmarked GC thing that it finds means either a false positive or a GC hazard. I suppose we need a better reporting about these leaks that would be available even in optimized builds to collect some stats about the leaks before starting to rely on the scanner. Note that the patch depends on other patches waiting for landing/review and untested at this point.
Attachment #410380 - Attachment is obsolete: true
Attachment #430194 - Attachment is obsolete: true
We can also add it to the GCTIMER options to see how big the overhead is. Igor does it work now with the 2 patches you mentioned in comment 192?
Attached patch patch with api test (obsolete) (deleted) — Splinter Review
The new patch adds a basic api test to verify that the conservative scanner works and adds an option to dump the leaked conservative roots and other stats. The leaks are defined as anything that the conservative scanner marks after marking all non-conservatibe roots. The option is always on currently and is activated at runtime via setting an environment variable JS_CONSERVATIVE_GC_ROOTS. The value of that should a file name to dump the info or stdout. A typical output from a browser session looks like: CONSERVATIVE STACK SCANNING: number of stack words: 5549 number of unique words: 2445 excluded, low bit set: 520 not withing chunk range: 524 not withing a chunk: 1250 not withing an arena: 0 excluded, wrong tag: 120 excluded, not live: 1 things marked: 30 raw pointers marked: 24 conservative roots: 5 0x7f0c33e76380: string "3.7a5pre" 0x7f0c258287c0: object XPCWrappedNative_NoHelper 0x7f0c25830b80: string "^3.0" 0x7f0c2580d000: string "3.7" 0x7f0c25828840: object Array The patch depends on the latest patches from the following bugs: bug 561364 bug 564414 bug 237006
Attachment #444640 - Attachment is obsolete: true
Attached patch update (obsolete) (deleted) — Splinter Review
Here is a patch with no external dependencies as all relevant patches have landed.
Attachment #445193 - Attachment is obsolete: true
Attached patch update (obsolete) (deleted) — Splinter Review
The new update adds another workaround to testIsAboutToBeFinalized to make sure that the stack stays clean from the GC things. Apparently on 64 bit Linux it could be rather hard.
Attachment #446775 - Attachment is obsolete: true
I plan to land this temporary on TM tip around 6:00 AM PST to collect the test stats as TryServer is not that helpful in that respect.
I see leaks with this patch. Don't you have to disable conservative stack scanning once JS_CommenceRuntimeShutDown is called?
(In reply to comment #199) > I see leaks with this patch. Don't you have to disable conservative stack > scanning once JS_CommenceRuntimeShutDown is called? That should not be necessary since at that moment there should not be any JS requests running. The patch just have a bug in that area.
Attached patch update (obsolete) (deleted) — Splinter Review
The new version does not do the stack scan when js_GC is called outside a request. This should fix the shutdown leak.
Attachment #447381 - Attachment is obsolete: true
Attached patch update (obsolete) (deleted) — Splinter Review
The new patch disables the conservative scanning during the cycle collection if the CC is called initially outside a JS request. This helps to promptly collect XPCOM garbage. What is remaining AFAICS is to fix js/src/xpconnect/tests/unit/test_js_weak_references.js so it will work in presence of the conservative GC and to figure out why on try server there more random failures with the patch.
Attachment #447839 - Attachment is obsolete: true
(In reply to comment #202) > What is remaining AFAICS is to fix > js/src/xpconnect/tests/unit/test_js_weak_references.js so it will work in > presence of the conservative GC and to figure out why on try server there more > random failures with the patch. Are the crashes platform related?
(In reply to comment #203) > > Are the crashes platform related? On the try server it looks like more orange than typical with some tests time outing and other failing. But I will look for crashes after fixing test_js_weak_references.js.
Attached patch update (obsolete) (deleted) — Splinter Review
The new patch updates tests to workaround conservative scanning. During the last few days the tryserver has started to show unconditional success so this is ready. The most visible change compared with the initial design is that the patch scans the stack only for threads inside js_GC or with suspended requests. To support the latter the patch records the stack pointer during the suspend and scans the stack between the base and that place. This is inaccurate since in FF in many cases JS_SuspendRequest is called via a chain of utility helpers and the recorded stack position includes few corresponding stack frames that should not include any roots. But it is not worse than scanning native frames between the base of the stack and the frame that enters the request. The complication with that is multiple JS_SuspendRequest calls for multiple contexts. That required to use the max stack depth among multiple GC calls. Ideally the bug 477999 should allow to simplify that but that is for the later.
Attachment #410653 - Attachment is obsolete: true
Attachment #448497 - Attachment is obsolete: true
Attachment #449458 - Flags: review?(brendan)
Attachment #449458 - Flags: review?(gal)
(In reply to comment #205) > Created an attachment (id=449458) [details] > update > > The new patch updates tests to workaround conservative scanning. During the > last few days the tryserver has started to show unconditional success so this > is ready. > > The most visible change compared with the initial design is that the patch > scans the stack only for threads inside js_GC or with suspended requests. To > support the latter the patch records the stack pointer during the suspend and > scans the stack between the base and that place. This is inaccurate since in FF > in many cases JS_SuspendRequest is called via a chain of utility helpers and > the recorded stack position includes few corresponding stack frames that should > not include any roots. But it is not worse than scanning native frames between > the base of the stack and the frame that enters the request. > > The complication with that is multiple JS_SuspendRequest calls for multiple > contexts. That required to use the max stack depth among multiple GC calls. > Ideally the bug 477999 should allow to simplify that but that is for the later. There were a lot of "should not include roots" in all the comments of this bug. Did you actually try to scan all parts where we assume that we wouldn't find any roots?
(In reply to comment #206) > > There were a lot of "should not include roots" in all the comments of this bug. > Did you actually try to scan all parts where we assume that we wouldn't find > any roots? I am not sure what do you mean. Is it about having a mode for a conservative scanning that try to scan everything including threads outside the GC request to find bugs related to incorrect usage of the request API?
Yes I meant all threads (including threads outside of a request). I don't think we need a mode for this but have you checked that the assumption that there are no roots on such stacks is correct for the try-server for example?
(In reply to comment #208) > Yes I meant all threads (including threads outside of a request). > I don't think we need a mode for this but have you checked that the assumption > that there are no roots on such stacks is correct for the try-server for > example? There is no good way to scan a thread outside a request. We cannot suspend such thread. For the scan to be reliable the thread has to be suspended during the full GC marking cycle. If it is suspended just briefly to scan its stack then there is no guarantee that when it would be resumed it would not create more roots on the stack that GC would miss (I have missed that when doing initial reviews of the patches). And if it would be suspended for the whole GC then we are going to have deadlocks due to various XPConnect things taking locks during the GC. Precisely for this reason any changes of the root set should happen inside a request or during a GC. The engine tries to enforce this using debug checks for API that have GC things as in-out parameters. We may have issues in this area, but this is for another bug. Now, as a tool we may still try to do some suspend to capture the violation of the assumptions to see if some code leaks jsvals on the stack outside a request, but how would we distinguish the false positives (that happens regularly, see comment 195) from the real bugs?
(In reply to comment #209) > Now, as a tool we may still try to do some suspend to capture the violation of > the assumptions to see if some code leaks jsvals on the stack outside a > request, but how would we distinguish the false positives (that happens > regularly, see comment 195) from the real bugs? It would be easier to create such tool if the entering/suspending JS requests would be done using callbacks like: JS_DoRequest(cx, callback) or JS_PauseRequest(cx, callback). It would be straightforward in JS_DoRequest and JS_PauseRequest to record the precise native stack boundaries where the GC roots are allowed. Also in a special tool mode the functions can try to clean the registers and stack, so any false positive can only happen due to random coincidence of some numbers with GC things. Those are rarer so we should be able to capture bugs in the request usage. Unfortunately it would take time to change FF codebase to use API like that.
Comment on attachment 449458 [details] [diff] [review] update Nice tests. >+ FILE *fp = !strcmp(dumpFileName, "stdout") ? stdout >+ : !strcmp(dumpFileName, "stderr") ? stderr >+ : fopen(dumpFileName, "aw"); Can you write this as if-statements. This is pretty unreadable. >+/* static */ >+JS_NEVER_INLINE JS_FRIEND_API(void) >+ConservativeGCThreadData::enable(bool knownStackBoundary) >+{ >+ ++enableCount; >+ if (enableCount <= 0) >+ return; >+ >+ /* Update the native stack pointer if it points to a bigger stack. */ >+#if JS_STACK_GROWTH_DIRECTION > 0 >+# define CMP > >+#else >+# define CMP < >+#endif >+ jsuword dummy; >+ if (knownStackBoundary || enableCount == 1 || &dummy CMP nativeStackTop) >+ nativeStackTop = &dummy; >+#undef CMP >+ >+ /* Update the register snapshot with the latest values. */ >+#if defined(_MSC_VER) >+# pragma warning(push) >+# pragma warning(disable: 4611) >+#endif >+ setjmp(registerSnapshot.jmpbuf); >+#if defined(_MSC_VER) >+# pragma warning(pop) >+#endif >+ >+} What about roots that are created while the request is suspended? Is that generally disallowed as per API contract? >+ /* >+ * Do not scan the current thread on the shutdown or when the GC is called >+ * outside a request. >+ */ >+ bool scanGCThreadStack = >+#ifdef JS_THREADSAFE >+ (cx->thread->contextsInRequests != 0) && >+#endif >+ (rt->state != JSRTS_LANDING); Is this safe? Roots might go away while we do our shutdown dance. >+ >+ // We want to scan the current thread for GC roots only if it was in a >+ // request prior to the Collect call to avoid false positives during the >+ // cycle collection. So to compensate for JS_BeginRequest in >+ // XPCCallContext::Init we disable the conservative scanner if that call >+ // has started the only request on this thread. Is there no better way of detecting this? This sounds rather subtle. >+ JS_ASSERT(cx->requestDepth >= 1); >+ JS_ASSERT(cx->thread->contextsInRequests >= 1); >+ if(cx->requestDepth >= 2 || cx->thread->contextsInRequests >= 2) >+ { >+ JS_GC(cx); >+ } >+ else >+ { >+ JS_THREAD_DATA(cx)->conservativeGC.disable(); >+ JS_GC(cx); >+ JS_THREAD_DATA(cx)->conservativeGC.enable(); >+ } > JS_SetGCCallback(cx, gOldJSGCCallback); > gOldJSGCCallback = nsnull;
(In reply to comment #211) > What about roots that are created while the request is suspended? Is that > generally disallowed as per API contract? Yes. Any rooting for a thread with a suspended request has the same problems as a thread outside a request, see the comment 209 for details. > >+ bool scanGCThreadStack = > >+#ifdef JS_THREADSAFE > >+ (cx->thread->contextsInRequests != 0) && > >+#endif > >+ (rt->state != JSRTS_LANDING); > > Is this safe? Roots might go away while we do our shutdown dance. rt->state == JSRTS_LANDING means that we have the last context that is been destroyed. This operation finishes any running request on it before running the GC and sort-of starts the GC shutdown. So no more roots. > > >+ > >+ // We want to scan the current thread for GC roots only if it was in a > >+ // request prior to the Collect call to avoid false positives during the > >+ // cycle collection. So to compensate for JS_BeginRequest in > >+ // XPCCallContext::Init we disable the conservative scanner if that call > >+ // has started the only request on this thread. > > Is there no better way of detecting this? This sounds rather subtle. I can add some internal method to check for this, but fortunately this is the only place that needs this. Also I hope to get something working for the bug 477999 soon that would allow to have much simpler checks here.
Comment on attachment 449458 [details] [diff] [review] update Sounds good to me. Lets get it landed.
Attachment #449458 - Flags: review?(gal) → review+
The new patch adds asserts to ensure that the request is not ended when there are native frames on the stack with API calls insisting on been in a request. That ensures that the conservative GC cannot be disabled via triggering some code that inadvertently calls JS_EndRequest. It passes the try server.
Attachment #449458 - Attachment is obsolete: true
Attachment #450829 - Flags: review?(brendan)
Attachment #449458 - Flags: review?(brendan)
Brendan, could y(In reply to comment #214) > Created an attachment (id=450829) [details] Brendan, do you have a chance to review this before gal starts to change every line of jsgc.cpp?
Comment on attachment 450829 [details] [diff] [review] extra asserts to ensure proper begin/end request usage >+static JS_NEVER_INLINE size_t >+NativeFrameCleaner() >+{ >+ char buffer[1 << 16]; >+ memset(buffer, 0, sizeof buffer); >+ size_t count = 0; >+ for (size_t i = 0; i != sizeof buffer; ++i) { >+ if (buffer[i]) >+ count++; How can buffer[i] be non-zero here? >+ConservativeGCStackMarker::ConservativeGCStackMarker(JSTracer *trc) >+ : trc(trc) Nit: half-indent (two spaces) to the : in the superclass declaration. >+ if ((p & GC_CHUNK_MASK) >= GC_MARK_BITMAP_ARRAY_OFFSET) >+ RETURN(outside); Worth a separate stats counter for this "inside-outside" case? >+ while (cursor) { >+ /* If the cursor moves past the thing, its not in the freelist. */ s/its/it's/ >+ if (thing < cursor) >+ break; >+ /* If we find it on the freelist, its dead. */ Ditto. Could use a blank line before the comment. >+ /* >+ * We have now a valid pointer, that is either raw or tagged properly. >+ * Since we do not rely on the conservative scanning yet and assume that >+ * all the roots are precisely reported, any unmarked GC things here mean >+ * a leak. s/a leak/those things leaked/ >+void >+ConservativeGCStackMarker::markRange(jsuword *begin, jsuword *end) >+{ >+ JS_ASSERT(begin <= end); >+ if (history.initialized()) { >+ for (jsuword *i = begin; i != end; ++i) { >+ CONSERVATIVE_METER(stats.words++); >+ jsuword p = *i; >+ if (history.has(p)) >+ continue; >+ markWord(p); >+ >+ /* >+ * If adding the address to the hash table fails because we are >+ * out of memory, stack scanning slows down but is otherwise >+ * unaffected. >+ */ >+ history.put(p); Mild pref to test !history.has(p) and indent the prefetch-predicted-taken path instead of using continue. Unless the history.has(p) true result is the common case? JS_LIKELY could be helpful here, based on measurements. I see no stats, but it seems worth metering when history saves a redundant conservative mark. >+ /* >+ * For now we use the conservative stack scanner only in the check mode s/the check mode/as a sanity check,/ >+ * and mark conservatively after marking all other roots to detect s/and mark/so we mark/ Great to see this arrive. r=me with above dealt with. /be
Attachment #450829 - Flags: review?(brendan) → review+
(In reply to comment #216) > Mild pref to test !history.has(p) and indent the prefetch-predicted-taken path > instead of using continue. Unless the history.has(p) true result is the common > case? JS_LIKELY could be helpful here, based on measurements. I see no stats, > but it seems worth metering when history saves a redundant conservative mark. From the stats from comment 195 that are typical: number of stack words: 5549 number of unique words: 2445 This tells that history.has(p) in the loop would be more frequent. On the other hand JS_LIKELY should be avoided as history.has(p) would be true just slightly more often. > > >+ /* > >+ * For now we use the conservative stack scanner only in the check mode > > s/the check mode/as a sanity check,/ > > >+ * and mark conservatively after marking all other roots to detect > > s/and mark/so we mark/ > > Great to see this arrive. r=me with above dealt with. > > /be
Whiteboard: fixed-in-tracemonkey
Need a separate bug to enable this and let people start counting on it? /be
Blocks: 572057
Blocks: 572061
Blocks: 572063
(In reply to comment #219) > Need a separate bug to enable this and let people start counting on it? > > /be Lets wait some weeks to know for sure that we do not have some hard ti fix leaks.
We don't have the luxury of weeks. Gotta be fast now =)
http://hg.mozilla.org/tracemonkey/rev/210562a030b7 - followup to fix the compiler warning (and the bug ;) ) in testIsAboutToBeFinalized.
(In reply to comment #221) > We don't have the luxury of weeks. Gotta be fast now =) We can always remove the code. But for now I want to be sure that there are no request model violations. Those could disable the conservative GC. Now, the extra checks that the conservative GC has added are only implemented as asserts. But perhaps we should enable the corresponding asserts in the optimized builds as well to get better test coverage with nighties. They should not affect the performance.
Good idea enabling the asserts in product build nightlies. I am all for it. Extra coverage + feedback.
Depends on: 572077
http://hg.mozilla.org/tracemonkey/rev/43a1993e3369 - follow up to add few more asserts
Depends on: 572678
Depends on: 572839
Blocks: 574313
blocking2.0: beta1+ → beta2+
I am making bug 552266 to block this. With the changes there it is much easy to track and fix the request model violations.
Depends on: 552266
Depends on: 578233
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Depends on: 580578
Depends on: 581919
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: