Closed Bug 469044 Opened 16 years ago Closed 16 years ago

TM: Support code with type-unstable globals

Categories

(Core :: JavaScript Engine, defect, P1)

x86
macOS
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: gal, Assigned: gal)

References

Details

(Keywords: verified1.9.1, Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 26 obsolete files)

(deleted), patch
gal
: review+
Details | Diff | Splinter Review
No description provided.
Attached patch for review, untested (obsolete) (deleted) — Splinter Review
Assignee: general → gal
Attached patch v2, refreshed + fixes (obsolete) (deleted) — Splinter Review
runs trace.js now
Attachment #352454 - Attachment is obsolete: true
Attached patch v3, more fixes (obsolete) (deleted) — Splinter Review
fixed a bunch of allocation and typemap offset bugs and cleaned up some code. we pass a lot more trace-tests now, but still assert about halfway through.
Attachment #352607 - Attachment is obsolete: true
Attached patch v3 for real (obsolete) (deleted) — Splinter Review
helps to attach the right file.
Attachment #352644 - Attachment is obsolete: true
Blocks: 465915
Assignee: gal → danderson
Attached patch v4, more tests pass (obsolete) (deleted) — Splinter Review
More tests pass now, but hits an assert after the andTests: > Assertion failure: JSVAL_TAG(v) == JSVAL_STRING, at ../jstracer.cpp:1417 This patch also cleans up TreeInfo to have wrappers around the type map members.
Attachment #352645 - Attachment is obsolete: true
Attached patch v5, works (obsolete) (deleted) — Splinter Review
passes trace-tests after fixing a few remaining bugs. I will do performance testing shortly, a cursory run of fannkuch (my arch nemesis) shows the number of side exits grew, which might be explainable by the change in demotion logic.
Attachment #354872 - Attachment is obsolete: true
Comment on attachment 354911 [details] [diff] [review] v5, works >diff -r d0e8862aa513 js/src/jscntxt.h >--- a/js/src/jscntxt.h Mon Dec 29 23:22:23 2008 -0800 >+++ b/js/src/jscntxt.h Tue Dec 30 22:42:23 2008 -0500 >@@ -101,7 +101,6 @@ > class TraceRecorder; > extern "C++" { template<typename T> class Queue; } > typedef Queue<uint16> SlotList; >-class TypeMap; > > # define CLS(T) T* > #else >@@ -125,7 +124,6 @@ > CLS(TraceRecorder) recorder; > uint32 globalShape; > CLS(SlotList) globalSlots; >- CLS(TypeMap) globalTypeMap; > jsval *recoveryDoublePool; > jsval *recoveryDoublePoolPtr; > >diff -r d0e8862aa513 js/src/jstracer.cpp >--- a/js/src/jstracer.cpp Mon Dec 29 23:22:23 2008 -0800 >+++ b/js/src/jstracer.cpp Tue Dec 30 22:42:23 2008 -0500 >@@ -113,12 +113,6 @@ > > /* Max number of branches per tree. */ > #define MAX_BRANCHES 16 >- >-/* Macros for demote slot lists */ >-#define ALLOCA_UNDEMOTE_SLOTLIST(num) (unsigned*)alloca(((num) + 1) * sizeof(unsigned)) >-#define ADD_UNDEMOTE_SLOT(list, slot) list[++list[0]] = slot >-#define NUM_UNDEMOTE_SLOTS(list) list[0] >-#define CLEAR_UNDEMOTE_SLOTLIST(list) list[0] = 0 > > #ifdef JS_JIT_SPEW > #define ABORT_TRACE(msg) do { debug_only_v(fprintf(stdout, "abort: %d: %s\n", __LINE__, msg);) return false; } while (0) >@@ -906,8 +900,8 @@ > > #define FORALL_SLOTS(cx, ngslots, gslots, callDepth, code) \ > JS_BEGIN_MACRO \ >+ FORALL_SLOTS_IN_PENDING_FRAMES(cx, callDepth, code); \ > FORALL_GLOBAL_SLOTS(cx, ngslots, gslots, code); \ >- FORALL_SLOTS_IN_PENDING_FRAMES(cx, callDepth, code); \ > JS_END_MACRO > > /* Calculate the total number of native frame slots we need from this frame >@@ -944,14 +938,49 @@ > JS_NOT_REACHED("js_NativeStackSlots"); > } > >-/* Capture the type map for the selected slots of the global object. */ >-JS_REQUIRES_STACK void >-TypeMap::captureGlobalTypes(JSContext* cx, SlotList& slots) >+/* >+ * Capture the type map for the selected slots of the global object and currently pending >+ * stack frames. >+ */ >+JS_REQUIRES_STACK void >+TypeMap::captureTypes(JSContext* cx, SlotList& slots, unsigned callDepth) > { > unsigned ngslots = slots.length(); > uint16* gslots = slots.data(); >- setLength(ngslots); >+ setLength(js_NativeStackSlots(cx, callDepth) + ngslots); > uint8* map = data(); >+ uint8* m = map; >+ FORALL_SLOTS_IN_PENDING_FRAMES(cx, callDepth, >+ uint8 type = getCoercedType(*vp); >+ if ((type == JSVAL_INT) && oracle.isStackSlotUndemotable(cx, unsigned(m - map))) >+ type = JSVAL_DOUBLE; >+ JS_ASSERT(type != JSVAL_BOXED); >+ debug_only_v(printf("capture stack type %s%d: %d=%c\n", vpname, vpnum, type, typeChar[type]);) >+ JS_ASSERT(uintptr_t(m - map) < length()); >+ *m++ = type; >+ ); >+ FORALL_GLOBAL_SLOTS(cx, ngslots, gslots, >+ uint8 type = getCoercedType(*vp); >+ if ((type == JSVAL_INT) && oracle.isGlobalSlotUndemotable(cx, gslots[n])) >+ type = JSVAL_DOUBLE; >+ JS_ASSERT(type != JSVAL_BOXED); >+ debug_only_v(printf("capture global type %s%d: %d=%c\n", vpname, vpnum, type, typeChar[type]);) >+ JS_ASSERT(uintptr_t(m - map) < length()); >+ *m++ = type; >+ ); >+ JS_ASSERT(uintptr_t(m - map) == length()); >+} >+ >+JS_REQUIRES_STACK void >+TypeMap::captureMissingGlobalTypes(JSContext* cx, SlotList& slots, unsigned stackSlots) >+{ >+ unsigned oldSlots = length() - stackSlots; >+ int diff = slots.length() - oldSlots; >+ JS_ASSERT(diff >= 0); >+ unsigned ngslots = slots.length(); >+ uint16* gslots = slots.data(); >+ setLength(length() + diff); >+ uint8* map = data() + stackSlots; > uint8* m = map; > FORALL_GLOBAL_SLOTS(cx, ngslots, gslots, > uint8 type = getCoercedType(*vp); >@@ -959,25 +988,9 @@ > type = JSVAL_DOUBLE; > JS_ASSERT(type != JSVAL_BOXED); > debug_only_v(printf("capture global type %s%d: %d=%c\n", vpname, vpnum, type, typeChar[type]);) >- *m++ = type; >- ); >-} >- >-/* Capture the type map for the currently pending stack frames. */ >-JS_REQUIRES_STACK void >-TypeMap::captureStackTypes(JSContext* cx, unsigned callDepth) >-{ >- setLength(js_NativeStackSlots(cx, callDepth)); >- uint8* map = data(); >- uint8* m = map; >- FORALL_SLOTS_IN_PENDING_FRAMES(cx, callDepth, >- uint8 type = getCoercedType(*vp); >- if ((type == JSVAL_INT) && >- oracle.isStackSlotUndemotable(cx, unsigned(m - map))) { >- type = JSVAL_DOUBLE; >- } >- debug_only_v(printf("capture stack type %s%d: %d=%c\n", vpname, vpnum, type, typeChar[type]);) >- *m++ = type; >+ *m = type; >+ JS_ASSERT((m > map + oldSlots) || (*m == type)); >+ m++; > ); > } > >@@ -1008,7 +1021,7 @@ > > JS_REQUIRES_STACK > TraceRecorder::TraceRecorder(JSContext* cx, VMSideExit* _anchor, Fragment* _fragment, >- TreeInfo* ti, unsigned ngslots, uint8* globalTypeMap, uint8* stackTypeMap, >+ TreeInfo* ti, unsigned stackSlots, unsigned ngslots, uint8* typeMap, > VMSideExit* innermostNestedGuard, Fragment* outerToBlacklist) > { > JS_ASSERT(!_fragment->vmprivate && ti); >@@ -1061,7 +1074,7 @@ > eor_ins = addName(lir->insLoad(LIR_ldp, lirbuf->state, offsetof(InterpState, eor)), "eor"); > > /* read into registers all values on the stack and all globals we know so far */ >- import(treeInfo, lirbuf->sp, ngslots, callDepth, globalTypeMap, stackTypeMap); >+ import(treeInfo, lirbuf->sp, stackSlots, ngslots, callDepth, typeMap); > > #if defined(JS_HAS_OPERATION_COUNT) && !JS_HAS_OPERATION_COUNT > if (fragment == fragment->root) { >@@ -1588,8 +1601,8 @@ > } > > JS_REQUIRES_STACK void >-TraceRecorder::import(TreeInfo* treeInfo, LIns* sp, unsigned ngslots, unsigned callDepth, >- uint8* globalTypeMap, uint8* stackTypeMap) >+TraceRecorder::import(TreeInfo* treeInfo, LIns* sp, unsigned stackSlots, unsigned ngslots, >+ unsigned callDepth, uint8* typeMap) > { > /* If we get a partial list that doesn't have all the types (i.e. recording from a side > exit that was recorded but we added more global slots later), merge the missing types >@@ -1602,11 +1615,14 @@ > map. Since thats exactly what we used to fill in the types our current side exit > didn't provide, this is always safe to do. */ > unsigned length; >- if (ngslots < (length = traceMonitor->globalTypeMap->length())) >- mergeTypeMaps(&globalTypeMap, &ngslots, >- traceMonitor->globalTypeMap->data(), length, >+ /* This is potentially the typemap of the side exit and thus shorter than the tree's >+ global type map. */ >+ uint8* globalTypeMap = typeMap + stackSlots; >+ if (ngslots < (length = treeInfo->globalSlots())) >+ mergeTypeMaps(&globalTypeMap/*out param*/, &ngslots/*out param*/, >+ treeInfo->globalTypeMap(), length, > (uint8*)alloca(sizeof(uint8) * length)); >- JS_ASSERT(ngslots == traceMonitor->globalTypeMap->length()); >+ JS_ASSERT(ngslots == treeInfo->globalSlots()); > > /* the first time we compile a tree this will be empty as we add entries lazily */ > uint16* gslots = traceMonitor->globalSlots->data(); >@@ -1616,7 +1632,7 @@ > m++; > ); > ptrdiff_t offset = -treeInfo->nativeStackBase; >- m = stackTypeMap; >+ m = typeMap; > FORALL_SLOTS_IN_PENDING_FRAMES(cx, callDepth, > import(sp, offset, vp, *m, vpname, vpnum, fp); > m++; offset += sizeof(double); >@@ -1641,7 +1657,7 @@ > uint8 type = getCoercedType(*vp); > if ((type == JSVAL_INT) && oracle.isGlobalSlotUndemotable(cx, slot)) > type = JSVAL_DOUBLE; >- traceMonitor->globalTypeMap->add(type); >+ treeInfo->typeMap.add(type); > import(gp_ins, slot*sizeof(double), vp, type, "global", index, NULL); > return true; > } >@@ -1769,13 +1785,14 @@ > /* Promote slots if necessary to match the called tree' type map and report error if thats > impossible. */ > JS_REQUIRES_STACK bool >-TraceRecorder::adjustCallerTypes(Fragment* f, unsigned* demote_slots, bool& trash) >+TraceRecorder::adjustCallerTypes(Fragment* f, bool& trash) > { > JSTraceMonitor* tm = traceMonitor; >- uint8* m = tm->globalTypeMap->data(); >- uint16* gslots = traceMonitor->globalSlots->data(); >- unsigned ngslots = traceMonitor->globalSlots->length(); >- uint8* map = ((TreeInfo*)f->vmprivate)->stackTypeMap.data(); >+ uint8* m = treeInfo->globalTypeMap(); >+ uint16* gslots = tm->globalSlots->data(); >+ unsigned ngslots = tm->globalSlots->length(); >+ JS_ASSERT(ngslots == treeInfo->globalSlots()); >+ uint8* map = ((TreeInfo*)f->vmprivate)->typeMap.data(); > bool ok = true; > trash = false; > FORALL_GLOBAL_SLOTS(cx, ngslots, gslots, >@@ -1798,22 +1815,17 @@ > lir->insStorei(get(vp), lirbuf->sp, > -treeInfo->nativeStackBase + nativeStackOffset(vp)); > /* Aggressively undo speculation so the inner tree will compile if this fails. */ >- ADD_UNDEMOTE_SLOT(demote_slots, unsigned(m - map)); >+ oracle.markStackSlotUndemotable(cx, unsigned(m - map)); > } else if (!isPromote && *m == JSVAL_INT) { > debug_only_v(printf("adjusting will fail, %s%d, slot %d\n", vpname, vpnum, m - map);) > ok = false; >- ADD_UNDEMOTE_SLOT(demote_slots, unsigned(m - map)); >+ oracle.markStackSlotUndemotable(cx, unsigned(m - map)); > } else if (JSVAL_IS_INT(*vp) && *m == JSVAL_DOUBLE) { > /* Aggressively undo speculation so the inner tree will compile if this fails. */ >- ADD_UNDEMOTE_SLOT(demote_slots, unsigned(m - map)); >+ oracle.markStackSlotUndemotable(cx, unsigned(m - map)); > } > ++m; > ); >- /* If this isn't okay, tell the oracle. */ >- if (!ok) { >- for (unsigned i = 1; i <= NUM_UNDEMOTE_SLOTS(demote_slots); i++) >- oracle.markStackSlotUndemotable(cx, demote_slots[i]); >- } > JS_ASSERT(f == f->root); > return ok; > } >@@ -1925,7 +1937,7 @@ > the top of the stack is a boxed value. Either pc[-cs.length] is JSOP_NEXTITER and we > want one below top of stack, or else it's JSOP_CALL and we want top of stack. */ > if (resumeAfter) { >- m[(pc[-cs.length] == JSOP_NEXTITER) ? -2 : -1] = JSVAL_BOXED; >+ typemap[stackSlots + ((pc[-cs.length] == JSOP_NEXTITER) ? -2 : -1)] = JSVAL_BOXED; > > /* Now restore the the original pc (after which early returns are ok). */ > MUST_FLOW_LABEL(restore_pc); >@@ -1949,7 +1961,7 @@ > for (unsigned n = 0; n < nexits; ++n) { > VMSideExit* e = exits[n]; > if (e->ip_adj == ip_adj && >- !memcmp(getTypeMap(exits[n]), typemap, typemap_size)) { >+ !memcmp(getFullTypeMap(exits[n]), typemap, typemap_size)) { > LIns* data = lir_buf_writer->skip(sizeof(GuardRecord)); > GuardRecord* rec = (GuardRecord*)data->payload(); > /* setup guard record structure with shared side exit */ >@@ -1987,7 +1999,7 @@ > exit->ip_adj = ip_adj; > exit->sp_adj = (stackSlots * sizeof(double)) - treeInfo->nativeStackBase; > exit->rp_adj = exit->calldepth * sizeof(FrameInfo); >- memcpy(getTypeMap(exit), typemap, typemap_size); >+ memcpy(getFullTypeMap(exit), typemap, typemap_size); > > /* BIG FAT WARNING: If compilation fails, we currently don't reset the lirbuf so its safe > to keep references to the side exits here. If we ever start rewinding those lirbufs, >@@ -2088,85 +2100,80 @@ > * @param root_peer First fragment in peer list. > * @param stable_peer Outparam for first type stable peer. > * @param trash Whether to trash the tree (demotion). >- * @param demotes Array to store demotable stack slots. >+ * @param demote Tree if stability was achieved through demotion. True not tree. > * @return True if type stable, false otherwise. > */ > JS_REQUIRES_STACK bool >-TraceRecorder::deduceTypeStability(Fragment* root_peer, Fragment** stable_peer, unsigned* demotes) >+TraceRecorder::deduceTypeStability(Fragment* root_peer, Fragment** stable_peer, bool& demote) > { > uint8* m; > uint8* typemap; > unsigned ngslots = traceMonitor->globalSlots->length(); > uint16* gslots = traceMonitor->globalSlots->data(); >- JS_ASSERT(traceMonitor->globalTypeMap->length() == ngslots); >+ JS_ASSERT(ngslots == treeInfo->globalSlots()); > > if (stable_peer) > *stable_peer = NULL; > >- CLEAR_UNDEMOTE_SLOTLIST(demotes); >- >+ demote = false; >+ > /* > * Rather than calculate all of this stuff twice, it gets cached locally. The "stage" buffers > * are for calls to set() that will change the exit types. > */ > bool success; >- bool unstable_from_undemotes; > unsigned stage_count; >- jsval** stage_vals = (jsval**)alloca(sizeof(jsval*) * (ngslots + treeInfo->stackTypeMap.length())); >- LIns** stage_ins = (LIns**)alloca(sizeof(LIns*) * (ngslots + treeInfo->stackTypeMap.length())); >+ jsval** stage_vals = (jsval**)alloca(sizeof(jsval*) * (treeInfo->typeMap.length())); >+ LIns** stage_ins = (LIns**)alloca(sizeof(LIns*) * (treeInfo->typeMap.length())); > > /* First run through and see if we can close ourselves - best case! */ > stage_count = 0; > success = false; >- unstable_from_undemotes = false; > > debug_only_v(printf("Checking type stability against self=%p\n", fragment);) > >- m = typemap = traceMonitor->globalTypeMap->data(); >+ m = typemap = treeInfo->globalTypeMap(); > FORALL_GLOBAL_SLOTS(cx, ngslots, gslots, > debug_only_v(printf("%s%d ", vpname, vpnum);) > if (!checkType(*vp, *m, stage_vals[stage_count], stage_ins[stage_count], stage_count)) { > /* If the failure was an int->double, tell the oracle. */ >- if (*m == JSVAL_INT && isNumber(*vp) && !isPromoteInt(get(vp))) >+ if (*m == JSVAL_INT && isNumber(*vp) && !isPromoteInt(get(vp))) { > oracle.markGlobalSlotUndemotable(cx, gslots[n]); >- trashSelf = true; >- goto checktype_fail_1; >+ demote = true; >+ } else { >+ goto checktype_fail_1; >+ } > } > ++m; > ); >- m = typemap = treeInfo->stackTypeMap.data(); >+ m = typemap = treeInfo->stackTypeMap(); > FORALL_SLOTS_IN_PENDING_FRAMES(cx, 0, > debug_only_v(printf("%s%d ", vpname, vpnum);) > if (!checkType(*vp, *m, stage_vals[stage_count], stage_ins[stage_count], stage_count)) { >- if (*m == JSVAL_INT && isNumber(*vp) && !isPromoteInt(get(vp))) >- ADD_UNDEMOTE_SLOT(demotes, unsigned(m - typemap)); >- else >+ if (*m == JSVAL_INT && isNumber(*vp) && !isPromoteInt(get(vp))) { >+ oracle.markStackSlotUndemotable(cx, unsigned(m - typemap)); >+ demote = true; >+ } else { > goto checktype_fail_1; >+ } > } > ++m; > ); > >- /* >- * If there's an exit that's unstable because of undemotable slots, we want to search for >- * peers just in case we can make a connection. >- */ >- if (NUM_UNDEMOTE_SLOTS(demotes)) >- unstable_from_undemotes = true; >- else >- success = true; >+ success = true; > > checktype_fail_1: > /* If we got a success and we don't need to recompile, we should just close here. */ >- if (success) { >+ if (success && !demote) { > for (unsigned i = 0; i < stage_count; i++) > set(stage_vals[i], stage_ins[i]); > return true; > /* If we need to trash, don't bother checking peers. */ > } else if (trashSelf) { > return false; >- } else { >- CLEAR_UNDEMOTE_SLOTLIST(demotes); >- } >+ } >+ >+ demote = false; > > /* At this point the tree is about to be incomplete, so let's see if we can connect to any > * peer fragment that is type stable. >@@ -2179,17 +2186,25 @@ > continue; > ti = (TreeInfo*)f->vmprivate; > /* Don't allow varying stack depths */ >- if (ti->stackTypeMap.length() != treeInfo->stackTypeMap.length()) >+ if ((ti->stackSlots != treeInfo->stackSlots) || >+ (ti->typeMap.length() != treeInfo->typeMap.length())) > continue; > stage_count = 0; > success = false; >- m = ti->stackTypeMap.data(); >- >+ >+ m = ti->globalTypeMap(); >+ FORALL_GLOBAL_SLOTS(cx, traceMonitor->globalSlots->length(), traceMonitor->globalSlots->data(), >+ if (!checkType(*vp, *m, stage_vals[stage_count], stage_ins[stage_count], stage_count)) >+ goto checktype_fail_2; >+ ++m; >+ ); >+ >+ m = ti->stackTypeMap(); > FORALL_SLOTS_IN_PENDING_FRAMES(cx, 0, >- if (!checkType(*vp, *m, stage_vals[stage_count], stage_ins[stage_count], stage_count)) >- goto checktype_fail_2; >- ++m; >- ); >+ if (!checkType(*vp, *m, stage_vals[stage_count], stage_ins[stage_count], stage_count)) >+ goto checktype_fail_2; >+ ++m; >+ ); > > success = true; > >@@ -2203,33 +2218,49 @@ > set(stage_vals[i], stage_ins[i]); > if (stable_peer) > *stable_peer = f; >- return false; >- } >- } >- >- JS_ASSERT(NUM_UNDEMOTE_SLOTS(demotes) == 0); >+ demote = false; >+ return false; >+ } >+ } > > /* > * If this is a loop trace and it would be stable with demotions, build an undemote list > * and return true. Our caller should sniff this and trash the tree, recording a new one > * that will assumedly stabilize. > */ >- if (unstable_from_undemotes && fragment->kind == LoopTrace) { >- typemap = m = treeInfo->stackTypeMap.data(); >+ if (demote && fragment->kind == LoopTrace) { >+ typemap = m = treeInfo->globalTypeMap(); >+ FORALL_GLOBAL_SLOTS(cx, traceMonitor->globalSlots->length(), traceMonitor->globalSlots->data(), >+ if (*m == JSVAL_INT) { >+ JS_ASSERT(isNumber(*vp)); >+ if (!isPromoteInt(get(vp))) >+ oracle.markGlobalSlotUndemotable(cx, gslots[n]); >+ } else if (*m == JSVAL_DOUBLE) { >+ JS_ASSERT(isNumber(*vp)); >+ oracle.markGlobalSlotUndemotable(cx, gslots[n]); >+ } else { >+ JS_ASSERT(*m == JSVAL_TAG(*vp)); >+ } >+ m++; >+ ); >+ >+ typemap = m = treeInfo->stackTypeMap(); > FORALL_SLOTS_IN_PENDING_FRAMES(cx, 0, > if (*m == JSVAL_INT) { > JS_ASSERT(isNumber(*vp)); > if (!isPromoteInt(get(vp))) >- ADD_UNDEMOTE_SLOT(demotes, unsigned(m - typemap)); >+ oracle.markStackSlotUndemotable(cx, unsigned(m - typemap)); > } else if (*m == JSVAL_DOUBLE) { > JS_ASSERT(isNumber(*vp)); >- ADD_UNDEMOTE_SLOT(demotes, unsigned(m - typemap)); >+ oracle.markStackSlotUndemotable(cx, unsigned(m - typemap)); > } else { > JS_ASSERT((*m == JSVAL_TNULL) ? JSVAL_IS_NULL(*vp) : *m == JSVAL_TAG(*vp)); > } > m++; > ); > return true; >+ } else { >+ demote = false; > } > > return false; >@@ -2283,11 +2314,11 @@ > js_JoinPeersIfCompatible(Fragmento* frago, Fragment* stableFrag, TreeInfo* stableTree, > VMSideExit* exit) > { >- JS_ASSERT(exit->numStackSlots == stableTree->stackTypeMap.length()); >+ JS_ASSERT(exit->numStackSlots == stableTree->stackSlots); >+ > /* Must have a matching type unstable exit. */ >- if (memcmp(getTypeMap(exit) + exit->numGlobalSlots, >- stableTree->stackTypeMap.data(), >- stableTree->stackTypeMap.length()) != 0) { >+ if ((exit->numGlobalSlots + exit->numStackSlots != stableTree->typeMap.length()) || >+ memcmp(getFullTypeMap(exit), stableTree->typeMap.data(), stableTree->typeMap.length())) { > return false; > } > >@@ -2301,7 +2332,7 @@ > > /* Complete and compile a trace and link it to the existing tree if appropriate. */ > JS_REQUIRES_STACK bool >-TraceRecorder::closeLoop(Fragmento* fragmento, bool& demote, unsigned *demotes) >+TraceRecorder::closeLoop(Fragmento* fragmento, bool& demote) > { > bool stable; > LIns* exitIns; >@@ -2309,8 +2340,6 @@ > VMSideExit* exit; > Fragment* peer_root; > >- demote = false; >- > exitIns = snapshot(UNSTABLE_LOOP_EXIT); > exit = (VMSideExit*)((GuardRecord*)exitIns->payload())->exit; > >@@ -2321,14 +2350,14 @@ > return false; > } > >- JS_ASSERT(exit->numStackSlots == treeInfo->stackTypeMap.length()); >+ JS_ASSERT(exit->numStackSlots == treeInfo->stackSlots); > > peer_root = fragmento->getLoop(fragment->root->ip); > JS_ASSERT(peer_root != NULL); >- stable = deduceTypeStability(peer_root, &peer, demotes); >+ stable = deduceTypeStability(peer_root, &peer, demote); > > #if DEBUG >- if (!stable || NUM_UNDEMOTE_SLOTS(demotes)) >+ if (!stable) > AUDIT(unstableLoopVariable); > #endif > >@@ -2337,9 +2366,8 @@ > return false; > } > >- if (stable && NUM_UNDEMOTE_SLOTS(demotes)) { >+ if (stable && demote) { > JS_ASSERT(fragment->kind == LoopTrace); >- demote = true; > return false; > } > >@@ -2370,10 +2398,6 @@ > */ > if (walkedOutOfLoop()) > exit->ip_adj = terminate_ip_adj; >- >- /* If we were trying to stabilize a promotable tree, trash it. */ >- if (promotedPeer) >- js_TrashTree(cx, promotedPeer); > } else { > JS_ASSERT(peer->code()); > exit->target = peer; >@@ -2405,12 +2429,14 @@ > TraceRecorder::joinEdgesToEntry(Fragmento* fragmento, Fragment* peer_root) > { > if (fragment->kind == LoopTrace) { >+ JSTraceMonitor* tm = traceMonitor; > TreeInfo* ti; > Fragment* peer; > uint8* t1, *t2; > UnstableExit* uexit, **unext; >- >- unsigned* demotes = (unsigned*)alloca(treeInfo->stackTypeMap.length() * sizeof(unsigned)); >+ uint32* stackDemotes = (uint32*)alloca(sizeof(uint32) * treeInfo->stackSlots); >+ uint32* globalDemotes = (uint32*)alloca(sizeof(uint32) * treeInfo->globalSlots()); >+ > for (peer = peer_root; peer != NULL; peer = peer->peer) { > if (!peer->code()) > continue; >@@ -2428,20 +2454,34 @@ > This is actually faster than trashing the original tree as soon as the > instability is detected, since we could have compiled a fairly stable > tree that ran faster with integers. */ >- unsigned count = 0; >- t1 = treeInfo->stackTypeMap.data(); >- t2 = getTypeMap(uexit->exit) + uexit->exit->numGlobalSlots; >+ unsigned stackCount = 0; >+ unsigned globalCount = 0; >+ t1 = treeInfo->stackTypeMap(); >+ t2 = getStackTypeMap(uexit->exit); > for (unsigned i = 0; i < uexit->exit->numStackSlots; i++) { > if (t2[i] == JSVAL_INT && t1[i] == JSVAL_DOUBLE) { >- demotes[count++] = i; >+ stackDemotes[stackCount++] = i; > } else if (t2[i] != t1[i]) { >- count = 0; >+ stackCount = 0; > break; > } > } >- if (count) { >- for (unsigned i = 0; i < count; i++) >- oracle.markStackSlotUndemotable(cx, demotes[i]); >+ t1 = treeInfo->globalTypeMap(); >+ t2 = getGlobalTypeMap(uexit->exit); >+ for (unsigned i = 0; i < uexit->exit->numGlobalSlots; i++) { >+ if (t2[i] == JSVAL_INT && t1[i] == JSVAL_DOUBLE) { >+ globalDemotes[globalCount++] = i; >+ } else if (t2[i] != t1[i]) { >+ globalCount = 0; >+ stackCount = 0; >+ break; >+ } >+ } >+ if (stackCount || globalCount) { >+ for (unsigned i = 0; i < stackCount; i++) >+ oracle.markStackSlotUndemotable(cx, stackDemotes[i]); >+ for (unsigned i = 0; i < globalCount; i++) >+ oracle.markGlobalSlotUndemotable(cx, tm->globalSlots->data()[globalDemotes[i]]); > JS_ASSERT(peer == uexit->fragment->root); > if (fragment == peer) > trashSelf = true; >@@ -2541,8 +2581,8 @@ > LIns* args[] = { INS_CONSTPTR(inner), lirbuf->state }; /* reverse order */ > LIns* ret = lir->insCall(&js_CallTree_ci, args); > /* Read back all registers, in case the called tree changed any of them. */ >- import(ti, inner_sp_ins, exit->numGlobalSlots, exit->calldepth, >- getTypeMap(exit), getTypeMap(exit) + exit->numGlobalSlots); >+ import(ti, inner_sp_ins, exit->numStackSlots, exit->numGlobalSlots, >+ exit->calldepth, getFullTypeMap(exit)); > /* Restore sp and rp to their original values (we still have them in a register). */ > if (callDepth > 0) { > lir->insStorei(lirbuf->sp, lirbuf->state, offsetof(InterpState, sp)); >@@ -2736,7 +2776,7 @@ > > static JS_REQUIRES_STACK bool > js_StartRecorder(JSContext* cx, VMSideExit* anchor, Fragment* f, TreeInfo* ti, >- unsigned ngslots, uint8* globalTypeMap, uint8* stackTypeMap, >+ unsigned stackSlots, unsigned ngslots, uint8* typeMap, > VMSideExit* expectedInnerExit, Fragment* outer) > { > JSTraceMonitor* tm = &JS_TRACE_MONITOR(cx); >@@ -2752,7 +2792,7 @@ > > /* start recording if no exception during construction */ > tm->recorder = new (&gc) TraceRecorder(cx, anchor, f, ti, >- ngslots, globalTypeMap, stackTypeMap, >+ stackSlots, ngslots, typeMap, > expectedInnerExit, outer); > if (cx->throwing) { > js_AbortRecording(cx, "setting up recorder failed"); >@@ -2973,7 +3013,7 @@ > #endif > > JS_REQUIRES_STACK bool >-js_RecordTree(JSContext* cx, JSTraceMonitor* tm, Fragment* f, Fragment* outer, unsigned* demotes) >+js_RecordTree(JSContext* cx, JSTraceMonitor* tm, Fragment* f, Fragment* outer) > { > JS_ASSERT(f->root == f); > >@@ -2986,17 +3026,6 @@ > /* Make sure the global type map didn't change on us. */ > JSObject* globalObj = JS_GetGlobalForObject(cx, cx->fp->scopeChain); > if (!js_CheckGlobalObjectShape(cx, tm, globalObj)) { >- js_FlushJITCache(cx); >- return false; >- } >- TypeMap current; >- current.captureGlobalTypes(cx, *tm->globalSlots); >- if (!current.matches(*tm->globalTypeMap)) { >- debug_only_v(printf("Global type map mismatch in RecordTree, flushing cache.\n");) >- debug_only_v(printf("Current global type map:\n");) >- debug_only_v(js_dumpMap(current)); >- debug_only_v(printf("Cached global type map:\n");) >- debug_only_v(js_dumpMap(*tm->globalTypeMap)); > js_FlushJITCache(cx); > return false; > } >@@ -3030,18 +3059,10 @@ > /* setup the VM-private treeInfo structure for this fragment */ > TreeInfo* ti = new (&gc) TreeInfo(f); > >- /* capture the coerced type of each active slot in the stack type map */ >- ti->stackTypeMap.captureStackTypes(cx, 0/*callDepth*/); >- >- if (demotes) { >- /* If we get a list of demotions, an outer tree is telling us our types are not callable. */ >- uint8* typeMap = ti->stackTypeMap.data(); >- for (unsigned i = 1; i <= NUM_UNDEMOTE_SLOTS(demotes); i++) { >- JS_ASSERT(demotes[i] < ti->stackTypeMap.length()); >- if (typeMap[demotes[i]] == JSVAL_INT) >- typeMap[demotes[i]] = JSVAL_DOUBLE; >- } >- } >+ /* capture the coerced type of each active slot in the type map */ >+ SlotList& globalSlots = *tm->globalSlots; >+ ti->typeMap.captureTypes(cx, globalSlots, 0/*callDepth*/); >+ ti->stackSlots = ti->typeMap.length() - globalSlots.length(); > > /* Check for duplicate entry type maps. This is always wrong and hints at trace explosion > since we are trying to stabilize something without properly connecting peer edges. */ >@@ -3052,12 +3073,12 @@ > continue; > ti_other = (TreeInfo*)peer->vmprivate; > JS_ASSERT(ti_other); >- JS_ASSERT(!ti->stackTypeMap.matches(ti_other->stackTypeMap)); >+ JS_ASSERT(!ti->typeMap.matches(ti_other->typeMap)); > } > #endif > > /* determine the native frame layout at the entry point */ >- unsigned entryNativeStackSlots = ti->stackTypeMap.length(); >+ unsigned entryNativeStackSlots = ti->stackSlots; > JS_ASSERT(entryNativeStackSlots == js_NativeStackSlots(cx, 0/*callDepth*/)); > ti->nativeStackBase = (entryNativeStackSlots - > (cx->fp->regs->sp - StackBase(cx->fp))) * sizeof(double); >@@ -3067,8 +3088,9 @@ > > /* recording primary trace */ > if (!js_StartRecorder(cx, NULL, f, ti, >- tm->globalSlots->length(), tm->globalTypeMap->data(), >- ti->stackTypeMap.data(), NULL, outer)) { >+ ti->stackSlots, >+ tm->globalSlots->length(), >+ ti->typeMap.data(), NULL, outer)) { > return false; > } > >@@ -3080,28 +3102,22 @@ > { > JSTraceMonitor* tm = &JS_TRACE_MONITOR(cx); > Fragment* from = exit->from->root; >- unsigned* demotes; > > JS_ASSERT(exit->from->root->code()); >- >- demotes = ALLOCA_UNDEMOTE_SLOTLIST(exit->numStackSlots); >- CLEAR_UNDEMOTE_SLOTLIST(demotes); >- >- uint8* t2 = getTypeMap(exit) + exit->numGlobalSlots; >+ >+ /* Make sure any doubles are not accidentally undemoted */ >+ uint8* m = getStackTypeMap(exit); > for (unsigned i = 0; i < exit->numStackSlots; i++) { >- if (t2[i] == JSVAL_DOUBLE) >- ADD_UNDEMOTE_SLOT(demotes, i); >- } >- >- if (!NUM_UNDEMOTE_SLOTS(demotes)) >- demotes = NULL; >- >- if (!js_RecordTree(cx, tm, from->first, outer, demotes)) >- return false; >- >- tm->recorder->setPromotedPeer(demotes ? from : NULL); >- >- return true; >+ if (m[i] == JSVAL_DOUBLE) >+ oracle.markStackSlotUndemotable(cx, i); >+ } >+ m = getGlobalTypeMap(exit); >+ for (unsigned i = 0; i < exit->numGlobalSlots; i++) { >+ if (m[i] == JSVAL_DOUBLE) >+ oracle.markGlobalSlotUndemotable(cx, i); >+ } >+ >+ return js_RecordTree(cx, tm, from->first, outer); > } > > static JS_REQUIRES_STACK bool >@@ -3129,16 +3145,16 @@ > if (++c->hits() >= HOTEXIT) { > /* start tracing secondary trace from this point */ > c->lirbuf = f->lirbuf; >+ unsigned stackSlots; > unsigned ngslots; >- uint8* globalTypeMap; >- uint8* stackTypeMap; >+ uint8* typeMap; > TypeMap fullMap; > if (exitedFrom == NULL) { > /* If we are coming straight from a simple side exit, just use that exit's type map > as starting point. */ > ngslots = anchor->numGlobalSlots; >- globalTypeMap = getTypeMap(anchor); >- stackTypeMap = globalTypeMap + ngslots; >+ stackSlots = anchor->numStackSlots; >+ typeMap = getFullTypeMap(anchor); > } else { > /* If we side-exited on a loop exit and continue on a nesting guard, the nesting > guard (anchor) has the type information for everything below the current scope, >@@ -3146,14 +3162,15 @@ > scope (and whatever it inlined). We have to merge those maps here. */ > VMSideExit* e1 = anchor; > VMSideExit* e2 = exitedFrom; >- fullMap.add(getTypeMap(e1) + e1->numGlobalSlots, e1->numStackSlotsBelowCurrentFrame); >- fullMap.add(getTypeMap(e2) + e2->numGlobalSlots, e2->numStackSlots); >+ fullMap.add(getStackTypeMap(e1), e1->numStackSlotsBelowCurrentFrame); >+ fullMap.add(getStackTypeMap(e2), e2->numStackSlots); >+ stackSlots = fullMap.length(); >+ fullMap.add(getGlobalTypeMap(e2), e2->numGlobalSlots); > ngslots = e2->numGlobalSlots; >- globalTypeMap = getTypeMap(e2); >- stackTypeMap = fullMap.data(); >+ typeMap = fullMap.data(); > } >- return js_StartRecorder(cx, anchor, c, (TreeInfo*)f->vmprivate, >- ngslots, globalTypeMap, stackTypeMap, exitedFrom, outer); >+ return js_StartRecorder(cx, anchor, c, (TreeInfo*)f->vmprivate, stackSlots, >+ ngslots, typeMap, exitedFrom, outer); > } > return false; > } >@@ -3185,10 +3202,7 @@ > > bool demote; > Fragment* f = r->getFragment(); >- TreeInfo* ti = r->getTreeInfo(); >- unsigned* demotes = ALLOCA_UNDEMOTE_SLOTLIST(ti->stackTypeMap.length()); >- r->closeLoop(fragmento, demote, demotes); >- JS_ASSERT(!demote || NUM_UNDEMOTE_SLOTS(demotes)); >+ r->closeLoop(fragmento, demote); > js_DeleteRecorder(cx); > > /* >@@ -3196,7 +3210,7 @@ > * compiler again here since we didn't return to the loop header. > */ > if (demote && !walkedOutOfLoop) >- return js_RecordTree(cx, tm, f, NULL, demotes); >+ return js_RecordTree(cx, tm, f, NULL); > return false; > } > >@@ -3248,16 +3262,10 @@ > Fragment* empty; > bool trash = false; > bool success = false; >- unsigned* demotes = NULL; > > f = r->findNestedCompatiblePeer(f, &empty); >- if (f && f->code()) { >- TreeInfo* ti = (TreeInfo*)f->vmprivate; >- /* alloca docs says it lasts out of scopes. */ >- demotes = ALLOCA_UNDEMOTE_SLOTLIST(ti->stackTypeMap.length()); >- CLEAR_UNDEMOTE_SLOTLIST(demotes); >- success = r->adjustCallerTypes(f, demotes, trash); >- } >+ if (f && f->code()) >+ success = r->adjustCallerTypes(f, trash); > > if (!success) { > AUDIT(noCompatInnerTrees); >@@ -3277,7 +3285,7 @@ > if (old->recordAttempts < MAX_MISMATCH) > old->resetHits(); > f = empty ? empty : tm->fragmento->getAnchor(cx->fp->regs->pc); >- return js_RecordTree(cx, tm, f, old, demotes); >+ return js_RecordTree(cx, tm, f, old); > } > > r->prepareTreeCall(f); >@@ -3383,16 +3391,6 @@ > tm = &JS_TRACE_MONITOR(cx); > unsigned int ngslots = tm->globalSlots->length(); > uint16* gslots = tm->globalSlots->data(); >- uint8* m = tm->globalTypeMap->data(); >- >- if (ngslots) { >- FORALL_GLOBAL_SLOTS(cx, ngslots, gslots, >- debug_only_v(printf("%s%d=", vpname, vpnum);) >- if (!js_IsEntryTypeCompatible(vp, m)) >- return NULL; >- m++; >- ); >- } > > /* We keep a maximum tally - we want to select the peer most likely to work so we don't keep > * recording. >@@ -3409,11 +3407,15 @@ > > unsigned demotes = 0; > ti = (TreeInfo*)f->vmprivate; >- m = ti->stackTypeMap.data(); > > debug_only_v(printf("checking nested types %p: ", f);) > >- FORALL_SLOTS_IN_PENDING_FRAMES(cx, 0, >+ if (ngslots > ti->globalSlots()) >+ ti->typeMap.captureMissingGlobalTypes(cx, *tm->globalSlots, ti->stackSlots); >+ >+ uint8* m = ti->typeMap.data(); >+ >+ FORALL_SLOTS(cx, ngslots, gslots, 0, > debug_only_v(printf("%s%d=", vpname, vpnum);) > if (!js_IsEntryTypeCompatible(vp, m)) > goto check_fail; >@@ -3423,7 +3425,7 @@ > demotes++; > m++; > ); >- JS_ASSERT(unsigned(m - ti->stackTypeMap.data()) == ti->stackTypeMap.length()); >+ JS_ASSERT(unsigned(m - ti->typeMap.data()) == ti->typeMap.length()); > > debug_only_v(printf(" (demotes %d)\n", demotes);) > >@@ -3463,25 +3465,25 @@ > tm = &JS_TRACE_MONITOR(cx); > unsigned int ngslots = tm->globalSlots->length(); > uint16* gslots = tm->globalSlots->data(); >- uint8* m = tm->globalTypeMap->data(); >- >- if (ngslots) { >- FORALL_GLOBAL_SLOTS(cx, ngslots, gslots, >- debug_only_v(printf("%s%d=", vpname, vpnum);) >- if (!js_IsEntryTypeCompatible(vp, m)) >- goto check_fail; >- m++; >- ); >- } >- >- m = ti->stackTypeMap.data(); >- FORALL_SLOTS_IN_PENDING_FRAMES(cx, 0, >+ >+ JS_ASSERT(ti->stackSlots == js_NativeStackSlots(cx, 0)); >+ >+ if (ngslots > ti->globalSlots()) >+ ti->typeMap.captureMissingGlobalTypes(cx, *tm->globalSlots, ti->stackSlots); >+ >+ uint8* m = ti->typeMap.data(); >+ >+ JS_ASSERT(ti->typeMap.length() == js_NativeStackSlots(cx, 0) + ngslots); >+ JS_ASSERT(ti->typeMap.length() == ti->stackSlots + ngslots); >+ JS_ASSERT(ti->globalSlots() == ngslots); >+ FORALL_SLOTS(cx, ngslots, gslots, 0, > debug_only_v(printf("%s%d=", vpname, vpnum);) >+ JS_ASSERT(*m != 0xCD); > if (!js_IsEntryTypeCompatible(vp, m)) > goto check_fail; > m++; > ); >- JS_ASSERT(unsigned(m - ti->stackTypeMap.data()) == ti->stackTypeMap.length()); >+ JS_ASSERT(unsigned(m - ti->typeMap.data()) == ti->typeMap.length()); > > debug_only_v(printf("\n");) > return true; >@@ -3548,9 +3550,11 @@ > ti->maxNativeStackSlots, > f->code());) > >- if (ngslots) >- BuildNativeGlobalFrame(cx, ngslots, gslots, tm->globalTypeMap->data(), global); >- BuildNativeStackFrame(cx, 0/*callDepth*/, ti->stackTypeMap.data(), stack); >+ JS_ASSERT(ti->globalSlots() == ngslots); >+ >+ if (ngslots) >+ BuildNativeGlobalFrame(cx, ngslots, gslots, ti->globalTypeMap(), global); >+ BuildNativeStackFrame(cx, 0/*callDepth*/, ti->typeMap.data(), stack); > > double* entry_sp = &stack[ti->nativeStackBase/sizeof(double)]; > FrameInfo callstack_buffer[MAX_CALL_STACK_ENTRIES]; >@@ -3717,17 +3721,17 @@ > cycles)); > > /* If this trace is part of a tree, later branches might have added additional globals for >- with we don't have any type information available in the side exit. We merge in this >+ which we don't have any type information available in the side exit. We merge in this > information from the entry type-map. See also comment in the constructor of TraceRecorder > why this is always safe to do. */ > unsigned exit_gslots = innermost->numGlobalSlots; >- JS_ASSERT(ngslots == tm->globalTypeMap->length()); >+ JS_ASSERT(ngslots == ti->globalSlots()); > JS_ASSERT(ngslots >= exit_gslots); >- uint8* globalTypeMap = getTypeMap(innermost); >+ uint8* globalTypeMap = getGlobalTypeMap(innermost); > if (exit_gslots < ngslots) >- mergeTypeMaps(&globalTypeMap, &exit_gslots, tm->globalTypeMap->data(), ngslots, >+ mergeTypeMaps(&globalTypeMap, &exit_gslots, ti->globalTypeMap(), ngslots, > (uint8*)alloca(sizeof(uint8) * ngslots)); >- JS_ASSERT(exit_gslots == tm->globalTypeMap->length()); >+ JS_ASSERT(exit_gslots == ti->globalSlots()); > > /* write back interned globals */ > int slots = FlushNativeGlobalFrame(cx, exit_gslots, gslots, globalTypeMap, global); >@@ -3738,7 +3742,7 @@ > > /* write back native stack frame */ > slots = FlushNativeStackFrame(cx, innermost->calldepth, >- getTypeMap(innermost) + innermost->numGlobalSlots, >+ getStackTypeMap(innermost), > stack, NULL); > if (slots < 0) > return NULL; >@@ -3796,7 +3800,7 @@ > if (++f->hits() >= HOTLOOP) { > /* We can give RecordTree the root peer. If that peer is already taken, it will > walk the peer list and find us a free slot or allocate a new tree if needed. */ >- return js_RecordTree(cx, tm, f->first, NULL, NULL); >+ return js_RecordTree(cx, tm, f->first, NULL); > } > /* Threshold not reached yet. */ > return false; >@@ -4022,12 +4026,11 @@ > } > #endif > if (!tm->fragmento) { >- JS_ASSERT(!tm->globalSlots && !tm->globalTypeMap && !tm->recoveryDoublePool); >+ JS_ASSERT(!tm->globalSlots && !tm->recoveryDoublePool); > Fragmento* fragmento = new (&gc) Fragmento(core, 24); > verbose_only(fragmento->labels = new (&gc) LabelMap(core, NULL);) > tm->fragmento = fragmento; > tm->globalSlots = new (&gc) SlotList(); >- tm->globalTypeMap = new (&gc) TypeMap(); > tm->recoveryDoublePoolPtr = tm->recoveryDoublePool = new jsval[MAX_NATIVE_STACK_SLOTS]; > } > if (!tm->reFragmento) { >@@ -4059,14 +4062,12 @@ > } > #endif > if (tm->fragmento != NULL) { >- JS_ASSERT(tm->globalSlots && tm->globalTypeMap && tm->recoveryDoublePool); >+ JS_ASSERT(tm->globalSlots && tm->recoveryDoublePool); > verbose_only(delete tm->fragmento->labels;) > delete tm->fragmento; > tm->fragmento = NULL; > delete tm->globalSlots; > tm->globalSlots = NULL; >- delete tm->globalTypeMap; >- tm->globalTypeMap = NULL; > delete[] tm->recoveryDoublePool; > tm->recoveryDoublePool = tm->recoveryDoublePoolPtr = NULL; > } >@@ -4133,7 +4134,6 @@ > if (cx->fp) { > tm->globalShape = OBJ_SHAPE(JS_GetGlobalForObject(cx, cx->fp->scopeChain)); > tm->globalSlots->clear(); >- tm->globalTypeMap->clear(); > } > } > >@@ -8612,20 +8612,24 @@ > printf("fragment %p:\nENTRY: ", f); > ti = (TreeInfo*)f->vmprivate; > if (looped) >- JS_ASSERT(ti->stackTypeMap.length() == length); >- for (unsigned i = 0; i < ti->stackTypeMap.length(); i++) >- printf("%d ", ti->stackTypeMap.data()[i]); >+ JS_ASSERT(ti->typeMap.length() == length); >+ for (unsigned i = 0; i < ti->stackSlots; i++) >+ printf("S%d ", ti->stackTypeMap()[i]); >+ for (unsigned i = 0; i < ti->globalSlots(); i++) >+ printf("G%d ", ti->globalTypeMap()[i]); > printf("\n"); > UnstableExit* uexit = ti->unstableExits; > while (uexit != NULL) { > printf("EXIT: "); >- uint8* m = getTypeMap(uexit->exit) + uexit->exit->numGlobalSlots; >+ uint8* m = getFullTypeMap(uexit->exit); > for (unsigned i = 0; i < uexit->exit->numStackSlots; i++) >- printf("%d ", m[i]); >+ printf("S%d ", m[i]); >+ for (unsigned i = 0; i < uexit->exit->numGlobalSlots; i++) >+ printf("G%d ", m[uexit->exit->numStackSlots + i]); > printf("\n"); > uexit = uexit->next; > } >- length = ti->stackTypeMap.length(); >+ length = ti->typeMap.length(); > looped = true; > } > } >diff -r d0e8862aa513 js/src/jstracer.h >--- a/js/src/jstracer.h Mon Dec 29 23:22:23 2008 -0800 >+++ b/js/src/jstracer.h Tue Dec 30 22:42:23 2008 -0500 >@@ -66,6 +66,9 @@ > while (_max < size) > _max <<= 1; > _data = (T*)realloc(_data, _max * sizeof(T)); >+#if defined(DEBUG) >+ memset(&_data[_len], 0xcd, _max - _len); >+#endif > } > public: > Queue(unsigned max = 16) { >@@ -173,8 +176,10 @@ > > class TypeMap : public Queue<uint8> { > public: >- JS_REQUIRES_STACK void captureGlobalTypes(JSContext* cx, SlotList& slots); >- JS_REQUIRES_STACK void captureStackTypes(JSContext* cx, unsigned callDepth); >+ JS_REQUIRES_STACK void captureTypes(JSContext* cx, SlotList& slots, unsigned callDepth); >+ JS_REQUIRES_STACK void captureMissingGlobalTypes(JSContext* cx, >+ SlotList& slots, >+ unsigned stackSlots); Could be fewer lines too. > bool matches(TypeMap& other) const; > }; > >@@ -202,9 +207,19 @@ > ExitType exitType; > }; > >-static inline uint8* getTypeMap(nanojit::SideExit* exit) >+static inline uint8* getStackTypeMap(nanojit::SideExit* exit) > { > return (uint8*)(((VMSideExit*)exit) + 1); >+} >+ >+static inline uint8* getGlobalTypeMap(nanojit::SideExit* exit) >+{ >+ return getStackTypeMap(exit) + ((VMSideExit*)exit)->numStackSlots; >+} >+ >+static inline uint8* getFullTypeMap(nanojit::SideExit* exit) >+{ >+ return getStackTypeMap(exit); > } > > struct InterpState >@@ -235,7 +250,8 @@ > unsigned maxNativeStackSlots; > ptrdiff_t nativeStackBase; > unsigned maxCallDepth; >- TypeMap stackTypeMap; >+ TypeMap typeMap; >+ unsigned stackSlots; > Queue<nanojit::Fragment*> dependentTrees; > unsigned branchCount; > Queue<VMSideExit*> sideExits; >@@ -245,6 +261,16 @@ > fragment = _fragment; > } > ~TreeInfo(); >+ >+ inline unsigned globalSlots() { >+ return typeMap.length() - stackSlots; >+ } >+ inline uint8* globalTypeMap() { >+ return typeMap.data() + stackSlots; >+ } >+ inline uint8* stackTypeMap() { >+ return typeMap.data(); >+ } > }; > > struct FrameInfo { >@@ -304,7 +330,6 @@ > bool terminate; > intptr_t terminate_ip_adj; > nanojit::Fragment* outerToBlacklist; >- nanojit::Fragment* promotedPeer; > TraceRecorder* nextRecorderToAbort; > bool wasRootFragment; > >@@ -313,8 +338,8 @@ > JS_REQUIRES_STACK ptrdiff_t nativeStackOffset(jsval* p) const; > JS_REQUIRES_STACK void import(nanojit::LIns* base, ptrdiff_t offset, jsval* p, uint8& t, > const char *prefix, uintN index, JSStackFrame *fp); >- JS_REQUIRES_STACK void import(TreeInfo* treeInfo, nanojit::LIns* sp, unsigned ngslots, >- unsigned callDepth, uint8* globalTypeMap, uint8* stackTypeMap); >+ JS_REQUIRES_STACK void import(TreeInfo* treeInfo, nanojit::LIns* sp, unsigned stackSlots, >+ unsigned callDepth, unsigned ngslots, uint8* typeMap); > void trackNativeStackUse(unsigned slots); > > JS_REQUIRES_STACK bool lazilyImportGlobalSlot(unsigned slot); >@@ -332,7 +357,8 @@ > JS_REQUIRES_STACK bool checkType(jsval& v, uint8 t, jsval*& stage_val, > nanojit::LIns*& stage_ins, unsigned& stage_count); > JS_REQUIRES_STACK bool deduceTypeStability(nanojit::Fragment* root_peer, >- nanojit::Fragment** stable_peer, unsigned* demotes); >+ nanojit::Fragment** stable_peer, >+ bool& demote); Fits into the same line. > > JS_REQUIRES_STACK jsval& argval(unsigned n) const; > JS_REQUIRES_STACK jsval& varval(unsigned n) const; >@@ -437,7 +463,7 @@ > public: > JS_REQUIRES_STACK > TraceRecorder(JSContext* cx, VMSideExit*, nanojit::Fragment*, TreeInfo*, >- unsigned ngslots, uint8* globalTypeMap, uint8* stackTypeMap, >+ unsigned stackSlots, unsigned ngslots, uint8* typeMap, > VMSideExit* expectedInnerExit, nanojit::Fragment* outerToBlacklist); > ~TraceRecorder(); > >@@ -448,14 +474,12 @@ > nanojit::Fragment* getFragment() const { return fragment; } > JS_REQUIRES_STACK bool isLoopHeader(JSContext* cx) const; > JS_REQUIRES_STACK void compile(nanojit::Fragmento* fragmento); >- JS_REQUIRES_STACK bool closeLoop(nanojit::Fragmento* fragmento, bool& demote, >- unsigned *demotes); >+ JS_REQUIRES_STACK bool closeLoop(nanojit::Fragmento* fragmento, bool& demote); > JS_REQUIRES_STACK void endLoop(nanojit::Fragmento* fragmento); > JS_REQUIRES_STACK void joinEdgesToEntry(nanojit::Fragmento* fragmento, > nanojit::Fragment* peer_root); > void blacklist() { fragment->blacklist(); } >- JS_REQUIRES_STACK bool adjustCallerTypes(nanojit::Fragment* f, unsigned* demote_slots, >- bool& trash); >+ JS_REQUIRES_STACK bool adjustCallerTypes(nanojit::Fragment* f, bool& trash); > JS_REQUIRES_STACK nanojit::Fragment* findNestedCompatiblePeer(nanojit::Fragment* f, > nanojit::Fragment** empty); > JS_REQUIRES_STACK void prepareTreeCall(nanojit::Fragment* inner); >@@ -477,7 +501,6 @@ > void deepAbort() { deepAborted = true; } > bool wasDeepAborted() { return deepAborted; } > bool walkedOutOfLoop() { return terminate; } >- void setPromotedPeer(nanojit::Fragment* peer) { promotedPeer = peer; } > TreeInfo* getTreeInfo() { return treeInfo; } > > #define OPDEF(op,val,name,token,length,nuses,ndefs,prec,format) \
Attachment #354911 - Flags: review+
Attached patch refreshed, updated trace-test.js (obsolete) (deleted) — Splinter Review
fasta gets 10ms faster. fannkuch gets 10ms slower. bench gets 100ms slower on my machine, but time.sh doesn't show any differences that add up to such a big penalty. someone else might want to give sunspider a run with this patch, I don't have any other computers to play with for a few days.
Attachment #354911 - Attachment is obsolete: true
When run consecutively in one test run, I get the following results without the patch: 207:src gal$ ./bench2.sh t/3d-cube.js 44 t/3d-morph.js 31 t/3d-raytrace.js 83 t/access-binary-trees.js 40 t/access-fannkuch.js 57 t/access-nbody.js 30 t/access-nsieve.js 13 t/bitops-3bit-bits-in-byte.js 2 t/bitops-bits-in-byte.js 8 t/bitops-bitwise-and.js 2 t/bitops-nsieve-bits.js 26 t/controlflow-recursive.js 33 t/crypto-aes.js 39 t/crypto-md5.js 20 t/crypto-sha1.js 6 t/date-format-tofte.js 119 t/date-format-xparb.js 103 t/math-cordic.js 19 t/math-partial-sums.js 15 t/math-spectral-norm.js 8 t/regexp-dna.js 77 t/string-base64.js 17 t/string-fasta.js 77 t/string-tagcloud.js 119 t/string-unpack-code.js 135 t/string-validate-input.js 37 And this is with: 207:src gal$ ./bench2.sh t/3d-cube.js 48 t/3d-morph.js 31 t/3d-raytrace.js 84 t/access-binary-trees.js 40 t/access-fannkuch.js 56 t/access-nbody.js 24 t/access-nsieve.js 12 t/bitops-3bit-bits-in-byte.js 2 t/bitops-bits-in-byte.js 13 t/bitops-bitwise-and.js 3 t/bitops-nsieve-bits.js 25 t/controlflow-recursive.js 35 t/crypto-aes.js 40 t/crypto-md5.js 19 t/crypto-sha1.js 9 t/date-format-tofte.js 121 t/date-format-xparb.js 102 t/math-cordic.js 24 t/math-partial-sums.js 15 t/math-spectral-norm.js 7 t/regexp-dna.js 76 t/string-base64.js 16 t/string-fasta.js 141 t/string-tagcloud.js 118 t/string-unpack-code.js 129 t/string-validate-input.js 38 fasta gets 70ms slower. It doesn't happen when fasta is run separately.
This is the modified bench script I used: #!/bin/bash X="var d = Date.now();"; for i in t/*.js; do X="$X load(\"$i\"); var d2 = Date.now(); print(\"$i \" + (d2 - d)); d = d2; "; done # X="$X print(Date.now() - d);" echo $X | (./Darwin_OPT.OBJ/js -j || ./Linux_All_OPT.OBJ/js -j)
This is not a gc issue. Triggering a gc in between each load does not fix the performance regression observed in fasta. t/3d-cube.js 46 before 1167360, after 45056, break 01800000 t/3d-morph.js 39 before 1740800, after 49152, break 01800000 t/3d-raytrace.js 87 before 1376256, after 65536, break 01800000 t/access-binary-trees.js 43 before 1454080, after 77824, break 01800000 t/access-fannkuch.js 65 before 81920, after 77824, break 01800000 t/access-nbody.js 32 before 2830336, after 81920, break 01800000 t/access-nsieve.js 14 before 81920, after 81920, break 01800000 t/bitops-3bit-bits-in-byte.js 2 before 81920, after 81920, break 01800000 t/bitops-bits-in-byte.js 8 before 81920, after 81920, break 01800000 t/bitops-bitwise-and.js 3 before 81920, after 81920, break 01800000 t/bitops-nsieve-bits.js 25 before 1302528, after 86016, break 01800000 t/controlflow-recursive.js 42 before 86016, after 86016, break 01800000 t/crypto-aes.js 46 before 815104, after 90112, break 01800000 t/crypto-md5.js 29 before 118784, after 98304, break 01800000 t/crypto-sha1.js 7 before 196608, after 102400, break 01800000 t/date-format-tofte.js 117 before 1613824, after 106496, break 01800000 t/date-format-xparb.js 107 before 1970176, after 110592, break 01800000 t/math-cordic.js 32 before 110592, after 106496, break 01800000 t/math-partial-sums.js 17 before 106496, after 106496, break 01800000 t/math-spectral-norm.js 7 before 122880, after 110592, break 01800000 t/regexp-dna.js 80 before 110652, after 110592, break 01800000 t/string-base64.js 18 before 692224, after 114688, break 01800000 t/string-fasta.js 143 before 3666028, after 118784, break 01800000 t/string-tagcloud.js 126 before 1097792, after 126976, break 01800000 t/string-unpack-code.js 146 before 1159168, after 139264, break 01800000 t/string-validate-input.js 47 before 929792, after 143360, break 01800000
Patch is out of date again. Will hunt down danderson and re-test this with the most recent ebp perf issue fixed.
I tried to refresh but the patch also fails trace-tests.js now.
Flags: blocking1.9.1?
Attached patch v7, refresh + fix (obsolete) (deleted) — Splinter Review
Refreshed again. Also fixed a bug where when extending a branch, the TreeInfo didn't get its type map extended for missing globals. I still get about a 100ms slowdown on my laptop when running SS through bench.sh or in the browser, but not when running tests individually. This might suggest that globals are poisoning subsequent tests somehow. Also, it might be a good idea to introduce a peer cap alongside (or after) this patch. Somewhere in trace-tests (mandlebrot, probably) I noticed a huge amount of spew from just trying to find compatible peers in lists containing 10+ fragments. Yuck.
Attachment #355141 - Attachment is obsolete: true
I narrowed the regression down to string-fasta running after regexp-dna. The side exits increase from 79 to around 87,000. fasta has a main work loop (actually, a nested loop) that bangs on a table each iteration. This main loop is run twice, each time with a different input table. The tables have different shapes. Before this patch, the second run of the main loop would hit a global type map mismatch, flush the cache, and the loop would be recompiled with the new shape. This patch removes the cache flushing and the loop never gets recompiled. Every single iteration hits a shape mismatch guard, causing the perf regression. Changing the shape guards to BRANCH_EXIT instead of MISMATCH_EXIT brings the perf difference to a ~20ms loss.
Attachment #356388 - Flags: review?(gal) → review+
blocks a blocker
Flags: blocking1.9.1? → blocking1.9.1+
This patch is almost ready. We hit a chain of new bugs along the way. Those are fixed. Currently being tested. As long the other stuff goes green this will go in next.
Attached patch v8, fixes multitrees regression (obsolete) (deleted) — Splinter Review
Running mochitests exposed a bug where unstable exits were not joined to peers if the exit had an incomplete global typemap. This patch extends js_AttemptToStabilizeTree to check for and resolve this condition. Mochitests passes now. I'll re-run sunspider in the browser tomorrow to see what the current damage is.
Attachment #356388 - Attachment is obsolete: true
Attachment #356693 - Flags: review?(gal)
Comment on attachment 356693 [details] [diff] [review] v8, fixes multitrees regression I see 20ms perf loss.
Attachment #356693 - Flags: review?(gal) → review+
(function() { for (var j = 0; j < 4; ++j) { for each (e in ['A', 1, 'A']) { } } })(); Assertion failure: !ti->typeMap.matches(ti_other->typeMap), at ../jstracer.cpp:3075
I think this interacts with bug 472528 in bad ways.
var g; (function() { for (let j = 0; j < 4; ++j) { g = 64; for each (let a in [.3, .3]) { g /= 2; print("" + (g)); } } })(); On the last iteration of the outer loop, outputs "0" twice instead of "32" and "16". I can't figure out how to create a testcase that doesn't use print() :(
Nice, looks like there's more work to do here.
Attached patch v9, fixes comment #21 (obsolete) (deleted) — Splinter Review
This was another multitrees regression, this time from oracle misuse. Still working on the correctness bug in comment #23.
Attachment #356693 - Attachment is obsolete: true
David is work on the fuzzer bugs, I am taking a look at the perf loss.
TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: *1.016x as slow* 1063.5ms +/- 0.3% 1080.0ms +/- 0.2% significant ============================================================================= 3d: *1.045x as slow* 150.4ms +/- 0.7% 157.2ms +/- 0.6% significant cube: ?? 40.4ms +/- 2.8% 41.7ms +/- 2.3% not conclusive: might be *1.032x as slow* morph: - 30.4ms +/- 1.2% 30.2ms +/- 1.0% raytrace: *1.072x as slow* 79.6ms +/- 0.6% 85.3ms +/- 0.4% significant access: ?? 132.1ms +/- 0.4% 132.6ms +/- 0.6% not conclusive: might be *1.004x as slow* binary-trees: - 40.5ms +/- 0.9% 40.2ms +/- 0.7% fannkuch: *1.009x as slow* 55.4ms +/- 0.7% 55.9ms +/- 0.4% significant nbody: ?? 23.7ms +/- 1.5% 24.1ms +/- 0.9% not conclusive: might be *1.017x as slow* nsieve: - 12.5ms +/- 3.0% 12.4ms +/- 4.0% bitops: *1.033x as slow* 36.9ms +/- 1.4% 38.1ms +/- 1.9% significant 3bit-bits-in-byte: ?? 1.4ms +/- 26.4% 1.6ms +/- 23.1% not conclusive: might be *1.143x as slow* bits-in-byte: *1.135x as slow* 9.6ms +/- 3.8% 10.9ms +/- 2.1% significant bitwise-and: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% nsieve-bits: - 23.9ms +/- 1.7% 23.6ms +/- 1.6% controlflow: ?? 32.6ms +/- 1.1% 32.9ms +/- 0.7% not conclusive: might be *1.009x as slow* recursive: ?? 32.6ms +/- 1.1% 32.9ms +/- 0.7% not conclusive: might be *1.009x as slow* crypto: *1.015x as slow* 60.4ms +/- 0.8% 61.3ms +/- 0.6% significant aes: *1.020x as slow* 34.6ms +/- 1.1% 35.3ms +/- 1.0% significant md5: ?? 19.1ms +/- 1.2% 19.4ms +/- 1.9% not conclusive: might be *1.016x as slow* sha1: - 6.7ms +/- 5.2% 6.6ms +/- 5.6% date: ?? 202.4ms +/- 0.9% 204.2ms +/- 0.3% not conclusive: might be *1.009x as slow* format-tofte: *1.019x as slow* 99.5ms +/- 0.5% 101.4ms +/- 0.4% significant format-xparb: - 102.9ms +/- 1.5% 102.8ms +/- 0.3% math: - 39.8ms +/- 1.1% 39.5ms +/- 1.3% cordic: *1.021x as slow* 19.1ms +/- 1.2% 19.5ms +/- 1.9% significant partial-sums: - 13.8ms +/- 2.2% 13.8ms +/- 2.2% spectral-norm: 1.113x as fast 6.9ms +/- 3.3% 6.2ms +/- 4.9% significant regexp: ?? 55.2ms +/- 0.5% 55.4ms +/- 0.7% not conclusive: might be *1.004x as slow* dna: ?? 55.2ms +/- 0.5% 55.4ms +/- 0.7% not conclusive: might be *1.004x as slow* string: *1.014x as slow* 353.7ms +/- 0.3% 358.8ms +/- 0.3% significant base64: ?? 15.5ms +/- 2.4% 15.7ms +/- 2.2% not conclusive: might be *1.013x as slow* fasta: *1.023x as slow* 70.9ms +/- 0.3% 72.5ms +/- 0.5% significant tagcloud: ?? 105.8ms +/- 1.1% 106.4ms +/- 0.3% not conclusive: might be *1.006x as slow* unpack-code: *1.014x as slow* 130.4ms +/- 0.4% 132.2ms +/- 0.3% significant validate-input: *1.029x as slow* 31.1ms +/- 0.7% 32.0ms +/- 0.0% significant
Raytrace is the only concrete slowdown I can observe after repeated runs. The rest is pretty noisy. I will look into this one. raytrace: *1.073x as slow* 79.6ms +/- 0.6% 85.4ms +/- 0.6% significant
Well this probably explains it: whale:src gal$ ./Darwin_DBG.OBJ/js -j t/3d-raytrace.js Assertion failure: !ti->typeMap.matches(ti_other->typeMap), at ../jstracer.cpp:3075
Attached patch v10, fixes comment #23 and comment #29 (obsolete) (deleted) — Splinter Review
comment #23 exposed that adjustCallerTypes was reading from the wrong typemap for globals. comment #29 was caused by an aggressive check in js_AttemptStabilizeTree that ended up being wrong.
Attachment #357028 - Attachment is obsolete: true
Attached patch v11, fast on fasta again (obsolete) (deleted) — Splinter Review
The code to build nested trees was busted for cases where a global integer slot needed to be a double. I don't get any regressions using time.sh now, but bench.sh is once again not liking this patch so I will reduce the suite to see what's causing problems.
Attachment #357085 - Attachment is obsolete: true
Priority: -- → P1
fasta is the culprit. fasta: *3.55x as slow* 70.9ms +/- 0.3% 251.6ms +/- 0.4% significant
fasta went from: recorder: started(12), aborted(2), completed(17), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(2), breaks(3), returns(0), unstableInnerCalls(1) monitor: triggered(77), exits(77), type mismatch(0), global mismatch(3) to: recorder: started(16), aborted(7), completed(23), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(3), breaks(5), returns(0), unstableInnerCalls(5) monitor: triggered(28), exits(28), type mismatch(0), global mismatch(3) Looks like we abort a bunch of traces and execute less in native code.
All the other scores look good. Its just fasta thats broken atm as far as I can tell.
(In reply to comment #34) > All the other scores look good. Its just fasta thats broken atm as far as I can > tell. I'm confused. Attachment 357175 [details] [diff] (v11) claims to be fast on fasta again. Is this not the case?
for #35: We fixed a mild regression in fasta (20ms), exposing a hefty regression (300ms). One step ahead, one mile back. Working on it ...
This is the minimum code to trigger the slowdown: load("t/regexp-dna.js"); load("t/string-fasta.js"); Commenting out loading regexp makes fasta run in 70ms. With regexp its 260ms.
Attached file test case for the slowdown (obsolete) (deleted) —
var seqs = [1,2,3,4,5]; for (j in seqs) ; This code in front of fasta triggers the slowdown.
This seems to be sufficient as well: for (x in []) ;
Debug build doesn't seem to show the slowdown, but we compile more with the for-in loop that without (despite the empty loop). whale:src gal$ ./Darwin_DBG.OBJ/js -j x.js 128 recorder: started(16), aborted(7), completed(23), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(3), breaks(5), returns(0), unstableInnerCalls(5) monitor: triggered(28), exits(28), type mismatch(0), global mismatch(3) whale:src gal$ ./Darwin_DBG.OBJ/js -j x.js 131 recorder: started(17), aborted(7), completed(24), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(3), breaks(6), returns(0), unstableInnerCalls(5) monitor: triggered(31), exits(31), type mismatch(0), global mismatch(2)
The following line in front of fasta is sufficient: StopIteration;
Attached file simplified testcase (obsolete) (deleted) —
Attachment #357876 - Attachment is obsolete: true
Scary stuff. In OPT we hit js_ExecuteTree 56000 times with the 'StopIteration' in front of fasta. In DBG this doesn't happen. whale:Darwin_OPT.OBJ gal$ ./js -j -e 'StopIteration;' ../t/string-fasta.js execute trees: 56000 whale:Darwin_OPT.OBJ gal$ ./js -j ../t/string-fasta.js execute trees: 28 whale:Darwin_DBG.OBJ gal$ ./js -j -e 'StopIteration;' ../t/string-fasta.js execute trees: 31 recorder: started(17), aborted(7), completed(24), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(3), breaks(6), returns(0), unstableInnerCalls(5) monitor: triggered(31), exits(31), type mismatch(0), global mismatch(2) whale:Darwin_DBG.OBJ gal$ ./js -j ../t/string-fasta.js execute trees: 28 recorder: started(16), aborted(7), completed(23), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(3), breaks(5), returns(0), unstableInnerCalls(5) monitor: triggered(28), exits(28), type mismatch(0), global mismatch(3)
./js -j -e 'StopIteration;' ../t/string-fasta.js && ./js -j ../t/string-fasta.js Counting # of times deduceTypeStability sets demote to false (debug): 18 18 Same for opt: 36 18 Very sketchy.
This is the case that slows down run on OPT vs DBG: --- /tmp/2 2009-01-20 19:24:38.000000000 -0800 +++ /tmp/1 2009-01-20 19:24:34.000000000 -0800 @@ -1,64 +1,178 @@ -recording starting from ../t/string-fasta.js:47@14, global shape=188 +recording starting from ../t/string-fasta.js:47@14, global shape=179 closeLoop demote=0 -recording starting from ../t/string-fasta.js:48@42, global shape=188 +recording starting from ../t/string-fasta.js:48@42, global shape=179 closeLoop demote=0 -recording starting from ../t/string-fasta.js:47@14, global shape=188 +recording starting from ../t/string-fasta.js:47@14, global shape=179 closeLoop demote=0 -recording starting from ../t/string-fasta.js:48@42, global shape=188 +recording starting from ../t/string-fasta.js:48@42, global shape=179 closeLoop demote=0 -recording starting from ../t/string-fasta.js:47@20, global shape=188 +recording starting from ../t/string-fasta.js:47@20, global shape=179 Abort. -recording starting from ../t/string-fasta.js:38@13, global shape=188 +recording starting from ../t/string-fasta.js:38@13, global shape=179 closeLoop demote=0 -recording starting from ../t/string-fasta.js:67@76, global shape=188 +recording starting from ../t/string-fasta.js:67@76, global shape=179 EndLoop -recording starting from ../t/string-fasta.js:66@57, global shape=188 +recording starting from ../t/string-fasta.js:66@57, global shape=179 Abort. -recording starting from ../t/string-fasta.js:67@76, global shape=188 +recording starting from ../t/string-fasta.js:67@76, global shape=179 closeLoop demote=0 -recording starting from ../t/string-fasta.js:68@89, global shape=188 +recording starting from ../t/string-fasta.js:68@89, global shape=179 closeLoop demote=0 -recording starting from ../t/string-fasta.js:66@57, global shape=188 +recording starting from ../t/string-fasta.js:66@57, global shape=179 Abort. Here the two go out of sync. One records line 68, the other line 67. -recording starting from ../t/string-fasta.js:68@89, global shape=188 +recording starting from ../t/string-fasta.js:67@76, global shape=179 +closeLoop demote=0 +recording starting from ../t/string-fasta.js:66@57, global shape=179 +Abort. +recording starting from ../t/string-fasta.js:67@76, global shape=179 EndLoop 67: for (var c in table) { 68: if (r < table[c]) { line[i] = c; break; } } It seems the OPT build exits on a branch exit, whereas DEBUG compiles a loop.
entering trace at ../t/string-fasta.js:47@14, native stack slots: 14 code: 0x24aea0 leaving trace at ../t/string-fasta.js:48@42, op=, lr=0x24920c, exitType=0, sp=2, calldepth=0 entering trace at ../t/string-fasta.js:47@14, native stack slots: 14 code: 0x24aba0 leaving trace at ../t/string-fasta.js:48@42, op=, lr=0x249df0, exitType=0, sp=2, calldepth=0 entering trace at ../t/string-fasta.js:47@14, native stack slots: 14 code: 0x24aba0 leaving trace at ../t/string-fasta.js:47@20, op=, lr=0x249cdc, exitType=0, sp=2, calldepth=0 entering trace at ../t/string-fasta.js:38@13, native stack slots: 13 code: 0x24a7c0 leaving trace at ../t/string-fasta.js:40@48, op=, lr=0x47e664, exitType=3, sp=3, calldepth=0 entering trace at ../t/string-fasta.js:67@76, native stack slots: 14 code: 0x24a6b0 leaving trace at ../t/string-fasta.js:68@89, op=, lr=0x47ea24, exitType=0, sp=4, calldepth=0 entering trace at ../t/string-fasta.js:67@76, native stack slots: 14 code: 0x24a6b0 leaving trace at ../t/string-fasta.js:70@111, op=, lr=0x47eb5c, exitType=1, sp=2, calldepth=0 entering trace at ../t/string-fasta.js:67@76, native stack slots: 14 code: 0x24a6b0 leaving trace at ../t/string-fasta.js:70@111, op=, lr=0x47eb5c, exitType=1, sp=2, calldepth=0 entering trace at ../t/string-fasta.js:67@76, native stack slots: 14 code: 0x24a520 leaving trace at ../t/string-fasta.js:68@89, op=, lr=0x47d31c, exitType=0, sp=4, calldepth=0 The OPT build exists with this side exit instead: leaving trace at ../t/string-fasta.js:70@111, op=, lr=0x1d8b5c, exitType=1, sp=2, calldepth=0
00000: defvar "last" 00003: defvar "A" 00006: defvar "C" 00009: defvar "M" 00012: deffun function rand(max) {last = (last * A + C) % M;return max * last / M;} 00015: defvar "ALU" 00018: defvar "IUB" 00021: defvar "HomoSap" 00024: deffun function makeCumulative(table) {var last = null;for (var c in table) {if (last) {table[c] += table[last];}last = c;}} 00027: deffun function fastaRepeat(n, seq) {var seqi = 0, lenOut = 60;while (n > 0) {if (n < lenOut) {lenOut = n;}if (seqi + lenOut < seq.length) {ret = seq.substring(seqi, seqi + lenOut);seqi += lenOut;} else {var s = seq.substring(seqi);seqi = lenOut - s.length;ret = s + seq.substring(0, seqi);}n -= lenOut;}} 00030: deffun function fastaRandom(n, table) {var line = new Array(60);makeCumulative(table);while (n > 0) {if (n < line.length) {line = new Array(n);}for (var i = 0; i < line.length; i++) {var r = rand(1);for (var c in table) {if (r < table[c]) {line[i] = c;break;}}}ret = line.join("");n -= line.length;}} 00033: defvar "ret" 00036: defvar "count" main: 00039: int8 42 00041: setgvar "last" 00044: pop 00045: uint16 3877 00048: setgvar "A" 00051: pop 00052: uint16 29573 00055: setgvar "C" 00058: pop 00059: uint24 139968 00063: setgvar "M" 00066: pop 00067: nop 00068: string "GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCGGATCACCTGAGGTCAGGAGTTCGAGACCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGGAGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAA" 00071: setgvar "ALU" 00074: pop 00075: newinit 1 00077: double 0.27 00080: initprop "a" 00083: double 0.12 00086: initprop "c" 00089: double 0.12 00092: initprop "g" 00095: double 0.27 00098: initprop "t" 00101: double 0.02 00104: initprop "B" 00107: double 0.02 00110: initprop "D" 00113: double 0.02 00116: initprop "H" 00119: double 0.02 00122: initprop "K" 00125: double 0.02 00128: initprop "M" 00131: double 0.02 00134: initprop "N" 00137: double 0.02 00140: initprop "R" 00143: double 0.02 00146: initprop "S" 00149: double 0.02 00152: initprop "V" 00155: double 0.02 00158: initprop "W" 00161: double 0.02 00164: initprop "Y" 00167: endinit 00168: setgvar "IUB" 00171: pop 00172: newinit 1 00174: double 0.302954942668 00177: initprop "a" 00180: double 0.1979883004921 00183: initprop "c" 00186: double 0.1975473066391 00189: initprop "g" 00192: double 0.3015094502008 00195: initprop "t" 00198: endinit 00199: setgvar "HomoSap" 00202: pop 00203: nop 00204: nop 00205: nop 00206: getgvar "ret" 00209: pop 00210: int8 7 00212: setgvar "count" 00215: pop 00216: callgvar "fastaRepeat" 00219: int8 2 00221: getgvar "count" 00224: mul 00225: uint24 100000 00229: mul 00230: getgvar "ALU" 00233: call 2 00236: setgvar "ret" 00239: pop 00240: callgvar "fastaRandom" 00243: int8 3 00245: getgvar "count" 00248: mul 00249: uint16 1000 00252: mul 00253: getgvar "IUB" 00256: call 2 00259: setgvar "ret" 00262: pop 00263: callgvar "fastaRandom" 00266: int8 5 00268: getgvar "count" 00271: mul 00272: uint16 1000 00275: mul 00276: getgvar "HomoSap" 00279: call 2 00282: setgvar "ret" 00285: pop 00286: stop Source notes: 0: 0 [ 0] setline lineno 6 2: 15 [ 15] xdelta 3: 15 [ 0] setline lineno 13 5: 18 [ 3] setline lineno 22 7: 21 [ 3] setline lineno 29 9: 33 [ 12] xdelta 10: 33 [ 0] setline lineno 79 12: 36 [ 3] setline lineno 81 14: 39 [ 3] setline lineno 1 16: 39 [ 0] setline lineno 6 18: 41 [ 2] decl offset 0 20: 44 [ 3] pcdelta offset 7 22: 51 [ 7] pcdelta offset 7 24: 58 [ 7] pcdelta offset 8 26: 67 [ 9] xdelta 27: 67 [ 0] setline lineno 8 29: 67 [ 0] funcdef function 0 (function rand(max) {last = (last * A + C) % M;return max * last / M;}) 31: 68 [ 1] setline lineno 13 33: 68 [ 0] newline 34: 71 [ 3] decl offset 0 36: 75 [ 4] setline lineno 22 38: 77 [ 2] newline 39: 101 [ 24] xdelta 40: 101 [ 0] newline 41: 125 [ 24] xdelta 42: 125 [ 0] newline 43: 149 [ 24] xdelta 44: 149 [ 0] newline 45: 168 [ 19] xdelta 46: 168 [ 0] decl offset 0 48: 172 [ 4] setline lineno 29 50: 174 [ 2] newline 51: 180 [ 6] newline 52: 186 [ 6] newline 53: 192 [ 6] newline 54: 199 [ 7] decl offset 0 56: 203 [ 4] setline lineno 36 58: 203 [ 0] funcdef function 1 (function makeCumulative(table) {var last = null;for (var c in table) {if (last) {table[c] += table[last];}last = c;}}) 60: 204 [ 1] setline lineno 44 62: 204 [ 0] funcdef function 2 (function fastaRepeat(n, seq) {var seqi = 0, lenOut = 60;while (n > 0) {if (n < lenOut) {lenOut = n;}if (seqi + lenOut < seq.length) {ret = seq.substring(seqi, seqi + lenOut);seqi += lenOut;} else {var s = seq.substring(seqi);seqi = lenOut - s.length;ret = s + seq.substring(0, seqi);}n -= lenOut;}}) 64: 205 [ 1] setline lineno 60 66: 205 [ 0] funcdef function 3 (function fastaRandom(n, table) {var line = new Array(60);makeCumulative(table);while (n > 0) {if (n < line.length) {line = new Array(n);}for (var i = 0; i < line.length; i++) {var r = rand(1);for (var c in table) {if (r < table[c]) {line[i] = c;break;}}}ret = line.join("");n -= line.length;}}) 68: 206 [ 1] setline lineno 79 70: 206 [ 0] decl offset 0 72: 210 [ 4] setline lineno 81 74: 212 [ 2] decl offset 0 76: 216 [ 4] newline 77: 233 [ 17] xdelta 78: 233 [ 0] pcbase offset 17 80: 240 [ 7] newline 81: 256 [ 16] xdelta 82: 256 [ 0] pcbase offset 16 84: 263 [ 7] newline 85: 279 [ 16] xdelta 86: 279 [ 0] pcbase offset 16
Offsets are totally out of whack. I will try to catch both versions in gdb.
Attachment #357882 - Attachment is obsolete: true
Attachment #358035 - Attachment is obsolete: true
Attached file another reduction (obsolete) (deleted) —
Attachment #358040 - Attachment is obsolete: true
Attached patch patch with debug output for graydon (obsolete) (deleted) — Splinter Review
Attached file more reduction (obsolete) (deleted) —
Attachment #358043 - Attachment is obsolete: true
Attached patch only 1 global variable left (obsolete) (deleted) — Splinter Review
Attachment #358060 - Attachment is obsolete: true
DBG: recording completed at fasta.js:15@170 via closeLoop O/O S/S S/S D/D O/O I/I entering trace at fasta.js:15@170, native stack slots: 6 code: 0x24af20 leaving trace at fasta.js:17@194, op=, lr=0x24b2d0, exitType=1, sp=2, calldepth=0 O/O S/S S/S D/D O/O I/I entering trace at fasta.js:15@170, native stack slots: 6 code: 0x24af20 leaving trace at fasta.js:17@194, op=, lr=0x24b2d0, exitType=1, sp=2, calldepth=0 capture global type global0: 4=S capture global type global1: 2=D capture global type global2: 0=O capture global type global3: 2=D recording starting from fasta.js:13@128 OPT: recording completed at fasta.js:15@170 via closeLoop O/O S/S S/S D/D O/O I/I entering trace at fasta.js:15@170, native stack slots: 6 code: 0x14ef20 leaving trace at fasta.js:17@194, op=, lr=0x14f2d0, exitType=1, sp=2, calldepth=0 O/O S/S S/S D/D O/O I/I entering trace at fasta.js:15@170, native stack slots: 6 code: 0x14ef20 leaving trace at fasta.js:17@194, op=, lr=0x14f2d0, exitType=1, sp=2, calldepth=0 capture global type 0: 4=S capture global type 0: 2=D capture global type 0: 0=O capture global type 0: 1=I recording starting from fasta.js:13@128 global3 is D for DBG and I for OPT. This is the first divergence I could make out.
Bug found. vpnum is used incorrectly in an OPT build instead n. Patch to follow.
This patch fixes the last perf regression. We are up to parity with tip.
Attachment #357175 - Attachment is obsolete: true
Attachment #358045 - Attachment is obsolete: true
Attachment #358067 - Attachment is obsolete: true
Attached patch v11 rebased for tip (obsolete) (deleted) — Splinter Review
Assignee: danderson → gal
Attachment #358103 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attached patch v12, fixed gslots[vpnum] => gslots[n] (obsolete) (deleted) — Splinter Review
Attachment #358104 - Attachment is obsolete: true
Comment on attachment 358105 [details] [diff] [review] v12, fixed gslots[vpnum] => gslots[n] David isn't around, so asking for review from Brendan for the vpnum->n change only (the rest is reviewed).
Attachment #358105 - Flags: review?(brendan)
Comment on attachment 358105 [details] [diff] [review] v12, fixed gslots[vpnum] => gslots[n] >+TraceRecorder::import(TreeInfo* treeInfo, LIns* sp, unsigned stackSlots, unsigned ngslots, >+ unsigned callDepth, uint8* typeMap) > { > /* If we get a partial list that doesn't have all the types (i.e. recording from a side > exit that was recorded but we added more global slots later), merge the missing types > from the entry type map. This is safe because at the loop edge we verify that we > have compatible types for all globals (entry type and loop edge type match). While > a different trace of the tree might have had a guard with a different type map for > these slots we just filled in here (the guard we continue from didn't know about them), > since we didn't take that particular guard the only way we could have ended up here > is if that other trace had at its end a compatible type distribution with the entry > map. Since thats exactly what we used to fill in the types our current side exit > didn't provide, this is always safe to do. */ > unsigned length; >- if (ngslots < (length = traceMonitor->globalTypeMap->length())) >- mergeTypeMaps(&globalTypeMap, &ngslots, >- traceMonitor->globalTypeMap->data(), length, >+ /* This is potentially the typemap of the side exit and thus shorter than the tree's >+ global type map. */ >+ uint8* globalTypeMap = typeMap + stackSlots; >+ if (ngslots < (length = treeInfo->globalSlots())) Nit: this is too crunched vertically, a blank line before any comment that consumes one or more lines is better. Nit: length is declared early and then assigned in a nested assignment on the right of < in an if condition. Much more readable to pull that assignment out and make it the initializer for the length declaration. >+ UnstableExit* uexit; >+ UnstableExit** tail = &ti->unstableExits; >+ for (uexit = ti->unstableExits; uexit != NULL; uexit = uexit->next) { >+ if (uexit->exit == exit) { >+ *tail = uexit->next; >+ bound = true; >+ break; >+ } >+ tail = &uexit->next; >+ } >+ JS_ASSERT(bound); >+ } No uexit usage after the for loop, so move its decl into the first part of the for-head. r=me with these, and let's have that followup bug to catch future opt-build vpnum/vpname uses! Have fun storming the TinderboxCastle! /be
Attachment #358105 - Flags: review?(brendan) → review+
Attached patch v13, with brendan's nits (obsolete) (deleted) — Splinter Review
Attachment #358105 - Attachment is obsolete: true
Comment on attachment 358109 [details] [diff] [review] v13, with brendan's nits Wahhh. (Andreas got the message in person. ;-) /be
for (var i = 0; i < 2; ++i) { for (var e = 0; e < 2; ++e) { } var c = void 0; print("---"); for (var a = 0; a < 3; ++a) { c <<= c; print("" + c); } } In builds without this patch, this outputs only zeros. But in builds with v12, I get "8 2048 2048" on the second time through the outer loop.
Here's a version of comment 66 that is more likely to work as an automated test: var A = []; for (var i = 0; i < 2; ++i) { var c = void 0; print; for (var a = 0; a < 3; ++a) { c <<= c; A.push(c); } } print(uneval(A));
This bug involves lazy resolution. If you put print("") on top of the test case the bug goes away: print(""); for (var i = 0; i < 2; ++i) { for (var e = 0; e < 2; ++e) { } var c = void 0; for (var a = 0; a < 3; ++a) { c <<= c; print("" + c); } print(c); } Adding Math; right around print triggers it again: print(""); for (var i = 0; i < 2; ++i) { for (var e = 0; e < 2; ++e) { } var c = void 0; for (var a = 0; a < 3; ++a) { c <<= c; Math; print("" + c); } print(c); }
Attached patch v15, fixes comment #66 (deleted) — Splinter Review
captureMissingGlobalTypes was overwriting all types, not just the missing types. review on interdiff requested
Attachment #358111 - Attachment is obsolete: true
Attachment #358140 - Flags: review?(gal)
Assignee: gal → danderson
Attachment #358140 - Flags: review?(gal) → review+
Comment on attachment 358140 [details] [diff] [review] v15, fixes comment #66 Lets land it.
Pushed as changeset: http://hg.mozilla.org/tracemonkey/rev/186e782c623d Please backout at the slightest sign of tinderbox sadness - I'll be around tomorrow to debug.
Assignee: danderson → gal
Our tree is currently totally busted. Looks like massive tinderbox failure on a patch that previously cycled. Did I ever voice my opinion regarding our build infrastructure?
Looks like the tinderboxes liked the patch. Victory at last!
Whiteboard: fixed-in-tracemonkey
Depends on: 474769
As suggested by shaver, making vpnum a large number to catch future accidental uses of it in OPT builds. Reverting gslots[n] to gslots[vpnum] yields the following result when running the latest testcase: Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x8020a140 0x0009b344 in TraceRecorder::adjustCallerTypes (this=0x20c970, f=0x0) at ../jstracer.cpp:1812 1812 FORALL_GLOBAL_SLOTS(cx, ngslots, gslots, (gdb) bt #0 0x0009b344 in TraceRecorder::adjustCallerTypes (this=0x20c970, f=0x0) at ../jstracer.cpp:1812 #1 0x0009fcd7 in js_RecordLoopEdge (cx=0x20a830, r=0x20c970, inlineCallCount=@0xbffff65c) at ../jstracer.cpp:3341 #2 0x0009feda in js_MonitorLoopEdge (cx=0x20a830, inlineCallCount=@0xbffff65c) at ../jstracer.cpp:3848 #3 0x0003b029 in js_Interpret (cx=0x20a830) at ../jsinterp.cpp:3095 #4 0x0003ed03 in js_Execute (cx=0x20a830, chain=0x153000, script=0x20c3c0, down=0x0, flags=0, result=0x0) at jsinterp.cpp:1564 #5 0x0000c7b8 in JS_ExecuteScript (cx=0x20a830, obj=0x0, script=0x0, rval=0x0) at ../jsapi.cpp:5083 #6 0x00006cc9 in Process () #7 0x0000725a in main () Pushed to TM: http://hg.mozilla.org/tracemonkey/rev/5dbe936e4f76
Depends on: 474835
Depends on: 474951
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Depends on: 475658
Whiteboard: fixed-in-tracemonkey → fixed-in-tracemonkey [needs 191 landing]
Keywords: fixed1.9.1
Whiteboard: fixed-in-tracemonkey [needs 191 landing] → fixed-in-tracemonkey
js1_5/Regress/regress-469044.js js1_8_1/trace/trace-test.js v 1.9.1, 1.9.2
Status: RESOLVED → VERIFIED
No longer blocks: 476871
Depends on: 476871
Depends on: 557647
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: