Closed Bug 504462 Opened 15 years ago Closed 15 years ago

Merge tamarin's CodeAlloc into tracemonkey

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
status1.9.2 --- beta1-fixed

People

(Reporter: graydon, Assigned: graydon)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files, 10 obsolete files)

Attached patch wip (obsolete) (deleted) — Splinter Review
Port the tamarin bug 490380. And any dependencies I discover for it. (Currently needs to integrate, I've learned, the patch in 457960 as well. Oops.)
Attached patch wip2 (obsolete) (deleted) — Splinter Review
Further work on this, fold in Allocator.{cpp,h}, move closer to the TR tip. Still not compiling, definitely not working.
Attachment #388832 - Attachment is obsolete: true
Attached patch wip3 (obsolete) (deleted) — Splinter Review
Attachment #389237 - Attachment is obsolete: true
Compiling on x86, rebased to today's TM tip, passing most tests; some drift in the trace stats. Will investigate (and tidy, and get ARM working, and update to new 'chunk' naming scheme visible in bug 504607, and and and...)
Depends on: 504607
couple small things i noticed: CodeAlloc.cpp:42 the commented out #ifdef DOPROF is a switch to enable/disable the functionality in ../vprof/vprof.h. I suggest leaving it in and commented out, maybe add a comment: "uncomment this to enable vprof macros" LIR.cpp: generally Is missing some changes I made in response to njn's comments. i reckon they'll show up after another round of 3-way merging. avmplus.cpp:190 need to add a null check and longjmp to somewhere that invokes ~Allocator() (probably stating the obvious since this is w.i.p.)
Attached patch wip4: closer, closer (obsolete) (deleted) — Splinter Review
This variant: - Adopts the relevant bits of code from bug 504607 (I think, might be missing some). - Fixes a bug in Nativei386.cpp underrunProtect, which was not sensitive to whether or not it's _inExit (how could this have worked?) - Works around a but I don't understand in CodeAlloc wherein the block->end fields are getting all wrong, so the ::size() calculation is similarly wrong, and we can't tell how full our code cache is. Workaround is to wire all blocks to 4096 bytes, which (I think) they are anyways. Total kludge. - Implements a per-allocator SPI on the tracemonkey side via an ugly but functional static downcast on 'this', in the Allocator::allocChunk and Allocator::freeChunk methods. - Passes all trace-tests in debug and opt builds on linux-x86. - Boots the browser and runs a few nontrivial things (eg. animated raytracer). - Crashes on v8 due to (ta-da!) some CodeAlloc pointer corruption. See above wrt. block->end being wrong. Possibly some other stuff. Getting slightly annoyed at how long all this is taking.
Attachment #389592 - Attachment is obsolete: true
Attached patch working on x86 (obsolete) (deleted) — Splinter Review
This version is actually at the point of working on x86 (passing all trace tests, v8 tests, sunspider, large alexa pageload tests, etc.) I am not implying that it's "done" and I haven't done close performance comparisons or tuning yet, but I'd like some preliminary review. It's a big patch and will take a couple times around the block to land. edwsmith, rreitmai, dvander, anyone else familiar with the allocation system(s) in nanojit, please chime in with any comments if you've time. I'll be try-servering this and working on the non-x86 ports in the meantime, looking at perf, and trying to figure out if the crash I saw during a pageload test was real or related to this or what.
Attachment #389825 - Attachment is obsolete: true
Attachment #390126 - Flags: review?(gal)
Attached patch Minor updates to work on win32 (obsolete) (deleted) — Splinter Review
Couple of compilation bugs, no major changes.
Attachment #390126 - Attachment is obsolete: true
Attachment #390263 - Flags: review?(gal)
Attachment #390126 - Flags: review?(gal)
Attached patch performance and correctness updates (obsolete) (deleted) — Splinter Review
This variant adjusts the size calculation on Allocator to be constant time and includes new code from Edwin to fix a bug in handling larger-than-1-page allocations in CodeAlloc (which we do, for performance). Both these get us to a (very minor) net win on sunspider, on mac, against tracemonkey branch as of 1/2 hour ago. About 0.3% faster.
Attachment #390263 - Attachment is obsolete: true
Attachment #390381 - Flags: review?(gal)
Attachment #390263 - Flags: review?(gal)
Attached patch now with arm fixes (obsolete) (deleted) — Splinter Review
This variant at least builds and runs a little ways on ARM, but I'm clearly making a mistake in the underrunProtect function, as we wind up with too-close _nIns and _nSlot. I'll poke around a bit more after lunch.
Attachment #390381 - Attachment is obsolete: true
Attachment #390381 - Flags: review?(gal)
Attached patch This variant passes tests on ARM and x86 (obsolete) (deleted) — Splinter Review
I believe I'm in the home stretch for this patch now. Try-servering as we speak. Could actually use real review this time.
Attachment #390521 - Attachment is obsolete: true
Attachment #390945 - Flags: review?(gal)
Reviewing now.
Comment on attachment 390945 [details] [diff] [review] This variant passes tests on ARM and x86 >+ CLS(VMAllocator) reAllocator; >+ CLS(nanojit::CodeAlloc) reCodeAlloc; Do we have a bug already on merging these with the main allocator? > size_t i; > for (i = 0; i < node->u.flat.length - 1; i += 2) { >- if (fragment->lirbuf->outOMem()) >+ if (JS_TRACE_MONITOR(cx).reAllocator->outOfMemory()) > return 0; Pull a JSTraceMonitor* tm out of the loop? Or even better the VMAllocator*. > for (size_t i = 0; i < node->u.flat.length; i++) { >- if (fragment->lirbuf->outOMem()) >+ if (JS_TRACE_MONITOR(cx).reAllocator->outOfMemory()) Dito. >- if (fragment->lirbuf->outOMem()) >+ if (JS_TRACE_MONITOR(cx).reAllocator->outOfMemory()) Dito. >- if (fragment->lirbuf->outOMem()) >+ if (JS_TRACE_MONITOR(cx).reAllocator->outOfMemory()) > return NULL; Dito. > /* >- * You might imagine the outOMem flag on the lirbuf is sufficient >+ * You might imagine the outOfMemory flag on the allocator is sufficient ♥ outOMem->outOfMemory >- for (uint32 i = 0; i < argc && !lirbuf->outOMem(); i++) { >+ for (uint32 i = 0; i < argc && !traceMonitor->allocator->outOfMemory(); i++) { Pointer to allocator please. >+ uintptr_t mReserve[0x10000]; Why 0x10000? (not saying its bad, just curious) > _unused += sizeof(LInsSk); > _stats.lir++; We should probably not collect stats in OPT builds. >+ // the chunk, which would break everything. >+ if (_unused >= _limit) { >+ // Check we used exactly the remaining space >+ NanoAssert(_unused == _limit); >+ uintptr_t addrOfLastLInsOnChunk = _unused - sizeof(LIns); >+ moveToNewChunk(addrOfLastLInsOnChunk); This logic is really weird, having to check explicitly that we used all. >+ // FIXME: restate these in a useful way. >+ // NanoAssert(prevLInsAddr >= pageDataStart(prevLInsAddr)); >+ // NanoAssert(samepage(prevLInsAddr, insSk)); Not too worried about this FIXME. skip should go away soon. >+ /** each chunk is just a raw area of LIns instances, with no header >+ and no more than 8-byte alignment. The chunk size is somewhat arbitrary >+ as long as it's well larger than 2*sizeof(LInsSk) */ >+ static const size_t CHUNK_SZB = 8000; >+ >+ /** the first instruction on a chunk is always a start instruction, or a >+ * payload-less skip instruction linking to the previous chunk. The biggest >+ * possible instruction would take up the entire rest of the chunk. */ >+ static const size_t MAX_LINS_SZB = CHUNK_SZB - sizeof(LInsSk); >+ >+ /** the maximum skip payload size is determined by the maximum instruction >+ * size. We require that a skip's payload be adjacent to the skip LIns >+ * itself. */ >+ static const size_t MAX_SKIP_PAYLOAD_SZB = MAX_LINS_SZB - sizeof(LInsSk); >+ Random rant: Every comment in NJ is formated differently. > protected: >- Page* pageAlloc(); >- void moveToNewPage(uintptr_t addrOfLastLInsOnCurrentPage); >+ friend class LirBufWriter; > >- PageList _pages; >- Page* _nextPage; // allocated in preperation of a needing to growing the buffer >- uintptr_t _unused; // next unused instruction slot >- int _noMem; // set if ran out of memory when writing to buffer >+ /** get CHUNK_SZB more memory for LIR instructions */ >+ void chunkAlloc(); >+ void moveToNewChunk(uintptr_t addrOfLastLInsOnCurrentChunk); >+ >+ Allocator& _allocator; >+ uintptr_t _unused; // next unused instruction slot in the current LIR chunk >+ uintptr_t _limit; // one past the last usable byte of the current LIR chunk >+ size_t _bytesAllocated; > }; > > class LirBufWriter : public LirWriter > { > DWB(LirBuffer*) _buf; // underlying buffer housing the instructions > > public: > LirBufWriter(LirBuffer* buf) >@@ -1167,17 +1176,17 @@ namespace nanojit > } > void setpos(LIns *i) { > _i = i; > } > }; > > class Assembler; > >- void compile(Assembler *assm, Fragment *frag); >+ void compile(Fragmento *frago, Assembler *assm, Fragment *frag); > verbose_only(void live(GC *gc, LirBuffer *lirbuf);) > > class StackFilter: public LirFilter > { > GC *gc; > LirBuffer *lirbuf; > LInsp sp; > avmplus::BitSet stk; >diff --git a/js/src/nanojit/NativeARM.cpp b/js/src/nanojit/NativeARM.cpp >--- a/js/src/nanojit/NativeARM.cpp >+++ b/js/src/nanojit/NativeARM.cpp >@@ -531,17 +531,17 @@ Assembler::nFragExit(LInsp guard) > asm_ld_imm(R2, int(gr)); > > // Set the jmp pointer to the start of the sequence so that patched > // branches can skip the LDi sequence. > gr->jmp = _nIns; > } > > #ifdef NJ_VERBOSE >- if (_frago->core()->config.show_stats) { >+ if (config.show_stats) { > // load R1 with Fragment *fromFrag, target fragment > // will make use of this when calling fragenter(). > int fromfrag = int((Fragment*)_thisfrag); > asm_ld_imm(argRegs[1], fromfrag); > } > #endif > > // Pop the stack frame. >@@ -808,42 +808,16 @@ Assembler::asm_call(LInsp ins) > r = UnknownReg; > } > #endif > > asm_arg(sz, arg, r, stkd); > } > } > >-void >-Assembler::nMarkExecute(Page* page, int flags) >-{ >- NanoAssert(sizeof(Page) == NJ_PAGE_SIZE); >-#ifdef UNDER_CE >- static const DWORD kProtFlags[4] = { >- PAGE_READONLY, // 0 >- PAGE_READWRITE, // PAGE_WRITE >- PAGE_EXECUTE_READ, // PAGE_EXEC >- PAGE_EXECUTE_READWRITE // PAGE_EXEC|PAGE_WRITE >- }; >- DWORD prot = kProtFlags[flags & (PAGE_WRITE|PAGE_EXEC)]; >- DWORD dwOld; >- BOOL res = VirtualProtect(page, NJ_PAGE_SIZE, prot, &dwOld); >- if (!res) >- { >- // todo: we can't abort or assert here, we have to fail gracefully. >- NanoAssertMsg(false, "FATAL ERROR: VirtualProtect() failed\n"); >- } >-#endif >-#ifdef AVMPLUS_PORTING_API >- NanoJIT_PortAPI_MarkExecutable(page, (void*)((char*)page+NJ_PAGE_SIZE), flags); >- // todo, must add error-handling to the portapi >-#endif >-} >- > Register > Assembler::nRegisterAllocFromSet(int set) > { > NanoAssert(set != 0); > > // The CountLeadingZeroes function will use the CLZ instruction where > // available. In other cases, it will fall back to a (slower) C > // implementation. >@@ -1329,31 +1303,27 @@ Assembler::nativePageReset() > _nSlot = 0; > _startingSlot = 0; > _nExitSlot = 0; > } > > void > Assembler::nativePageSetup() > { >- if (!_nIns) _nIns = pageAlloc(); >- if (!_nExitIns) _nExitIns = pageAlloc(true); >- //nj_dprintf("assemble onto %x exits into %x\n", (int)_nIns, (int)_nExitIns); >+ if (!_nIns) >+ codeAlloc(codeStart, codeEnd, _nIns); >+ if (!_nExitIns) >+ codeAlloc(exitStart, exitEnd, _nExitIns); > >+ // constpool starts at top of page and goes down, >+ // code starts at bottom of page and moves up > if (!_nSlot) >- { >- // This needs to be done or the samepage macro gets confused; pageAlloc >- // gives us a pointer to just past the end of the page. >- _nIns--; >- _nExitIns--; >- >- // constpool starts at top of page and goes down, >- // code starts at bottom of page and moves up >- _nSlot = (int*)pageDataStart(_nIns); >- } >+ _nSlot = codeStart; >+ if (!_nExitSlot) >+ _nExitSlot = exitStart; > } > > // Record the starting value of _nIns. On ARM, it is also necessary to record > // the starting value of the literal pool pointer, _nSlot. > void > Assembler::recordStartingInstructionPointer() > { > _startingIns = _nIns; >@@ -1366,52 +1336,38 @@ Assembler::recordStartingInstructionPoin > void > Assembler::resetInstructionPointer() > { > _nIns = _startingIns; > _nSlot = _startingSlot; > NanoAssert(samepage(_nIns,_nSlot)); > } > >-// Note: underrunProtect should not touch any registers, even IP; it >-// might need to allocate a new page in the middle of an IP-using >-// sequence. > void > Assembler::underrunProtect(int bytes) > { > NanoAssertMsg(bytes<=LARGEST_UNDERRUN_PROT, "constant LARGEST_UNDERRUN_PROT is too small"); >- intptr_t u = bytes + sizeof(PageHeader)/sizeof(NIns) + 8; >- if ( (samepage(_nIns,_nSlot) && (((intptr_t)_nIns-u) <= intptr_t(_nSlot+1))) || >- (!samepage((intptr_t)_nIns-u,_nIns)) ) >+ NanoAssert(_nSlot != 0 && int(_nIns)-int(_nSlot) <= 4096); >+ uintptr_t top = uintptr_t(_nSlot); >+ uintptr_t pc = uintptr_t(_nIns); >+ if (pc - bytes < top) > { >+ verbose_only(verbose_outputf(" %p:", _nIns);) > NIns* target = _nIns; >+ if (_inExit) >+ codeAlloc(exitStart, exitEnd, _nIns); >+ else >+ codeAlloc(codeStart, codeEnd, _nIns); > >- _nIns = pageAlloc(_inExit); >+ _nSlot = _inExit ? exitStart : codeStart; > >- // XXX _nIns at this point points to one past the end of >- // the page, intended to be written into using *(--_nIns). >- // However, (guess) something seems to be storing the value >- // of _nIns as is, and then later generating a jump to a bogus >- // address. So pre-decrement to ensure that it's always >- // valid; we end up skipping using the last instruction this >- // way. >- _nIns--; >- >- // Update slot, either to _nIns (if decremented above), or >- // _nIns-1 once the above bug is fixed/found. >- _nSlot = (int*)pageDataStart(_nIns); >- >- // If samepage() is used on _nIns and _nSlot, it'll fail, since _nIns >- // points to one past the end of the page right now. Assume that >- // JMP_nochk won't ever try to write to _nSlot, and so won't ever >- // check samepage(). See B_cond_chk macro. >- JMP_nochk(target); >- } else if (!_nSlot) { >- // make sure that there's always a slot pointer >- _nSlot = (int*)pageDataStart(_nIns); >+ // _nSlot points to the first empty position in the new code block >+ // _nIns points just past the last empty position. >+ // Assume B_nochk won't ever try to write to _nSlot. See B_cond_chk macro. >+ B_nochk(target); > } > } > > void > Assembler::JMP_far(NIns* addr) > { > // Even if a simple branch is all that is required, this function must emit > // two words so that the branch can be arbitrarily patched later on. >@@ -1608,29 +1564,32 @@ Assembler::asm_ld_imm(Register d, int32_ > // literal space. > > // We must do the underrunProtect before PC_OFFSET_FROM as underrunProtect > // can move the PC if there isn't enough space on the current page! > if (chk) { > underrunProtect(LD32_size); > } > >- int offset = PC_OFFSET_FROM(_nSlot+1, _nIns-1); >+ int offset = PC_OFFSET_FROM(_nSlot, _nIns-1); > // If the offset is out of range, waste literal space until it is in range. > while (offset <= -4096) { > ++_nSlot; > offset += sizeof(_nSlot); > } >- NanoAssert(isS12(offset) && (offset < -8)); >+ NanoAssert(isS12(offset) && (offset < 0)); > > // Write the literal. >- *(++_nSlot) = imm; >+ *(_nSlot++) = imm; >+ asm_output("## imm= 0x%x", imm); > > // Load the literal. > LDR_nochk(d,PC,offset); >+ NanoAssert(uintptr_t(_nIns) + 8 + offset == uintptr_t(_nSlot-1)); >+ NanoAssert(*((int32_t*)_nSlot-1) == imm); > } > > // Branch to target address _t with condition _c, doing underrun > // checks (_chk == 1) or skipping them (_chk == 0). > // > // Set the target address (_t) to 0 if the target is not yet known and the > // branch will be patched up later. > // >diff --git a/js/src/nanojit/NativeARM.h b/js/src/nanojit/NativeARM.h >--- a/js/src/nanojit/NativeARM.h >+++ b/js/src/nanojit/NativeARM.h >@@ -671,16 +671,19 @@ enum { > // offsets, use PC+8. > #define PC_OFFSET_FROM(target,frompc) ((intptr_t)(target) - ((intptr_t)(frompc) + 8)) > #define isS12(offs) ((-(1<<12)) <= (offs) && (offs) < (1<<12)) > #define isU12(offs) (((offs) & 0xfff) == (offs)) > > #define B_cond(_c,_t) \ > B_cond_chk(_c,_t,1) > >+#define B_nochk(_t) \ >+ B_cond_chk(AL,_t,0) >+ > // NB: don't use COND_AL here, we shift the condition into place! > #define JMP(_t) \ > B_cond_chk(AL,_t,1) > > #define JMP_nochk(_t) \ > B_cond_chk(AL,_t,0) > > #define JA(t) B_cond(HI,t) >diff --git a/js/src/nanojit/Nativei386.cpp b/js/src/nanojit/Nativei386.cpp >--- a/js/src/nanojit/Nativei386.cpp >+++ b/js/src/nanojit/Nativei386.cpp >@@ -118,17 +118,17 @@ namespace nanojit > } > > return fragEntry; > } > > void Assembler::nFragExit(LInsp guard) > { > SideExit *exit = guard->record()->exit; >- bool trees = _frago->core()->config.tree_opt; >+ bool trees = config.tree_opt; > Fragment *frag = exit->target; > GuardRecord *lr = 0; > bool destKnown = (frag && frag->fragEntry); > // Generate jump to epilog and initialize lr. > // If the guard is LIR_xtbl, use a jump table with epilog in every entry > if (guard->isop(LIR_xtbl)) { > lr = guard->record(); > Register r = EBX; >@@ -226,62 +226,16 @@ namespace nanojit > } > asm_arg(sz, ins->arg(j), r); > } > > if (extra > 0) > SUBi(SP, extra); > } > >- void Assembler::nMarkExecute(Page* page, int flags) >- { >- NanoAssert(sizeof(Page) == NJ_PAGE_SIZE); >- #if defined WIN32 || defined WIN64 >- DWORD dwIgnore; >- static const DWORD kProtFlags[4] = >- { >- PAGE_READONLY, // 0 >- PAGE_READWRITE, // PAGE_WRITE >- PAGE_EXECUTE_READ, // PAGE_EXEC >- PAGE_EXECUTE_READWRITE // PAGE_EXEC|PAGE_WRITE >- }; >- DWORD prot = kProtFlags[flags & (PAGE_WRITE|PAGE_EXEC)]; >- BOOL res = VirtualProtect(page, NJ_PAGE_SIZE, prot, &dwIgnore); >- if (!res) >- { >- // todo: we can't abort or assert here, we have to fail gracefully. >- NanoAssertMsg(false, "FATAL ERROR: VirtualProtect() failed\n"); >- } >- #elif defined AVMPLUS_UNIX || defined AVMPLUS_MAC >- static const int kProtFlags[4] = >- { >- PROT_READ, // 0 >- PROT_READ|PROT_WRITE, // PAGE_WRITE >- PROT_READ|PROT_EXEC, // PAGE_EXEC >- PROT_READ|PROT_WRITE|PROT_EXEC // PAGE_EXEC|PAGE_WRITE >- }; >- int prot = kProtFlags[flags & (PAGE_WRITE|PAGE_EXEC)]; >- intptr_t addr = (intptr_t)page; >- addr &= ~((uintptr_t)NJ_PAGE_SIZE - 1); >- NanoAssert(addr == (intptr_t)page); >- #if defined SOLARIS >- if (mprotect((char *)addr, NJ_PAGE_SIZE, prot) == -1) >- #else >- if (mprotect((void *)addr, NJ_PAGE_SIZE, prot) == -1) >- #endif >- { >- // todo: we can't abort or assert here, we have to fail gracefully. >- NanoAssertMsg(false, "FATAL ERROR: mprotect(PROT_EXEC) failed\n"); >- abort(); >- } >- #else >- (void)page; >- #endif >- } >- > Register Assembler::nRegisterAllocFromSet(int set) > { > Register r; > RegAlloc &regs = _allocator; > #ifdef WIN32 > _asm > { > mov ecx, regs >@@ -1699,18 +1653,18 @@ namespace nanojit > } > > void Assembler::nativePageReset() > { > } > > void Assembler::nativePageSetup() > { >- if (!_nIns) _nIns = pageAlloc(); >- if (!_nExitIns) _nExitIns = pageAlloc(true); >+ if (!_nIns) codeAlloc(codeStart, codeEnd, _nIns); >+ if (!_nExitIns) codeAlloc(exitStart, exitEnd, _nExitIns); > } > > // Reset the _nIns pointer to the starting value. This can be used to roll > // back the instruction pointer in case an error occurred during the code > // generation. > void Assembler::resetInstructionPointer() > { > _nIns = _startingIns; >@@ -1720,23 +1674,23 @@ namespace nanojit > void Assembler::recordStartingInstructionPointer() > { > _startingIns = _nIns; > } > > // enough room for n bytes > void Assembler::underrunProtect(int n) > { >+ NIns *eip = _nIns; > NanoAssertMsg(n<=LARGEST_UNDERRUN_PROT, "constant LARGEST_UNDERRUN_PROT is too small"); >- NIns *eip = this->_nIns; >- Page *p = (Page*)pageTop(eip-1); >- NIns *top = (NIns*) &p->code[0]; >- if (eip - n < top) { >+ if (eip - n < (_inExit ? exitStart : codeStart)) { > // We are done with the current page. Tell Valgrind that new code > // has been generated. >- VALGRIND_DISCARD_TRANSLATIONS(pageTop(p), NJ_PAGE_SIZE); >- _nIns = pageAlloc(_inExit); >+ if (_inExit) >+ codeAlloc(exitStart, exitEnd, _nIns); >+ else >+ codeAlloc(codeStart, codeEnd, _nIns); > JMP(eip); > } > } > > #endif /* FEATURE_NANOJIT */ > } >diff --git a/js/src/nanojit/avmplus.cpp b/js/src/nanojit/avmplus.cpp >--- a/js/src/nanojit/avmplus.cpp >+++ b/js/src/nanojit/avmplus.cpp >@@ -27,32 +27,159 @@ > * file only under the terms of either the GPL or the LGPL, and not to allow others to use your > * version of this file under the terms of the MPL, indicate your decision by deleting provisions > * above and replace them with the notice and other provisions required by the GPL or the > * LGPL. If you do not delete the provisions above, a recipient may use your version of this file > * under the terms of any one of the MPL, the GPL or the LGPL. > * > ***** END LICENSE BLOCK ***** */ > >-#include "avmplus.h" >+#include "nanojit.h" >+ >+#ifdef SOLARIS >+ #include <ucontext.h> >+ #include <dlfcn.h> >+ #include <procfs.h> >+ #include <sys/stat.h> >+ extern "C" caddr_t _getfp(void); >+ typedef caddr_t maddr_ptr; >+#else >+ typedef void *maddr_ptr; >+#endif > > using namespace avmplus; > > Config AvmCore::config; > static GC _gc; > GC* AvmCore::gc = &_gc; > GCHeap GC::heap; > String* AvmCore::k_str[] = { (String*)"" }; > >+void >+avmplus::AvmLog(char const *msg, ...) { >+} >+ > #ifdef _DEBUG > // NanoAssertFail matches JS_Assert in jsutil.cpp. > void NanoAssertFail() > { > #if defined(WIN32) > DebugBreak(); > exit(3); > #elif defined(XP_OS2) || (defined(__GNUC__) && defined(__i386)) > asm("int $3"); > #endif > > abort(); > } > #endif >+ >+#ifdef WIN32 >+void >+VMPI_setPageProtection(void *address, >+ size_t size, >+ bool executableFlag, >+ bool writeableFlag) >+{ >+ DWORD oldProtectFlags = 0; >+ DWORD newProtectFlags = 0; >+ if ( executableFlag && writeableFlag ) { >+ newProtectFlags = PAGE_EXECUTE_READWRITE; >+ } else if ( executableFlag ) { >+ newProtectFlags = PAGE_EXECUTE_READ; >+ } else if ( writeableFlag ) { >+ newProtectFlags = PAGE_READWRITE; >+ } else { >+ newProtectFlags = PAGE_READONLY; >+ } >+ >+ BOOL retval; >+ MEMORY_BASIC_INFORMATION mbi; >+ do { >+ VirtualQuery(address, &mbi, sizeof(MEMORY_BASIC_INFORMATION)); >+ size_t markSize = size > mbi.RegionSize ? mbi.RegionSize : size; >+ >+ retval = VirtualProtect(address, markSize, newProtectFlags, &oldProtectFlags); >+ NanoAssert(retval); >+ >+ address = (char*) address + markSize; >+ size -= markSize; >+ } while(size > 0 && retval); >+ >+ // We should not be clobbering PAGE_GUARD protections >+ NanoAssert((oldProtectFlags & PAGE_GUARD) == 0); >+} >+ >+#else // !WIN32 >+ >+void VMPI_setPageProtection(void *address, >+ size_t size, >+ bool executableFlag, >+ bool writeableFlag) >+{ >+ int bitmask = sysconf(_SC_PAGESIZE) - 1; >+ // mprotect requires that the addresses be aligned on page boundaries >+ void *endAddress = (void*) ((char*)address + size); >+ void *beginPage = (void*) ((size_t)address & ~bitmask); >+ void *endPage = (void*) (((size_t)endAddress + bitmask) & ~bitmask); >+ size_t sizePaged = (size_t)endPage - (size_t)beginPage; >+ >+ int flags = PROT_READ; >+ if (executableFlag) { >+ flags |= PROT_EXEC; >+ } >+ if (writeableFlag) { >+ flags |= PROT_WRITE; >+ } >+ int retval = mprotect((maddr_ptr)beginPage, (unsigned int)sizePaged, flags); >+ AvmAssert(retval == 0); >+ (void)retval; >+} >+ >+#endif // WIN32 >+ >+ >+ >+#ifdef WIN32 >+void* >+nanojit::CodeAlloc::allocCodeChunk(size_t nbytes) { >+ return VirtualAlloc(NULL, >+ nbytes, >+ MEM_COMMIT | MEM_RESERVE, >+ PAGE_EXECUTE_READWRITE); >+} >+ >+void >+nanojit::CodeAlloc::freeCodeChunk(void *p, size_t nbytes) { >+ VirtualFree(p, 0, MEM_RELEASE); >+} >+ >+#elif defined(AVMPLUS_UNIX) >+ >+void* >+nanojit::CodeAlloc::allocCodeChunk(size_t nbytes) { >+ return mmap(NULL, >+ nbytes, >+ PROT_READ | PROT_WRITE | PROT_EXEC, >+ MAP_PRIVATE | MAP_ANON, >+ -1, >+ 0); >+} >+ >+void >+nanojit::CodeAlloc::freeCodeChunk(void *p, size_t nbytes) { >+ munmap(p, nbytes); >+} >+ >+#else // !WIN32 && !AVMPLUS_UNIX >+ >+void* >+nanojit::CodeAlloc::allocCodeChunk(size_t nbytes) { >+ return valloc(nbytes); >+} >+ >+void >+nanojit::CodeAlloc::freeCodeChunk(void *p, size_t nbytes) { >+ free(p); >+} >+ >+#endif // WIN32 >+ >diff --git a/js/src/nanojit/avmplus.h b/js/src/nanojit/avmplus.h >--- a/js/src/nanojit/avmplus.h >+++ b/js/src/nanojit/avmplus.h >@@ -283,22 +283,29 @@ namespace MMgc { > > #define DWB(x) x > #define DRCWB(x) x > #define WB(gc, container, addr, value) do { *(addr) = (value); } while(0) > #define WBRC(gc, container, addr, value) do { *(addr) = (value); } while(0) > > #define MMGC_MEM_TYPE(x) > >+extern void VMPI_setPageProtection(void *address, >+ size_t size, >+ bool executableFlag, >+ bool writeableFlag); >+ > namespace avmplus { > > using namespace MMgc; > > typedef int FunctionID; > >+ extern void AvmLog(char const *msg, ...); >+ > class String > { > }; > > typedef class String AvmString; > > class StringNullTerminatedUTF8 > { >diff --git a/js/src/nanojit/nanojit.h b/js/src/nanojit/nanojit.h >--- a/js/src/nanojit/nanojit.h >+++ b/js/src/nanojit/nanojit.h >@@ -115,24 +115,22 @@ namespace nanojit > * ------------------------------------------- > * START AVM bridging definitions > * ------------------------------------------- > */ > class Fragment; > class LIns; > struct SideExit; > class RegAlloc; >- struct Page; > typedef avmplus::AvmCore AvmCore; > typedef avmplus::OSDep OSDep; > typedef avmplus::GCSortedMap<const void*,Fragment*,avmplus::LIST_GCObjects> FragmentMap; > typedef avmplus::SortedMap<SideExit*,RegAlloc*,avmplus::LIST_GCObjects> RegAllocMap; > typedef avmplus::List<LIns*,avmplus::LIST_NonGCObjects> InsList; > typedef avmplus::List<char*, avmplus::LIST_GCObjects> StringList; >- typedef avmplus::List<Page*,avmplus::LIST_NonGCObjects> PageList; > > const uint32_t MAXARGS = 8; > > #ifdef MOZ_NO_VARADIC_MACROS > static void NanoAssertMsgf(bool a,const char *f,...) {} > static void NanoAssertMsg(bool a,const char *m) {} > static void NanoAssert(bool a) {} > #elif defined(_DEBUG) >@@ -290,17 +288,18 @@ namespace nanojit { > > } > > // ------------------------------------------------------------------- > // END debug-logging definitions > // ------------------------------------------------------------------- > > >- >+#include "Allocator.h" > #include "Native.h" >+#include "CodeAlloc.h" > #include "LIR.h" > #include "RegAlloc.h" > #include "Fragmento.h" > #include "Assembler.h" > > #endif // FEATURE_NANOJIT > #endif // __nanojit_h__ >diff --git a/js/src/vprof/vprof.h b/js/src/vprof/vprof.h >--- a/js/src/vprof/vprof.h >+++ b/js/src/vprof/vprof.h >@@ -81,26 +81,24 @@ > // > // If your application is not threaded, define THREAD_SAFE 0, > // otherwise, you have the option of setting THREAD_SAFE to 1 which results in exact counts or to 0 which results in a much more efficient but non-exact counts > // > #define THREADED 0 > #define THREAD_SAFE 0 > > #ifdef _MSC_VER >-typedef unsigned char uint8_t; >-typedef unsigned short uint16_t; >-typedef signed char int8_t; >-typedef short int16_t; >-typedef unsigned int uint32_t; >-typedef signed int int32_t; >+typedef __int8 int8_t; >+typedef __int16 int16_t; >+typedef __int32 int32_t; > typedef __int64 int64_t; >+typedef unsigned __int8 uint8_t; >+typedef unsigned __int16 uint16_t; >+typedef unsigned __int32 uint32_t; > typedef unsigned __int64 uint64_t; >-typedef long long int64_t; >-typedef unsigned long long uint64_t; > #else > #include <inttypes.h> > #endif > > // portable align macro > #if defined(_MSC_VER) > #define vprof_align8(t) __declspec(align(8)) t > #elif defined(__GNUC__) >@@ -115,21 +113,21 @@ int profileValue (void** id, char* file, > int _profileEntryValue (void* id, int64_t value); > int histValue(void** id, char* file, int line, int64_t value, int nbins, ...); > int _histEntryValue (void* id, int64_t value); > > #ifdef __cplusplus > } > #endif > >-#define DOPROF >- > #ifndef DOPROF > #define _vprof(v) >+#define _nvprof(n,v) > #define _hprof(h) >+#define _nhprof(n,h) > #else > > #define _vprof(v,...) \ > { \ > static void* id = 0; \ > (id != 0) ? \ > _profileEntryValue (id, (int64_t) (v)) \ > : \
Attachment #390945 - Flags: review?(gal) → review+
Sorry, I forgot to cut out the last chunk, but I did shorten most of it :)
(In reply to comment #12) > > >+ // the chunk, which would break everything. > >+ if (_unused >= _limit) { > >+ // Check we used exactly the remaining space > >+ NanoAssert(_unused == _limit); > >+ uintptr_t addrOfLastLInsOnChunk = _unused - sizeof(LIns); > >+ moveToNewChunk(addrOfLastLInsOnChunk); > > This logic is really weird, having to check explicitly that we used all. It makes sense when seen in conjunction with the prior test. In particular, it is (was?) possible to have a skip instruction with a payload so large it takes up an entire page. This code took numerous attempts to get right so should only be changed with great care.
(In reply to comment #12) > (From update of attachment 390945 [details] [diff] [review]) > >+ CLS(VMAllocator) reAllocator; > >+ CLS(nanojit::CodeAlloc) reCodeAlloc; > > Do we have a bug already on merging these with the main allocator? No, but all this will shift around significantly once we start teasing apart the (now separable) lifetimes of the code vs. LIR. This is the "drop new code in, no behavior should change" first-step patch. I wouldn't bother with a specific "merge these two allocators" bug yet. We need to have design discussions about the real lifecycles. > Pull a JSTraceMonitor* tm out of the loop? Or even better the VMAllocator*. Will do all these. > >+ uintptr_t mReserve[0x10000]; > > Why 0x10000? (not saying its bad, just curious) Temporary hack intended to be "no worse, hopefully slightly better, than the reserve-page mechanism". I figured "16 times better" would be a good approximation. I'll give it a symbolic name and comment to clarify. Again, this will die as soon as we figure out which of the saner forms of OOM handling we want to try (my vote is still with longjmp). This is just the "work the same as currently" stopgap. > > _unused += sizeof(LInsSk); > > _stats.lir++; > > We should probably not collect stats in OPT builds. Oops. Will fix. > > >+ // the chunk, which would break everything. > >+ if (_unused >= _limit) { > >+ // Check we used exactly the remaining space > >+ NanoAssert(_unused == _limit); > >+ uintptr_t addrOfLastLInsOnChunk = _unused - sizeof(LIns); > >+ moveToNewChunk(addrOfLastLInsOnChunk); > > This logic is really weird, having to check explicitly that we used all. As njn points out, this is sanity-checking based on earlier conditions; I'm loth to weaken any of the asserts in here. If it's wrong, we are scribbling over the assembler. And it's been wrong a lot. > Random rant: Every comment in NJ is formated differently. Fair; but at this stage beside the point of attempting to sync bytes with Adobe. We also have a few variable-naming schemes fighting for dominance. We'll work it out later. Thanks for the patient reading. Now I just need to work out if this is too disruptive to land at the present product-cycle moment :)
Attached patch further updates (obsolete) (deleted) — Splinter Review
This variant includes gal's previous nits, rebases to today, incorporates a performance fix in the CodeAlloc from Adobe.
Attachment #390945 - Attachment is obsolete: true
Attachment #391201 - Flags: review?(gal)
Blocks: 507031
Blocks: 507036
Blocks: 507039
Blocks: 507042
Blocks: 507302
Attachment #391201 - Flags: review?(gal)
Comment on attachment 391201 [details] [diff] [review] further updates Actually I'm removing the r? here because only one loop changed from obviously-wrong to obviously-right (and equal to upstream) in the CodeAlloc, everything else was just addressing concerns and rebasing. And I've rebased again. No substantive changes. (Sorry if this is too fast-and-loose, been trying to land this for 2 weeks)
Attached patch rebase to tonight (deleted) — Splinter Review
Attachment #391201 - Attachment is obsolete: true
Comment on attachment 391498 [details] [diff] [review] rebase to tonight Reviewed based on interdiff. Minimal diff.
Attachment #391498 - Flags: review+
Attached patch Some fixes (deleted) — Splinter Review
This patch provides a few key fixes that remove (as far as I can tell) the performance "regression" (it was actually an infinite leak) that prevented the previous landing from sticking. The key part is hidden down at the bottom of the hunk rewriting Assembler::endAssembly: the removal of a spurious call to Assembler::reset(), that I thought was harmless. What it actually did, sadly, was *forget* the generated CodeList before ::compile managed to transfer ownership of Assembler::codeList over to the owning Fragment. So it leaked forever. Sigh. Anyway, also took the opportunity to reduce divergence with Adobe's version of endAssembly, since it's a bit tidier and logically equivalent. Also fixed a couple crasher bugs in CodeAlloc (one in the block-coalescing algorithm, and one in the implementation of ::sweep), lowered the per-allocation page count in CodeAlloc to 16, and taught the tracer to call sweep when it's really out of space.
Attachment #393227 - Flags: review?(edwsmith)
Attachment #393227 - Flags: review?(tharwood)
Attachment #393227 - Flags: review?(edwsmith)
Attachment #393227 - Flags: review+
Comment on attachment 393227 [details] [diff] [review] Some fixes the change in CodeAlloc::free looks right, but adding Tom Harwood for another review just to be sure. sweep() needs to unlink blocks from heapblocks, too, as discussed on #nanojit
Comment on attachment 393227 [details] [diff] [review] Some fixes - CodeAlloc::free() oldHigher could be given a name the denotes its destiny, e.g., weedWhackerFodder or perhaps just coalescedBlock. Does sweep() have an error as it was written? The free() bug is a fat-finger coding error, but sweep() should have worked as it was written. Nothing wrong with the rewrite, though. Worth commenting that the test "!ab->higher->higher" is safe (and necessary) because anything on availblocks is guaranteed to have a higher sibling, the terminator block for the highest avail block.
Attachment #393227 - Flags: review?(tharwood) → review+
(In reply to comment #21) > sweep() needs to unlink blocks from heapblocks Right. Disregard comment above about correctness of sweep(), it does need work and first scanning availblocks is a better way to go, because it leaves the terminator block readily to hand to be unlinked from heapblocks.
This appears to be sticking, no evident regressions after 2 talos cycles. Fingers crossed. http://hg.mozilla.org/tracemonkey/rev/1cf3897d3691 and dvander's followup catch http://hg.mozilla.org/tracemonkey/rev/b79cb3d86982 (Wound up landing a variant with a couple minor adjustments to the CodeAlloc debugging code, an extra nanojit::Assembler for the regexp pipeline, and a corrected sweep that works in two passes; otherwise as-reviewed)
Whiteboard: fixed-in-tracemonkey
I'm not convinced about the modifications to ::asm_ld_imm in NativeARM.cpp. In particular, you've changed an instance of *(++_nSlot) to *(_nSlot++) and modified some assertions. Whilst a post-increment may be more correct, _nSlot is also used in B_cond_chk, and this usage hasn't been updated. Is this really correct? Also, you've changed the assertion that (offset < -8) to check that (offset < 0). The former is more strict, as the PC on ARM is 8 bytes ahead of the executing instruction. (PC_OFFSET_FROM takes this offset into account.) Have I misunderstood something here?
(In reply to comment #25) > I'm not convinced about the modifications to ::asm_ld_imm in NativeARM.cpp. In > particular, you've changed an instance of *(++_nSlot) to *(_nSlot++) and > modified some assertions. Whilst a post-increment may be more correct, _nSlot > is also used in B_cond_chk, and this usage hasn't been updated. Is this really > correct? > > Also, you've changed the assertion that (offset < -8) to check that (offset < > 0). The former is more strict, as the PC on ARM is 8 bytes ahead of the > executing instruction. (PC_OFFSET_FROM takes this offset into account.) > > Have I misunderstood something here? I'm not sure. Not even a bit. Here's what I did: - Try making no changes to ARM at all except API changes to handle the new allocator interface. - Notice that it crashes all the time. - Try to understand now _nSlot works, and try to understand why the Tamarin code differs in the ways you outline. Talk to Edwin about it, draw some pictures, scratch my head. - Come to believe Tamarin might be more correct. Try copying tamarin code for handling _nSlot into TM, find to my delight that it stops the crashing. IOW, if you feel confident that it's wrong and can make it more-right without causing it to go back to crashing all the time, I'm absolutely happy to accept your patches to do so! Particularly if they cause Tamarin and Tracemonkey converge more, rather than diverge again. I don't have a firm feeling about the right way to do this area, nor a great deal of confidence in my ability to speak ARM assembly at all.
Depends on: 510657
Comment on attachment 391498 [details] [diff] [review] rebase to tonight >+class VMAllocator; ... >+struct VMAllocator : public nanojit::Allocator Oops ;-)
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Flags: in-testsuite-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: