Closed Bug 495734 Opened 16 years ago Closed 15 years ago

nanojit: don't store non-LIR data in LIR buffers

Categories

(Core :: JavaScript Engine, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: n.nethercote, Assigned: graydon)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 5 obsolete files)

Various blobs of non-LIR data are stored in LIR buffers: VMSideExits, GuardRecords, CallInfos, FrameInfos, etc. They are stored using LIR_skip instructions, which have a variable-length payload. This causes various problems: - It restricts the size of these blobs -- they can't be bigger than (slightly less than) the page size. - It ties the life-time of these blobs to the life-time of the LIR buffers. This makes it hard to deallocate LIR buffers, even after the LIR instructions are no longer needed. - Skips with payloads are complicated and easy to get wrong, and we've seen with numerous prior bugs. Attached is a patch that addresses this. (Note that it requires attachment 379644 [details] [diff] [review] from bug 494639 to be applied first.) The patch gets rid of LIR_skip instructions with payloads, which leaves only payload-less LIR_skip instructions that link pages together. This allows insSkip() and payload() to be removed. However, the patch is only a proof-of-concept and needs more work because: - All the data blobs that were allocated via LIR_skip payloads are now allocated with malloc(), but no attempt is made to free them. (I don't know what the life-times are for the various different blob kinds.) - A couple of magic numbers have been introduced where previously NJ_MAX_SKIP_PAYLOAD_SZB was used. These problem points are marked with "XXX WARNING njn".
Blocks: 497009
Attached patch rebased patch (obsolete) (deleted) — Splinter Review
Rebased patch. One thing worth mentioning is that I'm completely uncertain about the handling of the insGuard() call within TraceRecorder::emitNativeCall() -- I just used 0 for the 2nd arg instead of the GuardRecord (and trace-test.js passes). I suspect the original code is actually buggy, see bug 498807 for more details about that.
Attachment #380777 - Attachment is obsolete: true
Attached file Valgrind errors (obsolete) (deleted) —
Also, with the above patch I've found some problems with Valgrind, as shown in the attachment. I think these were latent problems that my change uncovered. All of them involve small overruns of the buffers that used to be skip payloads but now are malloc'd. They were likely not having any effect because skip payload sizes are currently rounded up to a multiple of the word size, thus hiding the fact that the asked-for memory size isn't quite big enough. Eg. the first one is a 2-byte overrun of a 38 byte buffer; with rounding, that goes up to 40, so the rounded-up buffer is big enough.
I wonder if you couldn't pick up even more of this kind of stuff by Memchecking it running on a bigendian platform. Shame NJ doesn't work on ppc32-linux.
Tamarin works on ppc32-linux. At least, there's a backend file called NativePPC.cpp, which is rather suggestive.
NativePPC.cpp has only been tested on mac, and there has been interest in AIX. porting should be straightforward. Happy to consult if anyone has the hardware to test on.
(In reply to comment #5) > NativePPC.cpp has only been tested on mac, and there has been interest in AIX. > porting should be straightforward. Happy to consult if anyone has the hardware > to test on. ... /me fires up qemu-system-ppc and gets to work building a userspace :)
X64 and PPC are missing a few TM bits (i.e. LIR_loop, savedRegs), but that should be easy to add.
(In reply to comment #1) > > One thing worth mentioning is that I'm completely uncertain about the handling > of the insGuard() call within TraceRecorder::emitNativeCall() -- I just used 0 > for the 2nd arg instead of the GuardRecord (and trace-test.js passes). I > suspect the original code is actually buggy, see bug 498807 for more details > about that. I've cleared up this confusion in bug 498807, which I'm marking as a blocker for this bug.
Depends on: 498807
Attached patch patch with simple memory management (obsolete) (deleted) — Splinter Review
This patch adds some very simple management of the malloc'd blocks -- pointers to them are stored in a linked list in TreeInfo and they are deallocated when TreeInfo is destroyed. I'm not certain about the lifetimes of these but Andreas said the lifetimes would match that of TreeInfo, and Valgrind doesn't complain about any uses-after-frees. Remaining problems: - jsregexp.cpp still allocates some memory without freeing it -- there's no TreeInfo in jsregexp.cpp for storing pointers to the blocks - the malloc calls aren't checked for failure - the magic numbers are still present I don't expect this patch to pass review, but any suggestions on how to continue would be welcome.
Attachment #383613 - Attachment is obsolete: true
Attachment #384577 - Flags: review?(gal)
I see this pattern (alloc & register) repeated a lot: CallInfo* ci = (CallInfo *) malloc(sizeof(struct CallInfo)); treeInfo->oddsAndEnds = prependToHetList(treeInfo->oddsAndEnds, ci); How about replacing this with: CallInfo* ci = treeInfo->alloc<CallInfo>(); which will do both, and will be backed up by: template<typename T> T* alloc(size_t extra = 0) { T* p = malloc(sizeof(T) + extra); treeInfo->oddsAndEnds = prependToHetList(treeInfo->oddsAndEnds, p); return p; } For regexps, the first thing that comes to mind is to add an allocated list to the JSRegExp and use it in the same way, freeing the pointers in JS_DestroyRegExp.
Comment on attachment 384577 [details] [diff] [review] patch with simple memory management >diff --git a/js/src/jsregexp.cpp b/js/src/jsregexp.cpp >--- a/js/src/jsregexp.cpp >+++ b/js/src/jsregexp.cpp >@@ -2174,26 +2174,27 @@ class RegExpNativeCompiler { > return JS_FALSE; > /* > * If we share generated native code, we need to make a copy > * of the bitmap because the original regexp's copy is destroyed > * when that regexp is. > */ > RECharSet *charSet = &re->classList[node->u.ucclass.index]; > size_t bitmapLen = (charSet->length >> 3) + 1; >- /* insSkip() can't hold large data blocks. */ >+ // njn: the size restrictions are gone now that blobs of data are no >+ // longer stored in the LIR with skip instructions. Yeah, we can drop the restriction here I guess. > if (bitmapLen > 1024) > return NULL; > /* The following line allocates charSet.u.bits if successful. */ > if (!charSet->converted && !ProcessCharSet(cx, re, charSet)) > return NULL; >- LIns* skip = lirBufWriter->insSkip(bitmapLen); > if (fragment->lirbuf->outOMem()) > return NULL; >- void* bitmapData = skip->payload(); >+ // XXX WARNING njn: this memory is never freed >+ void* bitmapData = malloc(bitmapLen); > memcpy(bitmapData, charSet->u.bits, bitmapLen); dmandelin wrote this code. We should discuss who should own the bitmap memory. > > LIns* to_fail = lir->insBranch(LIR_jf, lir->ins2(LIR_lt, pos, cpend), 0); > fails.add(to_fail); > LIns* text_ch = lir->insLoad(LIR_ldcs, pos, lir->insImm(0)); > fails.add(lir->insBranch(LIR_jf, lir->ins2(LIR_le, text_ch, lir->insImm(charSet->length)), 0)); > LIns* byteIndex = lir->ins2(LIR_rsh, text_ch, lir->insImm(3)); > LIns* bitmap = lir->insImmPtr(bitmapData); >@@ -2350,28 +2351,28 @@ class RegExpNativeCompiler { > > /* > * Insert the side exit and guard record for a compiled regexp. Most > * of the fields are not used. The important part is the regexp source > * and flags, which we use as the fragment lookup key. > */ > GuardRecord* insertGuard(const jschar* re_chars, size_t re_length) > { >- LIns* skip = lirBufWriter->insSkip(sizeof(GuardRecord) + >- sizeof(RESideExit) + >- (re_length-1) * sizeof(jschar)); >- GuardRecord* guard = (GuardRecord *) skip->payload(); >+ // XXX WARNING njn: this memory is never freed >+ GuardRecord* guard = (GuardRecord*) malloc(sizeof(GuardRecord) + >+ sizeof(RESideExit) + >+ (re_length-1) * sizeof(jschar)); > memset(guard, 0, sizeof(*guard)); > RESideExit* exit = (RESideExit*)(guard+1); > guard->exit = exit; > guard->exit->target = fragment; > exit->re_flags = re->flags; > exit->re_length = re_length; > memcpy(exit->re_chars, re_chars, re_length * sizeof(jschar)); >- fragment->lastIns = lir->insGuard(LIR_loop, lir->insImm(1), skip); >+ fragment->lastIns = lir->insGuard(LIR_loop, lir->insImm(1), guard); > return guard; > } > > public: > RegExpNativeCompiler(JSRegExp* re, CompilerState* cs, Fragment* fragment) > : re(re), cs(cs), fragment(fragment), lir(NULL), lirBufWriter(NULL) { } > > JSBool compile(JSContext* cx) >diff --git a/js/src/jstracer.cpp b/js/src/jstracer.cpp >--- a/js/src/jstracer.cpp >+++ b/js/src/jstracer.cpp >@@ -274,16 +274,44 @@ static bool did_we_check_processor_featu > > #ifdef JS_JIT_SPEW > bool js_verboseDebug = getenv("TRACEMONKEY") && strstr(getenv("TRACEMONKEY"), "verbose"); > bool js_verboseStats = js_verboseDebug || > (getenv("TRACEMONKEY") && strstr(getenv("TRACEMONKEY"), "stats")); > bool js_verboseAbort = getenv("TRACEMONKEY") && strstr(getenv("TRACEMONKEY"), "abort"); > #endif > >+ >+// A simple linked list type intended just to hold a bunch of malloc'd blocks >+// that need to be freed at the same time. >+struct _HetList { >+ void* elem; >+ HetList* next; >+}; >+ >+// njn: what if the allocation of elem failed? >+static HetList* prependToHetList(HetList* list, void* elem) >+{ >+ HetList* newNode = (HetList*) malloc(sizeof(HetList)); >+ newNode->elem = elem; >+ newNode->next = list; >+ return newNode; >+} >+ >+static void destroyHetList(HetList* curr) >+{ >+ while (curr) { >+ HetList* tmp = curr; >+ curr = curr->next; >+ free(tmp->elem); // Free the HetList node's element >+ free(tmp); // Free the HetList node itself >+ } >+} >+ We have a bunch of list data structures lying around. Not sure whether one of them would work here. I usually try to stick a next pointer into all elements and avid shrink-wrapping. Would it be easier to have a per tree allocation pool? So instead of keeping track of all these little bits, we just have a pool we allocate from? And then dispose of the entire pool when the tree is killed?
(In reply to comment #11) > > We have a bunch of list data structures lying around. Not sure whether one of > them would work here. I usually try to stick a next pointer into all elements > and avid shrink-wrapping. I looked at some of them (List, Queue) but they weren't suitable because AFAICT they move elements around as the data structure fills up. Hmm, I guess I could have just put pointers into the structure. Still, the advantage of this is that it's void* elems so different types can go in the one list. It's something of a strawman, dead simple just to get the ball rolling because I figured whatever I came up with first would have some kind of problem :) > Would it be easier to have a per tree allocation pool? So instead of keeping > track of all these little bits, we just have a pool we allocate from? And then > dispose of the entire pool when the tree is killed? It's certainly possible. The interface would be similar, ie. "give me a chunk of memory whose lifetime is that of the tree, please" (that's assuming we bundle up the malloc calls something like in comment 10). It's just a question of the underlying implementation, lists vs. pools. Maybe there are cache benefits to be had by co-locating certain kinds of things together? And then there's the elephant in the room of how to handle OOM...
OOM handling won't be too terrible. We can stop the recording if we OOM.
(In reply to comment #13) > OOM handling won't be too terrible. We can stop the recording if we OOM. How does that correspond to the patch? Ie. currently I don't check the malloc() results. How do I check and then stop the recording, is it possible now? Or will it require some work towards setjmp/longjmp or somesuch? (Factoring out as per comment 10 would at least mean that there's a single place to check rather than several.)
We can already run out of lir space while recording. This would be a similar and additional abort condition (out of meta data storage).
In a separate bug i've been working on cleaning up the allocation pattersn and putting in a clean allocator api for nanojit to use everywhere. which brings up the subject of arenas (fast bump-allocator, toss the whole buffer when done). seems pretty clear that LIR (and associated structures, and inline blobs, etc) is a prime candidate for using an arena allocator. This but aims to move data out of LirBuffer, but I've always thought of LirBuffer as an allocation arena; its just not well factored that way. (putting data in line means it has lir lifetime and is freed when lirbuffer is freed, no muss no fuss). so if we agree on an OOM handling policy, (strawman: arena allocators longjmp when OOM) and we allow for N arenas with different lifetimes, then we get fast bump allocation with clean OOM handling and clean normal-exit cleanup. separately, jsteward is advocating to add a statement list into LIR. if we did *that* and we used a lir-lifeimte bump-allocate arena, then here's what happens: - each single LIns is arena-allocated - and so are lir-lifetime blobs, like struct SideExit - you dont need LirBuffer anymore, at least not for storing instructions. maybe LirBuffer becomes the new statement list. - you can traverse lir forwards or backwards without knowing about instruction size - can reorder instructions by making new lists. thoughts?
(In reply to comment #16) > thoughts? I like the sound of this in all ways but one: we keep side exits (and some other metadata, notably the TreeInfo objects this bug is mostly concerned with) around post-compilation in order to "knit" fragments together as each trace tree grows. We actually keep the LIRBuffer and the code pages alive equally long, and flush them both (including all in-buffer structures) simultaneously when we hit our memory high-water mark. So while I think it's a good time to try writing such an experiment, we'll need to be just a little careful not to deny clients the ability to allocate such structures out-of-line, if we want to match the lifetime requirements of the client. But that's a qualification rather than an objection. In general I think it is worth trying such a design. Who wants to sketch out the proof-of-concept? :)
The approach in #16 would still work though. There is nothing stopping us from having 2 parallel lirbuffers. One we deallocate right away (instruction stream) if we don't recompile, and one that is permanent and is deallocated when the native code is killed.
(In reply to comment #18) > The approach in #16 would still work though. There is nothing stopping us from > having 2 parallel lirbuffers. One we deallocate right away (instruction stream) > if we don't recompile, and one that is permanent and is deallocated when the > native code is killed. Absolutely. Such a solution just requires the ability for the client to choose which buffer gets used for which sort of allocation.
(In reply to comment #19) > (In reply to comment #18) > > The approach in #16 would still work though. There is nothing stopping us from > > having 2 parallel lirbuffers. One we deallocate right away (instruction stream) > > if we don't recompile, and one that is permanent and is deallocated when the > > native code is killed. > > Absolutely. Such a solution just requires the ability for the client to choose > which buffer gets used for which sort of allocation. exactly. imho, Arena should be in nanojit, and get its memory in hunks from a vm-provided Allocator. (no 4k rules need apply). (none of this is for native code, although an Arena could be made that has native-code lifetime). strawman: nanojit::Allocator = VM-provided allocation interface. never returns null, longjumps on OOM without calling destructors. nj::Allocator is meant to be a new/delete abstraction. methods: alloc(size_t) // allocate, never return NULL dealloc(p) // free explicitly template<> destroy(p) // replaces delete p nanojit::Arena = nanojit-implemented fast bump allocator, calls nj::Allocator for hunks of memory. no free provided, this is an arena! ::operator new (size_t, nj::Allocator) // overloaded new (Allocator) T() ::operator new (size_t, nj::Arena) // same idea In places where lifetimes are clearly joined, an Arena* can be a member field and the class can use it w/out passing it around. e.g.. possibly LirBuf. in places where much flexibility is needed over lifetime, Arena* is passed into the api. e.g. possibly the SortedMap/BitMap/List<> containers. - it remains to be seen if any nanojit code other than Arena would ever want to use Allocator. but lets leave that open
In replay to comments 16-20: I like the proposal in comment 16 too. On the issue of SideExits and lifetimes and such: it seems to me that we should regard (LIR + LIR SideExit/GuardRecord descriptors) as together making up the intermediate form of a trace: instructions and metadata. And (native code + natively used SideExit buffers + FrameInfos) as together making up the compiled form of a trace: "code and data segments". So conceptually a LIR SideExit has the same lifetime as LIR, and a native SideExit is a logically distinct object that has the same lifetime as native code. And I would recommend making this explicit in the code. Avoiding a copy by transferring ownership from LIR to native should be regarded as an optimization, and probably not an important one. WRT to arenas: looks good to me, but I hope we will get an API layer such that we later switch to a different allocation scheme without too much pain. In particular, it seems to be that someday in the further future we might want finer-grained management of fragment memory or even a GC for fragment memory.
I think the strawman api layer above, combined with the CodeAlloc api (in TR/nanojit) has that flexibility. either way I agree, though, it should have that flexibility.
(In reply to comment #16) > > This bug aims to move data > out of LirBuffer, but I've always thought of LirBuffer as an allocation arena; > its just not well factored that way. (putting data in line means it has lir > lifetime and is freed when lirbuffer is freed, no muss no fuss). Now I'm really confused. I thought the main motivation for this bug was to disentangle the data's lifetime from the LIR's lifetime, and that is why this blocks bug 497009... (In reply to comment #21) > > I like the proposal in comment 16 too. On the issue of SideExits and lifetimes > and such: it seems to me that we should regard (LIR + LIR SideExit/GuardRecord > descriptors) as together making up the intermediate form of a trace: > instructions and metadata. And (native code + natively used SideExit buffers + > FrameInfos) as together making up the compiled form of a trace: "code and data > segments". > > So conceptually a LIR SideExit has the same lifetime as LIR, and a native > SideExit is a logically distinct object that has the same lifetime as native > code. And I would recommend making this explicit in the code. Avoiding a copy > by transferring ownership from LIR to native should be regarded as an > optimization, and probably not an important one. ...although this somewhat lessens my confusion. In my mind, the first step here is to determine the lifetimes of all the relevant pieces of data: VMSideExits, GuardRecords, SwitchInfos, CallInfos, FrameInfos, and bitmap data (for jsregexp.cpp). This is orthogonal to, and something of a prerequisite for, the exact allocation method (arenas or whatever). Despite the fact I wrote the patch, as someone still new to this code the uses and lifetimes of these different objects are mysterious to me, which makes me very nervous. (And if LIR SideExits and native SideExits have the same type but a different lifetime, as Dave says, then I agree they should probably be better distinguished.) So any explanations or pointers to documentation on this matter would be greatly appreciated.
We want to deallocate the instruction LIR. Currently instructions and meta data (side exits) are intermixed. If we split them into separate LIR buffers, we can deallocate one and keep the other.
(In reply to comment #24) > We want to deallocate the instruction LIR. Currently instructions and meta data > (side exits) are intermixed. If we split them into separate LIR buffers, we can > deallocate one and keep the other. That's a very simplistic view of it, and one that I've understood from the start. But the real story is that we have six kinds of metadata that are currently stored in the LIR buffer (seven, if VMSideExits can be split between LIR and native code). The lifetimes of these six or seven kinds of metadata is what I don't know. Also, I think we should avoid using the term "LIR buffer" for a buffer that doesn't hold LIR instructions, because the LirBuffer class has several members (insCounst(), byteCount(), _stats.lir, maybe some of the others) that only make sense for instructions.
Maybe you want to list the 6/7 pieces of meta data to make this more concrete and we discuss each type individually?
(In reply to comment #26) > Maybe you want to list the 6/7 pieces of meta data to make this more concrete > and we discuss each type individually? He did, comment 23.
Ah ok thanks. Missed that. 1. VMSideExits (bailout info, a point in the interpreter bytecode) 2. GuardRecords (bailout info, a point in the native code/LIR) 3. SwitchInfos (info for switch statement handling) 4. CallInfos (these are dynamically generated callinfos for fast native and slow native calls, we also use static ones that are declared directly in C) 5. FrameInfos (inling info, used to synthesize frames) 6. and bitmap data (for jsregexp.cpp, character matching bitmaps) Still missing 7. Anyway, all this meta data is referenced by the native code, directly or indirectly (native code refers to GuardRecords, for example, and GuardRecords point to VMSideExits). In other words all this meta data must live as long as the _native_ code lives. The LIR instructions themselves can have a shorter life time. As for LirBuffer, the name is maybe misleading, but I think we know what we mean. Do we actually use stats.lir and insCount and byteCount? Last time I was cleaning out in that corner I noticed most of that is dead code.
Note: I tried at some point to stick this meta info into the code buffer (_nIns), but ended up not doing that because: - the code buffer has to be rewound if we abort recording, which happens with a non-trivial frequency and it wasn't obvious to me how to do that once we cross a page boundary - co-allocating code and data seemed bad for code locality - some of the meta info is not read-only (i.e. VMSideExits, which has a linked list of GuardRecords that grows), and we want the code pages to be read only eventually IMO, conceptually, this meta info should live in the code page, just not really.
(In reply to comment #28) > > Anyway, all this meta data is referenced by the native code, > directly or indirectly (native code refers to GuardRecords, for example, and > GuardRecords point to VMSideExits). In other words all this meta data must live > as long as the _native_ code lives. The LIR instructions themselves can have a > shorter life time. Ah, thanks, that's what I wanted to know. > As for LirBuffer, the name is maybe misleading, but I think we know what we > mean. Do we actually use stats.lir and insCount and byteCount? Last time I was > cleaning out in that corner I noticed most of that is dead code. TM doesn't use insCount() and byteCount(), but Tamarin-redux does. _stats.lir is used by insCount(). (In reply to comment #29) > Note: I tried at some point to stick this meta info into the code buffer > (_nIns), but ended up not doing that because: > > - co-allocating code and data seemed bad for code locality Yeah, doesn't that completely trash caches (ie. leads to lots of cache flushing?) > IMO, conceptually, this meta info should live in the code page, just not > really. We could make a code page have a header with a pointer to an object holding the metadata. ---- As for moving forward: comments #16 and #20 seem like a good idea, but they greatly expand the scope of this bug. Do people think it's possible that a slightly improved version of my patch could be accepted, and then more advanced stuff be pushed into a follow-up bug? The linked list I used is very simple but enough to get the metadata out of the LIR buffers, which gets rid of skips with payloads, which makes LIR a lot simpler. Also, it would allow progress on bug 497009 which would help Fennec. If that is good enough for now, the only real question is what to do if the malloc() call for the metadata (which will be encapsulated in a single function) fails. A short-term easy thing would be just to abort, based on the idea that the follow-up bug would address it properly. Or I could try to have some backup static memory and set an OOM flag, much like is done with LirBuffer in makeRoom(), but that's more work and is likely to be replaced soon anyway.
I'd remove the stats tracking if it makes the refactoring easier. We can always adapt. Surely there's a structure in TM that corresponds 1-1 with code? (and points to the code)? that might be a good place to track "the data segments" for each code segment.
(In reply to comment #31) > Surely there's a structure in TM that corresponds 1-1 with code? (and points to > the code)? that might be a good place to track "the data segments" for each > code segment. Oh how I wish. We might be there by the end of the summer though, if we hustle.
> separately, jsteward is advocating to add a statement list into LIR. I'm only advocating something very simple. Not really a list of LInsns which are the top level statements (at least not now). That would increase storage requirements and complexity. What I meant was, split the LIns type into two, one for LIns kinds which compute a value, and one for LIns kinds which don't compute a value, and so which we must be doing for their side effects. At the moment, if I give you a LIns, you're expecting one that computes a value, but by mistake I give you one which doesn't, then at best we get an assertion failure at run time, at worst weird behaviour or a crash. Whereas if they were separate types then such errors would be detected at compile time. The criteria for the division is basically as defined by the function hasEffects in comment #5 of https://bugzilla.mozilla.org/show_bug.cgi?id=496693. Obviously the division actually needs to be done on the basis of the LOpcodes, so it is completely compile-time known, but that should be no problem.
(In reply to comment #33) > split the LIns type into two, one for LIns kinds which not least, because at the moment one can write stuff like ins2(LIR_add, insStore(....), insGuard(...)) which is nonsense, and won't be detected until it runs. With split types this would be jumped by g++ at build time. Anyway, this is OT. Time for a new bug report perhaps.
(In reply to comment #30) > > As for moving forward: comments #16 and #20 seem like a good idea, but they > greatly expand the scope of this bug. Do people think it's possible that a > slightly improved version of my patch could be accepted, and then more advanced > stuff be pushed into a follow-up bug? Sounds reasonable to me. Anyone disagree?
sounds fine to me. #16 and #20 weren't meant to expand the scope of this bug, only to align with allocation work that's in progress.
Attached patch yet another version of the patch (obsolete) (deleted) — Splinter Review
Good things about this patch: - It handles jsregexp.cpp as well as jstracer.cpp. - It uses the pre-existing Queue class rather than a new type. - It commons up the allocation and list-adding steps. Bad things about this patch: - Valgrind complains about various use-after-free errors and I get a bus error on my Mac when I try to free the memory. So the lines in jsregexp.cpp and jstracer.cpp that do free the memory are currently commented out. - Still some "njn" comments for things that need fixing but I'm not sure how to, eg. removing magic numbers. - lirasm now has some memory leaks that need to be addressed. These are marked with 'njn' comments. - On my Mac I see a ~10ms slowdown on SunSpider. I haven't investigated it, I figure there's not much point so long as it doesn't even run without crashing. - It breaks lirasm. But that's because lirasm is passing a SideExit as the 2nd arg to insGuard when it should be passing a GuardRecord. So this change has exposed that which is a good thing. This bug is blocking bug 497009, which is a 1.9.2 blocker, so assuming that dependency is correct, this is a 1.9.2 blocker. I'm confident about the LIR changes that I've made, but I'm not at all confident about the remaining TM changes -- trying to understand memory lifetimes in large modules I know little about is difficult. Especially when Valgrind is saying that memory is being used longer than even Andreas thought should be the case. In other words, I won't be able to make the necessary TM fixes on my own, so if anyone wants to help that would be great. Or any other suggestions on how to best proceed would be welcome.
Attachment #384577 - Attachment is obsolete: true
Attachment #384577 - Flags: review?(gal)
> I get a bus error on my Mac when I try to free the memory. mmap-related problem? POSIX says: The system shall always zero-fill any partial page at the end of an object. Further, the system shall never write out any modified portions of the last page of an object which are beyond its end. [MPR] [Option Start] References within the address range starting at pa and continuing for len bytes to whole pages following the end of an object shall result in delivery of a SIGBUS signal. [Option End] An implementation may generate SIGBUS signals when a reference would cause an error in the mapped object, such as out-of-space condition.
now that we have nanojit::Allocator, the free protocol for data referenced by LIR is hopefully much simpler. as long as the data is allocated by an Allocator that's at least as long lived by the one used for LirBuffer, then no extra pointer rooting and free-ing should be needed.
I'll take this and rewrite to use the main monitor allocator and/or the recorder allocator landing in bug 517083.
Assignee: nnethercote → graydon
Depends on: 517083
Attached patch update to use allocators (deleted) — Splinter Review
This is a minimal version of the patch that updates the tracer, regexp compiler and lirasm to move all of their existing insSkip() allocations into explicit allocator calls. It omits the portion of the previous patch that actually did away with the insSkip() function in the lirWriter. I'll leave that pleasure for post-merge, as a followup bug.
Attachment #383615 - Attachment is obsolete: true
Attachment #390772 - Attachment is obsolete: true
Attachment #402216 - Flags: review?(gal)
Comment on attachment 402216 [details] [diff] [review] update to use allocators >+ Allocator &alloc = *JS_TRACE_MONITOR(cx).dataAlloc; >+ >+ GuardRecord* guard = (GuardRecord *) >+ alloc.alloc(sizeof(GuardRecord) + >+ sizeof(SideExit) + >+ (re_length-1) * sizeof(jschar)); Does it still make sense to allocate this stuff with one alloc with the new allocator? Or can we clean this up now? >- CallInfo* ci = (CallInfo*) lir->insSkip(sizeof(struct CallInfo))->payload(); >+ CallInfo* ci = (CallInfo*) new (*traceMonitor->dataAlloc) CallInfo(); Do we need the cast? >- CallInfo* ci = (CallInfo*) lir->insSkip(sizeof(struct CallInfo))->payload(); >+ CallInfo* ci = (CallInfo*) new (*traceMonitor->dataAlloc) CallInfo(); And that one? >- JSTraceType* typemap = (JSTraceType*) lir->insSkip(stackSlots * sizeof(JSTraceType))->payload(); >+ JSTraceType* typemap = (JSTraceType*) traceMonitor->dataAlloc->alloc(stackSlots * sizeof(JSTraceType)); If we use new, maybe we can do without a cast?
Attachment #402216 - Flags: review?(gal) → review+
(In reply to comment #42) > (From update of attachment 402216 [details] [diff] [review]) > > >+ Allocator &alloc = *JS_TRACE_MONITOR(cx).dataAlloc; > >+ > >+ GuardRecord* guard = (GuardRecord *) > >+ alloc.alloc(sizeof(GuardRecord) + > >+ sizeof(SideExit) + > >+ (re_length-1) * sizeof(jschar)); > > Does it still make sense to allocate this stuff with one alloc with the new > allocator? Or can we clean this up now? I think it makes sense now. Will move again when introducing lirAlloc (next patch in series). > Do we need the cast? > And that one? Oops, no. > If we use new, maybe we can do without a cast? Hm, probably. Will fix both these.
Whiteboard: fixed-in-tracemonkey
Graydon, did you see any performance difference? NJ compilation speed should have improved slightly due to the reduced number of skips inserted and read, but maybe it was negligible.
(In reply to comment #45) > Graydon, did you see any performance difference? NJ compilation speed should > have improved slightly due to the reduced number of skips inserted and read, > but maybe it was negligible. I did not look; since it's only one level of indirection less and winds up in the same allocator, I figured it'd be lost in the noise. Checking now ... looks like nothing obvious on talos. Possibly measurable if you go hunting / valgrind cycle-counting.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Bug 513596 is a follow-up to this. It deals with removing insSkip().
Flags: in-testsuite-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: