Closed Bug 280844 Opened 20 years ago Closed 19 years ago

Uncontrolled recursion in js_MarkXML during GC

Categories

(Core :: JavaScript Engine, defect, P1)

x86
Linux
defect

Tracking

()

VERIFIED FIXED
mozilla1.8beta2

People

(Reporter: igor, Assigned: brendan)

References

Details

(Keywords: js1.5, Whiteboard: [have patch])

Attachments

(3 files, 7 obsolete files)

The garbage collection implementation for XML objects reintroduced uncontrolled recursion during mark phase that allows to crash SpiderMonkey with specially crafted scripts. Consider, for example, the following script for js shell that in the current code causes uncontrolled recursion through E4X parent chain: var N = 5 * 1000; var x = <x/>; for (var i = 1; i <= N; ++i) { x.appendChild(<x/>); x = x.x[0]; } print(x.toXMLString()); gc(); When I run it with default stack limit of 10MB on FedoraCore 3 system the script run OK: ~/w/mozilla/js/src> ./Linux_All_DBG.OBJ/js -x ~/s/rec.js <x> <x/> </x> before 2209041, after 633762, break 09cfb000 When I set the stack limit to 20K and pass -S 10000 to the shell not to use more then 10000 bytes of stack space I got: ~/w/mozilla/js/src> ulimit -s 20 ~/w/mozilla/js/src> ./Linux_All_DBG.OBJ/js -x -S 10000 ~/s/rec.js <x> <x/> </x> Segmentation fault
Yes, this was a calculated trade-off for the short run. I'll fix now that there is time in 1.8beta2. /be
Assignee: general → brendan
Priority: -- → P3
Target Milestone: --- → mozilla1.8beta2
Attached patch fix (obsolete) (deleted) — Splinter Review
This is a little easier than the general case (DeutschSchorrWaite in jsgc.c), because we can use xml_kids.cursor to remember where the reversed pointer is. /be
Attachment #175253 - Flags: superreview?(shaver)
Attachment #175253 - Flags: review?(igor)
Status: NEW → ASSIGNED
Keywords: js1.5
Whiteboard: [have patch]
This is easier to read. /be
Attachment #175253 - Attachment is obsolete: true
Attachment #175253 - Flags: superreview?(shaver)
Attachment #175253 - Flags: review?(igor)
Attachment #175270 - Attachment description: diff -w version of last attachment → diff -w version of next attachment
Attachment #175270 - Flags: superreview?(shaver)
Attachment #175270 - Flags: review?(igor)
Just to be sure: the above patch assumes that the only entities that are reachable from XML objects and that can lead to deep recursion are XML objects. But what about __proto__ manipulations like in: var x = <x/>; x.__proto__ = non-xml-object ? In this way it should be possible to assemble arbitrary list of xml/non-xml objects. Or if __proto__ manipulations would be disabled, then what about x.function::toXMLString.a = non-xml-object ?
All slots are marked by MarkGCThing, independent of private-data marking done under JSObjectOps.mark, so ad-hoc properties will be scanned too. Another way of looking at this: XML objects are native objects too (their JSObjectOps.newObjectMap is js_NewObjectMap), so the bug being fixed here is only that their private data mark function could nest deeply. The general mark code in jsgc.c already does D-S-W, so this bug's patch simply needs to specialize D-S-W for XML private data. /be
Ugh, this is slightly misstated: > Another way of looking at this: XML objects are native objects too (their > JSObjectOps.newObjectMap is js_NewObjectMap), .... It doesn't matter to the GC whether OBJ_IS_NATIVE or not. *All* objects, "host" (foreign) or native, have their slots marked by the generic code, including the slot-generic D-S-W code, in jsgc.c. What I was getting at was the need for a specialized D-S-W in jsxml.c. /be
Comment on attachment 175270 [details] [diff] [review] diff -w version of next attachment Boy, those DSW kids sure were onto something! sr=shaver
Attachment #175270 - Flags: superreview?(shaver) → superreview+
(In reply to comment #7) > *All* objects, "host" > (foreign) or native, have their slots marked by the generic code, including the > slot-generic D-S-W code, in jsgc.c. I missed that DSW code in jsgc.c marks the heap without C stack recursion for ALL objects including XML and their kids. But then the question is why not to call it directly from js_MarkXML when stack is low and avoid implementing XML DSW?
Igor: the generic code (slot-generic, I wrote here in my last comment) iterates over obj->slots. A JSXML xml_kids array is of type JSXMLArray. JSXML instances may or may not have objects (JSObjects) associated with them. For a big XML tree where script handles only the root and a few paths through the tree, most nodes will not have an object. So it seems worthwhile to have distinct storage for xml_kids. Also, ECMA-357 strongly distinguishes children from ad-hoc properties, pretty much preventing you from decorating an instance with the latter (although XML.prototype and XMLList.prototype in the spec have method properties, and 'constructor', the latter ironically inaccessible!). /be
> I missed that DSW code in jsgc.c marks the heap without C stack recursion for > ALL objects including XML and their kids. ALL objects' slots, IOW (rephrasing my reply in the last comment) -- but *not* XML non-leaf nodes' kids. Those are explicitly distinguished from properties by E4X, and for efficiency (to avoid an object per XML node) not stored in slots in this implementation. Hope this clears things up. The tradeoff is a second DSW impl, but it's pretty small. Anyway, I'd like to get this patch in for 1.8b2. /be
Attachment #175270 - Flags: review?(igor) → review+
(In reply to comment #11) > > I missed that DSW code in jsgc.c marks the heap without C stack recursion for > > ALL objects including XML and their kids. > > ALL objects' slots, IOW (rephrasing my reply in the last comment) -- but *not* > XML non-leaf nodes' kids. Those are explicitly distinguished from properties by > E4X, and for efficiency (to avoid an object per XML node) not stored in slots in > this implementation. > > Hope this clears things up. But how then DeutschSchorrWaite in jsgc.c calls the XML mark code? Consider the code there: if (JSVAL_IS_OBJECT(v)) { *vp = JSVAL_SETTAG(top, JSVAL_BOOLEAN); top = parent; if (scope) scope->dswIndex = EncodeDSWIndex(vp, obj->slots); goto down; } /* Handle string and double GC-things. */ MARK_GC_THING(cx, thing, flagp, NULL); If v points to XML, then JSVAL_IS_OBJECT(v) would be true and the path MARK_GC_THING would not be taken, so reachable "XML non-leaf nodes' kids" would not be marked. What did I miss this time?
> But how then DeutschSchorrWaite in jsgc.c calls the XML mark code? See near the top of DeutschSchorrWaite, under the down: label after the METER(), the initialization of obj and parent, and the scope setting: end = vp + ((obj->map->ops->mark) ? obj->map->ops->mark(cx, obj, NULL) : JS_MIN(obj->map->freeslot, obj->map->nslots)); Thus JSObjectOps.mark, if non-null, is called for each object visited by DSW. Fixed for 1.8b2. My thanks to reviewers! /be
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Here is a test case that causes uncontrolled C recursion even with patch applied: var N = 100 * 1000; function prepare_list(N) { var head = {}; var cursor = head; for (var i = 0; i != N; ++i) { var ns = new Namespace(); var x = <xml/>; x.addNamespace(ns); cursor.next = x; cursor = ns; } return head; } var head = prepare_list(N); gc(); I limit the system C stack to 50K and run: ~/w/js/mozilla/js/src> ./Linux_All_DBG.OBJ/js -x -S 50000 ~/s/x.js Segmentation fault The test case explores the fact that DeutschSchorrWaite in jsgc.c recurses with obj->map->ops->mark != NULL. Note a modification of the test cases causes shell crush, see bug 283439.
Status: RESOLVED → REOPENED
Component: JavaScript Engine → Build Config
Resolution: FIXED → ---
The above test case produces a linked list like: XML ___ XML ___... | | | | ->Namespace.next-| ->Namespace.next-| which explores different paths of C recursion in SM. But in Rhino the linked list would not occure since Rhino copies uri and prefix strings from ns into internal structures on addNamespace etc. and creates a new instance of Namespace on getDeclaredNamespaces() etc. In this way instances of Namespace and QName are not stored in XML objects at all. AFAICS it is allowed by ECMA 357, right?
Component: Build Config → JavaScript Engine
ECMA-357 9.1.1.13 [[AddInScopeNamespaces]], called directly from 13.4.4.2 XML.prototype.addNamespace, ends up changing the [[InScopeNamespaces]] internal property with spec-language of the form: Let x.[[InScopeNamespaces]] = x.[[InScopeNamespaces]] U { N } Taking the set notation and reference semantics at face value, I'd say Rhino has a bug. It should not be cloning namespaces under addNamespace. Rhino also should not be cloning namespaces in XML.prototype.namespaceDeclarations (13.4.4.24), if that's what you meant by "and creates a new instance of Namespace on getDeclaredNamespaces() etc.". Igor, if you agree please file bugs and cc: me. I'll bring this up with ECMA TG1 to confirm. Thanks for finding this hole in the dual DSW approach. There are no doubt more such holes. I'll try to find a universal solution that doesn't force everything into obj->slots. /be
Status: REOPENED → ASSIGNED
(In reply to comment #16) > > Let x.[[InScopeNamespaces]] = x.[[InScopeNamespaces]] U { N } > > Taking the set notation and reference semantics at face value, I'd say Rhino has > a bug. It should not be cloning namespaces under addNamespace. Then it would be quite unfixable bug. It would require to change E4X implementation in Rhino in a very significant way. Rhino internally uses Java XMLBeans library to implement E4X and converts between external presentation of QName and Namespace as JS objects and internal E4X types with value semantics all the time. Clearly all additional properties of Namespace and QName would be lost in transit. > > Rhino also should not be cloning namespaces in > XML.prototype.namespaceDeclarations (13.4.4.24), if that's what you meant by > "and creates a new instance of Namespace on getDeclaredNamespaces() etc.". > > Igor, if you agree please file bugs and cc: me. I'll bring this up with ECMA > TG1 to confirm. I will file the bug after I check for potential hints about the intended semantics in the original sources from AgileDelta folks who made E4X standard to happen.
With two DSW functions that are mutually recursive, we're doomed. To break the cycle so that DSW can call (indirectly) xml_DSW, but not vice versa, requires that we forbid setting ad-hoc properties on Namespace and QName objects. That is the suggestion I made to the ECMA TG1 group, also in light of comment 17 (the separate issue of object identity conservation was addressed by the suggestion that we conserve objects, which will cause Rhino's layering on xmlbeans some pain, requiring a cache of some sort). At least one other TG1 member agreed. So I'll probably make Namespace and QName "closed" objects (not "sealed" in the sense we've used in Rhino and SpiderMonkey, which would cause "sets" as well as "adds" to fail). Comments? /be
Priority: P3 → P1
(In reply to comment #16) > I'll try to find a universal solution that doesn't force everything > into obj->slots. I think replacing the current mark-and-sweep GC by copy one is too drastic, right? But what about using breadth first marking phase instead of the current depth-first? That is, JS_MarkGCThing instead of recursing om unmarked nodes would add them to a special list. Then js_GC would loop over the list until it would be empty calling node-specific mark code. The only problem is how to implement that list without the overhead of an additional pointer per each GC thing. Here is one possibility. Suppose for each N consecutive cells for GC things the allocator also adds memory to hold a bit map of N bits and a pointer to the next bitmap. Then that marking phase list would consist of a linked bitmaps. To add a GC cell with previously unmarked GC thing to that list it would be necessary to do the following: 1. Given the cell address find the address of its bit map and the cell index in that bitmap. 2. If the bitmap is empty, add it to the bitmap list. 3. Set the bit corresponding to the cell in the bitmap. Then the GC marking loop would consider the head of the bitmap list, locate a set bit there, locate the corresponding cell, run cell-specific marking code, clear the set bit. If after that the head bitmap would become empty, it would be removed from the list. What would be the overhead of such schema? For efficient bitmap manipulation it should occupy one machine word. The next pointer would take another word so the overhead is 2 words for each W GC cells where W is number of bits in the word. Thus in the worst case of GC cells that take 2 words the overhead is 2 words per 2*W words or the increase of 1/W in memory usage. For 32-bits processor this worst-case memory overhead is 1/32 or about 3%. AFAICS the price for low memory overhead is increased time to locate the index of a set in the bitmap which is a exercise in bits manipulation but that is a dense code without any memory access which should tun fast on modern processors. What do you think about it?
js/tests/e4x/Regress/regress-280844-1.js checked in for test in comment 0 js/tests/e4x/Regress/regress-280844-2.js checked in for test in comment 14
This is an experimental patch implementing the above idea about recursion less GC. It requires GC_NO_MARK_RECURSION to be defined during compilation to have any effect. With the patch js_MarkGCThing instead of recursion adds GC thing to a bag of unscanned GC things. Then js_GC runs a loop that picks GC thing from bag, removes it and marks its children until the bag is empty. The bag is implemented as a linked list of word-sized bitsets. Each bitset corresponds to JS_BITS_PER_WORD things where each set bit describes thing that is marked but which children are not yet scanned. Thus to put a thing to the bag the code locates the bitset, set thing bit there and ands bitset to the list if it was empty before. To pick and remove a thing from the bug the code locates unset bit in the first bitset of the list, unsets the bit and remove the bitset from the list if it became 0. The linked list data is allocated at the and of arena area which is extended to fit the necessary amount of linked bitsets. The overhead of this is 1/JS_BITS_PER_WORD increase in memory usage or 1/32 on 32 bits platforms. The patch does NOT provide any way to use GC_MARK_DEBUG during marking since the current implementation of that uses C stack for the debugging data. The patch also waists memory with things exceeding JSGCThing in size but it is not clear how that can be avoided: how do I get the size of allocated things based on thing address?
Attached patch Update of non-recursive GC patch (obsolete) (deleted) — Splinter Review
The update fixes the allocation check in js_NewGCThing to take into account additional GC_SCAN_AREA_SIZE when checking if there is space for new gc thing and adds more assert checks. But there are bugs in the patch since it crashes with Assertion failure: (chunk->bits & unscannedBit) == 0, at jsgc.c:1685 on the following script. for (var i = 0; i != 100; ++i) { var x = "A"; x += "BC" + { obj: [1,2,3] }; (function() { for (var j= 0; j != i*i; ++j) { x+={index: j}; } })(); gc(); } I guess I misscalculated the allocation layout somehow :(
Attachment #176918 - Attachment is obsolete: true
Attached patch Update 2 of non-recursive GC patch (obsolete) (deleted) — Splinter Review
Yet another patch update to fix the above assert. I wrongly assumed that arena zeros the allocated pages so this patch version includes memset to zero bitsets area.
Attachment #177008 - Attachment is obsolete: true
This version of the patch removes GC_NO_MARK_RECURSION macro and uses delayed scan approach only when there is C stack overflow. In this way GC_MARK_DEBUG is fully preserved together with tail call optimization so those 3% of extra memory are used only if there is stack overflow.
Attachment #177013 - Attachment is obsolete: true
Since in the above patch the delayed scanning bag is used only if there is C stack overflow the overhead of extra 2 bits per JCGCThing can be reduced farther if the current implementation of the bag as linked list of bitsets will be replaced by the list of bitsets of bitsets. In this way it would be necessary to use one bit per GC thing, one extra bit per each chunk of JS_BITS_PER_WORD of GC thing and one pointer per each JS_BITS_PER_WORD of chunks. If JS_BITS_PER_WORD is 32 it would mean one extra marking bit and 2 words with chunk bitset and next pointer per each arena area. But this is for later.
Attached patch Patch update with faster floorOfLog2 (obsolete) (deleted) — Splinter Review
This is the version of the previous patch with faster implementation of the search for the highest set bit that uses bits manipulations to get branch-free code. It is based on http://aggregate.org/MAGIC/ .
Attachment #177032 - Attachment is obsolete: true
Attachment #177118 - Flags: review?(brendan)
Attached patch Using sets of sets for unscanned things (obsolete) (deleted) — Splinter Review
This version of the patch implements the above idea of using linked lists of sets of sets to implement the bag for marked but not yes scanned GC things. In this way the memory overhead is reduced to one bit per each JCGCThing slot plus one word for bitset of bitsets plus one word for linking or 128 + 2*sizeof(jsuword) bytes per GC arena. The patch is still work in progress due to the following: 1. It lacks comments about new GC arena organization. 2. It adds JS_COMPILE_ASSERT to jsutil.h to assert constant expressions during compilation but I am not sure that it is portable. 3. The patch contains uncommented change: for (i = 0; i < GC_NUM_FREELISTS; i++) { JS_InitArenaPool(&rt->gcArenaPool[i], "gc-arena", GC_ARENA_SIZE, - GC_FREELIST_NBYTES(i)); + sizeof(JSGCThing)); } This is due to the fact that patch increased GC_ARENA_SIZE by those 128 bytes + 2*sizeof(jsuword) so GC_ARENA_SIZE no longer divides 1 << ceil(log_2(GC_FREELIST_NBYTES(i))) for each i. Now I think without the patch that GC_FREELIST_NBYTES(i) should be replaced by 1 since GC would do its own alignment on 1K boundary for things. With the patch that 1 should be replaced by something that allows to align properly the extra data. For now I put sizeof(JSGCThing) but that should be commented or asserted. 4. I discovered that SM already contains an implementation of floor(log2(n)) in the form of JS_FloorLog2. But that takes JSUint32 as argument while the code requires jsuword to support 64 bits case. Plus the implementation of floorOfLog2 asserts that the argument is not 0 so I can not even use JS_FloorLog2 in 32-bits case. So the question is how to avoid code duplication?
Attachment #177118 - Attachment is obsolete: true
Attachment #177118 - Flags: review?(brendan)
The patch explores the fact that in the current JSGCPageInfo the split field is redundant as it can always be calculated from GC thing address and flags. So for the overhead of additional bit manipulations in js_GetGCThingFlags it is possible to get 8 additional words in 32bits case or 16 words for 64 bit CPU. Now in the 64 bits case 16 words give 1024 bits or exactly GC_THINGS_COUNT. So in this case there is enough bits for each GC thing and the previous patch requires very little changes. In the 32 bits case there is only 8 * 32 == 256 bits which gives 4 GC things per bit. But since the bag of unscanned things should only be used for low C stack case, the idea is to scan for children all thing among those 4 that have GC_MARK bit set. The worst case overhead of this is increase of marking phase time by factor of 4 but then the GC graph had to be deliberately constructed. Simple low C stack cases should behave much better. Note that the patched JSGCPageInfo holds arena->base. That can be changed to the offset JSGCPageInfo from arena->base. That offset requires only 2 bytes so there is potentially 2 more bytes per each JSGCPageInfo and the comments about incremental GC write barrier can be modified accordingly.
Attachment #177241 - Attachment is obsolete: true
Flags: testcase+
*** Bug 292167 has been marked as a duplicate of this bug. ***
Nominating for 1.5 blockers - crash bug Brendan would like fixed.
Flags: blocking-aviary1.5-
(In reply to comment #30) > Nominating for 1.5 blockers - crash bug Brendan would like fixed. > did you mean blocking-aviary1.5? instead of blocking-aviary1.5- ?
I get the following stack in the debug shell on winxp for e4x/Regress/regress-280844-2.js i 0x00000000 m 0x00fff628 n 0xd5d5d5d5 + ns2 0x00000000 + xml 0x0102b578 + &xml 0x0013e4b8 AddInScopeNamespace(JSContext * 0x000371b8, JSXML * 0x0102b578, JSXMLNamespace * 0x01025d20) line 3092 + 9 bytes xml_addNamespace(JSContext * 0x000371b8, JSObject * 0x00039048, unsigned int 0x00000001, long * 0x0041da8c, long * 0x0013e574) line 5404 + 17 bytes js_Invoke(JSContext * 0x000371b8, unsigned int 0x00000001, unsigned int 0x00000000) line 1163 + 23 bytes js_Interpret(JSContext * 0x000371b8, unsigned char * 0x0042430e, long * 0x0013ee24) line 3467 + 15 bytes js_Execute(JSContext * 0x000371b8, JSObject * 0x000387c8, JSScript * 0x00424cb0, JSStackFrame * 0x00000000, unsigned int 0x00000000, long * 0x0013fecc) line 1393 + 19 bytes JS_ExecuteScript(JSContext * 0x000371b8, JSObject * 0x000387c8, JSScript * 0x00424cb0, long * 0x0013fecc) line 3842 + 25 bytes Process(JSContext * 0x000371b8, JSObject * 0x000387c8, char * 0x00032d56) line 223 + 22 bytes ProcessArgs(JSContext * 0x000371b8, JSObject * 0x000387c8, char * * 0x00032cc4, int 0x00000006) line 426 + 23 bytes main(int 0x00000006, char * * 0x00032cc4, char * * 0x00033188) line 2552 + 21 bytes JS! mainCRTStartup + 227 bytes KERNEL32! 7c816d4f()
Depends on: 324117
testing with 20060401 builds I don't see any crashes on the trunk for win/mac/linux and only see crashes on 1.8.0.2 and 1.8 for windows and mac. Whatever "fixed" it, occured around 20060320 - 20060324. I don't have time to track the fix down, but if someone else can that would be nice. If not, I'll mark it wfm on my next run.
Fixed by patch for bug 324278. /be
Status: ASSIGNED → RESOLVED
Closed: 20 years ago19 years ago
Resolution: --- → FIXED
verified 1.9a1 win/mac/linux 20060415
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: