Closed
Bug 280844
Opened 20 years ago
Closed 19 years ago
Uncontrolled recursion in js_MarkXML during GC
Categories
(Core :: JavaScript Engine, defect, P1)
Tracking
()
VERIFIED
FIXED
mozilla1.8beta2
People
(Reporter: igor, Assigned: brendan)
References
Details
(Keywords: js1.5, Whiteboard: [have patch])
Attachments
(3 files, 7 obsolete files)
(deleted),
patch
|
igor
:
review+
shaver
:
superreview+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review |
The garbage collection implementation for XML objects reintroduced uncontrolled
recursion during mark phase that allows to crash SpiderMonkey with specially
crafted scripts.
Consider, for example, the following script for js shell that in the current
code causes uncontrolled recursion through E4X parent chain:
var N = 5 * 1000;
var x = <x/>;
for (var i = 1; i <= N; ++i) {
x.appendChild(<x/>);
x = x.x[0];
}
print(x.toXMLString());
gc();
When I run it with default stack limit of 10MB on FedoraCore 3 system the script
run OK:
~/w/mozilla/js/src> ./Linux_All_DBG.OBJ/js -x ~/s/rec.js
<x>
<x/>
</x>
before 2209041, after 633762, break 09cfb000
When I set the stack limit to 20K and pass -S 10000 to the shell not to use more
then 10000 bytes of stack space I got:
~/w/mozilla/js/src> ulimit -s 20
~/w/mozilla/js/src> ./Linux_All_DBG.OBJ/js -x -S 10000 ~/s/rec.js
<x>
<x/>
</x>
Segmentation fault
Assignee | ||
Comment 1•20 years ago
|
||
Yes, this was a calculated trade-off for the short run. I'll fix now that there
is time in 1.8beta2.
/be
Assignee: general → brendan
Priority: -- → P3
Target Milestone: --- → mozilla1.8beta2
Assignee | ||
Comment 2•20 years ago
|
||
This is a little easier than the general case (DeutschSchorrWaite in jsgc.c),
because we can use xml_kids.cursor to remember where the reversed pointer is.
/be
Attachment #175253 -
Flags: superreview?(shaver)
Attachment #175253 -
Flags: review?(igor)
Assignee | ||
Updated•20 years ago
|
Assignee | ||
Comment 3•20 years ago
|
||
This is easier to read.
/be
Assignee | ||
Updated•20 years ago
|
Attachment #175253 -
Attachment is obsolete: true
Attachment #175253 -
Flags: superreview?(shaver)
Attachment #175253 -
Flags: review?(igor)
Assignee | ||
Comment 4•20 years ago
|
||
Assignee | ||
Updated•20 years ago
|
Attachment #175270 -
Attachment description: diff -w version of last attachment → diff -w version of next attachment
Attachment #175270 -
Flags: superreview?(shaver)
Attachment #175270 -
Flags: review?(igor)
Reporter | ||
Comment 5•20 years ago
|
||
Just to be sure: the above patch assumes that the only entities that are
reachable from XML objects and that can lead to deep recursion are XML objects.
But what about __proto__ manipulations like in:
var x = <x/>;
x.__proto__ = non-xml-object ?
In this way it should be possible to assemble arbitrary list of xml/non-xml
objects. Or if __proto__ manipulations would be disabled, then what about
x.function::toXMLString.a = non-xml-object
?
Assignee | ||
Comment 6•20 years ago
|
||
All slots are marked by MarkGCThing, independent of private-data marking done
under JSObjectOps.mark, so ad-hoc properties will be scanned too.
Another way of looking at this: XML objects are native objects too (their
JSObjectOps.newObjectMap is js_NewObjectMap), so the bug being fixed here is
only that their private data mark function could nest deeply. The general mark
code in jsgc.c already does D-S-W, so this bug's patch simply needs to
specialize D-S-W for XML private data.
/be
Assignee | ||
Comment 7•20 years ago
|
||
Ugh, this is slightly misstated:
> Another way of looking at this: XML objects are native objects too (their
> JSObjectOps.newObjectMap is js_NewObjectMap), ....
It doesn't matter to the GC whether OBJ_IS_NATIVE or not. *All* objects, "host"
(foreign) or native, have their slots marked by the generic code, including the
slot-generic D-S-W code, in jsgc.c.
What I was getting at was the need for a specialized D-S-W in jsxml.c.
/be
Comment on attachment 175270 [details] [diff] [review]
diff -w version of next attachment
Boy, those DSW kids sure were onto something!
sr=shaver
Attachment #175270 -
Flags: superreview?(shaver) → superreview+
Reporter | ||
Comment 9•20 years ago
|
||
(In reply to comment #7)
> *All* objects, "host"
> (foreign) or native, have their slots marked by the generic code, including the
> slot-generic D-S-W code, in jsgc.c.
I missed that DSW code in jsgc.c marks the heap without C stack recursion for
ALL objects including XML and their kids. But then the question is why not to
call it directly from js_MarkXML when stack is low and avoid implementing XML DSW?
Assignee | ||
Comment 10•20 years ago
|
||
Igor: the generic code (slot-generic, I wrote here in my last comment) iterates
over obj->slots. A JSXML xml_kids array is of type JSXMLArray.
JSXML instances may or may not have objects (JSObjects) associated with them.
For a big XML tree where script handles only the root and a few paths through
the tree, most nodes will not have an object. So it seems worthwhile to have
distinct storage for xml_kids. Also, ECMA-357 strongly distinguishes children
from ad-hoc properties, pretty much preventing you from decorating an instance
with the latter (although XML.prototype and XMLList.prototype in the spec have
method properties, and 'constructor', the latter ironically inaccessible!).
/be
Assignee | ||
Comment 11•20 years ago
|
||
> I missed that DSW code in jsgc.c marks the heap without C stack recursion for
> ALL objects including XML and their kids.
ALL objects' slots, IOW (rephrasing my reply in the last comment) -- but *not*
XML non-leaf nodes' kids. Those are explicitly distinguished from properties by
E4X, and for efficiency (to avoid an object per XML node) not stored in slots in
this implementation.
Hope this clears things up. The tradeoff is a second DSW impl, but it's pretty
small. Anyway, I'd like to get this patch in for 1.8b2.
/be
Reporter | ||
Updated•20 years ago
|
Attachment #175270 -
Flags: review?(igor) → review+
Reporter | ||
Comment 12•20 years ago
|
||
(In reply to comment #11)
> > I missed that DSW code in jsgc.c marks the heap without C stack recursion for
> > ALL objects including XML and their kids.
>
> ALL objects' slots, IOW (rephrasing my reply in the last comment) -- but *not*
> XML non-leaf nodes' kids. Those are explicitly distinguished from properties by
> E4X, and for efficiency (to avoid an object per XML node) not stored in slots in
> this implementation.
>
> Hope this clears things up.
But how then DeutschSchorrWaite in jsgc.c calls the XML mark code?
Consider the code there:
if (JSVAL_IS_OBJECT(v)) {
*vp = JSVAL_SETTAG(top, JSVAL_BOOLEAN);
top = parent;
if (scope)
scope->dswIndex = EncodeDSWIndex(vp, obj->slots);
goto down;
}
/* Handle string and double GC-things. */
MARK_GC_THING(cx, thing, flagp, NULL);
If v points to XML, then JSVAL_IS_OBJECT(v) would be true and the path
MARK_GC_THING would not be taken, so reachable "XML non-leaf nodes' kids" would
not be marked.
What did I miss this time?
Assignee | ||
Comment 13•20 years ago
|
||
> But how then DeutschSchorrWaite in jsgc.c calls the XML mark code?
See near the top of DeutschSchorrWaite, under the down: label after the METER(),
the initialization of obj and parent, and the scope setting:
end = vp + ((obj->map->ops->mark)
? obj->map->ops->mark(cx, obj, NULL)
: JS_MIN(obj->map->freeslot, obj->map->nslots));
Thus JSObjectOps.mark, if non-null, is called for each object visited by DSW.
Fixed for 1.8b2. My thanks to reviewers!
/be
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 14•20 years ago
|
||
Here is a test case that causes uncontrolled C recursion even with patch applied:
var N = 100 * 1000;
function prepare_list(N)
{
var head = {};
var cursor = head;
for (var i = 0; i != N; ++i) {
var ns = new Namespace();
var x = <xml/>;
x.addNamespace(ns);
cursor.next = x;
cursor = ns;
}
return head;
}
var head = prepare_list(N);
gc();
I limit the system C stack to 50K and run:
~/w/js/mozilla/js/src> ./Linux_All_DBG.OBJ/js -x -S 50000 ~/s/x.js
Segmentation fault
The test case explores the fact that DeutschSchorrWaite in jsgc.c recurses with
obj->map->ops->mark != NULL.
Note a modification of the test cases causes shell crush, see bug 283439.
Status: RESOLVED → REOPENED
Component: JavaScript Engine → Build Config
Resolution: FIXED → ---
Reporter | ||
Comment 15•20 years ago
|
||
The above test case produces a linked list like:
XML ___ XML ___...
| | | |
->Namespace.next-| ->Namespace.next-|
which explores different paths of C recursion in SM.
But in Rhino the linked list would not occure since Rhino copies uri and prefix
strings from ns into internal structures on addNamespace etc. and creates a new
instance of Namespace on getDeclaredNamespaces() etc. In this way instances of
Namespace and QName are not stored in XML objects at all. AFAICS it is allowed
by ECMA 357, right?
Component: Build Config → JavaScript Engine
Assignee | ||
Comment 16•20 years ago
|
||
ECMA-357 9.1.1.13 [[AddInScopeNamespaces]], called directly from 13.4.4.2
XML.prototype.addNamespace, ends up changing the [[InScopeNamespaces]] internal
property with spec-language of the form:
Let x.[[InScopeNamespaces]] = x.[[InScopeNamespaces]] U { N }
Taking the set notation and reference semantics at face value, I'd say Rhino has
a bug. It should not be cloning namespaces under addNamespace.
Rhino also should not be cloning namespaces in
XML.prototype.namespaceDeclarations (13.4.4.24), if that's what you meant by
"and creates a new instance of Namespace on getDeclaredNamespaces() etc.".
Igor, if you agree please file bugs and cc: me. I'll bring this up with ECMA
TG1 to confirm.
Thanks for finding this hole in the dual DSW approach. There are no doubt more
such holes. I'll try to find a universal solution that doesn't force everything
into obj->slots.
/be
Status: REOPENED → ASSIGNED
Reporter | ||
Comment 17•20 years ago
|
||
(In reply to comment #16)
>
> Let x.[[InScopeNamespaces]] = x.[[InScopeNamespaces]] U { N }
>
> Taking the set notation and reference semantics at face value, I'd say Rhino has
> a bug. It should not be cloning namespaces under addNamespace.
Then it would be quite unfixable bug. It would require to change E4X
implementation in Rhino in a very significant way. Rhino internally uses Java
XMLBeans library to implement E4X and converts between external presentation of
QName and Namespace as JS objects and internal E4X types with value semantics
all the time. Clearly all additional properties of Namespace and QName would be
lost in transit.
>
> Rhino also should not be cloning namespaces in
> XML.prototype.namespaceDeclarations (13.4.4.24), if that's what you meant by
> "and creates a new instance of Namespace on getDeclaredNamespaces() etc.".
>
> Igor, if you agree please file bugs and cc: me. I'll bring this up with ECMA
> TG1 to confirm.
I will file the bug after I check for potential hints about the intended
semantics in the original sources from AgileDelta folks who made E4X standard to
happen.
Assignee | ||
Comment 18•20 years ago
|
||
With two DSW functions that are mutually recursive, we're doomed. To break the
cycle so that DSW can call (indirectly) xml_DSW, but not vice versa, requires
that we forbid setting ad-hoc properties on Namespace and QName objects.
That is the suggestion I made to the ECMA TG1 group, also in light of comment 17
(the separate issue of object identity conservation was addressed by the
suggestion that we conserve objects, which will cause Rhino's layering on
xmlbeans some pain, requiring a cache of some sort). At least one other TG1
member agreed.
So I'll probably make Namespace and QName "closed" objects (not "sealed" in the
sense we've used in Rhino and SpiderMonkey, which would cause "sets" as well as
"adds" to fail). Comments?
/be
Priority: P3 → P1
Reporter | ||
Comment 19•20 years ago
|
||
(In reply to comment #16)
> I'll try to find a universal solution that doesn't force everything
> into obj->slots.
I think replacing the current mark-and-sweep GC by copy one is too drastic,
right? But what about using breadth first marking phase instead of the current
depth-first? That is, JS_MarkGCThing instead of recursing om unmarked nodes
would add them to a special list. Then js_GC would loop over the list until it
would be empty calling node-specific mark code.
The only problem is how to implement that list without the overhead of an
additional pointer per each GC thing. Here is one possibility.
Suppose for each N consecutive cells for GC things the allocator also adds
memory to hold a bit map of N bits and a pointer to the next bitmap. Then that
marking phase list would consist of a linked bitmaps. To add a GC cell with
previously unmarked GC thing to that list it would be necessary to do the following:
1. Given the cell address find the address of its bit map and the cell index in
that bitmap.
2. If the bitmap is empty, add it to the bitmap list.
3. Set the bit corresponding to the cell in the bitmap.
Then the GC marking loop would consider the head of the bitmap list, locate a
set bit there, locate the corresponding cell, run cell-specific marking code,
clear the set bit. If after that the head bitmap would become empty, it would be
removed from the list.
What would be the overhead of such schema? For efficient bitmap manipulation it
should occupy one machine word. The next pointer would take another word so the
overhead is 2 words for each W GC cells where W is number of bits in the word.
Thus in the worst case of GC cells that take 2 words the overhead is 2 words per
2*W words or the increase of 1/W in memory usage.
For 32-bits processor this worst-case memory overhead is 1/32 or about 3%.
AFAICS the price for low memory overhead is increased time to locate the index
of a set in the bitmap which is a exercise in bits manipulation but that is a
dense code without any memory access which should tun fast on modern processors.
What do you think about it?
Comment 20•20 years ago
|
||
js/tests/e4x/Regress/regress-280844-1.js checked in for test in comment 0
js/tests/e4x/Regress/regress-280844-2.js checked in for test in comment 14
Reporter | ||
Comment 21•20 years ago
|
||
This is an experimental patch implementing the above idea about recursion less
GC. It requires GC_NO_MARK_RECURSION to be defined during compilation to have
any effect.
With the patch js_MarkGCThing instead of recursion adds GC thing to a bag of
unscanned GC things. Then js_GC runs a loop that picks GC thing from bag,
removes it and marks its children until the bag is empty.
The bag is implemented as a linked list of word-sized bitsets. Each bitset
corresponds to JS_BITS_PER_WORD things where each set bit describes thing that
is marked but which children are not yet scanned.
Thus to put a thing to the bag the code locates the bitset, set thing bit there
and ands bitset to the list if it was empty before. To pick and remove a thing
from the bug the code locates unset bit in the first bitset of the list, unsets
the bit and remove the bitset from the list if it became 0.
The linked list data is allocated at the and of arena area which is extended to
fit the necessary amount of linked bitsets. The overhead of this is
1/JS_BITS_PER_WORD increase in memory usage or 1/32 on 32 bits platforms.
The patch does NOT provide any way to use GC_MARK_DEBUG during marking since
the current implementation of that uses C stack for the debugging data.
The patch also waists memory with things exceeding JSGCThing in size but it is
not clear how that can be avoided: how do I get the size of allocated things
based on thing address?
Reporter | ||
Comment 22•20 years ago
|
||
The update fixes the allocation check in js_NewGCThing to take into account
additional GC_SCAN_AREA_SIZE when checking if there is space for new gc thing
and adds more assert checks.
But there are bugs in the patch since it crashes with
Assertion failure: (chunk->bits & unscannedBit) == 0, at jsgc.c:1685
on the following script.
for (var i = 0; i != 100; ++i) {
var x = "A";
x += "BC" + { obj: [1,2,3] };
(function() {
for (var j= 0; j != i*i; ++j) {
x+={index: j};
}
})();
gc();
}
I guess I misscalculated the allocation layout somehow :(
Reporter | ||
Updated•20 years ago
|
Attachment #176918 -
Attachment is obsolete: true
Reporter | ||
Comment 23•20 years ago
|
||
Yet another patch update to fix the above assert. I wrongly assumed that arena
zeros the allocated pages so this patch version includes memset to zero bitsets
area.
Reporter | ||
Updated•20 years ago
|
Attachment #177008 -
Attachment is obsolete: true
Reporter | ||
Comment 24•20 years ago
|
||
This version of the patch removes GC_NO_MARK_RECURSION macro and uses delayed
scan approach only when there is C stack overflow. In this way GC_MARK_DEBUG is
fully preserved together with tail call optimization so those 3% of extra
memory are used only if there is stack overflow.
Attachment #177013 -
Attachment is obsolete: true
Reporter | ||
Comment 25•20 years ago
|
||
Since in the above patch the delayed scanning bag is used only if there is C
stack overflow the overhead of extra 2 bits per JCGCThing can be reduced farther
if the current implementation of the bag as linked list of bitsets will be
replaced by the list of bitsets of bitsets.
In this way it would be necessary to use one bit per GC thing, one extra bit per
each chunk of JS_BITS_PER_WORD of GC thing and one pointer per each
JS_BITS_PER_WORD of chunks. If JS_BITS_PER_WORD is 32 it would mean one extra
marking bit and 2 words with chunk bitset and next pointer per each arena area.
But this is for later.
Reporter | ||
Comment 26•20 years ago
|
||
This is the version of the previous patch with faster implementation of the
search for the highest set bit that uses bits manipulations to get branch-free
code. It is based on http://aggregate.org/MAGIC/ .
Attachment #177032 -
Attachment is obsolete: true
Attachment #177118 -
Flags: review?(brendan)
Reporter | ||
Comment 27•20 years ago
|
||
This version of the patch implements the above idea of using linked lists of
sets of sets to implement the bag for marked but not yes scanned GC things.
In this way the memory overhead is reduced to one bit per each JCGCThing slot
plus one word for bitset of bitsets plus one word for linking or 128 +
2*sizeof(jsuword) bytes per GC arena.
The patch is still work in progress due to the following:
1. It lacks comments about new GC arena organization.
2. It adds JS_COMPILE_ASSERT to jsutil.h to assert constant expressions during
compilation but I am not sure that it is portable.
3. The patch contains uncommented change:
for (i = 0; i < GC_NUM_FREELISTS; i++) {
JS_InitArenaPool(&rt->gcArenaPool[i], "gc-arena", GC_ARENA_SIZE,
- GC_FREELIST_NBYTES(i));
+ sizeof(JSGCThing));
}
This is due to the fact that patch increased GC_ARENA_SIZE by those 128 bytes +
2*sizeof(jsuword) so GC_ARENA_SIZE no longer divides
1 << ceil(log_2(GC_FREELIST_NBYTES(i)))
for each i. Now I think without the patch that GC_FREELIST_NBYTES(i) should be
replaced by 1 since GC would do its own alignment on 1K boundary for things.
With the patch that 1 should be replaced by something that allows to align
properly the extra data. For now I put sizeof(JSGCThing) but that should be
commented or asserted.
4. I discovered that SM already contains an implementation of floor(log2(n)) in
the form of JS_FloorLog2. But that takes JSUint32 as argument while the code
requires jsuword to support 64 bits case. Plus the implementation of
floorOfLog2 asserts that the argument is not 0 so I can not even use
JS_FloorLog2 in 32-bits case. So the question is how to avoid code duplication?
Attachment #177118 -
Attachment is obsolete: true
Reporter | ||
Updated•20 years ago
|
Attachment #177118 -
Flags: review?(brendan)
Reporter | ||
Comment 28•20 years ago
|
||
The patch explores the fact that in the current JSGCPageInfo the split field is
redundant as it can always be calculated from GC thing address and flags. So
for the overhead of additional bit manipulations in js_GetGCThingFlags it is
possible to get 8 additional words in 32bits case or 16 words for 64 bit CPU.
Now in the 64 bits case 16 words give 1024 bits or exactly GC_THINGS_COUNT. So
in this case there is enough bits for each GC thing and the previous patch
requires very little changes.
In the 32 bits case there is only 8 * 32 == 256 bits which gives 4 GC things
per bit. But since the bag of unscanned things should only be used for low C
stack case, the idea is to scan for children all thing among those 4 that have
GC_MARK bit set. The worst case overhead of this is increase of marking phase
time by factor of 4 but then the GC graph had to be deliberately constructed.
Simple low C stack cases should behave much better.
Note that the patched JSGCPageInfo holds arena->base. That can be changed to
the offset JSGCPageInfo from arena->base. That offset requires only 2 bytes so
there is potentially 2 more bytes per each JSGCPageInfo and the comments about
incremental GC write barrier can be modified accordingly.
Attachment #177241 -
Attachment is obsolete: true
Updated•20 years ago
|
Flags: testcase+
Reporter | ||
Comment 29•19 years ago
|
||
*** Bug 292167 has been marked as a duplicate of this bug. ***
Comment 30•19 years ago
|
||
Nominating for 1.5 blockers - crash bug Brendan would like fixed.
Flags: blocking-aviary1.5-
Comment 31•19 years ago
|
||
(In reply to comment #30)
> Nominating for 1.5 blockers - crash bug Brendan would like fixed.
>
did you mean blocking-aviary1.5? instead of blocking-aviary1.5- ?
Comment 32•19 years ago
|
||
I get the following stack in the debug shell on winxp for
e4x/Regress/regress-280844-2.js
i 0x00000000
m 0x00fff628
n 0xd5d5d5d5
+ ns2 0x00000000
+ xml 0x0102b578
+ &xml 0x0013e4b8
AddInScopeNamespace(JSContext * 0x000371b8, JSXML * 0x0102b578, JSXMLNamespace *
0x01025d20) line 3092 + 9 bytes
xml_addNamespace(JSContext * 0x000371b8, JSObject * 0x00039048, unsigned int
0x00000001, long * 0x0041da8c, long * 0x0013e574) line 5404 + 17 bytes
js_Invoke(JSContext * 0x000371b8, unsigned int 0x00000001, unsigned int
0x00000000) line 1163 + 23 bytes
js_Interpret(JSContext * 0x000371b8, unsigned char * 0x0042430e, long *
0x0013ee24) line 3467 + 15 bytes
js_Execute(JSContext * 0x000371b8, JSObject * 0x000387c8, JSScript * 0x00424cb0,
JSStackFrame * 0x00000000, unsigned int 0x00000000, long * 0x0013fecc) line 1393
+ 19 bytes
JS_ExecuteScript(JSContext * 0x000371b8, JSObject * 0x000387c8, JSScript *
0x00424cb0, long * 0x0013fecc) line 3842 + 25 bytes
Process(JSContext * 0x000371b8, JSObject * 0x000387c8, char * 0x00032d56) line
223 + 22 bytes
ProcessArgs(JSContext * 0x000371b8, JSObject * 0x000387c8, char * * 0x00032cc4,
int 0x00000006) line 426 + 23 bytes
main(int 0x00000006, char * * 0x00032cc4, char * * 0x00033188) line 2552 + 21 bytes
JS! mainCRTStartup + 227 bytes
KERNEL32! 7c816d4f()
Comment 33•19 years ago
|
||
testing with 20060401 builds I don't see any crashes on the trunk for win/mac/linux and only see crashes on 1.8.0.2 and 1.8 for windows and mac. Whatever "fixed" it, occured around 20060320 - 20060324. I don't have time to track the fix down, but if someone else can that would be nice. If not, I'll mark it wfm on my next run.
Assignee | ||
Comment 34•19 years ago
|
||
Fixed by patch for bug 324278.
/be
Status: ASSIGNED → RESOLVED
Closed: 20 years ago → 19 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•