Closed Bug 353231 Opened 18 years ago Closed 11 years ago

Static analyzer to enforce GC rooting

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr17 --- wontfix
firefox-esr24 --- wontfix
b2g18 --- wontfix

People

(Reporter: igor, Unassigned)

Details

(Keywords: sec-want, Whiteboard: [sg:want])

This is a spin-off of bug 352846 comment 14. A static source analyzer tool together with source annotations about explicit GC roots should help to find/avoid GC hazards in SpiderMonkey.
Marking as restricted to comment on typical code paths leading to vulnerabilities.
Group: security
Here is some details on the desired features of the static analyzer. First some terminology: 1. GC thing - a thing that is allocated from a memory controlled by the Garabge Collector. For the list of their types see http://lxr.mozilla.org/seamonkey/source/js/src/jsgc.h#51 2. Close hook for a GC thing - native or a scripted code that is executed after GC has found that the thing is no longer reachable. This code can be arbitrary and roughly corresponds to objects with the finalize method in Java or heap-allocated objects with destructors in C#. Note that the term "finalizer" is used in SpiderMonkey to denote native code that is executed right before the corresponding GC thing is returned to the pool of free memory. Such finalizer typically releases a system resource and can use very limited set of SpiderMonkey API. Currently the close hooks is used in SpiderMonkey only to implement the generators. In particular, when GC determines that a generator that yielded inside the try statement with a finally block is unreachable, SpiderMonkey follows Python 2.5 and executes the finally block after the garbage collection stage. 3. Last-ditch GC - a garbage collection that runs when GC heap is full. Compared with a garbage collection that application triggers through an explicit call to JS_GC() or JS_MaybeGC(), the last ditch GC does not collect atoms, never runs the close hooks and treats newborn array (see bellow) as normal roots. 4. Strong root - a normal GC root. If GC thing is stored in the strong root, then it is considered GC-reachable until the application explicitly removes the thing from the root or removes the root itself from the set of roots. 5. Weak root - a root that only protects against the last-ditch GC but not normal GC. There are 3 categories of such roots in SpiderMonkey: a. newborn array or lastInternalResult fields in JSContext structure, http://lxr.mozilla.org/seamonkey/source/js/src/jscntxt.h#609. When GC allocates a new GC thing of a particular type, it places it into JSContext.newborn[type] array. The array is protected against the last ditch GC so code can safely allocate several GC things of different types in a row without the need to root them. JSContext.lastInternalResult is used to hold the result of the last scripted or native function invocation so code can allocate a new GC thing and put the result there without an explicit root. b. Any location that holds an atom (interned property). The last ditch GC does not touch them so code can allocate many atoms in a row without the explicit rooting before placing them into strongly-rooted entity. c. A location that can be manipulated by a script such as object property, array element etc. Normally a GC thing stored in a such location is GC-protected through its parent object. The problem is that a close hook can remove the thing from the location and then trigger GC one more time. Note that even before the introduction of the close hook support such locations were weak roots as almost all SpiderMonkey API that can trigger GC, triggers it through a script execution. In such cases a script does not need to define a close hook to cut the thing from the object graph and can do it directly. Still introduction of the close hooks made this weak rooting property explicit. 6. Weakly rooted GC thing - GC thing that is rooted only via a week root. Such thing is protected against the last-ditch GC, but any normal GC call will collect it. Note that a few if not most of SpiderMonkey API can trigger a full GC as a side affect of calling application supplied callbacks. Thus one must be very careful with weakly rooted things. With this terminology I can classify GC-related bugs into 3 categories. 1. Treating weak roots as strong ones. So far this was the major source of bugs. A trivial example is: str = ApiToAllocateNewThing(); // str is only protected by JSContext.newborn CallAnotherAPIThatCanTriggerFullGC(); Use str. In many cases such pattern happens when code correctly assumed that CallAnotherAPIThatCanTriggerFullGC() could only trigger the last ditch GC but later API changes broke that and allowed for a full GC run. A more subtle example is: JSObject *obj; jsval val; ... OBJ_GET_PROPERTY(.., obj, ..., &val); // val is rooted only as property of obj CallAnotherAPIThatCanTriggerFullGC(); Use val Here to exploit a bug a script can define a close hook that deletes the property of obj and then trigger GC one more time. Note that pattern often exists in a form of chained calls. That is, one function call an API that returns a weakly rooted thing and passes it to the second function as an argument. The second function first calls CallAnotherAPIThatCanTriggerFullGC() before using its argument. This treating weak-as-strong root type of bugs in most cases requires a very specialized script to expose them and so far no testing tool is available to check for them. 2. Displacing a value from a strong root. This significantly less frequent bug but still poses a problem. A pattern to watch would be: void Example(..., GCThingType *pointerToAStronglyRootedLocation) { GCThingType gcthing = *pointerToAStronglyRootedLocation; Use gcthing Store another gcthing in PointerToARootedLocation; Call API that can trigger any kind of GC Use gcthing again. } Again the pattern may exist as a call chain like in: void Good(GCThingType gcthing1, GCThingType *pointerToAStronglyRootedLocation) { GCThingType gcthing2 = AllocateNewGCThing(); *pointerToAStronglyRootedLocation = gcthing2; Call API that can trigger any kind of GC Use gcthing1 } void Bad(GCThingType *pointerToAStronglyRootedLocation) { GCThingType gcthing = *pointerToAStronglyRootedLocation; Bad(gcthing, pointerToAStronglyRootedLocation); } Here the function Good assumes that *pointerToAStronglyRootedLocation is a place where to store the result and which it can use as a temporary root as well and gcthing1 is rooted by some other means while the function Bad violates that assumption. 3. Forgetting to mark things during GC-marking phase. This is a minor source of the problems and compiling SpiderMonkey with WAY_TOO_MUCH_GC (which forces to run GC almost on any allocation) and running various stress-tests was able to reveal most of them. Thus it would be nice if a static analyzer can detect the first kind of bugs: GetWeaklyRootedValue CallAPIThatCanTriggerFullGC UseWeaklyRootedValue where weakly rooted value is not copied to a strong root. Now the past experience tells that many such bugs can be fixed by changing the API to require to pass a pointer to a strong root so GetWeaklyRootedValue would store the value in the strong root immediately. But then the problem 2 can become very real as it is tempting to optimize away an extra rooting and reuse the rooted pointer for other things.
Whiteboard: [sg:investigate]
Here is a typical bugs that shows the GC harzads, their exploits and rooting fixes: Bug 311497 - using unrooted temporary storage for heap sort pivot. This is AFAICS is a hard problem for static analizies. But if some tool can detect it, it would be nice. Bug 352846 - a recent example that demonstrates how fragile the weekly rooted model. Bug 322045 - another good example of too much relience on week roots. Bug 312278 - a good demo for various GC hazards and typical fixes. Note that the patches there use, among other thinks, AllocStack and js_EnterLocalRootScope internal API to root the temporaries. That code was switched to using temp roots, a much better API for strong temporray roots, see bug 357169. Bug 325269 - the bug that lead to the introduction JS_PUSH_TEMP_ROOT/JS_POP_TEMP_ROOT API and shows that a GC thing that is rooted only via an object graph is, in fact, a weakly-rooted GC thing and may not survive innocent-looking API call (old-born problem). Bug 345967 and bug 341956 - the bugs that remind that atoms or interned properties are weakly rooted GC things. Bug 313276 - another good example of old-born problem. Bug 316885 - demonstrates how to root GC thing using interpreter stack. Bug 313952 - bug that lead to the introduction of lastInternalResult.
Some thought on a conservative type checking. I believe if we can check the following 3 rules, then we would avoid GC hazards in future. In the following I will use PGT or Pointer-to-Gc-Thing instead of jsval since the consideration applies equally well to jsval or explicit pointers to GC things like JSObject*, JSString*, JSXML*, JSQName* or JSNamespace*. 1. The first bad pattern to detect is: Foo(...) { PGT x; ... Bar(&x); ... } That should not be allowed. That is, a pointer to a local variable holding PGT must not be passed to any function. Note that currently there are still few places that violates this rule especially in the interpreter but the past experience has shown that it is extremely easy to use "x" after "Bar(&x)" without introducing GC hazard. A typical example of the bug with PGT == jsval and the way it is fixed is array_join_sub function from jsarray.c. Before the fix it contained: static JSBool array_join_sub(JSContext *cx, JSObject *obj, enum ArrayToStringOp op, JSString *sep, jsval *rval) { jsval v; .... for (index = 0; index < length; index++) { ok = GetArrayElement(cx, obj, index, &hole, &v); ... ok = js_ValueToObject(cx, v, ...); ... } This was a GC hazard since js_ValueToObject requires v to be rooted. The fix changed the code into: static JSBool array_join_sub(JSContext *cx, JSObject *obj, enum ArrayToStringOp op, JSString *sep, jsval *rval) { /* Use rval to locally root each element value as we loop and convert. */ #define v (*rval) .... for (index = 0; index < length; index++) { ok = GetArrayElement(cx, obj, index, &hole, &v); ... ok = js_ValueToObject(cx, v, ...); ... } The nice consequence of this approach is that the code can assume that a function argument of type PGT* can always be used for the local roots in the same as the above fix used jsval *rval. This rule removes specialty of the signature of the native function: typedef JSBool (* JS_DLL_CALLBACK JSNative)(JSContext *cx, JSObject *obj, uintN argc, jsval *argv, jsval *rval); Here jsval *argv and jsval *rval are pointers to rooted locations by the definition. But how do we get rooted pointers then? There are several ways. One is to use extern JS_FRIEND_API(jsval *) js_AllocStack(JSContext *cx, uintN nslots, void **markp); which allocates a segment of rooted stack and all pointers into the segment are rooted. Another way is to register the pointer to PGT with GC. The most widely used way is TEMP_ROOT API. For example, the above example can also be fixed via: static JSBool array_join_sub(JSContext *cx, JSObject *obj, enum ArrayToStringOp op, JSString *sep, jsval *rval) { jsval v; JSTempValueRooter tvr; v = JSVAL_NULL; /* initialize the value before registering it with GC */ JS_PUSH_TEMP_ROOT(cx, 1, &v, &tvr); /* register 1 location */ .... for (index = 0; index < length; index++) { ok = GetArrayElement(cx, obj, index, &hole, &v); ... ok = js_ValueToObject(cx, v, ...); ... } ... JS_POP_TEMP_ROOT(cx, &tvr); } That is, between JS_PUSH_TEMP_ROOT(cx, 1, &v, &tvr) and JS_POP_TEMP_ROOT(cx, &tvr) &v is a pointer to a rooted location and can be passed to other functions. 2. The second bad pattern can be demonstrated with the following assert: Foo2(PGT x, PGT* pointer, ...) { assert(x != *pointer); ... } This states that all arguments to a function of type PGT must not be rooted through a copy that exists in a location pointed by an argument of type PGT*. In a sense this is a consequence of the previous rule that states that PGT* arguments can be freely as local roots. Clearly if the caller uses them to root PGT passed to the callee as other argument this can not be the case. For example, consider the following function from jsobj.c: JSBool js_ValueToObject(JSContext *cx, jsval v, JSObject **objp); A bug can happen with a code path like that: JSObject *obj; jsval v; ... obj = NULL; RegisterObjectRoot(&obj); /* there is no such function but you can get the same effect with TEMP_ROOT API */ /* Now &obj is a pointer to a rooted location */ ConstructObject(&obj); v = OBJECT_TO_JSVAL(obj); js_ValueToObject(cx, v, &obj); /* BUG: after js_ValueToObject changes obj v is no longer rooted */ UnregisterObjectRoot(&obj); 3. The third pattern is about a function returning PGT. The idea is to assume that given PGT Foo3(...); the result of Foo3 must be immediately rooted. That is, if the call to Foo3 is not immediately followed by rooting of its result, assume a bug. For example, jsstring.c had many bugs like the following code fragment: static JSBool str_substring(JSContext *cx, JSObject *obj, uintN argc, jsval *argv, jsval *rval) { JSString *str; jsdouble d; jsdouble length, begin, end; str = js_ValueToString(cx, OBJECT_TO_JSVAL(obj)); if (!str) return JS_FALSE; ... use_str } Here PGT is JSString *str. After js_ValueToString its result was not rooted leading to GC hazard in use_str. The fix was to change that into: static JSBool str_substring(JSContext *cx, JSObject *obj, uintN argc, jsval *argv, jsval *rval) { JSString *str; jsdouble d; jsdouble length, begin, end; str = js_ValueToString(cx, OBJECT_TO_JSVAL(obj)); if (!str) return JS_FALSE; argv[-1] = STRING_TO_JSVAL(str); ... use_str } That copied str into a rooted location fixing the bug. It would be nice to convert js_ValueToString from extern JSString * js_ValueToString(JSContext *cx, jsval v); to extern JSBool js_ValueToString(JSContext *cx, jsval v, JSString **pstr); and require that pstr is a pointer to a rooted location by the previous 2 rules. But for compatibility such calls will continue to exist for some time. Thus it would be nice to check each and every caller of js_ValueToString.
> > 1. The first bad pattern to detect is: > > Foo(...) > { > PGT x; > ... > Bar(&x); > ... > } > > That should not be allowed. That is, a pointer to a local variable holding PGT > must not be passed to any function. CQual should be good at detecting #1. The other ones will require flow sensitivity from Oink. Not having flow sensitivity significantly reduces the types of static analysis we can do.
Garbage Collection Safety Analysis of SpiderMonkey Version 2: Result of Mozilla Summit Meetings Daniel Wilkerson, Igor Bukanov, Taras Glek 22 November 2006 A "jsval instance" is a pointer from C++ into the JavaScript [JS] heap. Note that by "a jsval instance" we mean the copy of the jsval in a particular memory location: that is, another copy of the same jsval 32-bit value in another location is not "the same" jsval instance, but a copy. The JS garbage collector has a list of jsval*-s called the "root list". Actually, these jsval*-s are interpreted as arrays and are accompanied by a length; the length 1 is used for the usual simple pointer. A jsval instance is "rooted" if it's address is on the root list. Assume that JS heap objects can only be pointed to by jsval-s (and not, say ints). At a given timestep, call "C++-live" the set of JS heap objects reachable from C++ code and call "JS-live" those reachable from jsvals the addresses of which are on the JS root list. Goal invariant "safe garbage collection": When the garbage collector runs, all C++-live objects are also JS-live. Rules to guarantee Goal invariant are as follows. Rule 1a: allocation, stack: When a jsval v is created on the stack, it must be initialized to JS_NULL. The next line must be a call to register(&v) that puts &v onto the root list. The line after that must set the value of v using say GET_OBJ_PROPERTY(..., &v) or some other function; question: how do we guarantee that this line sets the jsval again? Rule 1b: allocation, heap: When a jsval* is created on the stack, we initialize it from alloc_rooted_ptr(), which roots the pointer that it returns. Rules 1a and 1b guarantee that the act of creating and initializing stack-allocated jsval and jsval* variables is atomic with respect to the garbage collector. Question: What about C++-stack allocated jsval**-s or higher levels of indirection? I think Igor says we do not allow them. Question: What about pointers to objects that contain jsval*-s? I seem to recall Igor saying that the answer is something like the following. The only such objects are JS heap objects, not C/C++ objects. A jsval instance is "transitively-rooted" if it is rooted or its address is a field of an object that is transitively-rooted. Inductively, it seems therefore that all transitively-rooted jsvals must point to JS-live objects; not sure how this definition interacts with cycle garbage collection. The allocation and initialization of the jsval in question in rules 1a and 1b must be atomic with respect to the garbage collector. I think that requires the arguments to alloc_rooted_ptr(), register() and GET_OBJ_PROPERTY() to not call functions that could call the garbage collector and to be safe we will require them to not call any functions at all. The only values left to ensure that they are rooted are parameter-allocated jsvals and jsval*-s (assuming jsval**-s and higher cannot happen), but parameter jsval*-s are already rooted because they can only come from something which is rooted at its creation. The only remaining problem is the parameter jsvals. Question: What about temporary values made during expression computation, such as calling a function by first computing all the arguments: one of the arguments may be a jsval which is saved while another argument computation runs which calls a function which calls the garbage collector or does other things. Solution 1: Make parameter jsvals illegal. This seems to be a fine solution except that it requires changing the current API. Taras suggests that we do that and write wrappers for the public API such that f(jsval v, foo y) becomes the following. int f0(jsval *vp, foo y) { // function body of f() with *vp replacing v } int f(jsval v, foo y) { register(&v); return f0(&v, y); } I suggest we could automate this code change using Oink and the pretty printer. Solution 2: A jsval instance is "copy-of-rooted" if there is another jsval instance that has the same 32-bit value that is rooted or transitively-rooted. Invariant "rootedness": All jsval instances are 1) rooted (once initialization is complete as in rules 1a and 1b above) 2) transitively-rooted, or 3) copy-of-rooted. This invariant implies the goal invariant. Invariant "stack/parameter jsval arguments are un-aliased": All jsval arguments v to a function f() are stack or parameter allocated and there is no alias to v available from any data-structures transitively reachable by f(). This invariant implies invariant "rootedness" as the jsval parameter to f() is copy-of-rooted and its copy cannot be changed during the execution of f(). Rule 3: At the call site to f() above where a jsval argument is passed, we require that the jsval argument be a simple variable expression of a stack allocated jsval v; Igor says that this is the common case. If it is not we can create a temporary immediately before the call site and assign to it; this requires rooting the temporary. We need the following distinction: A pointer is "ephemeral" if whenever it is read by any expression other than the de-reference operators ("*", "[]" and "->") it is being copied to another ephemeral pointer. We can do a jsval*-ephemeral-analysis by 1) allowing the user to annotate some pointers as ephemeral and 2) verifying these as ephemeral. We take the names rval and argv as user annotations that these jsval*-s are ephemeral. Rule 4: For any stack or parameter allocated jsval v: address of v, &v, 1) is ephemeral and 2) is not passed to f() as an argument.
Here is few comments in reply to comment #6: > Garbage Collection Safety Analysis of SpiderMonkey > Version 2: Result of Mozilla Summit Meetings > > Rule 1a: allocation, stack: When a jsval v is created on the stack, it > must be initialized to JS_NULL. The next line must be a call to > register(&v) that puts &v onto the root list. The line after that > must set the value of v using say GET_OBJ_PROPERTY(..., &v) or some > other function; question: how do we guarantee that this line sets the > jsval again? I suggest not to worry about this problem of using v before the real initialization. As far as I can see it is orthogonal to GC-safty. It is nice to address it but this was not a roblem in practice. > > Rule 1b: allocation, heap: When a jsval* is created on the stack, we > initialize it from alloc_rooted_ptr(), which roots the pointer that it > returns. > > Rules 1a and 1b guarantee that the act of creating and initializing > stack-allocated jsval and jsval* variables is atomic with respect to > the garbage collector. > > Question: What about C++-stack allocated jsval**-s or higher levels > of indirection? I think Igor says we do not allow them. A quick grep for jsval ** shows that it only present in few obscure formating API as arguments. > > Question: What about pointers to objects that contain jsval*-s? I > seem to recall Igor saying that the answer is something like the > following. The only such objects are JS heap objects, not C/C++ > objects. This is not the case. jsval can refer to a an JS object that refers to C++ object that contains jsvals. Such values are rooted because the garbage collector knows how to traverse the object graph. That is, as long as the original jsval rooted, the garbage collector during the marking phase would reach the extra jsvals stored in the intermediate C++ object. This is happens since the embedding must register appropriate callbacks to mark all the necessary jsvals. In practice the code can "forget" to teach GC about all the stored jsvals leading to GC hazards. But the past experience has shown that this second type of problems happens less frequently then forgeting to register stack-allocated jsval with the root set. I do not have any suggestions how to addres it. > A jsval instance is "transitively-rooted" if it is rooted or > its address is a field of an object that is transitively-rooted. > Inductively, it seems therefore that all transitively-rooted jsvals > must point to JS-live objects; not sure how this definition interacts > with cycle garbage collection. Does this refr to XPCOM cycle collector that graydon develops, see bug 333078? > > The allocation and initialization of the jsval in question in rules 1a > and 1b must be atomic with respect to the garbage collector. I think > that requires the arguments to alloc_rooted_ptr(), register() and > GET_OBJ_PROPERTY() to not call functions that could call the garbage > collector and to be safe we will require them to not call any > functions at all. > > The only values left to ensure that they are rooted are > parameter-allocated jsvals and jsval*-s (assuming jsval**-s and higher > cannot happen), but parameter jsval*-s are already rooted because they > can only come from something which is rooted at its creation. The > only remaining problem is the parameter jsvals. > > Question: What about temporary values made during expression > computation, such as calling a function by first computing all the > arguments: one of the arguments may be a jsval which is saved while > another argument computation runs which calls a function which calls > the garbage collector or does other things. > > Solution 1: Make parameter jsvals illegal. This seems to be a fine > solution except that it requires changing the current API. Taras > suggests that we do that and write wrappers for the public API such > that f(jsval v, foo y) becomes the following. > > int f0(jsval *vp, foo y) { > // function body of f() with *vp replacing v > } > int f(jsval v, foo y) { > register(&v); > return f0(&v, y); > } To be precise, it will be rewritten like: int f(jsval v, foo y) { int tmp; register(&v); tmp = f0(&v, y); unregister(&v); return tmp; }
Any chance this could be updated with recent developments, or blog/doc URLs could be provided? I'm interested in at least following it.
(In reply to comment #8) > Any chance this could be updated with recent developments, or blog/doc URLs > could be provided? I'm interested in at least following it. The current proposal for a set of rules to enforce in SpiderMonkey is at http://wiki.mozilla.org/GC_SafetySpec .
Whiteboard: [sg:investigate] → [sg:want]
We're using static and dynamic analysis for the new exact rooting work for GGC, so closing this bug.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Jan, when did we fix this?
Group: core-security
You need to log in before you can comment on or make changes to this bug.