Closed Bug 518230 Opened 15 years ago Closed 2 years ago

XPC.mfasl needs an XDR string table

Categories

(Core :: XPCOM, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: taras.mozilla, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 5 obsolete files)

According to 
strings -e=l
we have almost 1mb of strings in XPC.mfasl. They are 16bit and there is lots of duplication in method names, etc. I think an 8bit string table would result in Ts goodness.
No longer blocks: 412796
Depends on: 412796
Attached patch WIP: Use UTF8 and refer to previous strings (obsolete) (deleted) — Splinter Review
I probably need something to force the use of UTF8 while converting to atoms.
Assignee: nobody → mwu
Severity: normal → enhancement
OS: Linux → All
Hardware: x86 → All
Blocks: 104170
Attached patch WIP: Use UTF8 and refer to previous strings, v2 (obsolete) (deleted) — Splinter Review
There's a couple error paths in js_XDRString that still leak, but it shouldn't be a big deal to fix. The bigger issue is the addition of forceUTF8 - I'm not sure if this is the right approach, but I can't think of anything simpler. It also requires the js script to only have unicode strings in order to be serialized. I suspect that should be ok, but I'm not sure.
Attachment #511576 - Attachment is obsolete: true
Attachment #515779 - Flags: feedback?(igor)
Serialized version of XPIProvider.jsm drops from 264312 bytes to 168428 bytes with this patch. (about 36% smaller)
(In reply to comment #2)
> It also requires the js script to only have unicode strings in order to be
> serialized. I suspect that should be ok, but I'm not sure.

Oh actually, that might be very bad, considering that XDR is used to implement js_CloneScript..
What does "only have unicode strings" mean?

/be
(In reply to comment #5)
> What does "only have unicode strings" mean?
> 
Strings that are serialized need to be convertible between UTF16 and UTF8.
Attached patch Use UTF8 and refer to previous strings, v3 (obsolete) (deleted) — Splinter Review
Attachment #515779 - Attachment is obsolete: true
Attachment #515955 - Flags: review?(igor)
Attachment #515779 - Flags: feedback?(igor)
(In reply to comment #6)
> Strings that are serialized need to be convertible between UTF16 and UTF8.

Why is this necessary? XDR is a private data format and can be serialized in whatever format that is preferable.
Attached patch Use CESU-8 and refer to previous atoms, v4 (obsolete) (deleted) — Splinter Review
Ok. This patch does UTF-16<->CESU-8 conversions so there are no restrictions on the type of strings that can be used now. It also does the conversion for JS_XDRString.
Attachment #515955 - Attachment is obsolete: true
Attachment #519996 - Flags: review?(igor)
Attachment #515955 - Flags: review?(igor)
Attachment #519996 - Attachment description: Use UTF8 and refer to previous atoms, v4 → Use CESU-8 and refer to previous atoms, v4
Comment on attachment 519996 [details] [diff] [review]
Use CESU-8 and refer to previous atoms, v4

>+class XDRAtomsHolder {
>+};
..
>+
> JSBool
> js_XDRScript(JSXDRState *xdr, JSScript **scriptp)
> {

...
>+    XDRAtomsHolder atomHolder(xdr);

For each function in the script this would instantiate a useless XDRAtomsHolder and its vector field as only the top-most XDRAtomsHolder would provide its atom table. To simplify things we should add a function like js_XDRScriptAndSubscripts, make js_XDRScript to call it like:

JSBool
js_XDRScript(JSXDRState *xdr, JSScript **scriptp)
{
    XDRAtomsHolder atomHolder(xdr);
    return js_XDRScriptAndSubscripts(xdr, script);
}

and then call js_XDRScriptAndSubscripts, not js_XDRScript, from js_XDRFunctionObject. This assumes that js_XDRFunctionObject is only called from js_XDRScript.

Currently the code also has a notion of a generic xdr support to serialize an arbitrary object graph that calls js_XDRFunctionObject via the class hook, but that has not used by FF for a few releases, it probably got rotten and we should remove that support. I will file a bug about that. For this bug it would sufficient to set the xdr hook for the function class at http://hg.mozilla.org/tracemonkey/file/40092c96120b/js/src/jsfun.cpp#l2030 to NULL.

With this changes there would be no need in XDRAtomsHolder and xdr->atoms as js_XDRScript can directly pass XDRAtoms to js_XDRScriptAndSubscripts and to js_XDRFunctionObject.
Attachment #519996 - Flags: review?(igor)
Attached patch Use CESU-8 and refer to previous atoms, v5 (obsolete) (deleted) — Splinter Review
Review comments addressed and one incorrect uint32* -> size_t* cast fixed.
Attachment #519996 - Attachment is obsolete: true
Attachment #520084 - Flags: review?(igor)
Comment on attachment 520084 [details] [diff] [review]
Use CESU-8 and refer to previous atoms, v5

>diff --git a/js/src/jsscript.cpp b/js/src/jsscript.cpp
>--- a/js/src/jsscript.cpp
>+++ b/js/src/jsscript.cpp
>@@ -310,6 +310,16 @@ enum ScriptBits {
> JSBool
> js_XDRScript(JSXDRState *xdr, JSScript **scriptp)
> {

Nit: assert that xdr->atoms is null.

>+    XDRAtoms atoms;
>+    xdr->atoms = &atoms;
>+    JSBool ok = js_XDRScriptAndSubscripts(xdr, scriptp);
>+    xdr->atoms = NULL;
>+    return ok;

Nit: pass xdr to XDRAtoms constructor and null xdr->atoms in the destructor.

>diff --git a/js/src/jsstr.h b/js/src/jsstr.h
>--- a/js/src/jsstr.h
>+++ b/js/src/jsstr.h
>@@ -951,7 +951,7 @@ js_SkipWhiteSpace(const jschar *s, const
>  * JS_malloc'ed. length is updated to the length of the new string in jschars.
>  */
> extern jschar *
>-js_InflateString(JSContext *cx, const char *bytes, size_t *length);
>+js_InflateString(JSContext *cx, const char *bytes, size_t *length, bool useCESU8 = false);


Nit: provide detailed comments on the meaning of useCESU8 parameter.

> 
> extern char *
> js_DeflateString(JSContext *cx, const jschar *chars, size_t length);
>@@ -971,7 +971,8 @@ js_InflateStringToBuffer(JSContext *cx, 
>  */
> extern JSBool
> js_InflateUTF8StringToBuffer(JSContext *cx, const char *bytes, size_t length,
>-                             jschar *chars, size_t *charsLength);
>+                             jschar *chars, size_t *charsLength,
>+                             bool useCESU8 = false);

Nit: refer to comment above about the meaning useCESU8.

> 
> /*
>  * Get number of bytes in the deflated sequence of characters. Behavior depends
>@@ -986,7 +987,7 @@ js_GetDeflatedStringLength(JSContext *cx
>  */
> extern size_t
> js_GetDeflatedUTF8StringLength(JSContext *cx, const jschar *chars,
>-                               size_t charsLength);
>+                               size_t charsLength, bool useCESU8 = false);

Nit: add comments that when useCESU8 is true the function is infallible.

>-static JSBool
>-XDRChars(JSXDRState *xdr, jschar *chars, uint32 nchars)
>-{
>-    uint32 i, padlen, nbytes;
>-    jschar *raw;
>-
>-    nbytes = nchars * sizeof(jschar);
>-    padlen = nbytes % JSXDR_ALIGN;
>-    if (padlen) {
>-        padlen = JSXDR_ALIGN - padlen;
>-        nbytes += padlen;
>-    }
>-    if (!(raw = (jschar *) xdr->ops->raw(xdr, nbytes)))
>-        return JS_FALSE;
>-    if (xdr->mode == JSXDR_ENCODE) {
>-        for (i = 0; i != nchars; i++)
>-            raw[i] = JSXDR_SWAB16(chars[i]);
>-        if (padlen)
>-            memset((char *)raw + nbytes - padlen, 0, padlen);
>-    } else if (xdr->mode == JSXDR_DECODE) {
>-        for (i = 0; i != nchars; i++)
>-            chars[i] = JSXDR_SWAB16(raw[i]);
>-    }
>-    return JS_TRUE;
>-}
>-
> /*
>  * Convert between a JS (Unicode) string and the XDR representation.
>  */
> JS_PUBLIC_API(JSBool)
> JS_XDRString(JSXDRState *xdr, JSString **strp)
> {
>-    uint32 nchars;
>-    jschar *chars;
>+    uint32 nchars = 0;
> 
>     if (xdr->mode == JSXDR_ENCODE)
>-        nchars = (*strp)->length();
>-    if (!JS_XDRUint32(xdr, &nchars))
>-        return JS_FALSE;
>+        nchars = js_GetDeflatedUTF8StringLength(xdr->cx,
>+                                                (*strp)->getChars(xdr->cx),
>+                                                (*strp)->length(),
>+                                                true);

nchars is uint32 yet js_GetDeflatedUTF8StringLength return size_t. So add a temporary like size_t length. Then add asserts with comments that length != size_t(-1) as js_GetDeflatedUTF8StringLength cannot fail due to the true parameter. Then add another assert that size_t(uint32(length)) == length together with a static assert that JSString::MAX_LENGTH < uint32(-1) / 3 with comments that GetDeflatedUTF8StringLength cannot exceed uint32. Then cast length to nchars.

>+    if (nchars < 0 || !JS_XDRUint32(xdr, &nchars))
>+        return false;

The check nchars < 0 is always true as nchars is uint32. This should be nchars != uint32(-1). But per comment above this check is not necessary.

> 
>-    if (xdr->mode == JSXDR_DECODE)
>-        chars = (jschar *) xdr->cx->malloc((nchars + 1) * sizeof(jschar));
>-    else
>-        chars = const_cast<jschar *>((*strp)->getChars(xdr->cx));
>-    if (!chars)
>-        return JS_FALSE;
>+    uint32 paddedLen = (nchars + JSXDR_MASK) & ~JSXDR_MASK;
>+    if (xdr->mode == JSXDR_DECODE && paddedLen > JS_XDRMemDataLeft(xdr))
>+        return false;

JS_XDRMemDataLeft is only works with memory-based streams. Since this is the only things that we use add an assert that xdr->ops == &xdrmem_ops with comments.

paddedLen > JS_XDRMemDataLeft(xdr) here means that we got a corrupted data stream. In general we do not verify that stream and assume that it is correct. So replace the if above with an assert with comments.

> 
>-    if (!XDRChars(xdr, chars, nchars))
>-        goto bad;
>-    if (xdr->mode == JSXDR_DECODE) {
>-        chars[nchars] = 0;
>-        *strp = JS_NewUCString(xdr->cx, chars, nchars);
>-        if (!*strp)
>-            goto bad;
>+    char *buf = (char *)xdr->ops->raw(xdr, paddedLen);

Use mem_raw together with an assert that ops->raw is mem_raw per above.

>+    if (!buf)
>+        return false;
>+
>+    size_t len = nchars;
>+    if (xdr->mode == JSXDR_ENCODE) {
>+        if (!js_DeflateStringToUTF8Buffer(xdr->cx,
>+                                          (*strp)->getChars(xdr->cx),
>+                                          (*strp)->length(), buf,
>+                                          &len, true))

js_DeflateStringToUTF8Buffer cannot fail here as the above code should have allocated enough space and we use useCESU8 mode.

>+static uint32
>+XDRAtomIndex(JSXDRState *xdr, JSAtom *atom)
>+{
>+    for (uint32 i = 0; i < xdr->atoms->length(); i++) {
>+        if ((*xdr->atoms)[i] == atom)
>+            return i;
>+    }

This results in O(N^2) behavior where N is number of atoms to serialize. We cannot assume that this code will only be applicable to our internal scripts because we use jsXDRScript to clone scripts across compartment. So when encoding we should use a hashmap to map atoms to indexes. 


>+    return -1;

Nit: use uint32(-1) here.

> extern JSBool
> js_XDRAtom(JSXDRState *xdr, JSAtom **atomp)
> {
>-    JSString *str;
>-    uint32 nchars;
>-    JSAtom *atom;
>-    JSContext *cx;
>-    jschar *chars;
>-    jschar stackChars[256];
>+    uint32 idx;
>+    char *buf;
> 
>     if (xdr->mode == JSXDR_ENCODE) {
>-        str = ATOM_TO_STRING(*atomp);
>+        idx = XDRAtomIndex(xdr, *atomp);
>+        if (idx == -1 && !xdr->atoms->append(*atomp))
>+            return false;

As idx is uint32 use uint32(-1), not plain -1, here.

>+    if (xdr->mode == JSXDR_DECODE) {
>+        if (idx != -1) {

Nit: uint32(-1)

>+            if (idx < xdr->atoms->length())
>+                *atomp = (*xdr->atoms)[idx];

Again: we assume that stream is correct so just add an assert that idx < xdr->atoms->length().

>+        } else {
>+            uint32 len;
>+            if (!JS_XDRUint32(xdr, &len))
>+                return false;
>+
>+            uint32 paddedLen = (len + JSXDR_MASK) & ~JSXDR_MASK;
>+            if (paddedLen > JS_XDRMemDataLeft(xdr))
>+                return false;

As above add assert about memory-only operations and replace the if with an assert.

>+    } else if (idx == -1) {

Nit: use } else { and a separated if not to mix encode/decode selector with idx selector.

>@@ -105,6 +108,8 @@ typedef struct JSXDROps {
>     void        (*finalize)(JSXDRState *);
> } JSXDROps;
> 
>+typedef js::Vector<JSAtom *, 1, js::SystemAllocPolicy> XDRAtoms;

You use SystemAllocPolicy. That means that failed append operation to the vector must be reported via js_ReportOutOfMemory.
Attachment #520084 - Flags: review?(igor) → review-
(In reply to comment #12)
> Comment on attachment 520084 [details] [diff] [review]
> >+    XDRAtoms atoms;
> >+    xdr->atoms = &atoms;
> >+    JSBool ok = js_XDRScriptAndSubscripts(xdr, scriptp);
> >+    xdr->atoms = NULL;
> >+    return ok;
> 
> Nit: pass xdr to XDRAtoms constructor and null xdr->atoms in the destructor.
> 

I went with this to minimize the number of lines this took up. IMHO, adding an extra class here to manage xdr->atoms seems to obscure what the code really does.

> > 
> >-    if (xdr->mode == JSXDR_DECODE)
> >-        chars = (jschar *) xdr->cx->malloc((nchars + 1) * sizeof(jschar));
> >-    else
> >-        chars = const_cast<jschar *>((*strp)->getChars(xdr->cx));
> >-    if (!chars)
> >-        return JS_FALSE;
> >+    uint32 paddedLen = (nchars + JSXDR_MASK) & ~JSXDR_MASK;
> >+    if (xdr->mode == JSXDR_DECODE && paddedLen > JS_XDRMemDataLeft(xdr))
> >+        return false;
> 
> JS_XDRMemDataLeft is only works with memory-based streams. Since this is the
> only things that we use add an assert that xdr->ops == &xdrmem_ops with
> comments.
> 

So do we want to just kill xdr->ops one day?

> paddedLen > JS_XDRMemDataLeft(xdr) here means that we got a corrupted data
> stream. In general we do not verify that stream and assume that it is correct.
> So replace the if above with an assert with comments.
> 

Apparently, mem_raw will give us null if we attempt to read beyond the end of stream while decoding, so this check is redundant.

> >@@ -105,6 +108,8 @@ typedef struct JSXDROps {
> >     void        (*finalize)(JSXDRState *);
> > } JSXDROps;
> > 
> >+typedef js::Vector<JSAtom *, 1, js::SystemAllocPolicy> XDRAtoms;
> 
> You use SystemAllocPolicy. That means that failed append operation to the
> vector must be reported via js_ReportOutOfMemory.

It looks like we can also use ContextAllocPolicy here. Is there a preference?
Most review comments addressed. I left the sanity check for the atom idx in. I feel like the code does tend to have sanity checks for xdr data, and I don't like to necessarily trust the data coming off the disk...
Attachment #520084 - Attachment is obsolete: true
Attachment #520358 - Flags: review?(igor)
Attached patch Changes between v5 and v6 (deleted) — Splinter Review
For reference. It's not exactly an interdiff between v5 and v6.. I made one nit fix to v6 after making this patch to split out the } else if { line. But it's close enough I think.
Comment on attachment 520358 [details] [diff] [review]
Use CESU-8 and refer to previous atoms, v6

>@@ -4034,7 +4037,7 @@ js_GetDeflatedUTF8StringLength(JSContext
>-        if (0xD800 <= c && c <= 0xDFFF) {
>+        if (0xD800 <= c && c <= 0xDFFF && !useCESU8) {

>@@ -4105,9 +4108,9 @@ js_DeflateStringToUTF8Buffer(JSContext *
>-        if ((c >= 0xDC00) && (c <= 0xDFFF))
>+        if ((c >= 0xDC00) && (c <= 0xDFFF) && !useCESU8)
>             goto badSurrogate;
>-        if (c < 0xD800 || c > 0xDBFF) {
>+        if (c < 0xD800 || c > 0xDBFF || useCESU8) {

I like the minimality of the changes here :)

> /*
>  * Convert between a JS (Unicode) string and the XDR representation.
>  */
> JS_PUBLIC_API(JSBool)
> JS_XDRString(JSXDRState *xdr, JSString **strp)
> {
>-    uint32 nchars;
>-    jschar *chars;
>+    uint32 nchars = 0;
>+    size_t len;
> 
>-    if (xdr->mode == JSXDR_ENCODE)
>-        nchars = (*strp)->length();
>+    if (xdr->mode == JSXDR_ENCODE) {
>+        len = js_GetDeflatedUTF8StringLength(xdr->cx,
>+                                             (*strp)->getChars(xdr->cx),
>+                                             (*strp)->length(),
>+                                             true);
>+        JS_ASSERT(len != -1); // js_GetDeflatedUTF8StringLength never fails in CESU8 mode

Nit: always use/* */ comments and make sure that comment text stays within column 78.

>+        JS_ASSERT(size_t(uint32(len)) == len);
>+        JS_STATIC_ASSERT(JSString::MAX_LENGTH < (uint32(-1) / 3));

Nit: Comment that this ensures that CESU8 encoding length always fits uint32.

>+    len = (uint32)nchars;
>+    if (xdr->mode == JSXDR_ENCODE) {
>+#ifdef DEBUG
>+        JSBool ok =
>+#endif
>+            js_DeflateStringToUTF8Buffer(xdr->cx,
>+                                         (*strp)->getChars(xdr->cx),
>+                                         (*strp)->length(), buf,
>+                                         &len, true);
>+        JS_ASSERT(ok);

Nit: write this as JS_ALWAYS_TRUE(js_DeflateStringToUTF8Buffer(...));

>+    if (xdr->mode == JSXDR_DECODE) {
>+        if (idx != uint32(-1)) {
>+            if (size_t(idx) < xdr->atoms->length())
>+                *atomp = (*xdr->atoms)[idx];
>+            else
>+                return false;

Nit: replace this with an assert. We either verify everything or nothing. Since we do not verify the structure of loaded script bytecode, we should skip any verification attempts and save cycles instead of adding  a false sense of security.
Attachment #520358 - Flags: review?(igor) → review+
Comment on attachment 520358 [details] [diff] [review]
Use CESU-8 and refer to previous atoms, v6

>@@ -310,6 +310,25 @@ enum ScriptBits {
> JSBool
> js_XDRScript(JSXDRState *xdr, JSScript **scriptp)
> {
>+    JS_ASSERT(!xdr->atoms);
>+    JS_ASSERT(!xdr->atomsMap);
>+
>+    XDRAtoms atoms;
>+    XDRAtomsHashMap atomsMap;
>+    if (xdr->mode == JSXDR_ENCODE && !atomsMap.init())
>+        return false;

One more nit: You need to add js_ReportOutOfMemory here.
(In reply to comment #13)
> It looks like we can also use ContextAllocPolicy here. Is there a preference?

ContextAllocPolicy also updates the GC pressure counters and should be in general used only to allocate the things that can freed from the GC.

So from utility point of view we are lacking something OOMReportingAllocPolicy, but this is for another bug to fix.
(In reply to comment #19)
> http://hg.mozilla.org/tracemonkey/rev/85d8e5c2c532

thanks for the nice work!
http://hg.mozilla.org/mozilla-central/rev/85d8e5c2c532
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Backed out the xdr string table part for causing startup crashes (bug 648022).

http://hg.mozilla.org/tracemonkey/rev/3acacde59381
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: fixed-in-tracemonkey

The bug assignee didn't login in Bugzilla in the last 7 months.
:nika, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: mwu.code → nobody
Flags: needinfo?(nika)

Closing this bug as the startup cache code has changed a lot by this point, and it's unlikely that this bug is still relevant.

Status: REOPENED → RESOLVED
Closed: 13 years ago2 years ago
Flags: needinfo?(nika)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: