Closed Bug 497618 Opened 15 years ago Closed 15 years ago

Change JSString macros to methods

Categories

(Core :: JavaScript Engine, defect)

Other Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jorendorff, Assigned: jorendorff)

References

Details

Attachments

(2 files, 1 obsolete file)

Attached patch v1 (obsolete) (deleted) — Splinter Review
Change all the JS{STRING,FLATSTR,DEPSTR,PREFIX}_* function-like macros to methods of struct JSString. chars() and friends return a pointer-to-const, because generally the buffer should not be modified. In a few places `const` is added to variables in functions that call these methods. In 4 places a const_cast is needed. A few look like this: jschar *ls; ... - JSSTRING_CHARS_AND_LENGTH(left, ls, ln); + left->getCharsAndLength(const_cast<const jschar *&>(ls), ln); ... ... do horrible things to ls ... and if that const_cast is too disgusting I can change it. I like having the const_cast appear near the hack (i.e., in the caller) rather than in a JSString method. Rename JSString::length -> mLength, u.chars -> mChars, u.base -> mBase. (These changes are brendan-approved.) Change many Emacs comments from "Mode: C;" to "Mode: C++;", including all the .cpp files. (I left C-compatible headers alone for now.) Delete a redundant copyright notice line from jscompat.h and jsmath.h. I can put these back if it's not kosher, IANAL etc., but it looks like a simple mistake. Change the shell to call JS_StrictlyEqual instead of js_StrictlyEqual. A few more whitespace changes where something was indented in a silly way. Builds a browser on Linux. Running through the try server now.
Attachment #382752 - Flags: review?(jwalden+bmo)
Attachment #382752 - Flags: review?(jwalden+bmo) → review-
Comment on attachment 382752 [details] [diff] [review] v1 >diff --git a/js/src/jsapi.cpp b/js/src/jsapi.cpp >@@ -2639,17 +2639,17 @@ JS_NewExternalString(JSContext *cx, jsch >- JSFLATSTR_INIT(str, chars, length); >+ str->flatInit(chars, length); I have trouble thinking of "flat" for this particular method as an adverb, think it's more sensible as initFlat, i.e. an adjective. Same for the other *Init and *Reinit methods. Thoughts? >@@ -5444,38 +5444,38 @@ JS_GetStringChars(JSString *str) >+ n = str->depLength(); >+ memcpy(s, str->depChars(), n * sizeof *s); I wouldn't abbreviate "dependent". I don't think the abbreviation helps much; it's not a common one, and my mind is going to seize up on it a fraction of a second each time I see it. Prefer expanding it, your call tho. As noted on IRC, subclasses for flat/dependent strings may help this by eliminating the prefix, overriding the length method and such, but followup if deemed necessary is certainly fair. >diff --git a/js/src/jsarray.cpp b/js/src/jsarray.cpp >+ sep->getCharsAndLength(sepstr, seplen); Given perfect compilers this doesn't matter, but maybe return chars so that if this ever *isn't* inlined we won't burn the return-value register? Smart compilers wouldn't care, but do we trust ours enough to never not-inline and be slightly slower? >diff --git a/js/src/jsatom.cpp b/js/src/jsatom.cpp > /* Finish handing off chars to the GC'ed key string. */ >- str->u.chars = NULL; >+ str->mChars = NULL; Yeah, I think you do want to friend js_AtomizeString (and if I read right js_ConcatStrings, and str_split, and whatever other place or two the pixie-dust must fall), seems better than a method anyone can call. >diff --git a/js/src/jsbool.cpp b/js/src/jsbool.cpp >+ return JSVAL_TO_STRING(v)->length() != 0; Given this, at least one or two cases in jstracer.cpp, Boolish and js_FoldConstants in jsparse.cpp, and the macro in jsxml.cpp, it's clear we should have a |bool isEmpty() { return length() != 0; }| helper. >diff --git a/js/src/jsbuiltins.cpp b/js/src/jsbuiltins.cpp >- JSSTRING_CHARS_AND_END(str, bp, end); >+ str->getCharsAndEnd(bp, end); Same note as on getCharsAndLength. >diff --git a/js/src/jsgc.cpp b/js/src/jsgc.cpp >+ if (str->isDependent()) >+ JS_CALL_STRING_TRACER(trc, str->depBase(), "base"); Same dep/dependent prefix concern. >diff --git a/js/src/jsregexp.cpp b/js/src/jsregexp.cpp >@@ -1999,42 +1999,42 @@ CompileRegExpToAST(JSContext* cx, JSToke > /* Return the cached fragment for the given regexp, or NULL. */ >-static Fragment* >-LookupNativeRegExp(JSContext* cx, void* hash, uint16 re_flags, >- jschar* re_chars, size_t re_length) >-{ >- Fragmento* fragmento = JS_TRACE_MONITOR(cx).reFragmento; >- Fragment* fragment = fragmento->getLoop(hash); >+static Fragment * >+LookupNativeRegExp(JSContext *cx, void *hash, uint16 re_flags, >+ const jschar *re_chars, size_t re_length) >+{ >+ Fragmento *fragmento = JS_TRACE_MONITOR(cx).reFragmento; >+ Fragment *fragment = fragmento->getLoop(hash); * immediately adjacent to type is the new Rome (incrementally), don't do this. >diff --git a/js/src/jsstr.cpp b/js/src/jsstr.cpp >@@ -77,173 +77,174 @@ >- JS_ASSERT(JSSTRDEP_IS_PREFIX(str)); >- JSPREFIX_SET_BASE(str, base); >- } else if (start <= JSSTRDEP_START_MASK) { >- length = JSSTRDEP_LENGTH(str); >- JSSTRDEP_REINIT(str, base, start, length); >+ JS_ASSERT(str->depIsPrefix()); >+ str->prefixSetBase(base); >+ } else if (start <= JSString::JSSTRDEP_START_MASK) { >+ length = str->depLength(); >+ str->depReinit(base, start, length); The JSSTR prefix there seems superfluous, JSString::DEPENDENT_START_MASK? Likewise-ish for all the other enums. >@@ -2713,17 +2717,17 @@ js_InitStringClass(JSContext *cx, JSObje >- if (length > JSSTRING_LENGTH_MASK) { >+ if (length > JSString::JSSTRING_LENGTH_MASK) { For less redundancy, as before, we should use LENGTH_MASK for less redundancy. >diff --git a/js/src/jsstr.h b/js/src/jsstr.h >+#define JSSTRING_BIT(n) ((size_t)1 << (n)) >+#define JSSTRING_BITMASK(n) (JSSTRING_BIT(n) - 1) C++0x constexpr would be useful here, if only... (as would JS_BIT, except for the type difference) >+ JSSTRING_LENGTH_BITS = (JS_BITS_PER_WORD - 4), >+ JSSTRDEP_LENGTH_BITS = (JSSTRING_LENGTH_BITS / 2), >+ JSSTRDEP_START_BITS = (JSSTRING_LENGTH_BITS-JSSTRDEP_LENGTH_BITS), Parentheses are superfluous now that these are proper constants; also put spaces around - in the last. >+ bool hasFlag(size_t flag) const { >+ return (mLength & flag) != 0; >+ } I haven't seen this in the patch up until this point, and I doubt exposing flags directly is a good idea, rather than adding readable methods that wrap individual bits. Make both the above enum and this method private perhaps? (Friend of TraceRecorder would cover the tracer uses.) >+ bool isMutable() const { >+ return (mLength & (JSSTRFLAG_DEPENDENT | JSSTRFLAG_MUTABLE)) == JSSTRFLAG_MUTABLE; >+ } >+ >+ bool isAtomized() const { >+ return (mLength & (JSSTRFLAG_DEPENDENT | JSSTRFLAG_ATOMIZED)) == JSSTRFLAG_ATOMIZED; >+ } I think this would be clearer as !isDependent() && hasFlag(mutable), mutatis mutandis for the second. >+ jschar * flatChars() const { Prevailing style these days for this is * immediately adjacent to type. >+ /* >+ * Special flat string initializer that preserves the JSSTR_DEFLATED flag. >+ * Use this method when reinitializing an existing string (which may be >+ * hashed to its deflated bytes. Newborn strings must use flatInit. >+ */ Fix parenthesization while you're here. >+ void depInit(JSString *bstr, size_t off, size_t len) { >+ mLength = JSSTRFLAG_DEPENDENT | >+ (off << JSSTRDEP_START_SHIFT) | len; >+ mBase = bstr; >+ } >+ >+ /* See JSString::flatReinit. */ >+ void depReinit(JSString *bstr, size_t off, size_t len) { >+ mLength = JSSTRFLAG_DEPENDENT | >+ (mLength & JSSTRFLAG_DEFLATED) | >+ (off << JSSTRDEP_START_SHIFT) | len; >+ mBase = bstr; >+ } Add assertions to each of these that len is not too large, for the appropriate largeness for each; the flat versions have them, these don't. >+ JSString * depBase() const { >+ return mBase; >+ } Naming suggests this needs an assertion of dependence, either that or make it private. >+ jschar * depChars() const { >+ JS_ASSERT(isDependent()); >+ return (depBase()->isDependent()) Overparenthesization. >+ void prefixInit(JSString *bstr, size_t len) { >+ mLength = JSSTRFLAG_DEPENDENT | JSSTRFLAG_PREFIX | len; >+ mBase = bstr; >+ } >+ >+ /* See JSString::flatReinit. */ >+ void prefixReinit(JSString *bstr, size_t len) { >+ mLength = JSSTRFLAG_DEPENDENT | JSSTRFLAG_PREFIX | >+ (mLength & JSSTRFLAG_DEFLATED) | len; >+ mBase = bstr; >+ } len-check. >diff --git a/js/src/jsxml.cpp b/js/src/jsxml.cpp >+#define IS_STAR(str) ((str)->length() == 1 && *(str)->chars() == '*') This can be an inline method, of course. There are enough fun changes I suggest above that I think I want another look at this, even tho I don't.
Thanks for the fine review. So far, I've taken all your suggestions except these: (In reply to comment #1) > As noted on IRC, subclasses for flat/dependent strings may help this by > eliminating the prefix, overriding the length method and such, but followup if > deemed necessary is certainly fair. Followup. Will you take it? > Given perfect compilers this doesn't matter, but maybe return chars so that if > this ever *isn't* inlined we won't burn the return-value register? Smart > compilers wouldn't care, but do we trust ours enough to never not-inline and be > slightly slower? I still need to look at this--and performance numbers generally. > >+ jschar * flatChars() const { > > Prevailing style these days for this is * immediately adjacent to type. Haven't made this change yet. (One other note - str_split touches JSSubString::chars, not JSString::mChars, so it doesn't need to be a friend.) I also changed the constness of some methods, as discussed on IRC. I'm out of time for now. v2 tomorrow.
Blocks: 497768
(In reply to comment #2) > (In reply to comment #1) > > As noted on IRC, subclasses for flat/dependent strings may help this by > > eliminating the prefix, overriding the length method and such, but > > followup if deemed necessary is certainly fair. > > Followup. Will you take it? Can do, filed bug 497768.
Attached patch interdiff v1 to v2 (deleted) — Splinter Review
Assignee: general → jorendorff
Attached patch v2 - work in progress (deleted) — Splinter Review
Need another minute to look at performance fluctuations. It's probably random -- code size difference with the patch is 3 bytes on linux. (Without JS_ALWAYS_INLINE, the difference is 3977 bytes; apparently g++ -O3 doesn't inline those huge length() and chars() methods unless you insist.)
Attachment #382752 - Attachment is obsolete: true
It was random. (In reply to comment #1) > Given perfect compilers this doesn't matter, but maybe return chars so that if > this ever *isn't* inlined we won't burn the return-value register? Smart > compilers wouldn't care, but do we trust ours enough to never not-inline and be > slightly slower? I'm now JS_ALWAYS_INLINEing these methods, so this isn't necessary--right? > >+ jschar * flatChars() const { > > Prevailing style these days for this is * immediately adjacent to type. So, I didn't make this change, and I want to try and explain my thinking, and then maybe we can come up with a better way out of this style morass we somehow got into. I think the result of this approach is going to be a lot of old-style code living alongside new-style code for the foreseeable future, because many lines of code won't be edited for years. jsregexp.cpp is an example of how things are going to look. And it sucks. Best possible world, we would have a consistent style everywhere. Failing that, we can at least have consistent style within each file. TBH I'd like to see the *old* style prevail and jstracer.cpp changed. But whichever style we decide on, a ten-year campaign to change our coding style line by line sounds really annoying! So, that's why my changes to jsstr.h are old-style. (I reverted the few whitespace changes I had in jsregexp.cpp; that didn't belong in this patch at all.)
Attachment #382976 - Flags: review?(jwalden+bmo)
I'm in favor of T *v, T (*fp)(...), etc. It's the one true way. The Stroustrup style is tolerable (Mozilla uses it, mostly) but error prone and misleading at the margin. Agree on precedent overwhelming interloping style (which I was party to) in jstracer.cpp. We can revise that more easily. Let the wiki style guide be updated, unless we need to parlay more. /be
Reference types OTOH are strange enough (initialization is very different from assignment) that using T& r = lvalue seems better, with of course exactly one declarator per declaration. Thoughts? I'll take this to the wiki, don't want to hijack this bug. /be
Comment on attachment 382976 [details] [diff] [review] v2 - work in progress (In reply to comment #6) > I'm now JS_ALWAYS_INLINEing these methods, so this isn't necessary--right? Ah, yes, should have thought of that myself, best of all worlds. >diff --git a/js/src/jsstr.cpp b/js/src/jsstr.cpp >+ } else if (start <= JSString::MAX_DEPENDENT_START) { Mm, these enum names are much better than previously, I especially like the mask<->max_length aliases, much more intuitive at a glance. >diff --git a/js/src/jsstr.h b/js/src/jsstr.h >+ friend class ::TraceRecorder; >+ >+ friend JSAtom * >+ ::js_AtomizeString(JSContext *cx, JSString *str, uintN flags); >+ >+ friend JSString * JS_FASTCALL >+ ::js_ConcatStrings(JSContext *cx, JSString *left, JSString *right); Lose the :: as noted on IRC, can't remember us using them anywhere else in SpiderMonkey. (I never understood why we use them in XPConnect and DOM-land, since as far as I can tell those are similarly unnecessary.) >+ /* >+ * Definitions for flags stored in the high order bits of JSString.length. >+ * PREFIX and MUTABLE are two aliases for the same value. >+ * PREFIX should be used only if DEPENDENT is set and >+ * MUTABLE should be used only if the string is flat. >+ * ATOMIZED is used only with the flat immutable strings. This unusual wrapping is weird at first but far more sensible than the normal style, on second glance.
Attachment #382976 - Flags: review?(jwalden+bmo) → review+
I still dislike the old whitespace style for T *foo. At this point I'm getting somewhat weary of the argument, tho, and I think I would take rewriting single-line declarations of multiple values where not all are non-pointers (good: T foo, bar, baz; okay: T *foo; bad: T *foo, bar; T *foo, *bar;) into multiple lines and call that enough to get rid of the worst style error T *foo encourages. This is probably an argument to be taken up elsewhere, and it's not relevant to the patch for now, but the discussion is here for the moment, and I lack the motivation to move it elsewhere right now.
(In reply to comment #9) > >+ /* > >+ * Definitions for flags stored in the high order bits of JSString.length. > >+ * PREFIX and MUTABLE are two aliases for the same value. > >+ * PREFIX should be used only if DEPENDENT is set and > >+ * MUTABLE should be used only if the string is flat. > >+ * ATOMIZED is used only with the flat immutable strings. > > This unusual wrapping is weird at first but far more sensible than the normal > style, on second glance. What usual style? Wrapping sentences freely is used where you don't have an ordered or unordered list. If you see a list wrapped as if it were prose, file a bug (or just fix it). Here, though, we have a list. Looks unordered. How about - "bullets"? /be
(In reply to comment #10) > I still dislike the old whitespace style for T *foo. Why? Likes and dislikes come and go, but the |T *p, oops| argument is objective enough. It's not a trump card -- we could spend significant time trying to switch the bulk of the code to Bjarne style. The trump card is jorendorff's point about being in mixed-style hell for years and KLOCs of code. /be
Pushed with the redundant ::s removed and a few whitespace changes. http://hg.mozilla.org/tracemonkey/rev/761e24450f2b The wiki style guide now describes this style. https://wiki.mozilla.org/JavaScript:SpiderMonkey:C_Coding_Style (In reply to comment #11) > (In reply to comment #9) > > >+ /* > > >+ * Definitions for flags stored in the high order bits of JSString.length. > > >+ * PREFIX and MUTABLE are two aliases for the same value. > > >+ * PREFIX should be used only if DEPENDENT is set and > > >+ * MUTABLE should be used only if the string is flat. > > >+ * ATOMIZED is used only with the flat immutable strings. > > > > This unusual wrapping is weird at first but far more sensible than the normal > > style, on second glance. > > What usual style? Wrapping sentences freely is used where you don't have an > ordered or unordered list. If you see a list wrapped as if it were prose, file > a bug (or just fix it). > > Here, though, we have a list. Looks unordered. How about - "bullets"? Funny story. In v1 and v2 I preserved the unusual wrapping from the original because I thought this was an unordered list. Rereading it, I find that it isn't. So instead of bullets, I just re-wrapped it like this: /* * Definitions for flags stored in the high order bits of mLength. * * PREFIX and MUTABLE are two aliases for the same bit. PREFIX should be * used only if DEPENDENT is set and MUTABLE should be used only if the * string is flat. * * ATOMIZED is used only with flat, immutable strings. */ So that's what I checked in. Thanks, Jeff and Brendan. Bug 498488, someday soon, I hope.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
(In reply to comment #12) > (In reply to comment #10) > > I still dislike the old whitespace style for T *foo. > > Why? Likes and dislikes come and go, but the |T *p, oops| argument is objective > enough. Because we're declaring a variable |foo| to which the deref operator may (only sometimes!) be applied, not a variable |*foo| -- no more, no less than that. I think that argument's objective enough, too, for what it's worth.
(In reply to comment #14) > (In reply to comment #12) > > (In reply to comment #10) > > > I still dislike the old whitespace style for T *foo. > > > > Why? Likes and dislikes come and go, but the |T *p, oops| argument is objective > > enough. > > Because we're declaring a variable |foo| to which the deref operator may (only > sometimes!) be applied, not a variable |*foo| -- no more, no less than that. "not a variable |*foo|" makes no grammatical sense if variables are named by identifiers. The C style going back to dmr is based on the "use" expression syntax matching the declarator expression syntax. This is detailed at http://cm.bell-labs.com/cm/cs/who/dmr/chist.html and it is self-consistent, albeit confusing to many when composing declarator modes, because declarator modes must be read "inside out" while the types and qualifiers come in from the left. > I think that argument's objective enough, too, for what it's worth. That argument (which does not make sense in terms of the C or C++ grammar) was not made; you only said you still dislike something, not why. Jorendorff, shouldn't the updates affect https://wiki.mozilla.org/JavaScript:SpiderMonkey:C%2B%2B_Coding_Style in addition to the C style guide? My fault for making two, it was an attempt to layer the new on the old and provide continuity without duplication. /be
(In reply to comment #15) > "not a variable |*foo|" makes no grammatical sense if variables are named by > identifiers. > That argument (which does not make sense in terms of the C or C++ grammar) Well, except that we humans don't visualize by the grammar -- we visualize at first pass by grouping and whitespace. I think it sensible to avoid a second pass when possible.
We're way OT, but "we humans" don't all see things the same, and programmers had better pay attention to the grammar. Bjarne's C++ style tries to wish the language into Algol 68 shape, by turning a blind eye to the full grammar. That bites back, sometimes. It's not the end of the world, though. /be
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: