Closed
Bug 497618
Opened 15 years ago
Closed 15 years ago
Change JSString macros to methods
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: jorendorff, Assigned: jorendorff)
References
Details
Attachments
(2 files, 1 obsolete file)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Waldo
:
review+
|
Details | Diff | Splinter Review |
Change all the JS{STRING,FLATSTR,DEPSTR,PREFIX}_* function-like macros to methods of struct JSString.
chars() and friends return a pointer-to-const, because generally the buffer should not be modified. In a few places `const` is added to variables in functions that call these methods. In 4 places a const_cast is needed. A few look like this:
jschar *ls;
...
- JSSTRING_CHARS_AND_LENGTH(left, ls, ln);
+ left->getCharsAndLength(const_cast<const jschar *&>(ls), ln);
...
... do horrible things to ls
...
and if that const_cast is too disgusting I can change it. I like having the const_cast appear near the hack (i.e., in the caller) rather than in a JSString method.
Rename JSString::length -> mLength, u.chars -> mChars, u.base -> mBase. (These changes are brendan-approved.)
Change many Emacs comments from "Mode: C;" to "Mode: C++;", including all the .cpp files. (I left C-compatible headers alone for now.)
Delete a redundant copyright notice line from jscompat.h and jsmath.h. I can put these back if it's not kosher, IANAL etc., but it looks like a simple mistake.
Change the shell to call JS_StrictlyEqual instead of js_StrictlyEqual.
A few more whitespace changes where something was indented in a silly way.
Builds a browser on Linux. Running through the try server now.
Attachment #382752 -
Flags: review?(jwalden+bmo)
Updated•15 years ago
|
Attachment #382752 -
Flags: review?(jwalden+bmo) → review-
Comment 1•15 years ago
|
||
Comment on attachment 382752 [details] [diff] [review]
v1
>diff --git a/js/src/jsapi.cpp b/js/src/jsapi.cpp
>@@ -2639,17 +2639,17 @@ JS_NewExternalString(JSContext *cx, jsch
>- JSFLATSTR_INIT(str, chars, length);
>+ str->flatInit(chars, length);
I have trouble thinking of "flat" for this particular method as an adverb, think it's more sensible as initFlat, i.e. an adjective. Same for the other *Init and *Reinit methods. Thoughts?
>@@ -5444,38 +5444,38 @@ JS_GetStringChars(JSString *str)
>+ n = str->depLength();
>+ memcpy(s, str->depChars(), n * sizeof *s);
I wouldn't abbreviate "dependent". I don't think the abbreviation helps much; it's not a common one, and my mind is going to seize up on it a fraction of a second each time I see it. Prefer expanding it, your call tho.
As noted on IRC, subclasses for flat/dependent strings may help this by eliminating the prefix, overriding the length method and such, but followup if deemed necessary is certainly fair.
>diff --git a/js/src/jsarray.cpp b/js/src/jsarray.cpp
>+ sep->getCharsAndLength(sepstr, seplen);
Given perfect compilers this doesn't matter, but maybe return chars so that if this ever *isn't* inlined we won't burn the return-value register? Smart compilers wouldn't care, but do we trust ours enough to never not-inline and be slightly slower?
>diff --git a/js/src/jsatom.cpp b/js/src/jsatom.cpp
> /* Finish handing off chars to the GC'ed key string. */
>- str->u.chars = NULL;
>+ str->mChars = NULL;
Yeah, I think you do want to friend js_AtomizeString (and if I read right js_ConcatStrings, and str_split, and whatever other place or two the pixie-dust must fall), seems better than a method anyone can call.
>diff --git a/js/src/jsbool.cpp b/js/src/jsbool.cpp
>+ return JSVAL_TO_STRING(v)->length() != 0;
Given this, at least one or two cases in jstracer.cpp, Boolish and js_FoldConstants in jsparse.cpp, and the macro in jsxml.cpp, it's clear we should have a |bool isEmpty() { return length() != 0; }| helper.
>diff --git a/js/src/jsbuiltins.cpp b/js/src/jsbuiltins.cpp
>- JSSTRING_CHARS_AND_END(str, bp, end);
>+ str->getCharsAndEnd(bp, end);
Same note as on getCharsAndLength.
>diff --git a/js/src/jsgc.cpp b/js/src/jsgc.cpp
>+ if (str->isDependent())
>+ JS_CALL_STRING_TRACER(trc, str->depBase(), "base");
Same dep/dependent prefix concern.
>diff --git a/js/src/jsregexp.cpp b/js/src/jsregexp.cpp
>@@ -1999,42 +1999,42 @@ CompileRegExpToAST(JSContext* cx, JSToke
> /* Return the cached fragment for the given regexp, or NULL. */
>-static Fragment*
>-LookupNativeRegExp(JSContext* cx, void* hash, uint16 re_flags,
>- jschar* re_chars, size_t re_length)
>-{
>- Fragmento* fragmento = JS_TRACE_MONITOR(cx).reFragmento;
>- Fragment* fragment = fragmento->getLoop(hash);
>+static Fragment *
>+LookupNativeRegExp(JSContext *cx, void *hash, uint16 re_flags,
>+ const jschar *re_chars, size_t re_length)
>+{
>+ Fragmento *fragmento = JS_TRACE_MONITOR(cx).reFragmento;
>+ Fragment *fragment = fragmento->getLoop(hash);
* immediately adjacent to type is the new Rome (incrementally), don't do this.
>diff --git a/js/src/jsstr.cpp b/js/src/jsstr.cpp
>@@ -77,173 +77,174 @@
>- JS_ASSERT(JSSTRDEP_IS_PREFIX(str));
>- JSPREFIX_SET_BASE(str, base);
>- } else if (start <= JSSTRDEP_START_MASK) {
>- length = JSSTRDEP_LENGTH(str);
>- JSSTRDEP_REINIT(str, base, start, length);
>+ JS_ASSERT(str->depIsPrefix());
>+ str->prefixSetBase(base);
>+ } else if (start <= JSString::JSSTRDEP_START_MASK) {
>+ length = str->depLength();
>+ str->depReinit(base, start, length);
The JSSTR prefix there seems superfluous, JSString::DEPENDENT_START_MASK? Likewise-ish for all the other enums.
>@@ -2713,17 +2717,17 @@ js_InitStringClass(JSContext *cx, JSObje
>- if (length > JSSTRING_LENGTH_MASK) {
>+ if (length > JSString::JSSTRING_LENGTH_MASK) {
For less redundancy, as before, we should use LENGTH_MASK for less redundancy.
>diff --git a/js/src/jsstr.h b/js/src/jsstr.h
>+#define JSSTRING_BIT(n) ((size_t)1 << (n))
>+#define JSSTRING_BITMASK(n) (JSSTRING_BIT(n) - 1)
C++0x constexpr would be useful here, if only... (as would JS_BIT, except for the type difference)
>+ JSSTRING_LENGTH_BITS = (JS_BITS_PER_WORD - 4),
>+ JSSTRDEP_LENGTH_BITS = (JSSTRING_LENGTH_BITS / 2),
>+ JSSTRDEP_START_BITS = (JSSTRING_LENGTH_BITS-JSSTRDEP_LENGTH_BITS),
Parentheses are superfluous now that these are proper constants; also put spaces around - in the last.
>+ bool hasFlag(size_t flag) const {
>+ return (mLength & flag) != 0;
>+ }
I haven't seen this in the patch up until this point, and I doubt exposing flags directly is a good idea, rather than adding readable methods that wrap individual bits. Make both the above enum and this method private perhaps? (Friend of TraceRecorder would cover the tracer uses.)
>+ bool isMutable() const {
>+ return (mLength & (JSSTRFLAG_DEPENDENT | JSSTRFLAG_MUTABLE)) == JSSTRFLAG_MUTABLE;
>+ }
>+
>+ bool isAtomized() const {
>+ return (mLength & (JSSTRFLAG_DEPENDENT | JSSTRFLAG_ATOMIZED)) == JSSTRFLAG_ATOMIZED;
>+ }
I think this would be clearer as !isDependent() && hasFlag(mutable), mutatis mutandis for the second.
>+ jschar * flatChars() const {
Prevailing style these days for this is * immediately adjacent to type.
>+ /*
>+ * Special flat string initializer that preserves the JSSTR_DEFLATED flag.
>+ * Use this method when reinitializing an existing string (which may be
>+ * hashed to its deflated bytes. Newborn strings must use flatInit.
>+ */
Fix parenthesization while you're here.
>+ void depInit(JSString *bstr, size_t off, size_t len) {
>+ mLength = JSSTRFLAG_DEPENDENT |
>+ (off << JSSTRDEP_START_SHIFT) | len;
>+ mBase = bstr;
>+ }
>+
>+ /* See JSString::flatReinit. */
>+ void depReinit(JSString *bstr, size_t off, size_t len) {
>+ mLength = JSSTRFLAG_DEPENDENT |
>+ (mLength & JSSTRFLAG_DEFLATED) |
>+ (off << JSSTRDEP_START_SHIFT) | len;
>+ mBase = bstr;
>+ }
Add assertions to each of these that len is not too large, for the appropriate largeness for each; the flat versions have them, these don't.
>+ JSString * depBase() const {
>+ return mBase;
>+ }
Naming suggests this needs an assertion of dependence, either that or make it private.
>+ jschar * depChars() const {
>+ JS_ASSERT(isDependent());
>+ return (depBase()->isDependent())
Overparenthesization.
>+ void prefixInit(JSString *bstr, size_t len) {
>+ mLength = JSSTRFLAG_DEPENDENT | JSSTRFLAG_PREFIX | len;
>+ mBase = bstr;
>+ }
>+
>+ /* See JSString::flatReinit. */
>+ void prefixReinit(JSString *bstr, size_t len) {
>+ mLength = JSSTRFLAG_DEPENDENT | JSSTRFLAG_PREFIX |
>+ (mLength & JSSTRFLAG_DEFLATED) | len;
>+ mBase = bstr;
>+ }
len-check.
>diff --git a/js/src/jsxml.cpp b/js/src/jsxml.cpp
>+#define IS_STAR(str) ((str)->length() == 1 && *(str)->chars() == '*')
This can be an inline method, of course.
There are enough fun changes I suggest above that I think I want another look at this, even tho I don't.
Assignee | ||
Comment 2•15 years ago
|
||
Thanks for the fine review. So far, I've taken all your suggestions except these:
(In reply to comment #1)
> As noted on IRC, subclasses for flat/dependent strings may help this by
> eliminating the prefix, overriding the length method and such, but followup if
> deemed necessary is certainly fair.
Followup. Will you take it?
> Given perfect compilers this doesn't matter, but maybe return chars so that if
> this ever *isn't* inlined we won't burn the return-value register? Smart
> compilers wouldn't care, but do we trust ours enough to never not-inline and be
> slightly slower?
I still need to look at this--and performance numbers generally.
> >+ jschar * flatChars() const {
>
> Prevailing style these days for this is * immediately adjacent to type.
Haven't made this change yet.
(One other note - str_split touches JSSubString::chars, not JSString::mChars, so it doesn't need to be a friend.)
I also changed the constness of some methods, as discussed on IRC. I'm out of time for now. v2 tomorrow.
Comment 3•15 years ago
|
||
(In reply to comment #2)
> (In reply to comment #1)
> > As noted on IRC, subclasses for flat/dependent strings may help this by
> > eliminating the prefix, overriding the length method and such, but
> > followup if deemed necessary is certainly fair.
>
> Followup. Will you take it?
Can do, filed bug 497768.
Assignee | ||
Comment 4•15 years ago
|
||
Assignee: general → jorendorff
Assignee | ||
Comment 5•15 years ago
|
||
Need another minute to look at performance fluctuations. It's probably random -- code size difference with the patch is 3 bytes on linux.
(Without JS_ALWAYS_INLINE, the difference is 3977 bytes; apparently g++ -O3 doesn't inline those huge length() and chars() methods unless you insist.)
Attachment #382752 -
Attachment is obsolete: true
Assignee | ||
Comment 6•15 years ago
|
||
It was random.
(In reply to comment #1)
> Given perfect compilers this doesn't matter, but maybe return chars so that if
> this ever *isn't* inlined we won't burn the return-value register? Smart
> compilers wouldn't care, but do we trust ours enough to never not-inline and be
> slightly slower?
I'm now JS_ALWAYS_INLINEing these methods, so this isn't necessary--right?
> >+ jschar * flatChars() const {
>
> Prevailing style these days for this is * immediately adjacent to type.
So, I didn't make this change, and I want to try and explain my thinking, and then maybe we can come up with a better way out of this style morass we somehow got into.
I think the result of this approach is going to be a lot of old-style code living alongside new-style code for the foreseeable future, because many lines of code won't be edited for years. jsregexp.cpp is an example of how things are going to look.
And it sucks. Best possible world, we would have a consistent style everywhere. Failing that, we can at least have consistent style within each file.
TBH I'd like to see the *old* style prevail and jstracer.cpp changed. But whichever style we decide on, a ten-year campaign to change our coding style line by line sounds really annoying!
So, that's why my changes to jsstr.h are old-style. (I reverted the few whitespace changes I had in jsregexp.cpp; that didn't belong in this patch at all.)
Assignee | ||
Updated•15 years ago
|
Attachment #382976 -
Flags: review?(jwalden+bmo)
Comment 7•15 years ago
|
||
I'm in favor of T *v, T (*fp)(...), etc. It's the one true way. The Stroustrup style is tolerable (Mozilla uses it, mostly) but error prone and misleading at the margin. Agree on precedent overwhelming interloping style (which I was party to) in jstracer.cpp. We can revise that more easily.
Let the wiki style guide be updated, unless we need to parlay more.
/be
Comment 8•15 years ago
|
||
Reference types OTOH are strange enough (initialization is very different from assignment) that using T& r = lvalue seems better, with of course exactly one declarator per declaration. Thoughts? I'll take this to the wiki, don't want to hijack this bug.
/be
Comment 9•15 years ago
|
||
Comment on attachment 382976 [details] [diff] [review]
v2 - work in progress
(In reply to comment #6)
> I'm now JS_ALWAYS_INLINEing these methods, so this isn't necessary--right?
Ah, yes, should have thought of that myself, best of all worlds.
>diff --git a/js/src/jsstr.cpp b/js/src/jsstr.cpp
>+ } else if (start <= JSString::MAX_DEPENDENT_START) {
Mm, these enum names are much better than previously, I especially like the mask<->max_length aliases, much more intuitive at a glance.
>diff --git a/js/src/jsstr.h b/js/src/jsstr.h
>+ friend class ::TraceRecorder;
>+
>+ friend JSAtom *
>+ ::js_AtomizeString(JSContext *cx, JSString *str, uintN flags);
>+
>+ friend JSString * JS_FASTCALL
>+ ::js_ConcatStrings(JSContext *cx, JSString *left, JSString *right);
Lose the :: as noted on IRC, can't remember us using them anywhere else in SpiderMonkey. (I never understood why we use them in XPConnect and DOM-land, since as far as I can tell those are similarly unnecessary.)
>+ /*
>+ * Definitions for flags stored in the high order bits of JSString.length.
>+ * PREFIX and MUTABLE are two aliases for the same value.
>+ * PREFIX should be used only if DEPENDENT is set and
>+ * MUTABLE should be used only if the string is flat.
>+ * ATOMIZED is used only with the flat immutable strings.
This unusual wrapping is weird at first but far more sensible than the normal style, on second glance.
Attachment #382976 -
Flags: review?(jwalden+bmo) → review+
Comment 10•15 years ago
|
||
I still dislike the old whitespace style for T *foo. At this point I'm getting somewhat weary of the argument, tho, and I think I would take rewriting single-line declarations of multiple values where not all are non-pointers (good: T foo, bar, baz; okay: T *foo; bad: T *foo, bar; T *foo, *bar;) into multiple lines and call that enough to get rid of the worst style error T *foo encourages. This is probably an argument to be taken up elsewhere, and it's not relevant to the patch for now, but the discussion is here for the moment, and I lack the motivation to move it elsewhere right now.
Comment 11•15 years ago
|
||
(In reply to comment #9)
> >+ /*
> >+ * Definitions for flags stored in the high order bits of JSString.length.
> >+ * PREFIX and MUTABLE are two aliases for the same value.
> >+ * PREFIX should be used only if DEPENDENT is set and
> >+ * MUTABLE should be used only if the string is flat.
> >+ * ATOMIZED is used only with the flat immutable strings.
>
> This unusual wrapping is weird at first but far more sensible than the normal
> style, on second glance.
What usual style? Wrapping sentences freely is used where you don't have an ordered or unordered list. If you see a list wrapped as if it were prose, file a bug (or just fix it).
Here, though, we have a list. Looks unordered. How about - "bullets"?
/be
Comment 12•15 years ago
|
||
(In reply to comment #10)
> I still dislike the old whitespace style for T *foo.
Why? Likes and dislikes come and go, but the |T *p, oops| argument is objective enough. It's not a trump card -- we could spend significant time trying to switch the bulk of the code to Bjarne style. The trump card is jorendorff's point about being in mixed-style hell for years and KLOCs of code.
/be
Assignee | ||
Comment 13•15 years ago
|
||
Pushed with the redundant ::s removed and a few whitespace changes.
http://hg.mozilla.org/tracemonkey/rev/761e24450f2b
The wiki style guide now describes this style.
https://wiki.mozilla.org/JavaScript:SpiderMonkey:C_Coding_Style
(In reply to comment #11)
> (In reply to comment #9)
> > >+ /*
> > >+ * Definitions for flags stored in the high order bits of JSString.length.
> > >+ * PREFIX and MUTABLE are two aliases for the same value.
> > >+ * PREFIX should be used only if DEPENDENT is set and
> > >+ * MUTABLE should be used only if the string is flat.
> > >+ * ATOMIZED is used only with the flat immutable strings.
> >
> > This unusual wrapping is weird at first but far more sensible than the normal
> > style, on second glance.
>
> What usual style? Wrapping sentences freely is used where you don't have an
> ordered or unordered list. If you see a list wrapped as if it were prose, file
> a bug (or just fix it).
>
> Here, though, we have a list. Looks unordered. How about - "bullets"?
Funny story. In v1 and v2 I preserved the unusual wrapping from the original because I thought this was an unordered list. Rereading it, I find that it isn't. So instead of bullets, I just re-wrapped it like this:
/*
* Definitions for flags stored in the high order bits of mLength.
*
* PREFIX and MUTABLE are two aliases for the same bit. PREFIX should be
* used only if DEPENDENT is set and MUTABLE should be used only if the
* string is flat.
*
* ATOMIZED is used only with flat, immutable strings.
*/
So that's what I checked in.
Thanks, Jeff and Brendan.
Bug 498488, someday soon, I hope.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Comment 14•15 years ago
|
||
(In reply to comment #12)
> (In reply to comment #10)
> > I still dislike the old whitespace style for T *foo.
>
> Why? Likes and dislikes come and go, but the |T *p, oops| argument is objective
> enough.
Because we're declaring a variable |foo| to which the deref operator may (only sometimes!) be applied, not a variable |*foo| -- no more, no less than that. I think that argument's objective enough, too, for what it's worth.
Comment 15•15 years ago
|
||
(In reply to comment #14)
> (In reply to comment #12)
> > (In reply to comment #10)
> > > I still dislike the old whitespace style for T *foo.
> >
> > Why? Likes and dislikes come and go, but the |T *p, oops| argument is objective
> > enough.
>
> Because we're declaring a variable |foo| to which the deref operator may (only
> sometimes!) be applied, not a variable |*foo| -- no more, no less than that.
"not a variable |*foo|" makes no grammatical sense if variables are named by identifiers.
The C style going back to dmr is based on the "use" expression syntax matching the declarator expression syntax. This is detailed at
http://cm.bell-labs.com/cm/cs/who/dmr/chist.html
and it is self-consistent, albeit confusing to many when composing declarator modes, because declarator modes must be read "inside out" while the types and qualifiers come in from the left.
> I think that argument's objective enough, too, for what it's worth.
That argument (which does not make sense in terms of the C or C++ grammar) was not made; you only said you still dislike something, not why.
Jorendorff, shouldn't the updates affect
https://wiki.mozilla.org/JavaScript:SpiderMonkey:C%2B%2B_Coding_Style
in addition to the C style guide? My fault for making two, it was an attempt to layer the new on the old and provide continuity without duplication.
/be
Comment 16•15 years ago
|
||
(In reply to comment #15)
> "not a variable |*foo|" makes no grammatical sense if variables are named by
> identifiers.
> That argument (which does not make sense in terms of the C or C++ grammar)
Well, except that we humans don't visualize by the grammar -- we visualize at first pass by grouping and whitespace. I think it sensible to avoid a second pass when possible.
Comment 17•15 years ago
|
||
We're way OT, but "we humans" don't all see things the same, and programmers had better pay attention to the grammar.
Bjarne's C++ style tries to wish the language into Algol 68 shape, by turning a blind eye to the full grammar. That bites back, sometimes. It's not the end of the world, though.
/be
You need to log in
before you can comment on or make changes to this bug.
Description
•