Closed Bug 558723 Opened 14 years ago Closed 13 years ago

clean up JS #includes

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 634839

People

(Reporter: n.nethercote, Assigned: n.nethercote)

Details

Attachments

(1 file, 1 obsolete file)

SpiderMonkey's #includes have become a bit of a mess, particularly because of the recent introduction of jsfooinlines.h files, which themselves became necessary due to the conversion of various macros into inline functions.

(Nb: This messiness isn't just academic, it has made life difficult for me recently -- when trying to add new inline functions I've had hard-to-decipher warnings about some of them being used-but-undefined, and I'm sick of playing the game of "add a random #include to a file to see if that fixes it", as well as its sibling "try to work out why jsfoo.h gets #included in jsbar.cpp but not in jsbaz.cpp".)

I'll start by enumerating the principles that Brendan wants, as discussed on IRC.  Once we've agreed on those, I'll start fixing the code to satisfy these rules.

- A .h file should be includable by itself, e.g. if you include a.h in a.cpp you shouldn't have to include b.h as well.  This is testable by compiling a C++ module that contains a single #include and nothing else.

- A corollary of this is that any inline function defined in a .h file cannot depend on any names (variables, functions, even macros) defined in another .h file.  Any function with such a dependency must instead be defined in a jsfooinlines.cpp file.

- Includes should be listed at the top of a file.  The exception is for include that must be within a #if/#ifdef;  they should be near the top of the file where possible.

- System includes should be listed first, in alphabetical order.  Then local includes, in alphabetical order.  Then a blank line and any jsfooinlines.h includes, in alphabetical order.  For example:

  #include <a.h>
  #include <q.h>
  #include "jsp.h"
  #include "jsx.h"

  #include "jspinlines.h"
  #include "jsxinlines.h"


Does that sound right?

Something else I've wondered... some #includes have this comment:

  #include "jshash.h" /* Added by JSIFY */

Is that significant?  Judging from jsify.pl these comments were added almost 12 years ago.  Can they be removed?
I guess another corollary of the first principle is that circular dependencies between .h files are not allowed.
Thanks for this bug -- it's a real issue, for sure.

> - A .h file should be includable by itself, e.g. if you include a.h in a.cpp
> you shouldn't have to include b.h as well.  This is testable by compiling a C++
> module that contains a single #include and nothing else.

This is sound. It should be clear that if the x.cpp that includes a.h also has need of interface declarations from b.h, then x.cpp should include b.h directly. No bootlegging via internal/implementation/interface-abstracted .h file nesting.

> - A corollary of this is that any inline function defined in a .h file cannot
> depend on any names (variables, functions, even macros) defined in another .h
> file.

This is slightly too strong. If b.h includes a.h because b.h defines a class subtyped from a class (struct, whatever) in a.h, or has other sizeof/offsetof dependencies that require b.h to include a.h, then inline methods/functions in b.h may depend on a.h contents.

> Any function with such a dependency must instead be defined in a
> jsfooinlines.cpp file.

This is important when there are many, and non-interface (non-signature-type) dependencies, for implementing the inline function/method bodies. It's always clear without inline that implementation != interface, since .cpp != .h. But inline introduces a new kind of .h, wherefore our jsfooinlines.h convention.

> - System includes should be listed first, in alphabetical order.  Then local
> includes, in alphabetical order.  Then a blank line and any jsfooinlines.h
> includes, in alphabetical order.  For example:
> 
>   #include <a.h>
>   #include <q.h>
>   #include "jsp.h"
>   #include "jsx.h"
> 
>   #include "jspinlines.h"
>   #include "jsxinlines.h"
> 
> 
> Does that sound right?

Right, but we have historical NSPR-forked jsprf.h, jsutil.h, jsarena.h, jsbit.h and other files segregated without a blank line, just by being grouped ahead of the "JS not NSPR" .h files. We should probably lose this confusing bit of history and sort all the local non-inline includes.

> Something else I've wondered... some #includes have this comment:
> 
>   #include "jshash.h" /* Added by JSIFY */
> 
> Is that significant?  Judging from jsify.pl these comments were added almost 12
> years ago.  Can they be removed?

Yes, please. That kind of noise mattered for about a month, then became ugly deadwood. It's due to that NSPR forking of files into js/src to make it stand alone. From 1998.

/be
(In reply to comment #2)
> This is important when there are many, and non-interface (non-signature-type)
> dependencies, for implementing the inline function/method bodies. It's always
> clear without inline that implementation != interface, since .cpp != .h.

Of course, it's not "always clear" even excluding inline methods, since C++ and C seem to require types to be visible to declare members, and members (esp. private ones in C++) are implementation-not-interface.

What saves us from over-including due to member types, often, is the member being a pointer-to-T, and the ability to declare opaque pointer types in C via struct tags and struct tag typedefs, hence jsprvtd.h and jspubtd.h; in C++ via class and struct forward declarations (alas nothing for enum).

But implementation chocolate gets into interface peanut butter even without inline -- the primal sin is #include, the pre-processor.

/be
(In reply to comment #2)
> 
> This is sound. It should be clear that if the x.cpp that includes a.h also has
> need of interface declarations from b.h, then x.cpp should include b.h
> directly. No bootlegging via internal/implementation/interface-abstracted .h
> file nesting.

Yes, but it's easy to bootleg unintentionally.  But it's a good aim.


> > - A corollary of this is that any inline function defined in a .h file cannot
> > depend on any names (variables, functions, even macros) defined in another .h
> > file.
> 
> This is slightly too strong. If b.h includes a.h because b.h defines a class
> subtyped from a class (struct, whatever) in a.h, or has other sizeof/offsetof
> dependencies that require b.h to include a.h, then inline methods/functions in
> b.h may depend on a.h contents.

Hmm, actually, if b.h include a.h then it should be ok for any inline function body in b.h to refer to any identifier from a.h as well, right?  That preserves the first principle.


> > - System includes should be listed first, in alphabetical order.  Then local
> > includes, in alphabetical order.  Then a blank line and any jsfooinlines.h
> > includes, in alphabetical order.  For example:
> > 
> >   #include <a.h>
> >   #include <q.h>
> >   #include "jsp.h"
> >   #include "jsx.h"
> > 
> >   #include "jspinlines.h"
> >   #include "jsxinlines.h"
> 
> Right, but we have historical NSPR-forked jsprf.h, jsutil.h, jsarena.h, jsbit.h
> and other files segregated without a blank line, just by being grouped ahead of
> the "JS not NSPR" .h files. We should probably lose this confusing bit of
> history and sort all the local non-inline includes.

That's what I figured, from what you said on IRC.

 
> >   #include "jshash.h" /* Added by JSIFY */
> > 
> > Can they be removed?
> 
> Yes, please. That kind of noise mattered for about a month, then became ugly
> deadwood. It's due to that NSPR forking of files into js/src to make it stand
> alone. From 1998.

Can jsify.pl be removed too?
I did the "compile with a single #include" test on js/src/*.h, the following list shows which ones (34 out of 66) had problems.  Note that not all of these necessarily need fixing, as fixing one broken header may cause others to work again.

jsapi.h             -
jsarena.h           -
jsarray.h           bad
jsatom.h            -
jsatominlines.h     bad
jsbit.h             -
jsbool.h            bad
jsbuiltins.h        -
jsclist.h           -
jscntxt.h           bad
jscntxtinlines.h    bad
jscompat.h          -
jscpucfg.h          -
jsdate.h            bad
jsdbgapi.h          -
jsdhash.h           -
jsdtoa.h            -
jsdtracef.h         bad (?)
jsemit.h            bad
jsexn.h             bad
jsfun.h             bad
jsgc.h              bad
jshash.h            -
jshashtable.h       -
jsinterp.h          bad
jsinttypes.h        -
jsiter.h            bad
jslibmath.h         -
jslock.h            -
jslong.h            -
jsmath.h            bad
jsnum.h             bad
jsobj.h             bad
jsobjinlines.h      bad
json.h              bad
jsopcode.h          -
jsotypes.h          bad
jsparse.h           bad
jsprf.h             -
jspropertycache.h   -
jspropertycacheinlines.h    bad
jspropertytree.h    -
jsprvtd.h           -
jspubtd.h           -
jsregexp.h          bad
jsscan.h            bad
jsscope.h           bad
jsscopeinlines.h    bad
jsscript.h          -
jsscriptinlines.h   bad
jsstaticcheck.h     -
jsstdint.h          -
jsstr.h             bad
jsstrinlines.h      bad
jstask.h            bad
jstl.h              -
jstracer.h          bad
jstypedarray.h      bad
jstypes.h           -
jsutil.h            bad
jsvector.h          bad
jsversion.h         -
jsxdrapi.h          -
jsxml.h             bad
prmjtime.h          -
resource.h          -
BTW, js-confdefs.h is auto-included in by the build system for all compiler invocations, so I think it's safe for a header to implicitly depend on it.
It also pays to use the same compile flags as a normal build, when doing an include-a-single-header-file build.
Why should a user be required to #include both "jsfoo.h" and jsfooinlines.h"?  It seems to me that jsfoo.h should just #include jsfooinlines.h itself; that way, one can think of "jsfoo.h" as "what I need to use foos".  Whether foos use inline functions is an implementation detail.
(In reply to comment #8)
> Why should a user be required to #include both "jsfoo.h" and jsfooinlines.h"? 
> It seems to me that jsfoo.h should just #include jsfooinlines.h itself; that
> way, one can think of "jsfoo.h" as "what I need to use foos".  Whether foos use
> inline functions is an implementation detail.

The problem is that the implementation detail costs everyone including jsfoo.h, even if most do not invoke the inline(s) that drag in the kitchen sink.

We do not want over-including just for an inlining optimization. Over-including breaks abstraction too:

* It makes stuff visible you shouldn't see (leading to more bootlegging by accident or on purpose).

* It makes stuff visible that may collide with local code in the including .cpp that did not expect the implementation-not-interface namespace pollution. This is mostly about macros, yeah.

* It can slow down builds -- we've seen this in spades in Firefox, when someone put a #include "jsnum.h" in "nsContentUtils.h".

/be
Oh, and perhaps the best reason not to force implementation includes on all interface includers just because some inline methods need the implementation .h files:

* It will create cycles in the current codebase. This could be fixed by splitting of headers into pieces that can be topologically sorted. But that sometimes is make-work if the only cause is inline-ing.

The jsfooinlines.h compromise seemed better to us at the time we started doing it for all the reasons given. Maybe it's not, but even if we come up with a minimal, topo-sorted dependency graph, implementation invading interface due to the damn pre-processor seems a poor reason for all the make-work.

The valgrind no-nesting approach (I recall it from SVR-old AT&T Unixes) goes to the other extreme. jsfooinlines.h is in the middle, and at least keeps impl. bits (as opposed to interface) pulled together only in .cpp files, not in .h files.

Bug agree that it's a compromise, so unsatisfying. In particular, forgetting to include the right jsfooinlines.h means you get a fairly obscure error message.

/be
jimb, inlines often use multiple header files, so including jsobjinlines.h in jsobj.h is pointless. We could have put the stuff in jsobj.h directly if that worked.
(In reply to comment #11)
> jimb, inlines often use multiple header files, so including jsobjinlines.h in
> jsobj.h is pointless. We could have put the stuff in jsobj.h directly if that
> worked.

Right, I understand that now.  I didn't realize the distinction was necessary for correctness; I had thought it was for organizational purposes.
(In reply to comment #2)
> 
> It should be clear that if the x.cpp that includes a.h also has
> need of interface declarations from b.h, then x.cpp should include b.h
> directly. No bootlegging via internal/implementation/interface-abstracted .h
> file nesting.

I'm most of the way through a patch, and it's clear to me that this is a nice aim, but almost impossible to attain in practice.  How do you verify you aren't bootlegging in x.cpp?  You'd have to look at every single thing imported by x.cpp, work out which .h file declares it, and make sure you've included that .h file directly.  The transitivity of #include really sucks.

I say "almost impossible" because there is a way to attain it -- use the Valgrind-style headers-don't-include-headers approach.  That way bootlegging is impossible.  More generally, with that approach the compiler will enforce your principles for you.  The downside is that it can be verbose;  it's really annoying if you want to include jscntxt.h and you have to include eight other .h files first.  (This problem has never been so bad in Valgrind, maybe because Valgrind is relatively small, or maybe because the module system was designed under these constraints from the start.)
  
> If b.h includes a.h because b.h defines a class
> subtyped from a class (struct, whatever) in a.h, or has other sizeof/offsetof
> dependencies that require b.h to include a.h, then inline methods/functions in
> b.h may depend on a.h contents.

Again, it's a nice idea but really hard to enforce.

If bootlegging is so difficult to avoid, is it worth outlawing it?  The only real downside I can see is that it can be difficult to know what is being included, but that'll always be true unless we use the Valgrind-style approach.
Bootlegging is the least evil here. Anti-modularity of the valgrind kind, where dependents have to know about a dependency's dependencies, I call worse :-P.

/be
(In reply to comment #14)
> Bootlegging is the least evil here. Anti-modularity of the valgrind kind, where
> dependents have to know about a dependency's dependencies, I call worse :-P.

Sure, there's pros and cons to both approaches and I'm not suggesting we change SpiderMonkey to use Valgrind's approach.  But we shouldn't fool ourselves into thinking that we'll avoid bootlegging, esp. since the typical programmer's approach is to randomly add #include statements until the thing compiles again :)
Yeah, I never thought laws would prevent crime either ;-).

Srsly, the only way to stamp out bootlegging is automation. At SGI many years ago I dreamt of a static analysis framework that would use-count every definition and declaration in a .h file, and identify bootleggers. Sixgill and possibly even our GCC-based static analysis plugins may be up to this task. Low priority!

/be
(In reply to comment #16)
> Srsly, the only way to stamp out bootlegging is automation. At SGI many years
> ago I dreamt of a static analysis framework that would use-count every
> definition and declaration in a .h file, and identify bootleggers.

Also identify useless direct and nested includes, of course.

/be
I've hit one roadblock in the quest to alphabeticize include statements.  On Windows, jsstdint.h must be included before nanojit/nanojit.h, because both files try to define the stdint types, because jsstdint.h #defines _STDINT_H which suppresses nanojit/nanojit.h's attempt.  Hmm.
I mentioned this on irc but forgot to repeat it. There's often one weird rule that makes no sense. Long ago, for Windows 3.1, we had a "jsstddef.h" you *had* to include first (even before system headers).

Let's make jsstdint.h the one weird thing. It'll be ok if we keep it to one.

/be
(In reply to comment #18)
> I've hit one roadblock in the quest to alphabeticize include statements.  On
> Windows, jsstdint.h must be included before nanojit/nanojit.h, because both
> files try to define the stdint types, because jsstdint.h #defines _STDINT_H
> which suppresses nanojit/nanojit.h's attempt.  Hmm.

Yeah, the problem there is the asymmetry: VMPI.h checks for jsstdint.h's definitions, but not vice versa; and jsstdint.h defines _STDINT_H, but VMPI.h doesn't.
How would this do, to decouple jsstdint.h and VMPI.h?  It needs approval for nanojit, too; who should I r? for that?
Attachment #438857 - Flags: review?(nnethercote)
(In reply to comment #21)
> 
> How would this do, to decouple jsstdint.h and VMPI.h?  It needs approval for
> nanojit, too; who should I r? for that?

Not sure if that's ok -- Nanojit is used by Tamarin as well.  I don't know how significant _STDINT_H, eg. if it's generated by configure or something.  And putting MOZ_* things into Nanojit is a bit weird.
jsstdint.h must come first as an extra rule is not terrible, if you can't get Nanojit buy-in (using a non-MOZ_ macro name might help there :-P).

/be
Bug 559132 might lead to the relaxation of the jsstdint.h-first rule, but for this bug I can live with it.
(In reply to comment #22)
> Not sure if that's ok -- Nanojit is used by Tamarin as well.  I don't know how
> significant _STDINT_H, eg. if it's generated by configure or something.  And
> putting MOZ_* things into Nanojit is a bit weird.

Right, I was trying to accomodate Tamarin.  _STDINT_H was chosen arbitrarily by Graydon in bug 519856.  It just happens to be the preprocessor symbol #defined by <stdint.h> on Linux; it would never be #defined if AVMPLUS_WIN32 is.  So its use in Tamarin is already an accomodation of SM.

The idea here would simply be to stop using a symbol outside our namespace.  I've updated the patch to use NANOJIT_STDINT_TYPES_DEFINED instead.
Second cut at patch: use NANOJIT_STDINT_TYPES_DEFINED as symbol name.
Attachment #439284 - Flags: review?(nnethercote)
Attachment #438857 - Attachment is obsolete: true
Attachment #438857 - Flags: review?(nnethercote)
(In reply to comment #26)
> Created an attachment (id=439284) [details]
> Let js/src/jsstdint.h and js/src/nanojit/VMPI.h communicate about <stdint.h>
> emulation.
> 
> Second cut at patch: use NANOJIT_STDINT_TYPES_DEFINED as symbol name.

This allows arbitrary ordering, but I don't think that allowing TM or NJ (whoever gets there first) to define the stdint types is the right way to do it.  I mentioned 559132, I think it should fix this by requiring either NJ or TM/TR to define them, but not both.
Attachment #439284 - Flags: review?(nnethercote) → review-
(In reply to comment #27)
> This allows arbitrary ordering, but I don't think that allowing TM or NJ
> (whoever gets there first) to define the stdint types is the right way to do
> it.  I mentioned 559132, I think it should fix this by requiring either NJ or
> TM/TR to define them, but not both.

I'm not getting how this would work.  If NJ needs them, it is a separable component, so it must define them.  Thus TM should depend on NJ's definitions. Is that what you have in mind?
NJ is designed to be embedded in a bigger program, the "embedder".  In practice, TM and TR are the only two embedders.  NJ provides a bunch of declarations/definitions for the embedder to use.  NJ is not totally stand-alone, however, and so requires the embedder to provide a few declarations/definitions.  (See nanojit/avmplus.cpp, for example, which despite its name, is not a core part of nanojit -- TM uses it, but TR doesn't.)  The interface between NJ and the embedder isn't very clear, hence bug 559132.

That's why TM (the embedder) could provide the definitions of these stdint types for NJ's use.  In this case that is probably best, as TM needs to define those types for many places where NJ isn't used, eg. in the interpreter.  Make sense?
So NJ will not define them, but require its clients to define them somehow?  "Don't include NJ headers until after you have defined the <stdint.h> types, using whatever shenanigans you like"?
(In reply to comment #30)
> So NJ will not define them, but require its clients to define them somehow? 
> "Don't include NJ headers until after you have defined the <stdint.h> types,
> using whatever shenanigans you like"?

Something like that, I guess, it's all rather unclear at the moment, hence bug 559132.  But having whichever gets there first (NJ or the embedder) declare them is not good.
(In reply to comment #16)
> 
> Srsly, the only way to stamp out bootlegging is automation. At SGI many years
> ago I dreamt of a static analysis framework that would use-count every
> definition and declaration in a .h file, and identify bootleggers. Sixgill and
> possibly even our GCC-based static analysis plugins may be up to this task. Low
> priority!

Bug 634839 may be the panacea here!
I will dup this forward to bug 634839;  if this stuff is ever going to be improved, an automated tool is the right way to do it.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: