Closed Bug 503424 Opened 15 years ago Closed 15 years ago

TM/nanojit: add built-in support for compiled-trace and -guard profiling

Tracking

()

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

status1.9.2

---

beta1-fixed

People

(Reporter: n.nethercote, Assigned: jseward)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 12 obsolete files)

patch which does profiling, as outlined above 15 years ago Julian Seward [:jseward] (deleted), patch		Details \| Diff \| Splinter Review
updated version of patch, which also profiles frag exits 15 years ago Julian Seward [:jseward] (deleted), patch		Details \| Diff \| Splinter Review
cleaned up & debugged version of previous patch 15 years ago Julian Seward [:jseward] (deleted), patch		Details \| Diff \| Splinter Review
fixes critical bug in all previous versions of the patch 15 years ago Julian Seward [:jseward] (deleted), patch		Details \| Diff \| Splinter Review
rebase previous patch to 31101:bb6086ec04b8 15 years ago Julian Seward [:jseward] (deleted), patch		Details \| Diff \| Splinter Review
rebase previous patch to 31574:12a9bea2d331 15 years ago Julian Seward [:jseward] (deleted), patch		Details \| Diff \| Splinter Review
fix an assertion failure in previous incarnation 15 years ago Julian Seward [:jseward] (deleted), patch		Details \| Diff \| Splinter Review
rebased following deletion of Fragmento 15 years ago Julian Seward [:jseward] (deleted), patch		Details \| Diff \| Splinter Review
Rebase des Tages 15 years ago Julian Seward [:jseward] (deleted), patch		Details \| Diff \| Splinter Review
patch v9 (rebase only) 15 years ago Julian Seward [:jseward] (deleted), patch		Details \| Diff \| Splinter Review
rebased; w/ correct code size measurements, & addresses review comments 15 years ago Julian Seward [:jseward] (deleted), patch		Details \| Diff \| Splinter Review
updated to address all remaining review issues 15 years ago Julian Seward [:jseward] (deleted), patch		Details \| Diff \| Splinter Review
rebased; made thread-safe; conform to house style; minor tidying 15 years ago Julian Seward [:jseward] (deleted), patch	graydon : review+	Details \| Diff \| Splinter Review

Nicholas Nethercote [inactive]

Reporter

Description

•

15 years ago

If we want to improve nanojit's codegen, it makes sense to focus on the areas where it will make a difference.  So it would be useful to know which are the hot traces.  I don't think we have that facility at the moment.

This bug is a proposal to add a facility whereby each compiled trace has a counter that is incremented each time it is executed.  Than at termination it would print out a list (and the code) of the hottest blocks (top 10, top 100, whatever), indicating what proportion of trace executions they account for.  Then we can look at improving the codegen for those hot blocks.  Valgrind has a facility like this and it has proven very useful.

David Mandelin [:dmandelin]

Comment 1

•

15 years ago

Me want. It would also be nice to record at least the total time taken (and thus the average) by each trace.

Julian Seward [:jseward]

Assignee

Comment 2

•

15 years ago

+1 for the block counting.  The timing stuff (I presume you mean rdtsc
adding-up) could be useful too; but it could confuse the issue if we
get tied up in delays due to cache misses, which are independent of
the code generation quality, mostly.

Robert Sayre

Comment 3

•

15 years ago

Looks like the right thing to inspect.

Nicholas Nethercote [inactive]

Reporter

Comment 4

•

15 years ago

Some preliminary design notes:

- We need to add a counter for every trace.

- We need a way of incrementing the counter each time a trace is executed.  This code can be added (conditionally, of course) in Assembler::gen() at the end of the loop.  It's better to increment at the start of a trace rather than the end so that the trace is still counted even if there's an early exit.

- We need a way of iterating through all the traces at the end, to find the hot ones.

- We need a way of printing out LIR and asm for the hot fragments at the end.  So these cannot be disposed of along the way.  Or perhaps the asm can be disposed of -- if the LIR is kept we could just re-assemble it in order to print out the native code.

Because the LIR and asm code pages tend to get destroyed as we go (I think;  I don't really understand the code memory handling very well) it seems best to store all the necessary stuff in a separate table.  For each trace, it might contain:

- the counter
- a copy of the LIR
- a copy of the asm (unless reassembling at the end is easy)
- some info indicating where in the original JavaScript it came from, if possible

Making copies of all that code will be slow, but this doesn't have to be fast, and then we wouldn't have to worry about holding onto LIR/asm code that we would normally dispose of (which would uglify our normal code paths).

David Mandelin [:dmandelin]

Comment 5

•

15 years ago

(In reply to comment #4)
> Some preliminary design notes:
> 
> - We need to add a counter for every trace.
> 
> - We need a way of incrementing the counter each time a trace is executed. 
> This code can be added (conditionally, of course) in Assembler::gen() at the
> end of the loop.  It's better to increment at the start of a trace rather than
> the end so that the trace is still counted even if there's an early exit.

I had assumed the generation of the counter would be in jstracer.cpp, but I think your proposal is better as it will then be available to all nanojit users.

> - We need a way of iterating through all the traces at the end, to find the hot
> ones.
> 
> - We need a way of printing out LIR and asm for the hot fragments at the end. 
> So these cannot be disposed of along the way.  Or perhaps the asm can be
> disposed of -- if the LIR is kept we could just re-assemble it in order to
> print out the native code.

I often don't need to look at the LIR/asm--JS source location info is often enough. What I have sometimes wanted and never had is description of the trace in terms of the JS source lines it goes through. (I see that you already mention that below, too. I believe it is fairly easy because the tracer knows what JS source position it is at when it records each bytecode.) But I'm sure the LIR/asm will be useful sometimes as well.

> Because the LIR and asm code pages tend to get destroyed as we go (I think;  I
> don't really understand the code memory handling very well) it seems best to
> store all the necessary stuff in a separate table.  For each trace, it might
> contain:
> 
> - the counter
> - a copy of the LIR
> - a copy of the asm (unless reassembling at the end is easy)
> - some info indicating where in the original JavaScript it came from, if
> possible
> 
> Making copies of all that code will be slow, but this doesn't have to be fast,
> and then we wouldn't have to worry about holding onto LIR/asm code that we
> would normally dispose of (which would uglify our normal code paths).

I think the copying is entirely reasonable. If it does have an excessive perf cost, which I doubt it will, it can be optional.

Nicholas Nethercote [inactive]

Reporter

Comment 6

•

15 years ago

(In reply to comment #5)
> I often don't need to look at the LIR/asm--JS source location info is often
> enough.

I assume you're mostly looking at tracing things -- for those working on NJ code generation, the LIR/asm is very important :)

Robert Sayre

Updated

•

15 years ago

Blocks: 504587

Julian Seward [:jseward]

Assignee

Comment 7

•

15 years ago

(In reply to comment #4)
Keeping a copy of the LIR and asm around sounds big and complex.
Maybe it's not necessary: all we need to store is an integer key
which uniquely identifies the fragment.  When profiling, print
the code etc out in the normal TMFLAGS=blah way, along with the
key.  When the profiled code is running, inc counters in the usual
way.  When profiled run finished, just print the (key,counter)
pairs.  We can then look in the logfile to find the blocks.  Not
so convenient, but easier to implement.

Julian Seward [:jseward]

Assignee

Comment 8

•

15 years ago

(In reply to comment #7)
Am working on a patch to do this.  (commenting here so as to check
i'm not duplicating work w/ anyone else -- yell if I am).

Nicholas Nethercote [inactive]

Reporter

Comment 9

•

15 years ago

(In reply to comment #8)
>
> Am working on a patch to do this.  (commenting here so as to check
> i'm not duplicating work w/ anyone else -- yell if I am).

That's what the "assigned" field is for :)

Assignee: general → jseward

Julian Seward [:jseward]

Assignee

Comment 10

•

15 years ago

Attached patch patch which does profiling, as outlined above (obsolete) (deleted) — Details — Splinter Review

Julian Seward [:jseward]

Assignee

Comment 11

•

15 years ago

(In reply to comment #10)
(description):

Patch adds a simple profiler, which counts, for each fragment,
every time the fragment passes the loopEntry point (gal told
me this is the right place to put the counter increment; I have
no idea).

At the end of the run, the top 50 blocks are shown, one line for
each block, including the percentage of total block runs, and
the cumulative statistics too.  This makes it very easy to see
which the hot blocks are.

This of course only works for debug builds.  It is integrated
with and controlled by the existing TMFLAGS= mechanism.  To get
profiling, you must do TMFLAGS=fragprofile.

Each printed cost line has a "FragID=XXXXXX" part, where XXXXXX
is a 6-digit number.  This makes it possible to find the fragment
in the normal debug printing, because those FragID numbers are
printed there too.

Hence the minimal useful invokation is TMFLAGS=fragprofile,minimal ...

Code from both tracer and regexps is profiled.

Only x86 is supported in this patch, although other archs should be
trivial.  It's only a matter of implementing Assembler::asm_inc_profctr
to generate the load-inc-store insns.

The implementation is mostly trivial.  class Fragment gets two new
fields, profHash (to identify the block) and profCounts, to hold the
count.  The only real difficulty is to extract those fields at the
time the Fragment is destroyed.  This is difficult because making 
Fragment::~Fragment virtual doesn't work, since there are some Fragments
which are really VMFragments, but others within nanojit which are
vanilla Fragments, and hence whose destruction is not observable
outside nanojit.  My least-worst solution (in the patch) is to 
have, in Fragmento, an optional finalizer callback used when destroying a 
Fragment.  The profile machinery in jstracer.cpp uses this to harvest
counts any time any Fragment is destroyed.

Example output for trace-tests.js follows.

----------------- Per-fragment execution counts ------------------

Total count = 153260112

  0:     (32.63%  50004998)     32.63%  50004998     FragID=000893
  1:     (39.15%  60004996)      6.52%   9999998     FragID=005744
  2:     (45.68%  70004994)      6.52%   9999998     FragID=005746
  3:     (50.25%  77012463)      4.57%   7007469     FragID=007873
  4:     (54.77%  83933739)      4.52%   6921276     FragID=007779
  5:     (59.28%  90855015)      4.52%   6921276     FragID=007795
  6:     (63.80%  97776291)      4.52%   6921276     FragID=007836
  7:     (68.31% 104697567)      4.52%   6921276     FragID=007847
  8:     (72.83% 111618843)      4.52%   6921276     FragID=007859
  9:     (74.14% 113631756)      1.31%   2012913     FragID=007747
 10:     (75.46% 115644552)      1.31%   2012796     FragID=007814
 11:     (76.77% 117657286)      1.31%   2012734     FragID=007816
 12:     (78.08% 119660304)      1.31%   2003018     FragID=007737
 13:     (79.38% 121663210)      1.31%   2002906     FragID=007808
 14:     (80.61% 123536928)      1.22%   1873718     FragID=007727
 15:     (81.83% 125410583)      1.22%   1873655     FragID=007804
 16:     (82.95% 127124279)      1.12%   1713696     FragID=007731
 17:     (84.06% 128837912)      1.12%   1713633     FragID=007806
 18:     (85.15% 130493991)      1.08%   1656079     FragID=007821
 19:     (86.23% 132150062)      1.08%   1656071     FragID=007823
 20:     (87.31% 133805849)      1.08%   1655787     FragID=007757
 21:     (88.39% 135461605)      1.08%   1655756     FragID=007759
 22:     (89.33% 136901602)      0.94%   1439997     FragID=007724
 23:     (90.27% 138341599)      0.94%   1439997     FragID=007768
 24:     (91.21% 139781596)      0.94%   1439997     FragID=007784
 25:     (92.15% 141221593)      0.94%   1439997     FragID=007801
 26:     (93.08% 142661590)      0.94%   1439997     FragID=007828
 27:     (94.02% 144101587)      0.94%   1439997     FragID=007839
 28:     (94.35% 144601585)      0.33%    499998     FragID=001431
 29:     (94.59% 144961583)      0.23%    359998     FragID=007771
 30:     (94.82% 145321581)      0.23%    359998     FragID=007787
 31:     (95.06% 145681579)      0.23%    359998     FragID=007831
 32:     (95.29% 146041577)      0.23%    359998     FragID=007842
 33:     (95.52% 146401575)      0.23%    359998     FragID=007850
 34:     (95.76% 146761573)      0.23%    359998     FragID=007864
 35:     (95.99% 147121570)      0.23%    359997     FragID=007830
 36:     (96.23% 147481567)      0.23%    359997     FragID=007841
 37:     (96.44% 147805522)      0.21%    323955     FragID=007775
 38:     (96.65% 148129477)      0.21%    323955     FragID=007791
 39:     (96.86% 148453432)      0.21%    323955     FragID=007833
 40:     (97.08% 148777387)      0.21%    323955     FragID=007844
 41:     (97.29% 149101342)      0.21%    323955     FragID=007855
 42:     (97.50% 149425297)      0.21%    323955     FragID=007869
 43:     (97.71% 149749250)      0.21%    323953     FragID=007835
 44:     (97.92% 150073203)      0.21%    323953     FragID=007846
 45:     (98.13% 150397147)      0.21%    323944     FragID=007778
 46:     (98.34% 150721091)      0.21%    323944     FragID=007794
 47:     (98.55% 151045035)      0.21%    323944     FragID=007858
 48:     (98.77% 151368939)      0.21%    323904     FragID=007872
 49:     (98.88% 151540897)      0.11%    171958     FragID=007803

Julian Seward [:jseward]

Assignee

Updated

•

15 years ago

Attachment #389800 - Flags: review?(gal)

Nicholas Nethercote [inactive]

Reporter

Comment 12

•

15 years ago

Nice!  This will be extremely useful.  

It gives equal weight to every fragment... that's probably good enough to start with.

Some nits about the output:

>   0:     (32.63%  50004998)     32.63%  50004998     FragID=000893
>   1:     (39.15%  60004996)      6.52%   9999998     FragID=005744
>   2:     (45.68%  70004994)      6.52%   9999998     FragID=005746
>   3:     (50.25%  77012463)      4.57%   7007469     FragID=007873

I find it easier to read with the timing columns swapped, ie. self-time before cumulative time.

Putting a header line giving each column a title would be good.  (Then you wouldn't need "FragID=" on every line.)  

For the FragID, is the number in exactly the same form as that printed for the "minimal" output?  I see stuff like this:

  Recording starting from trace-test.js:299@1610

which doesn't look quite the same as a 6-digit FragId.  If it's in exactly the same form, it makes it much easier for Vim users to bounce between the table and the actual fragment (in Vim you can use '#' and '*' to do forward/backward searches for the exact word that the cursor is currently on).

Also, you put "gal" for review rather than "gal@uci.edu"... not sure if that'll work?

Andreas Gal :gal

Comment 13

•

15 years ago

Yeah, gal@uci.edu is my bugzilla account for historic reasons. :gal is the short-hand.

Random style nit: func(arg, arg), not func( arg, arg ). Also not assert ( this ).

Could we do the ranking with a script, and just output raw data? Or maybe write it to a file? The stats gathering and ranking code looks a bit out of place in jstracer.cpp (IMO).

I don't particularly like the entanglement of tracer logging/verbose bits and nanojit. Can we disentangle this a bit? I wonder if we can just use regular lir to model the counters. It would minimally affect register allocation, but probably not noticably. If we are really worried about that we could add LIR_XINC or something like that.

The finalizer hook is also a bit ugly. I wonder whether its necessary.

In general, I think this functionality should be in nanojit. That would solve the entanglement problem and instead of the finalizer the code can be ifd-defed straight into the destructor.

Julian Seward [:jseward]

Assignee

Comment 14

•

15 years ago

Attached patch updated version of patch, which also profiles frag exits (obsolete) (deleted) — Details — Splinter Review

Major update to patch.  This not only profiles fragments but also
exits, making it possible to see exactly where we're leaving trace.

Also, for each fragment, it now shows the static exit count and the
static instruction byte count, so as to give a bit of a handle on
generated code bloat.

Is WIP.  Do not review.  Works well enough to run v8-richards, but
asserts (for debug builds) on trace-tests.  Further debugging needed.

Attachment #389800 - Attachment is obsolete: true

Attachment #389800 - Flags: review?(gal)

Julian Seward [:jseward]

Assignee

Comment 15

•

15 years ago

Attached patch cleaned up & debugged version of previous patch (obsolete) (deleted) — Details — Splinter Review

Cleaned up version of the patch in comment #14.  Suitable for
general use/testing.  Survives trace-test.js.  Changes:

* don't assert when Fragments are recycled; handle correctly
* also profile exits created by jsregexp (untested)
* fix space leaks in result-printing code
* minor readability changes in output printing

Attachment #392628 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Comment 16

•

15 years ago

Here's an example of what the output now looks like, for perfect.js.
Recommended usage: TMFLAGS=fragprofile,minimal,recorder

First there's a flat profile with one line per frag, containing start
counts for the frag, plus cumulative info, number of static exits and
static instructions for the frag, and the FragID so you can find it
easily in the recorder (or any other phase) output.

Then follows the per-exit count.  They're self explanatory.  Each
exit is labelled with a GuardID, which is printed as part of LIR,
so again the precise exits are findable in the debug output.

This is a pretty boring example.  v8-richards or v8-crypto produce
more interesting results.

----------------- Per-fragment execution counts ------------------

Total count = 2686

  0:     (72.23%   1940)   72.23%   1940   (se  10, sib  405)    FragID=000005
  1:     (90.80%   2439)   18.58%   499    (se   8, sib  322)    FragID=000001
  2:     (100.00%  2686)    9.20%   247    (se  10, sib  405)    FragID=000003
  3:     (100.00%  2686)    0.00%   0      (se   0, sib    0)    FragID=000002
  4:     (100.00%  2686)    0.00%   0      (se   0, sib    0)    FragID=000004

Total static instruction bytes (sib) = 1132, total static exits (se) = 28

Analysis by exit counts

FragID=000005, total count 1940:
   GuardID=012        248 (12.78%)
   Looped (013)      1692 (87.22%)

FragID=000001, total count 499:
   GuardID=009          1 ( 0.20%)
   Looped (010)       498 (99.80%)

FragID=000003, total count 247:
   GuardID=012          1 ( 0.40%)
   Looped (013)       246 (99.60%)

FragID=000002, total count 0:
   Looped (000)         0 ( 0.00%)

FragID=000004, total count 0:
   Looped (008)         0 ( 0.00%)

Julian Seward [:jseward]

Assignee

Updated

•

15 years ago

Summary: TM/nanojit: add counters that count how many times each compiled trace is executed → TM/nanojit: add built-in support for compiled-trace and -guard profiling

David Mandelin [:dmandelin]

Comment 17

•

15 years ago

>----------------- Per-fragment execution counts ------------------
>
>Total count = 2686
>
>  0:     (72.23%   1940)   72.23%   1940   (se  10, sib  405)    FragID=000005
>  1:     (90.80%   2439)   18.58%   499    (se   8, sib  322)    FragID=000001

May I suggest removing the text labels from the data rows and using column headers instead (the spacing is still bad in this version):

  Exec-Count        Cum-Exec-Count  Static-Exits     Bytes    Frag-ID
  (72.23%   1940)   72.23%   1940       10           405      000005
  (90.80%   2439)   18.58%   499         8           322      000001

IMO we currently have a problem with hard-to-read, hard-to-machine-parse debugging output. Your example is better than most of what we have but I want it better yet. :-) I would also submit that the exit count stats are not in fact self-explanatory for someone who doesn't already kind of know what you are doing here.

Julian Seward [:jseward]

Assignee

Comment 18

•

15 years ago

Attached patch fixes critical bug in all previous versions of the patch (obsolete) (deleted) — Details — Splinter Review

All previous versions of patch under certain circumstances fail to harvest
profile info from Fragments that are delete'd, giving nonsense profile data
for some programs (not all).  Found whilst chasing small testcases related to
v8-richards.

This version fixes it.  Also incorporates output formatting suggestions per
dmandelin in comment #16.

Attachment #392693 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Comment 19

•

15 years ago

Attached patch rebase previous patch to 31101:bb6086ec04b8 (obsolete) (deleted) — Details — Splinter Review

Rebased to TM tip of about four hours back.  Expect further
breakage in days to come, given that Fragmento is scheduled
for demolition.

Attachment #392922 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Comment 20

•

15 years ago

Attached patch rebase previous patch to 31574:12a9bea2d331 (obsolete) (deleted) — Details — Splinter Review

Attachment #394128 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Comment 21

•

15 years ago

Attached patch fix an assertion failure in previous incarnation (obsolete) (deleted) — Details — Splinter Review

.. and add an extra assertion checking consistency of counts.

Attachment #394847 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Comment 22

•

15 years ago

Attached patch rebased following deletion of Fragmento (obsolete) (deleted) — Details — Splinter Review

Attachment #394943 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Updated

•

15 years ago

Depends on: 512744

Julian Seward [:jseward]

Assignee

Comment 23

•

15 years ago

Attached patch Rebase des Tages (obsolete) (deleted) — Details — Splinter Review

Attachment #396627 - Attachment is obsolete: true

Graydon Hoare :graydon

Comment 24

•

15 years ago

(In reply to comment #13)

> Could we do the ranking with a script, and just output raw data? Or maybe write
> it to a file? The stats gathering and ranking code looks a bit out of place in
> jstracer.cpp (IMO).

I don't think this case is really doing *so* much formatting (basically collect, sort, and print the top 50) that it warrants depending on an external processing script (that itself is likely to bitrot). If we were doing histograms and graphs, sure; but this is just a tabular listing. We already have plenty of stat-tabulating code over in js/src: gc stats, scope stats, hashtable stats, etc. Some of it even does histograms :)

> I don't particularly like the entanglement of tracer logging/verbose bits and
> nanojit. Can we disentangle this a bit? 

I've filed bug 513339 and bug 513341 to try to pull some of the logging noise out of both jstracer and the rest of nanojit. I agree it's getting a bit overgrown, but gathering up all the chatter and depositing it somewhere is a different task from getting this profiler working in the first place.

> instead of the finalizer the code can be ifd-defed
> straight into the destructor.

Fragments have no destructors anymore; they're arena-allocated and released en-masse. I think it's OK to keep the fragment book-keeping outside as the fragment topology and lifecycle is really only something TM knows about. We're *deleting* that knowledge from NJ presently, as it was insufficient anyway.

Comments on the patch:

- Unless I'm misreading, I think the finalizer function *pointer* stored in the tracemonitor is vestigial from an earlier version of this patch; nobody appears to store into it or call through it, so it should be removed. All finalization calls occur directly to js_FragProfiling_FragFinalizer.

- avmplus::List<> is going away, use nanojit::Seq<>

- Likewise avmplus::SortedMap<>, use nanojit::HashMap<>

- Seems you're incrementally bubble-sorting the final ranked array? We already use std::sort() and qsort() elsewhere in spidermonkey, I'm sure it's OK to here.

- Style: this being C++, you can loop-scope a loop variable.

   for (Foo* bar = ...; ...; ...) { ... }

rather than

   Foo *bar;
   for (...; ...; ...) { ... }

- Style: while (true) rather than while (1)

- Style: void foo() rather than void foo(void)

- Style: as gal said, foo(Bar *b) rather than foo( Bar *bar )

- As of njn's logging work, we're directing output through to js_LogController.printf. Same meaning, but it centralizes control over that stream.

- On line 4823, you're adding a commented-out call to the finalizer. You either want to un-comment this or remove it, I'm not sure which.

- Considering that there's no hash function involved -- they're just sequential identifier-numbers -- rename profHash and guardID in GuardPI, and rename profHash in VMFragment to fragID or fragmentID. Generally try to avoid two structures with same-named fields.

- The use of a static profHashCtr is no good, we're possibly running multithreaded. If you want you can use JS_ATOMIC_INCREMENT, or else store a counter in the (per-thread) trace monitor and copy it into the VMFragment when you construct it.

Otherwise nice work, sorry the lifecycles of everything have been changing underfoot.

Nicholas Nethercote [inactive]

Reporter

Comment 25

•

15 years ago

(In reply to comment #24)
> 
> - As of njn's logging work, we're directing output through to
> js_LogController.printf.

I can't take credit for that, I think it was jseward :)

Julian Seward [:jseward]

Assignee

Comment 26

•

15 years ago

(In reply to comment #24)
Graydon, thanks for the comments.  Should all be no problem to fix,
but one thing I had a query on:

> - The use of a static profHashCtr is no good, we're possibly running
> multithreaded. If you want you can use JS_ATOMIC_INCREMENT, or else store a
> counter in the (per-thread) trace monitor and copy it into the VMFragment when
> you construct it.

I'd love to use JS_ATOMIC_INCREMENT, but that's going to bork for the TR
embedding, no?  Could we just copy JS_ATOMIC_INCREMENT into the NJ tree
so as to have an independent copy?  I'd be happy to do that, and tidy up.

Graydon Hoare :graydon

Comment 27

•

15 years ago

Perhaps then you can have the Fragment constructor take a fragID in verbose mode (default to 0) and move the profHashCtr out to jstracer.cpp; then you can use JS_ATOMIC_INCREMENT in the VMFragment constructor call to the Fragment constructor.

Andreas Gal :gal

Comment 28

•

15 years ago

Atomic operations are very slow. Make sure you know what you are doing. That might skew measurements significantly. Also, we _definitively_ don't want atomic profiling updates in product builds ever (I overheard we might want to run more profiling all the time).

Graydon Hoare :graydon

Comment 29

•

15 years ago

Yeah. The other option -- a per-TraceMonitor counter -- makes more sense to me anyway. We always have a TraceMonitor handy when we're constructing a VMFragment. Just pass one in.

Julian Seward [:jseward]

Assignee

Comment 30

•

15 years ago

Attached patch patch v9 (rebase only) (obsolete) (deleted) — Details — Splinter Review

Attachment #396789 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Comment 31

•

15 years ago

Attached patch rebased; w/ correct code size measurements, & addresses review comments (obsolete) (deleted) — Details — Splinter Review

Attachment #397852 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Comment 32

•

15 years ago

(In reply to comment #31)
> Created an attachment (id=399823) [details]
> rebased; w/ correct code size measurements, & addresses review comments

Rebased w/ much help from Graydon, following deletion of
Fragmento and of distinct loop-top labels.  Also incorporates
Graydon's fix for correct code size measurement as taken from
bug 512744.

Julian Seward [:jseward]

Assignee

Comment 33

•

15 years ago

Attached patch updated to address all remaining review issues (obsolete) (deleted) — Details — Splinter Review

Attachment #399823 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Updated

•

15 years ago

Attachment #400188 - Flags: review?(graydon)

Julian Seward [:jseward]

Assignee

Comment 34

•

15 years ago

(In reply to comment #33)
> updated to address all remaining review issues

Regarding this, from comment #24

> - Seems you're incrementally bubble-sorting the final ranked array?
> We already use std::sort() and qsort() elsewhere in spidermonkey,
> I'm sure it's OK to here.

This is sort-of true.  But the final ranked array is only N_TOP_BLOCKS
in size, regardless of how many Fragments are processed.  This avoids
constructing an array with potentially thousands of entries, one for
each Fragment, sorting it, and throwing away all but the top 50
entries.  Since the array is of fixed size, the cost for processing N
Fragments is O(50 x N), which isn't too bad given that this is
non-production code.

So I've left it as-is.

Julian Seward [:jseward]

Assignee

Comment 35

•

15 years ago

Attached patch rebased; made thread-safe; conform to house style; minor tidying (deleted) — Details — Splinter Review

Attachment #400188 - Attachment is obsolete: true

Attachment #400188 - Flags: review?(graydon)

Graydon Hoare :graydon

Comment 36

•

15 years ago

Comment on attachment 400791 [details] [diff] [review]
rebased; made thread-safe; conform to house style; minor tidying

Thanks for your patience during the review cycle. I'll land from here.

Attachment #400791 - Flags: review+

Andreas Gal :gal

Comment 37

•

15 years ago

"I don't particularly like the entanglement of tracer logging/verbose bits and
nanojit. Can we disentangle this a bit? I wonder if we can just use regular lir
to model the counters. It would minimally affect register allocation, but
probably not noticably. If we are really worried about that we could add
LIR_XINC or something like that."

The comment above was not addressed as far as I can tell. The checkin breaks x86-64 and sparc (dvander and leon will fix those quickly I am sure). Review from adobe for that many NJ changes would have been nice too. This isn't an r- comment. Just saying that the patch could be nicer. We can probably evolve it further as we go along. It seems useful enough to land it now instead of constantly rebasing it.

Julian Seward [:jseward]

Assignee

Comment 38

•

15 years ago

(In reply to comment #37)
> "I don't particularly like the entanglement of tracer
> logging/verbose bits and nanojit. Can we disentangle this a bit?

It's less entangled than it was previously.  Now, Fragment and
GuardRecord have counter fields, and nanojit knows to add code to
increment them if requested.  You have to tell NJ where the loop label
of a trace is, and give it a unique identifier for each new Fragment,
but that's about it.  The problem of extracting and presenting the
results is all on the TM side.

> I wonder if we can just use regular lir to model the counters.

That leads to an inconsistency, in which the fragment-entry counter
inc can be expressed in LIR, but the exit inc, which is part of the
exit block, can't be.  At least, not without generating contorted LIR
which reads the condition just before an guard, and if true,
increments the relevant exit counter.  And that would lead to a
situation where the output of each LIR_ov would be used twice, which
IIUC would break at least the x86 backend.

> The checkin breaks x86-64 and sparc

I don't have a sparc box to test on, and up till now x86-64 hasn't
been runnable from the tree.

> Review from adobe for that many NJ changes would have been nice too.

Yes, my bad there.

Graydon Hoare :graydon

Comment 39

•

15 years ago

Phew!

http://hg.mozilla.org/tracemonkey/rev/ce354a3e3b47 for initial checkin,
http://hg.mozilla.org/tracemonkey/rev/72d4907df250 to fix the windows build (mistakenly breaking *every* browser build, because browser builds have different ... header visibility?), and so finally
http://hg.mozilla.org/tracemonkey/rev/70f5a2e93b94

A little more multi-platform testing would be good in the future.

Whiteboard: fixed-in-tracemonkey

Blake Kaplan (:mrbkap) (inactive)

Comment 40

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/ce354a3e3b47

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Blake Kaplan (:mrbkap) (inactive)

Comment 41

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/72d4907df250

Blake Kaplan (:mrbkap) (inactive)

Comment 42

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/70f5a2e93b94

Robert Sayre

Updated

•

15 years ago

Flags: wanted1.9.2?

Robert Sayre

Comment 43

•

15 years ago

http://hg.mozilla.org/releases/mozilla-1.9.2/rev/f24e05517a62

status1.9.2: --- → beta1-fixed

Robert Sayre

Updated

•

15 years ago

Flags: wanted1.9.2? → wanted1.9.2+

Bob Clary [:bc] (inactive)

Comment 44

•

15 years ago

js/src/trace-test/tests/basic/arith.js
js/src/trace-test/tests/basic/testRegExpTest.js
js/src/trace-test/tests/basic/testShiftLeft.js
js/src/trace-test/tests/basic/testSideExitInConstructor.js

Flags: in-testsuite+

You need to log in before you can comment on or make changes to this bug.