Closed
Bug 598650
Opened 14 years ago
Closed 14 years ago
OperationCallbacks and GCs occurring during SunSpider tests
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
blocking2.0 | --- | beta8+ |
People
(Reporter: sayrer, Assigned: gwagner)
References
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(5 files, 1 obsolete file)
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
gal
:
review+
|
Details | Diff | Splinter Review |
This is much better than it was previously, but it's still occurring. We can't afford the impact of the variance introduced by this behavior. I'll attach a patch with my logging enabled.
Reporter | ||
Comment 1•14 years ago
|
||
Reporter | ||
Comment 2•14 years ago
|
||
Reporter | ||
Comment 3•14 years ago
|
||
STR:
Set the pref browser.dom.window.dump.enabled to true.
Apply the printf patch.
Visit http://people.mozilla.com/~sayrer/2010/08/logged-sunspider/results.html and click "run again".
Reporter | ||
Comment 4•14 years ago
|
||
Attachment #477506 -
Attachment is obsolete: true
Reporter | ||
Comment 5•14 years ago
|
||
Reporter | ||
Comment 6•14 years ago
|
||
Comment on attachment 477526 [details]
sunspider log with more logging
Facts gleaned from the log:
We are executing a LastDitchGC on startup.
The GC is never called from the public API during a test, only between them.
LastDitchGC is sometimes called during the tests.
TriggerGC is called from NewGCArena during the tests.
If OperationCallbacks are triggered without a GC occuring, they do not penalize our score.
Reporter | ||
Comment 7•14 years ago
|
||
Run with outliers caused by GC during tests:
http://bit.ly/bLho1V
Same run, but with outliers changed to match the other values:
http://bit.ly/d4EfCw
For those following along at home, here's what this GC behavior is costing our total score:
** TOTAL **: *1.169x as slow* 273.4ms +/- 2.6% 319.6ms +/- 7.9%
Updated•14 years ago
|
Assignee: general → anygregor
Reporter | ||
Comment 8•14 years ago
|
||
The check at the top of NewGCArena is responsible for all of the GCs during tests. We are basically bumping up on that 37.5MB limit a bunch of times.
Updated•14 years ago
|
blocking2.0: --- → ?
Comment 9•14 years ago
|
||
Gregor, we should fix this. Suggestions?
Assignee | ||
Comment 10•14 years ago
|
||
A few points:
The LastDitchGC is only on the very first startup after an update (When the new update site appears). I don't know what "extra" work the browser is doing there.
All the following starups shouldn't have a LastDitchGC.
The number of GCs we are running during the benchmarks should be reduced now after the compartmental GC tuning. We decided to wait for bhacketts patch for fine tuning. I will measure again.
We can change to a more aggressive working set strategy but this makes much more sense if we have a fixed upper limit like proposed in bug 592907.
I am working on bug 593729 right now. That will give us more insight on how we should tune our GC.
So once 584917 and 593729 are landed we should measure again.
Updated•14 years ago
|
Blocks: JaegerSpeed
Comment 12•14 years ago
|
||
(In reply to comment #7)
> For those following along at home, here's what this GC behavior is costing our
> total score:
>
> ** TOTAL **: *1.169x as slow* 273.4ms +/- 2.6% 319.6ms +/- 7.9%
This is very similar to what I'm seeing (following along at home...nice).
To paraphrase something bz said in another bug... "Reporters don't run in shell".
Assignee | ||
Comment 13•14 years ago
|
||
Back to this...
That's the current situation for starting the browser and run the V8 benchmark suite. We call the GC very often from RefillTypedFreeList. We should avoid this.
Assignee | ||
Comment 14•14 years ago
|
||
And for the shell-harness:
call js_GC from js_DestroyContext
enter js_GC for cx: 0x100412b90,
enter JS_GC public API
enter js_GC for cx: 0x100412b90
enter JS_GC public API
enter js_GC for cx: 0x100412b90
call js_GC from RefillTypedFreeList
enter js_GC for cx: 0x100412b90
enter JS_GC public API
enter js_GC for cx: 0x100412b90
enter JS_GC public API
enter js_GC for cx: 0x100412b90
enter JS_GC public API
enter js_GC for cx: 0x100412b90
enter JS_GC public API
enter js_GC for cx: 0x100412b90
call js_GC from RefillTypedFreeList
enter js_GC for cx: 0x100412b90
enter JS_GC public API
enter js_GC for cx: 0x100412b90
{
"v8-crypto": 229,
"v8-deltablue": 311,
"v8-earley-boyer": 898,
"v8-raytrace": 478,
"v8-regexp": 169,
"v8-richards": 170,
"v8-splay": 374
It seems that we call the API function between each benchmark. Nevertheless we also force a GC twice. We should get rid of it.
Assignee | ||
Comment 15•14 years ago
|
||
So the problem in the shell is not the GC tuning. We are running out of memory because of the heap size that is set for the shell. We either have to increase the heap size or we have to stick with the GC.
This brings us back to bug 592907 where I want to bring the shell settings closer to the browser settings.
Comment 16•14 years ago
|
||
(In reply to comment #15)
> So the problem in the shell is not the GC tuning. We are running out of memory
> because of the heap size that is set for the shell. We either have to increase
> the heap size or we have to stick with the GC.
> This brings us back to bug 592907 where I want to bring the shell settings
> closer to the browser settings.
I agree. We want the shell to be a good model for browser perf.
Reporter | ||
Updated•14 years ago
|
blocking2.0: ? → beta8+
Assignee | ||
Comment 17•14 years ago
|
||
For running SS in the browser (starting the browser, running SS and closing the browser) I get following (note that GC calls might have more than one cause):
Total GC runs: 22
JS_GC API calls: 10
LastDitchGC calls: 1
Call from AllocateArena: 0
Call from Chunk::allocateArena: 0
Call from OnTooMuchMalloc: 9
Calls because of shapeOverflow: 0
Calls from destroyContext: 3
Successful Maybe_GC calls: 0
Calls via operationCallback: 8
Increasing the gcMallocBytes to 128MB instead of 64MB leads to:
Total GC runs: 21
JS_GC API calls: 18
LastDitchGC calls: 0
Call from AllocateArena: 0
Call from Chunk::allocateArena: 0
Call from OnTooMuchMalloc: 0
Calls because of shapeOverflow: 0
Calls from destroyContext: 3
Successful Maybe_GC calls: 0
Calls via operationCallback: 0
So we would move the GC trigger to the browser.
Running SS 6 times in the browser shows:
tip: after:
[ms] +-[%] [ms] +-[%]
216.7 3.0 209.3 0.9
222 5.0 211.2 1.1
221.5 4.5 218.1 5.9
223.6 4.6 209.8 0.6
218.6 3.6 209.4 1.1
222.6 4.9 210.1 1.0
We reduce the overall runtime and the variance but the individual results are all over the place:
TEST COMPARISON FROM TO DETAILS
=============================================================================
** TOTAL **: 1.059x as fast 222.6ms +/- 4.9% 210.1ms +/- 1.0% significant
=============================================================================
3d: ?? 30.0ms +/- 1.6% 30.5ms +/- 3.0% not conclusive: might be *1.017x as slow*
cube: - 11.0ms +/- 0.0% 11.0ms +/- 0.0%
morph: ?? 4.6ms +/- 8.0% 4.7ms +/- 7.3% not conclusive: might be *1.022x as slow*
raytrace: ?? 14.4ms +/- 2.6% 14.8ms +/- 7.5% not conclusive: might be *1.028x as slow*
access: - 30.4ms +/- 1.6% 30.3ms +/- 2.2%
binary-trees: 1.143x as fast 4.0ms +/- 0.0% 3.5ms +/- 10.8% significant
fannkuch: - 15.6ms +/- 2.4% 15.6ms +/- 3.2%
nbody: ?? 4.7ms +/- 7.3% 5.0ms +/- 0.0% not conclusive: might be *1.064x as slow*
nsieve: ?? 6.1ms +/- 3.7% 6.2ms +/- 4.9% not conclusive: might be *1.016x as slow*
bitops: - 13.6ms +/- 5.6% 12.7ms +/- 3.8%
3bit-bits-in-byte: - 0.7ms +/- 49.3% 0.6ms +/- 61.5%
bits-in-byte: - 7.1ms +/- 5.7% 6.8ms +/- 4.4%
bitwise-and: - 1.3ms +/- 26.6% 1.0ms +/- 0.0%
nsieve-bits: - 4.5ms +/- 11.2% 4.3ms +/- 8.0%
controlflow: - 2.9ms +/- 14.0% 2.8ms +/- 10.8%
recursive: - 2.9ms +/- 14.0% 2.8ms +/- 10.8%
crypto: - 16.7ms +/- 4.1% 16.7ms +/- 5.7%
aes: - 9.4ms +/- 5.3% 9.0ms +/- 6.5%
md5: ?? 4.4ms +/- 8.4% 4.5ms +/- 8.4% not conclusive: might be *1.023x as slow*
sha1: ?? 2.9ms +/- 14.0% 3.2ms +/- 14.1% not conclusive: might be *1.103x as slow*
date: - 29.6ms +/- 3.6% 29.5ms +/- 3.5%
format-tofte: ?? 17.0ms +/- 4.0% 17.3ms +/- 4.4% not conclusive: might be *1.018x as slow*
format-xparb: - 12.6ms +/- 4.0% 12.2ms +/- 3.7%
math: - 19.6ms +/- 2.5% 19.2ms +/- 2.4%
cordic: - 4.9ms +/- 8.3% 4.9ms +/- 4.6%
partial-sums: - 9.8ms +/- 3.1% 9.5ms +/- 4.0%
spectral-norm: - 4.9ms +/- 4.6% 4.8ms +/- 6.3%
regexp: - 13.8ms +/- 42.3% 11.5ms +/- 3.3%
dna: - 13.8ms +/- 42.3% 11.5ms +/- 3.3%
string: - 66.0ms +/- 15.3% 56.9ms +/- 1.9%
base64: - 3.4ms +/- 10.9% 3.3ms +/- 10.5%
fasta: - 11.2ms +/- 5.0% 10.8ms +/- 2.8%
tagcloud: - 21.1ms +/- 39.3% 15.4ms +/- 2.4%
unpack-code: - 21.5ms +/- 1.8% 21.4ms +/- 3.2%
validate-input: - 8.8ms +/- 66.3% 6.0ms +/- 0.0%
Comment 18•14 years ago
|
||
> So we would move the GC trigger to the browser.
Which I think tries to trigger at 'idle' times... not sure whether it's managing in this case.
Assignee | ||
Comment 19•14 years ago
|
||
Another possibility to reduce the gcMallocBytes pressure for SS is to modify the code for ropes. If we look at the allocation distribution only for new ropes in js_ConcatStrings we get:
mozilla@alien:~/gwagner/meter4$ uniq -c out2.txt
104193 128
55179 256
18400 512
221 1024
177 2048
177 4096
176 8192
176 16384
176 32768
110 65536
44 131072
44 262144
44 524288
308 1048576
This definitely causes GC runs right now.
Andreas said we could modify the counter when we actually flatten the string.
The numbers are coming from js_concatStrings:
/* Neither child is a rope: need to make a new buffer. */
size_t capacity;
size_t allocSize = RopeAllocSize(length, &capacity);
printf("%d\n", allocSize);
buf = (JSRopeBufferInfo *) cx->malloc(allocSize);
Comment 20•14 years ago
|
||
We should allocate memory without accounting and then accout when and iff only we actually flatten and actually touch that memory we eagerly allocated at rope creation time.
Assignee | ||
Comment 21•14 years ago
|
||
Ups my numbers are from the malloc in ObtainRopeBuffer and not from js_ConcatStrings.
Assignee | ||
Comment 22•14 years ago
|
||
The actual code in ObtainRopeBuffer is
cx->free(sourceBuffer);
buf = (JSRopeBufferInfo *) cx->malloc(allocSize);
We should only account for the difference of the 2 buffers and not for the whole new buffer.
Assignee | ||
Comment 23•14 years ago
|
||
numbers for this patch are in #17.
Attachment #487390 -
Flags: review?(gal)
Updated•14 years ago
|
Attachment #487390 -
Flags: review?(gal) → review+
Assignee | ||
Comment 24•14 years ago
|
||
Updated•14 years ago
|
Whiteboard: fixed-in-tracemonkey
Comment 25•14 years ago
|
||
Shouldn't this be marked as resolved? (as well as the other merged TM bugs).
Reporter | ||
Comment 26•14 years ago
|
||
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•