Closed
Bug 611400
Opened 14 years ago
Closed 14 years ago
mjit::Compiler::finishThisUp: pc->ncode map wastes space
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
blocking2.0 | --- | betaN+ |
People
(Reporter: jseward, Assigned: jseward)
References
Details
(Keywords: memory-footprint, Whiteboard: [jsperf][fixed-in-tracemonkey])
Attachments
(1 file, 2 obsolete files)
(deleted),
patch
|
dvander
:
review+
|
Details | Diff | Splinter Review |
mjit::Compiler::finishThisUp callocs space to put a bunch of stuff
in, including a pc->ncode map. The map is mostly empty.
An extreme case is on kraken-1.1/imaging-darkroom, which for a 32-bit build
has a max-heap-liveness of 102MB. Of that, 12.2MB is occupied by
data calloc'd here, and that space is completely unused:
-------------------- 4 of 100 --------------------
max-live: 12,197,056 in 1 blocks
tot-alloc: 12,197,056 in 1 blocks (avg size 12197056.00)
deaths: 1, at avg age 1,432,667,675 (42.98% of prog lifetime)
acc-ratios: 0.00 rd, 0.00 wr (59 b-read, 129 b-written)
at 0x47B44FF: calloc (vg_replace_malloc.c:467)
by 0x81FF24D: js::mjit::Compiler::finishThisUp(js::mjit::JITSc
by 0x8215E47: js::mjit::Compiler::performCompilation(js::mjit:
by 0xC2F402F: ???
In this case a total of 3050540 map slots are allocated, of which
just 18 are non-zero. I guess this is because of the huge tables of
constants in this test (others behave similarly).
------
Problem is more general though. Running a browser with 5 reasonably
complex tabs open, there are in total 350498 slots allocated, of which
7049 are non-zero.
Estimating (from other data) that 3/4 of those slots are
simultaneously live in the browser, that's 3/4 * (350498 - 7049) *
sizeof(void*) = 2.06 MB of zeroes on a 64-bit target. That's more
than 2% of the total C++ heap size in this run (89.2MB).
------
Could we instead use an array of word-pairs, and binary-search as
necessary? This would remove around 95% of the wasted space.
Updated•14 years ago
|
Whiteboard: [jsperf]
Assignee | ||
Comment 1•14 years ago
|
||
Implements non sparse mapping w/ binary search, as per comment #0.
Reduces max live heap running kraken-1.1/imaging-darkroom from 102.5MB
to 90.0MB (32-bit Linux).
When running all of kraken-1.1 on 32-bit linux, there are no obvious
performance penalties. Insn count falls marginally from 35,187
million to 35,172 million. Miss count on a 4MB L3 cache falls from
35.5 million to 35.0 million. There are about 220000 binary searches
performed.
Assignee | ||
Comment 2•14 years ago
|
||
(In reply to comment #1)
> Created attachment 490131 [details] [diff] [review]
SS numbers (ss 0.9.1 from
svn.webkit.org/repository/webkit/trunk/SunSpider + pbiggar's
make-it-deterministic patch in 580532 comment 32)
on 32-bit x86 linux
In short, a 0.0193% increase in instruction count. OTOH it's a
nano-win on kraken in both insn counts and L3 misses, and it does
measurably reduce the heap size of the entire browser.
------------------------------------------------------------------
instructions executed (nb: "on-trace" may overestimate slightly)
------------------------------------------------------------------
3d-cube:
total 80,989,452 81,015,351 --
on-trace 45,103,763 45,103,789 --
3d-morph:
total 40,352,790 40,353,703 --
on-trace 23,498,398 23,498,392 --
3d-raytrace:
total 104,305,853 104,360,555 *1.001x worse*
on-trace 41,235,219 41,235,247 --
access-binary-trees:
total 26,369,799 26,372,751 --
on-trace 12,223,384 12,223,376 --
access-fannkuch:
total 89,080,367 89,101,292 --
on-trace 79,049,944 79,049,964 --
access-nbody:
total 29,204,526 29,219,975 *1.001x worse*
on-trace 15,532,117 15,532,109 --
access-nsieve:
total 36,560,446 36,558,001 --
on-trace 25,245,377 25,245,369 --
bitops-3bit-bits-in-byte:
total 7,463,084 7,463,382 --
on-trace 3,250,323 3,250,315 --
bitops-bits-in-byte:
total 36,858,840 36,861,468 --
on-trace 32,524,253 32,524,245 --
bitops-bitwise-and:
total 15,871,627 15,871,542 --
on-trace 12,016,982 12,016,974 --
bitops-nsieve-bits:
total 38,737,937 38,736,131 --
on-trace 33,392,857 33,392,853 --
controlflow-recursive:
total 17,387,529 17,387,196 --
on-trace 13,169,276 13,169,268 --
crypto-aes:
total 69,908,532 69,964,419 *1.001x worse*
on-trace 27,946,704 27,946,834 --
crypto-md5:
total 32,603,170 32,602,828 --
on-trace 4,787,562 4,787,560 --
crypto-sha1:
total 17,228,155 17,236,699 --
on-trace 6,623,850 6,623,852 --
date-format-tofte:
total 69,431,577 69,439,492 --
on-trace 21,307,289 21,307,281 --
date-format-xparb:
total 69,658,521 69,711,965 *1.001x worse*
on-trace 9,670,251 9,670,243 --
math-cordic:
total 43,487,414 43,487,827 --
on-trace 29,662,897 29,662,889 --
math-partial-sums:
total 22,765,364 22,770,643 --
on-trace 6,270,213 6,270,205 --
math-spectral-norm:
total 22,422,499 22,430,152 --
on-trace 13,634,579 13,634,577 --
regexp-dna:
total 49,520,074 49,523,108 --
on-trace 34,587,322 34,587,314 --
string-base64:
total 30,244,170 30,253,351 --
on-trace 9,282,553 9,282,545 --
string-fasta:
total 85,531,280 85,531,893 --
on-trace 24,495,021 24,495,023 --
string-tagcloud:
total 106,235,981 106,248,739 --
on-trace 17,118,481 17,118,473 --
string-unpack-code:
total 135,209,381 135,166,286 --
on-trace 20,853,993 20,853,985 --
string-validate-input:
total 43,909,353 43,924,497 --
on-trace 8,480,687 8,480,683 --
-------
all:
total 1,321,337,721 1,321,593,246 --
on-trace 570,963,295 570,963,365 --
Assignee | ||
Comment 3•14 years ago
|
||
Perf numbers in comments 1 and 2 are with "-j -m", I should add.
Assignee | ||
Updated•14 years ago
|
Attachment #490131 -
Flags: review?(dvander)
Comment 4•14 years ago
|
||
(In reply to comment #3)
> Perf numbers in comments 1 and 2 are with "-j -m", I should add.
Running with '-j -m -p' is best; that's what the browser effectively does. Not that it'll make a skerrick of difference in this case, I imagine.
Assignee | ||
Comment 5•14 years ago
|
||
(In reply to comment #4)
> Running with '-j -m -p' is best;
I'm not sure what a skerrick is. Perhaps a small boat, a bit like a
coracle?
Anyway, with -j -m -p the increase in instruction count on SS is
reduced, from 0.0193% (-j -m) to 0.0139%, or about 1 part in 7200.
The space savings on imaging-darkroom are unchanged.
------------------------------------------------------------------
instructions executed (nb: "on-trace" may overestimate slightly)
------------------------------------------------------------------
3d-cube:
total 69,741,119 69,786,418 *1.001x worse*
on-trace 44,497,984 44,497,992 --
3d-morph:
total 39,541,147 39,546,921 --
on-trace 24,587,583 24,587,575 --
3d-raytrace:
total 70,410,616 70,432,055 --
on-trace 41,270,450 41,270,442 --
access-binary-trees:
total 25,595,255 25,597,805 --
on-trace 12,242,688 12,242,680 --
access-fannkuch:
total 93,605,442 93,610,511 --
on-trace 88,475,968 88,475,960 --
access-nbody:
total 28,391,512 28,397,098 --
on-trace 16,274,177 16,274,169 --
access-nsieve:
total 36,615,173 36,618,067 --
on-trace 30,207,428 30,207,420 --
bitops-3bit-bits-in-byte:
total 7,489,837 7,490,205 --
on-trace 3,256,081 3,256,073 --
bitops-bits-in-byte:
total 35,331,790 35,332,687 --
on-trace 30,401,150 30,401,142 --
bitops-bitwise-and:
total 15,885,196 15,885,166 --
on-trace 12,019,216 12,019,208 --
bitops-nsieve-bits:
total 38,515,088 38,518,480 --
on-trace 33,394,926 33,394,922 --
controlflow-recursive:
total 17,387,939 17,387,650 --
on-trace 13,169,288 13,169,280 --
crypto-aes:
total 56,004,894 56,017,085 --
on-trace 32,159,499 32,159,491 --
crypto-md5:
total 24,715,625 24,736,064 *1.001x worse*
on-trace 12,371,649 12,371,647 --
crypto-sha1:
total 20,144,832 20,151,979 --
on-trace 8,772,674 8,772,676 --
date-format-tofte:
total 68,374,916 68,381,923 --
on-trace 21,500,778 21,500,770 --
date-format-xparb:
total 68,297,325 68,304,002 --
on-trace 10,329,790 10,329,782 --
math-cordic:
total 43,610,475 43,610,511 --
on-trace 29,515,359 29,515,351 --
math-partial-sums:
total 22,881,300 22,887,020 --
on-trace 6,310,886 6,310,878 --
math-spectral-norm:
total 32,735,683 32,736,644 --
on-trace 27,545,614 27,545,606 --
regexp-dna:
total 49,156,069 49,158,150 --
on-trace 34,581,182 34,581,174 --
string-base64:
total 30,431,623 30,434,918 --
on-trace 9,320,547 9,320,539 --
string-fasta:
total 64,711,451 64,714,231 --
on-trace 32,459,728 32,459,730 --
string-tagcloud:
total 105,565,938 105,569,290 --
on-trace 17,153,732 17,153,724 --
string-unpack-code:
total 133,481,056 133,481,308 --
on-trace 20,864,422 20,864,414 --
string-validate-input:
total 44,023,802 44,031,979 --
on-trace 8,583,556 8,583,552 --
-------
all:
total 1,242,645,103 1,242,818,167 --
on-trace 621,266,355 621,266,197 --
(gdb) p ((1.0 / (1242645103.0 / 1242818167.0))-1.0) * 100
$4 = 0.01392706570702007
Assignee | ||
Comment 6•14 years ago
|
||
1-line change compared to previous version. Reduces instruction
count overhead on sunspider down to 1 part in 7500.
Attachment #490131 -
Attachment is obsolete: true
Attachment #490575 -
Flags: review?(dvander)
Attachment #490131 -
Flags: review?(dvander)
Comment on attachment 490575 [details] [diff] [review]
remove partially redundant load
Sorry for the delay here. Thanks for doing this, I didn't realize nmap was so memory hungry. Patch looks good but would like to see some style changes before r+'ing, I promise faster turnaround :)
>+ if (nNmapLive > 0) {
>+ for (size_t i = 0; i < script->length; i++) {
>+ analyze::Bytecode *opinfo = analysis->maybeCode(i);
>+ if (opinfo && opinfo->safePoint) {
>+ Label L = jumpMap[i];
>+ JS_ASSERT(L.isValid());
>+ nmap[ix++] = (void*)(uintptr_t)i;
>+ nmap[ix++] = (uint8 *)(result + masm.distanceOf(L));
>+ }
Could this be something like:
struct NativeMapEntry {
jsbytecode *pc;
void *ncode;
};
instead?
>+inline void * bsearch_nmap(void** map, size_t nPairs, void* key)
>+{
>+ ssize_t lo = 0, hi = nPairs-1; /* lo, hi, mid must be signed! */
>+ while (1) {
>+ /* current unsearched space is from lo to hi, inclusive. */
>+ if (lo > hi) return NULL; /* not found */
>+ ssize_t mid = (lo + hi) / 2;
>+ void* key_mid = map[2 * mid];
>+ if (key < key_mid) { hi = mid-1; continue; }
>+ if (key > key_mid) { lo = mid+1; continue; }
>+ return map[2 * mid + 1];
>+ }
SpiderMonkey style wants
>+ if (lo > hi) return NULL; /* not found */
To be
> if (low > hi)
> return NULL;
And
>+ if (key > key_mid) { lo = mid+1; continue; }
To be
> if (key > key_mid) {
> lo = mid + 1;
> continue;
> }
Etc
> inline void *
> JSScript::nativeCodeForPC(bool constructing, jsbytecode *pc)
> {
>- void **nmap = nativeMap(constructing);
>+ js::mjit::JITScript *jit = getJIT(constructing);
>+ JS_ASSERT(jit);
You can remove this assert, since it'll null-deref pretty quickly.
>+ bool isCon = f.fp()->isConstructing();
Prefer "ctor" to "isCon" here and elsewhere
Attachment #490575 -
Flags: review?(dvander)
Updated•14 years ago
|
Blocks: JaegerShrink
Assignee | ||
Comment 8•14 years ago
|
||
Addresses all the c7 issues (thx for feedback). Also gets
rid of use of 'ssize_t' in bsearch_nmap as MSVC9 doesn't
take kindly to it.
Attachment #490575 -
Attachment is obsolete: true
Attachment #494730 -
Flags: review?(dvander)
Updated•14 years ago
|
Attachment #494730 -
Flags: review?(dvander) → review+
Assignee | ||
Updated•14 years ago
|
Keywords: checkin-needed
Updated•14 years ago
|
blocking2.0: --- → beta9+
Assignee: general → jseward
Status: NEW → ASSIGNED
Whiteboard: [jsperf] → [jsperf][fixed-in-tracemonkey]
Comment 10•14 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•14 years ago
|
Keywords: checkin-needed
Comment 11•14 years ago
|
||
As per today's meeting, beta 9 will be a time-based release. Marking these all betaN+. Please move it back to beta9+ if you believe it MUST be in the next beta (ie: trunk is in an unshippable state without this)
blocking2.0: beta9+ → betaN+
You need to log in
before you can comment on or make changes to this bug.
Description
•