Closed Bug 582789 Opened 14 years ago Closed 14 years ago

TM: ropes read barrier is very expensive

Categories

(Core :: JavaScript Engine, defect)

Other Branch
x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: gal, Unassigned)

Details

Unfortunately the ropes read barrier is extremely expensive. I have a patch that speeds up TM by > 6% by inlining charAt and unit string comparison. In both cases we have a read barrier now, which messes up code generation quite a bit. As a result, much of the speedup is gone. How much did ropes win us? Are we sure ropes really pay for themselves (especially now that we eagerly allocate memory for them and take malloc hits).
How much win were ropes for JM? I think what hurts me here is that this read barrier is not rare, and for the same location its hit or not hit frequently. So I can't just guard on it, its basically a guaranteed exit.
On http://arewefastyet.com/individual.php : Ropes were responsible for the 2.6ms speedup on string-base64 and the 6.1ms speedup on string-validate-input between Jul 12 and Jul 20. They are also necessary for the 12.2ms speedup on regex-dna on Jul 22. In total, that's 20.9ms, which is about a 3% speedup for SS. The JM speedup on regexp-dna was a little more, so it was about a 3.5% total speedup for JM. Of course, ropes also avoid a lot of cases where we previously took n^2 time, such as repeatedly prepending to a string. If we want to make the rope case more rare, we could have a check in js_ConcatStrings so that we only generate a rope if the resulting string is above a certain size. It looks like pretty much all of the sunspider cases of charCodeAt in a loop use a string that's external to the loop, so we flatten in the first iteration and then we have a flat string for the remainder of the loop. Maybe it's possible to recognize that a variable will always refer to a particular JSString, and flatten that JSString at trace compile time and emit code that assumes that the string is flat? I don't know the tracer well enough to know if that's possible. Even before the ropes patch, there wasn't any simple general way to get the chars from a string, since we could get arbitrarily-long chains of dependent strings (this was fixed in bug 578189, which depended on the ropes patch).
(In reply to comment #0) > TM: ropes read barrier is very expensive Hyperbole much? (In reply to comment #1) > I think what hurts me here is that this read > barrier is not rare, and for the same location its hit or not hit frequently. > So I can't just guard on it, its basically a guaranteed exit. Yeah a guard sounds silly for something that is going to fail with logarithmic probability. Have you tried a branch + insAlloc'd phi node (to be replaced by a real phi node one day...)?
Unfortunately, it looks like there probably wasn't ever a legitimate 6% win from bug 577882 and bug 577883. I was curious where the 6% came from, so I updated one of my TM repos to revision 47409, and applied the first patch from bug 577882, then the first patch from bug 577882, and compared the results before and after applying those patches. Here's what I got: ** TOTAL **: 1.065x as fast 454.3ms +/- 0.7% 426.5ms +/- 1.0% ============================================================================= 3d: - 70.7ms +/- 2.6% 70.5ms +/- 1.9% cube: - 27.3ms +/- 7.0% 27.2ms +/- 3.9% morph: - 12.3ms +/- 3.9% 12.2ms +/- 2.5% raytrace: - 31.1ms +/- 0.7% 31.1ms +/- 1.3% access: - 59.2ms +/- 0.8% 58.6ms +/- 1.3% binary-trees: - 12.4ms +/- 3.0% 12.3ms +/- 2.8% fannkuch: - 26.9ms +/- 0.8% 26.5ms +/- 1.4% nbody: ?? 11.2ms +/- 2.7% 11.3ms +/- 3.1% nsieve: - 8.7ms +/- 4.0% 8.5ms +/- 4.4% bitops: - 23.4ms +/- 3.6% 23.3ms +/- 2.1% 3bit-bits-in-byte: ?? 0.6ms +/- 61.5% 0.7ms +/- 49.3% bits-in-byte: - 7.5ms +/- 6.7% 7.2ms +/- 4.2% bitwise-and: ?? 4.1ms +/- 5.5% 4.5ms +/- 8.4% nsieve-bits: - 11.2ms +/- 2.7% 10.9ms +/- 2.1% controlflow: - 5.0ms +/- 0.0% 5.0ms +/- 0.0% recursive: - 5.0ms +/- 0.0% 5.0ms +/- 0.0% crypto: 1.090x as fast 30.4ms +/- 2.5% 27.9ms +/- 1.9% aes: 1.142x as fast 17.7ms +/- 3.3% 15.5ms +/- 2.4% md5: - 8.3ms +/- 4.2% 8.1ms +/- 2.8% sha1: - 4.4ms +/- 8.4% 4.3ms +/- 8.0% date: 1.77x as fast 57.3ms +/- 1.0% 32.3ms +/- 1.5% format-tofte: 3.94x as fast 33.1ms +/- 1.2% 8.4ms +/- 4.4% format-xparb: - 24.2ms +/- 1.2% 23.9ms +/- 0.9% math: ?? 23.7ms +/- 1.5% 24.7ms +/- 5.8% cordic: ?? 12.8ms +/- 2.4% 13.1ms +/- 4.0% partial-sums: ?? 7.7ms +/- 4.5% 7.9ms +/- 5.1% spectral-norm: ?? 3.2ms +/- 9.4% 3.7ms +/- 18.3% regexp: ?? 27.7ms +/- 1.2% 27.9ms +/- 1.5% dna: ?? 27.7ms +/- 1.2% 27.9ms +/- 1.5% string: - 156.9ms +/- 0.6% 156.3ms +/- 1.0% base64: 1.093x as fast 5.9ms +/- 3.8% 5.4ms +/- 6.8% fasta: ?? 17.3ms +/- 2.0% 17.4ms +/- 2.1% tagcloud: - 57.3ms +/- 0.8% 57.1ms +/- 1.5% unpack-code: - 61.4ms +/- 1.1% 61.1ms +/- 0.9% validate-input: ?? 15.0ms +/- 0.0% 15.3ms +/- 2.3% Notice that 25 of the 28 gained milliseconds are in date-format-tofte. Here's the relevant code from date-format-tofte: 1 function arrayExists(array, x) { 2 for (var i = 0; i < array.length; i++) { 3 if (array[i] == x) return true; 4 } 5 return false; 6 } ... 26 var switches = ["a", "A", "B", "d", "D", "F", "g", "G", "h", "H", 27 "i", "j", "l", "L", "m", "M", "n", "O", "r", "s", 28 "S", "t", "U", "w", "W", "y", "Y", "z"]; ... 272 var ia = input.split(""); ... 279 if (arrayExists(switches,ia[ij])) { 280 ia[ij] = eval(ia[ij] + "()"); 281 } The value ia is an array of dependent strings, so arrayExists is doing a bunch of unit string comparisons, where one of the strings is dependent. The original patches didn't correctly handle dependent strings (See bug 577882, comment 4. That particular issue became invalid after the patch for bug 578189, so it's not a bug in any current patches.). As a result, the eval line, with the patches applied, is called 0 times instead of the 5500 times it should be called (this can be verified by adding prints and enabling/disabling tracing).
Wow, nice detective work.
Awesome Alan! Thanks for digging into this. I have been puzzled by the (perceived) massive loss through the slow path code. We do have a speedup in base64 with the new version of the patch, its not 9.3% but something reasonably close (8-ish or so). I didn't see a speedup on aes, so thats maybe still to be had if I make the slow path reg alloc better.
Any remaining wins should be collected by this bug: https://bugzilla.mozilla.org/show_bug.cgi?id=585095
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.