Closed Bug 526348 Opened 15 years ago Closed 15 years ago

pick higher pattern-length threshold for using BMH in string matching

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: luke, Assigned: luke)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(3 files)

Attached file threshold test (deleted) —
From bug 526173, it is clear that using BMH for length patterns is very expensive. The following data is for the attached test (which tries varying pattern lengths in the range 1-17) with the js shell compiled with BMH threshold pattern-length 1-17. pattern length --> t 267 135 91 68 56 47 41 36 33 31 28 26 23 24 24 19 22 h 28 135 90 68 55 47 41 36 33 30 28 26 23 24 24 19 22 r 29 29 91 69 55 48 41 35 33 31 28 26 23 24 24 20 21 e 29 28 29 68 55 48 40 36 33 30 28 26 23 24 23 20 21 s 30 29 28 29 55 48 40 36 33 31 28 27 23 24 23 19 22 h 28 29 29 28 29 47 41 35 33 31 28 26 23 24 23 20 21 h 30 29 28 29 28 29 41 36 33 30 28 27 23 24 23 20 21 o 29 28 29 29 28 29 28 36 33 31 28 25 23 24 24 19 22 l 30 28 29 29 29 28 29 29 33 30 29 26 23 24 24 20 22 d 28 29 28 29 28 29 29 28 29 30 28 26 23 24 23 19 22 30 28 29 29 29 28 29 29 29 28 28 26 24 23 24 20 21 | 29 28 29 28 29 29 28 29 28 29 29 26 23 24 23 19 22 | 29 29 30 29 29 28 29 29 29 28 29 29 23 24 24 19 22 V 28 29 28 29 29 28 29 28 29 29 28 29 29 24 23 20 21 30 29 28 29 29 29 29 28 29 29 29 28 29 29 23 19 22 29 29 29 29 28 29 30 28 29 29 29 28 29 29 29 19 22 30 29 29 28 29 29 28 30 29 28 29 29 29 29 29 28 22 Although the results will vary highly with the particular string, pattern, and content of both, its clear from the above data that we can still lose a factor of 4 or 3 by using BMH too early.
Attached file best BMH/normal ratio (deleted) —
The previous test shows BMH at its best (needs |textlen / patlen| comparisons) vs. the linear search at its best (needs |textlen| comparisons). This test shows BMH at its best vs. linear search at its worst (needs |textlen*patlen| comparisons): pattern length --> t 270 200 138 108 92 78 62 62 47 47 47 47 32 32 32 32 32 h 28 199 138 108 92 77 62 63 47 47 47 47 32 32 32 32 32 r 29 51 138 108 92 77 62 63 47 47 47 47 32 32 32 32 32 e 29 50 56 107 93 77 62 62 47 47 47 47 32 32 32 32 32 s 28 50 56 60 92 77 63 62 47 47 47 47 32 32 32 32 32 h 32 51 55 60 65 77 62 63 47 47 47 47 32 32 32 32 32 h 31 49 55 60 66 69 62 62 47 47 47 47 32 32 32 32 32 o 29 50 54 60 66 69 72 63 47 47 47 47 32 32 32 32 32 l 29 50 56 60 66 69 72 75 47 47 47 47 32 32 32 32 32 d 31 50 57 60 65 69 73 74 78 47 47 47 32 32 32 32 32 34 49 55 60 65 70 72 74 78 79 47 47 32 32 32 32 32 | 32 49 57 60 65 70 72 75 77 80 83 47 32 32 32 32 32 | 29 50 55 60 65 70 72 74 78 80 87 86 32 32 32 32 32 V 31 49 55 60 65 70 72 75 77 79 85 85 88 33 31 32 32 31 50 54 59 66 69 72 75 77 80 84 86 88 91 32 32 32 29 50 55 60 66 69 72 75 77 80 84 86 88 91 95 32 32 35 49 56 60 65 70 72 75 77 80 84 85 89 91 94 97 32 Here, with the best possible input (if I haven't misunderstood :), BMH only wins for pattern length > 6. The previous result matrix shows BMH only winning after patlen > 11. Conversely, the second experiment shows a speedup of 2x for patlen > 12, while this degree of speedup happens in the first experiment past 20 (now shown). Again, this is only one CPU/compiler. A different micro-architecture or smarter optimization could change the tradeoffs, but it seems fairly uncontroversial to up the patlen threshold to 11. Thoughts?
Attached patch tweaked (deleted) — Splinter Review
So, without further ado: s/2/12. And comment.
Assignee: general → lw
Status: NEW → ASSIGNED
Attachment #411281 - Flags: review?(brendan)
I request that the comment "Pattern length goes to 11 before we switch to BMH" be added here. ;-) <insert a small amount of mumbling about range-checking that compilers will optimize for you, a pattern continues to be inscrutable to me even after having seen its use in SpiderMonkey for years>
(In reply to comment #3) > <insert a small amount of mumbling about range-checking that compilers will > optimize for you, a pattern continues to be inscrutable to me even after having > seen its use in SpiderMonkey for years> According to http://www.linux-kongress.org/2009/slides/compiler_survey_felix_von_leitner.pdf slide 30, gcc and msvc know the trick. So we could drop the inscrutability.
That mumbling was more whole-source mumbling than about this particular instance of it. Were we to change I would be indifferent to making this one specific case different in the interim while a changeover happened (although I don't see a particular reason to value atomic consistency over incrementally improved readability).
Attachment #411281 - Flags: review?(brendan) → review+
Comment on attachment 411281 [details] [diff] [review] tweaked Yay! /be
http://hg.mozilla.org/tracemonkey/rev/6d59f2b53b1f Oops, I did r=waldo (who reviewed the last BMH patch) instead of r=brendan.
Whiteboard: fixed-in-tracemonkey
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: