Closed
Bug 526348
Opened 15 years ago
Closed 15 years ago
pick higher pattern-length threshold for using BMH in string matching
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
People
(Reporter: luke, Assigned: luke)
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(3 files)
From bug 526173, it is clear that using BMH for length patterns is very expensive. The following data is for the attached test (which tries varying pattern lengths in the range 1-17) with the js shell compiled with BMH threshold pattern-length 1-17.
pattern length -->
t 267 135 91 68 56 47 41 36 33 31 28 26 23 24 24 19 22
h 28 135 90 68 55 47 41 36 33 30 28 26 23 24 24 19 22
r 29 29 91 69 55 48 41 35 33 31 28 26 23 24 24 20 21
e 29 28 29 68 55 48 40 36 33 30 28 26 23 24 23 20 21
s 30 29 28 29 55 48 40 36 33 31 28 27 23 24 23 19 22
h 28 29 29 28 29 47 41 35 33 31 28 26 23 24 23 20 21
h 30 29 28 29 28 29 41 36 33 30 28 27 23 24 23 20 21
o 29 28 29 29 28 29 28 36 33 31 28 25 23 24 24 19 22
l 30 28 29 29 29 28 29 29 33 30 29 26 23 24 24 20 22
d 28 29 28 29 28 29 29 28 29 30 28 26 23 24 23 19 22
30 28 29 29 29 28 29 29 29 28 28 26 24 23 24 20 21
| 29 28 29 28 29 29 28 29 28 29 29 26 23 24 23 19 22
| 29 29 30 29 29 28 29 29 29 28 29 29 23 24 24 19 22
V 28 29 28 29 29 28 29 28 29 29 28 29 29 24 23 20 21
30 29 28 29 29 29 29 28 29 29 29 28 29 29 23 19 22
29 29 29 29 28 29 30 28 29 29 29 28 29 29 29 19 22
30 29 29 28 29 29 28 30 29 28 29 29 29 29 29 28 22
Although the results will vary highly with the particular string, pattern, and content of both, its clear from the above data that we can still lose a factor of 4 or 3 by using BMH too early.
Assignee | ||
Comment 1•15 years ago
|
||
The previous test shows BMH at its best (needs |textlen / patlen| comparisons) vs. the linear search at its best (needs |textlen| comparisons). This test shows BMH at its best vs. linear search at its worst (needs |textlen*patlen| comparisons):
pattern length -->
t 270 200 138 108 92 78 62 62 47 47 47 47 32 32 32 32 32
h 28 199 138 108 92 77 62 63 47 47 47 47 32 32 32 32 32
r 29 51 138 108 92 77 62 63 47 47 47 47 32 32 32 32 32
e 29 50 56 107 93 77 62 62 47 47 47 47 32 32 32 32 32
s 28 50 56 60 92 77 63 62 47 47 47 47 32 32 32 32 32
h 32 51 55 60 65 77 62 63 47 47 47 47 32 32 32 32 32
h 31 49 55 60 66 69 62 62 47 47 47 47 32 32 32 32 32
o 29 50 54 60 66 69 72 63 47 47 47 47 32 32 32 32 32
l 29 50 56 60 66 69 72 75 47 47 47 47 32 32 32 32 32
d 31 50 57 60 65 69 73 74 78 47 47 47 32 32 32 32 32
34 49 55 60 65 70 72 74 78 79 47 47 32 32 32 32 32
| 32 49 57 60 65 70 72 75 77 80 83 47 32 32 32 32 32
| 29 50 55 60 65 70 72 74 78 80 87 86 32 32 32 32 32
V 31 49 55 60 65 70 72 75 77 79 85 85 88 33 31 32 32
31 50 54 59 66 69 72 75 77 80 84 86 88 91 32 32 32
29 50 55 60 66 69 72 75 77 80 84 86 88 91 95 32 32
35 49 56 60 65 70 72 75 77 80 84 85 89 91 94 97 32
Here, with the best possible input (if I haven't misunderstood :), BMH only wins for pattern length > 6. The previous result matrix shows BMH only winning after patlen > 11. Conversely, the second experiment shows a speedup of 2x for patlen > 12, while this degree of speedup happens in the first experiment past 20 (now shown).
Again, this is only one CPU/compiler. A different micro-architecture or smarter optimization could change the tradeoffs, but it seems fairly uncontroversial to up the patlen threshold to 11.
Thoughts?
Assignee | ||
Comment 2•15 years ago
|
||
So, without further ado: s/2/12. And comment.
Comment 3•15 years ago
|
||
I request that the comment "Pattern length goes to 11 before we switch to BMH" be added here. ;-)
<insert a small amount of mumbling about range-checking that compilers will optimize for you, a pattern continues to be inscrutable to me even after having seen its use in SpiderMonkey for years>
Assignee | ||
Comment 4•15 years ago
|
||
(In reply to comment #3)
> <insert a small amount of mumbling about range-checking that compilers will
> optimize for you, a pattern continues to be inscrutable to me even after having
> seen its use in SpiderMonkey for years>
According to
http://www.linux-kongress.org/2009/slides/compiler_survey_felix_von_leitner.pdf
slide 30, gcc and msvc know the trick. So we could drop the inscrutability.
Comment 5•15 years ago
|
||
That mumbling was more whole-source mumbling than about this particular instance of it. Were we to change I would be indifferent to making this one specific case different in the interim while a changeover happened (although I don't see a particular reason to value atomic consistency over incrementally improved readability).
Updated•15 years ago
|
Attachment #411281 -
Flags: review?(brendan) → review+
Comment 6•15 years ago
|
||
Comment on attachment 411281 [details] [diff] [review]
tweaked
Yay!
/be
Assignee | ||
Comment 7•15 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/6d59f2b53b1f
Oops, I did r=waldo (who reviewed the last BMH patch) instead of r=brendan.
Whiteboard: fixed-in-tracemonkey
Comment 8•15 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•