1436330 - Add a microbenchmarks for XPCOM string encoding conversions

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Description

•

7 years ago

Bench stuff like CopyUTF8toUTF16, etc.

Comment hidden (mozreview-request)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 6

•

7 years ago

Why these languages? * Lorem ipsum is all-ASCII * German is Latin1 (de-edit.txt is strictly Latin1) and the non-ASCII is infrequent enough that there's a good chance to get a run on 16 ASCII characters (that fit in a 128-bit SIMD register) between non-ASCII ones. * Turkish is a Latin-script language where non-ASCII is frequent enough that ASCII runs very often don't reach the length of 16. * Vietnamese is a Latin-script language where about every other character is non-ASCII. * Russian and Arabic are mutually different-script non-Latin-script languages that use ASCII punctuation/spaces and whose non-punctuation takes two bytes in UTF-8. (AFAICT, N'Ko is the only script that uses two bytes per character in UTF-8 and doesn't use ASCII punctuation, but AFAICT, Wikipedia isn't available in this script.) * Korean uses 3 bytes per character in UTF-8 with ASCII punctuation&spaces. * Japanese uses 3 bytes per character in UTF-8 without ASCII punctuation/spaces. * Thai also uses 3 bytes per character in UTF-8 with infrequent ASCII spaces but no ASCII punctuation (i.e. no ASCII following ASCII), though mainly it's included because the data is in the tree, because Thai is more interesting for other benchmarks.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Updated

•

7 years ago

Attachment #8948966 - Flags: review?(nfroyd)

Nathan Froyd [:froydnj]

Comment 7

•

7 years ago

mozreview-review

Comment on attachment 8948966 [details] Bug 1436330 - Add microbenchmarks for XPCOM string encoding conversions. https://reviewboard.mozilla.org/r/218388/#review238020 There are a lot of tests here; how long does it take to run all of these? r=me; I don't have any complaints per se, but I think there are some issues to discuss before this lands. ::: commit-message-90354:1 (Diff revision 5) > +Bug 1436330 - Add a microbenchmarks for XPCOM string encoding conversions. Nit: the "a" here can be dropped. ::: xpcom/tests/gtest/TestStrings.cpp:1866 (Diff revision 5) > + for (int i = 0; i < 500000; i++) { > + nsCString latin1; > + LossyCopyUTF16toASCII(*BlackBox(&mAsciiOneUtf16), *BlackBox(&latin1)); > + } Perhaps we could just provide some sort of function in the test fixture for this style of test, so we could just instantiate that function repeatedly, changing the member we're using, so we have a single place to control loop counts etc. Something like: ``` inline void BenchmarkLossyCopy(nsString& aTestString) { for (size_t i = 0; i < sLoopTripCount; ++i) { nsCString latin1; LossyCopyUTF16toASCII(...); } } ``` The function call overhead here should be minimal, and using `MOZ_ALWAYS_INLINE` is an option if we really want. There would be other functions for benchmarking other `*Copy*` methods, of course. WDYT about doing that instead? ::: xpcom/tests/gtest/TestStrings.cpp:1867 (Diff revision 5) > + nsCString latin1; > + LossyCopyUTF16toASCII(*BlackBox(&mAsciiOneUtf16), *BlackBox(&latin1)); Do you want the overhead of constructing and destructing the string in this loop (and others), along with allocating? Or is the desire to simulate "real" use of the encoding APIs?

Attachment #8948966 - Flags: review?(nfroyd) → review+

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 8

•

7 years ago

mozreview-review-reply

Comment on attachment 8948966 [details] Bug 1436330 - Add microbenchmarks for XPCOM string encoding conversions. https://reviewboard.mozilla.org/r/218388/#review238020 The problem is that I wanted enough repetitions for the single code unit case to register and then have the same number of repetitions for the longer cases in order to see how things scale with string length. So far, it appears that fixed cost of a conversion is pretty high. That is, the 3 and 15 lengths are rather close to 1 and then 100 and 1000 show the effect of the length. This results in some cases taking less than 100 ms and some taking up to 15 seconds on my dev box. The unreasonably long tests are the HasRTL tests, which we should shorten by a factor of 10 or so. > Perhaps we could just provide some sort of function in the test fixture for this style of test, so we could just instantiate that function repeatedly, changing the member we're using, so we have a single place to control loop counts etc. Something like: > > ``` > inline void > BenchmarkLossyCopy(nsString& aTestString) > { > for (size_t i = 0; i < sLoopTripCount; ++i) { > nsCString latin1; > LossyCopyUTF16toASCII(...); > } > } > ``` > > The function call overhead here should be minimal, and using `MOZ_ALWAYS_INLINE` is an option if we really want. > > There would be other functions for benchmarking other `*Copy*` methods, of course. > > WDYT about doing that instead? Since there are four variables (length, unit size of input, unit size of output and the conversion function), I think the abstraction would obscure more than it would help. > Do you want the overhead of constructing and destructing the string in this loop (and others), along with allocating? Or is the desire to simulate "real" use of the encoding APIs? The construction and destruction of the string within the loop is intentional, because I'm trying to change the allocation strategy of the conversions, so if allocation isn't timed, it's possible that the new conversions would look faster than they are when the allocation effects are taken into account.

Comment hidden (mozreview-request)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 10

•

7 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #8) > The unreasonably long tests are the HasRTL tests, which we should shorten by > a factor of 10 or so. Filed bug 1450907.

Comment hidden (mozreview-request)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 12

•

7 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #8) > This results in some cases taking less than 100 ms and some taking up to 15 > seconds on my dev box. Running all perf tests in TestStrings takes 3 minutes 20 seconds on my dev box. Checking if treeherder is faster or slower.

Comment hidden (mozreview-request)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 14

•

7 years ago

win32 timed out on treeherder. Dividing the number of iterations by 10 and trying again.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 15

•

7 years ago

(In reply to Nathan Froyd [:froydnj] from comment #7) > r=me; I don't have any complaints per se, but I think there are some issues > to discuss before this lands. Now that the iterations have been cut to a tenth, have the issues been discussed sufficiently for this to land?

Flags: needinfo?(nfroyd)

Nathan Froyd [:froydnj]

Comment 16

•

7 years ago

mozreview-review-reply

Comment on attachment 8948966 [details] Bug 1436330 - Add microbenchmarks for XPCOM string encoding conversions. https://reviewboard.mozilla.org/r/218388/#review238020 > Since there are four variables (length, unit size of input, unit size of output and the conversion function), I think the abstraction would obscure more than it would help. I don't understand this objection. Take the first two added tests: ```c++ MOZ_GTEST_BENCH_F(Strings, PerfUTF16toLatin1ASCIIOne, [this] { for (int i = 0; i < CONVERSION_ITERATIONS; i++) { nsCString latin1; LossyCopyUTF16toASCII(*BlackBox(&mAsciiOneUtf16), *BlackBox(&latin1)); } }); MOZ_GTEST_BENCH_F(Strings, PerfUTF16toLatin1ASCIIThree, [this] { for (int i = 0; i < CONVERSION_ITERATIONS; i++) { nsCString latin1; LossyCopyUTF16toASCII(*BlackBox(&mAsciiOneUtf16), *BlackBox(&latin1)); } ``` What is wrong with rewriting them thus: ```c++ void TestPerfUTF16ToLatin1ASCII(const nsString& from) { for (size_t i = 0; i < CONVERSION_ITERATIONS; i++) { nsCString latin1; LossyCopyUTF16toASCII(*BlackBox(&from), *BlackBox(&latin1)); } }; MOZ_GTEST_BENCH_F(Strings, PerfUTF16toLatin1ASCIIOne, [this] { TestPerfUTF16toLatin1ASCII(mAsciiOneUtf16); }); MOZ_GTEST_BENCH_F(Strings, PerfUTF16toLatin1ASCIIThree, [this] { TestPerfUTF16toLatin1ASCII(mAsciiThreeUtf16); }); ``` You have to write a separate `TestPerf` function for each style of conversion you want to do, but that does not seem onerous. Maybe you want to write a macro so that you can just say: ```c++ BENCH_STRING_CONVERSION(PerfUTF16toLatin1ASCIIThree, TestPerfUTF16toLatin1ASCII, mAsciiThreeUtf16); ``` Or maybe that's too clever. (Either way might have helped prevent the typo from the added tests.) In any other context, I think we'd be suspicious of all this copy-and-pasted code and require the author to separate out some sort of abstraction to make it clearer what's going on. I don't know why benchmark code would be any different in this regard. (Perhaps copy-and-paste makes it more obvious how much stuff you might be changing at a time, whereas changing a single function in a benchmark is less clear about how much code it can affect?)

Nathan Froyd [:froydnj]

Comment 17

•

7 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #15) > (In reply to Nathan Froyd [:froydnj] from comment #7) > > r=me; I don't have any complaints per se, but I think there are some issues > > to discuss before this lands. > > Now that the iterations have been cut to a tenth, have the issues been > discussed sufficiently for this to land? Thank you for investigating the time issues. I'd still like to understand your position on the copy-and-paste of the iteration loops, etc. from comment 16.

Flags: needinfo?(nfroyd)

Nathan Froyd [:froydnj]

Comment 18

•

7 years ago

mozreview-review

Comment on attachment 8948966 [details] Bug 1436330 - Add microbenchmarks for XPCOM string encoding conversions. https://reviewboard.mozilla.org/r/218388/#review239402 ::: xpcom/tests/gtest/TestStrings.cpp:1877 (Diff revision 8) > +}); > + > +MOZ_GTEST_BENCH_F(Strings, PerfUTF16toLatin1ASCIIThree, [this] { > + for (int i = 0; i < CONVERSION_ITERATIONS; i++) { > + nsCString latin1; > + LossyCopyUTF16toASCII(*BlackBox(&mAsciiOneUtf16), *BlackBox(&latin1)); This should be `mAsciiThreeUtf16`, yes? ::: xpcom/tests/gtest/TestStrings.cpp:1961 (Diff revision 8) > +}); > + > +MOZ_GTEST_BENCH_F(Strings, PerfLatin1toUTF16AsciiHundred, [this] { > + for (int i = 0; i < CONVERSION_ITERATIONS; i++) { > + nsString test; > + CopyASCIItoUTF16(*BlackBox(&mAsciiOneUtf8), *BlackBox(&test)); This should be `mAsciiHundredUtf8`? ::: xpcom/tests/gtest/TestStrings.cpp:1968 (Diff revision 8) > +}); > + > +MOZ_GTEST_BENCH_F(Strings, PerfLatin1toUTF16AsciiThousand, [this] { > + for (int i = 0; i < CONVERSION_ITERATIONS; i++) { > + nsString test; > + CopyASCIItoUTF16(*BlackBox(&mAsciiOneUtf8), *BlackBox(&test)); This should be `mAsciiThousandUtf8`? ::: xpcom/tests/gtest/TestStrings.cpp:2009 (Diff revision 8) > + } > +}); > + > +MOZ_GTEST_BENCH_F(Strings, PerfUTF16toUTF8AsciiOne, [this] { > + for (int i = 0; i < CONVERSION_ITERATIONS; i++) { > + nsAutoCString utf8; Why are these tests suddenly using auto strings vs. the previous tests which were using normal strings?

Comment hidden (mozreview-request)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 20

•

7 years ago

mozreview-review-reply

Comment on attachment 8948966 [details] Bug 1436330 - Add microbenchmarks for XPCOM string encoding conversions. https://reviewboard.mozilla.org/r/218388/#review239402 > This should be `mAsciiThreeUtf16`, yes? Yes. Fixed. Thanks. > This should be `mAsciiHundredUtf8`? Yes. Fixed. Thanks. > This should be `mAsciiThousandUtf8`? Yes. Fixed. Thanks. > Why are these tests suddenly using auto strings vs. the previous tests which were using normal strings? They should all be Auto to reflect normal usage. Fixed. Thanks.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Assignee

Comment 21

•

7 years ago

mozreview-review-reply

Comment on attachment 8948966 [details] Bug 1436330 - Add microbenchmarks for XPCOM string encoding conversions. https://reviewboard.mozilla.org/r/218388/#review238020 > I don't understand this objection. Take the first two added tests: > > ```c++ > MOZ_GTEST_BENCH_F(Strings, PerfUTF16toLatin1ASCIIOne, [this] { > for (int i = 0; i < CONVERSION_ITERATIONS; i++) { > nsCString latin1; > LossyCopyUTF16toASCII(*BlackBox(&mAsciiOneUtf16), *BlackBox(&latin1)); > } > }); > > MOZ_GTEST_BENCH_F(Strings, PerfUTF16toLatin1ASCIIThree, [this] { > for (int i = 0; i < CONVERSION_ITERATIONS; i++) { > nsCString latin1; > LossyCopyUTF16toASCII(*BlackBox(&mAsciiOneUtf16), *BlackBox(&latin1)); > } > ``` > > What is wrong with rewriting them thus: > > ```c++ > void > TestPerfUTF16ToLatin1ASCII(const nsString& from) > { > for (size_t i = 0; i < CONVERSION_ITERATIONS; i++) { > nsCString latin1; > LossyCopyUTF16toASCII(*BlackBox(&from), *BlackBox(&latin1)); > } > }; > > MOZ_GTEST_BENCH_F(Strings, PerfUTF16toLatin1ASCIIOne, [this] { > TestPerfUTF16toLatin1ASCII(mAsciiOneUtf16); > }); > > MOZ_GTEST_BENCH_F(Strings, PerfUTF16toLatin1ASCIIThree, [this] { > TestPerfUTF16toLatin1ASCII(mAsciiThreeUtf16); > }); > ``` > > You have to write a separate `TestPerf` function for each style of conversion you want to do, but that does not seem onerous. Maybe you want to write a macro so that you can just say: > > ```c++ > BENCH_STRING_CONVERSION(PerfUTF16toLatin1ASCIIThree, TestPerfUTF16toLatin1ASCII, mAsciiThreeUtf16); > ``` > > Or maybe that's too clever. (Either way might have helped prevent the typo from the added tests.) In any other context, I think we'd be suspicious of all this copy-and-pasted code and require the author to separate out some sort of abstraction to make it clearer what's going on. I don't know why benchmark code would be any different in this regard. (Perhaps copy-and-paste makes it more obvious how much stuff you might be changing at a time, whereas changing a single function in a benchmark is less clear about how much code it can affect?) I macroized these.

Pulsebot

Comment 22

•

7 years ago

Pushed by hsivonen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e0a439f469f3 Add microbenchmarks for XPCOM string encoding conversions. r=froydnj

Noemi Erli[:noemi_erli]

Comment 23

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/e0a439f469f3

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox61: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla61

Nobody; OK to take it and work on it

Updated

•

4 years ago

Component: String → XPCOM