Closed Bug 1821362 Opened 2 years ago Closed 1 year ago

Write a fast GCD algorithm

Categories

(Core :: MFBT, defect)

defect

Tracking

()

RESOLVED FIXED
115 Branch
Tracking Status
firefox115 --- fixed

People

(Reporter: padenot, Assigned: padenot)

References

Details

Attachments

(4 files)

I'm going to reduce a bunch of fractions in another patch, and the gcd algorithm we have is very naive, lets get something faster.

Attached file bench-gcd.cpp (deleted) —

Compile with:

clang++ -O3 bench-gcd.cpp -std=c++17

sample run on x86_64:

~::$ ./a.out
reducing 1000000 fractions
binary -- took: 99391810ns (99.3918ns per fraction) 307709210 cycles, 307.709 per fractions
euclid -- took: 1003870058ns (1003.87ns per fraction) 3107962156 cycles, 3107.96 per fractions

That's about a 10x speedup.

On the g++ I have locally, the speedup is less pronounced: the new version is slower than clang (~170ns per fraction), the old version is faster than clang (~656ns per fraction). Weird but also not that problematic.

Could we actually just switch to std::gcd https://en.cppreference.com/w/cpp/numeric/gcd?

It's ~10% slower than my code on the clang we use at the optimization level we use.

The implementation available in https://en.algorithmica.org/hpc/algorithms/gcd/ avoids a few branching and is, in my setup, more than two times faster than the version linked to this patch. It's worth giving it a try :-)

Attached file GCD_ChatGPT.txt (deleted) —

I asked ChatGPT for some answers, and it gave these 4 versions.

(In reply to Paul Adenot (:padenot) from comment #3)

It's ~10% slower than my code on the clang we use at the optimization level we use.

Looking at the generated assembly code, your version has a tighter loop code when compared to std::gcd. Only after switching to -march=x86-64-v3 both versions generate (essentially) the same code.

(In reply to André Bargull [:anba] from comment #8)

(In reply to Paul Adenot (:padenot) from comment #3)

It's ~10% slower than my code on the clang we use at the optimization level we use.

Looking at the generated assembly code, your version has a tighter loop code when compared to std::gcd. Only after switching to -march=x86-64-v3 both versions generate (essentially) the same code.

The current code has a version that's 30% faster than the previous one I had, so even faster.

I think the current version has a bug, because it doesn't handle overflows in T diff = aA - aB;. For example GCD<uint32_t>(3, 7) returns 3, whereas std::gcd<uint32_t>(3, 7) returns the correct result 1.

Pushed by padenot@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6dd80575e551 Add a generic CountTrailingZeroes function that lowers to the right intrinsic based the type its called with. r=sergesanspaille https://hg.mozilla.org/integration/autoland/rev/04c60cd83c5f Replace EuclidGCD by a binary gcd algorithm using intrinsics. r=media-playback-reviewers,alwu
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 115 Branch
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 115 Branch → ---
Pushed by padenot@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/159e2eee7549 Add a generic CountTrailingZeroes function that lowers to the right intrinsic based the type its called with. r=sergesanspaille https://hg.mozilla.org/integration/autoland/rev/dc506e3ad29d Replace EuclidGCD by a binary gcd algorithm using intrinsics. r=media-playback-reviewers,alwu
Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 115 Branch
Regressions: 1840002
Blocks: 1817997
No longer regressions: 1840002
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: