Closed Bug 1821362 Opened 2 years ago Closed 1 year ago

Write a fast GCD algorithm

Tracking

()

Status:

RESOLVED FIXED

Milestone:

115 Branch

Tracking Flags:

Tracking

Status

firefox115

---

fixed

People

(Reporter: padenot, Assigned: padenot)

References

Details

Attachments

(4 files)

bench-gcd.cpp 2 years ago Paul Adenot (:padenot) (deleted), text/x-c++src		Details
GCD_ChatGPT.txt 2 years ago Mayank Bansal (deleted), text/plain		Details
Bug 1821362 - Add a generic CountTrailingZeroes function that lowers to the right intrinsic based the type its called with. r?#media-playback-reviewers 2 years ago Paul Adenot (:padenot) (deleted), text/x-phabricator-request		Details
Bug 1821362 - Replace EuclidGCD by a binary gcd algorithm using intrinsics. r?#media-playback-reviewers 2 years ago Paul Adenot (:padenot) (deleted), text/x-phabricator-request		Details

Paul Adenot (:padenot)

Assignee

Description

•

2 years ago

I'm going to reduce a bunch of fractions in another patch, and the gcd algorithm we have is very naive, lets get something faster.

Paul Adenot (:padenot)

Assignee

Comment 1

•

2 years ago

Attached file bench-gcd.cpp (deleted) — Details

Compile with:

clang++ -O3 bench-gcd.cpp -std=c++17

sample run on x86_64:

~::$ ./a.out
reducing 1000000 fractions
binary -- took: 99391810ns (99.3918ns per fraction) 307709210 cycles, 307.709 per fractions
euclid -- took: 1003870058ns (1003.87ns per fraction) 3107962156 cycles, 3107.96 per fractions

That's about a 10x speedup.

On the g++ I have locally, the speedup is less pronounced: the new version is slower than clang (~170ns per fraction), the old version is faster than clang (~656ns per fraction). Weird but also not that problematic.

André Bargull [:anba]

Comment 2

•

2 years ago

Could we actually just switch to std::gcd https://en.cppreference.com/w/cpp/numeric/gcd?

Paul Adenot (:padenot)

Assignee

Comment 3

•

2 years ago

It's ~10% slower than my code on the clang we use at the optimization level we use.

[:sergesanspaille]

Comment 4

•

2 years ago

The implementation available in https://en.algorithmica.org/hpc/algorithms/gcd/ avoids a few branching and is, in my setup, more than two times faster than the version linked to this patch. It's worth giving it a try :-)

Mayank Bansal

Comment 5

•

2 years ago

Attached file GCD_ChatGPT.txt (deleted) — Details

I asked ChatGPT for some answers, and it gave these 4 versions.

Paul Adenot (:padenot)

Assignee

Comment 6

•

2 years ago

Attached file Bug 1821362 - Add a generic CountTrailingZeroes function that lowers to the right intrinsic based the type its called with. r?#media-playback-reviewers (deleted) — Details

Depends on D173313

Paul Adenot (:padenot)

Assignee

Comment 7

•

2 years ago

Attached file Bug 1821362 - Replace EuclidGCD by a binary gcd algorithm using intrinsics. r?#media-playback-reviewers (deleted) — Details

Perf notes:
https://lemire.me/blog/2013/12/26/fastest-way-to-compute-the-greatest-common-divisor/

Depends on D173314

André Bargull [:anba]

Comment 8

•

2 years ago

(In reply to Paul Adenot (:padenot) from comment #3)

It's ~10% slower than my code on the clang we use at the optimization level we use.

Looking at the generated assembly code, your version has a tighter loop code when compared to std::gcd. Only after switching to -march=x86-64-v3 both versions generate (essentially) the same code.

Paul Adenot (:padenot)

Assignee

Comment 9

•

2 years ago

(In reply to André Bargull [:anba] from comment #8)

(In reply to Paul Adenot (:padenot) from comment #3)

It's ~10% slower than my code on the clang we use at the optimization level we use.

Looking at the generated assembly code, your version has a tighter loop code when compared to std::gcd. Only after switching to -march=x86-64-v3 both versions generate (essentially) the same code.

The current code has a version that's 30% faster than the previous one I had, so even faster.

André Bargull [:anba]

Comment 10

•

2 years ago

I think the current version has a bug, because it doesn't handle overflows in T diff = aA - aB;. For example GCD<uint32_t>(3, 7) returns 3, whereas std::gcd<uint32_t>(3, 7) returns the correct result 1.

Pulsebot

Comment 11

•

1 year ago

Pushed by padenot@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6dd80575e551 Add a generic CountTrailingZeroes function that lowers to the right intrinsic based the type its called with. r=sergesanspaille https://hg.mozilla.org/integration/autoland/rev/04c60cd83c5f Replace EuclidGCD by a binary gcd algorithm using intrinsics. r=media-playback-reviewers,alwu

Cristina Horotan [:chorotan]

Comment 12

•

1 year ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/6dd80575e551
https://hg.mozilla.org/mozilla-central/rev/04c60cd83c5f

Status: NEW → RESOLVED

Closed: 1 year ago

status-firefox115: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 115 Branch

Cosmin Sabou [:CosminS]

Comment 13

•

1 year ago

Backed out 80 changesets (bug 1821362, bug 1703812, bug 1817997) for causing media crashes as in Bug 1833890.

Backout link: https://hg.mozilla.org/mozilla-central/rev/225c5ab0d999e743db5298d125893ae0702884af

Status: RESOLVED → REOPENED

status-firefox115: fixed → ---

Resolution: FIXED → ---

Target Milestone: 115 Branch → ---

Pulsebot

Comment 14

•

1 year ago

Pushed by padenot@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/159e2eee7549 Add a generic CountTrailingZeroes function that lowers to the right intrinsic based the type its called with. r=sergesanspaille https://hg.mozilla.org/integration/autoland/rev/dc506e3ad29d Replace EuclidGCD by a binary gcd algorithm using intrinsics. r=media-playback-reviewers,alwu

Narcis Beleuzu [:NarcisB]

Comment 15

•

1 year ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/159e2eee7549
https://hg.mozilla.org/mozilla-central/rev/dc506e3ad29d

Status: REOPENED → RESOLVED

Closed: 1 year ago → 1 year ago

status-firefox115: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 115 Branch

Dianna Smith [:diannaS]

Updated

•

1 year ago

Regressions: 1840002

Mathew Hodson

Updated

•

1 year ago

Blocks: 1817997

Mathew Hodson

Updated

•

1 year ago

No longer regressions: 1840002

You need to log in before you can comment on or make changes to this bug.