Closed
Bug 729952
Opened 13 years ago
Closed 13 years ago
Need a better hash function for atoms
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
FIXED
mozilla13
People
(Reporter: bzbarsky, Assigned: justin.lebar+bug)
References
Details
Attachments
(5 files, 6 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
bzbarsky
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Waldo
:
review+
|
Details | Diff | Splinter Review |
I just tried measuring our atom hash collisions, and out of 11000 unique atomized strings close to 900 collided. That's sucky in the extreme. It would suck even more for purposes of bug 705877.
The atom hashing code uses nsCRT::HashCode(PRUnichar*) and nsCRT::HashCodeAsUTF8. When fixing it, we need to make the latter keep working as now.
I _think_, that the current behavior where nsCRT::HashCode(PRUnichar*) has the same behavior as nsCRT::HashCode(char*) when called on ASCII input does not need to be preserved if we change the hashing behavior in nsCRT.
Thre is some discussion of hashing functions and their performance and behavior in bug 290032. There is also a good comparative writeup at http://burtleburtle.net/bob/hash/doobs.html (note the links to SpookyHash and lookup3.c). What we use right now is what that writeup calls "Rotating", and as it notes it's clearly crappy.
WebKit's string hashing uses what the writeup above calls "Paul Hsieh's hash", fwiw.
Reporter | ||
Comment 1•13 years ago
|
||
Test methodology: applied this, ran for a bit while logging to a file, then quit the browser. Then pipe the log file through:
grep STR | sort | uniq | sed 's/.*|,[^,]*,//' | sort | uniq -c | sort -n
and cried at all the 5s visible in my terminal.
Comment 2•13 years ago
|
||
In bug 676071, I am importing SpookyHash into ANGLE. SpookyHash seems to be a good 128bit (can of course be truncated to smaller sizes) non-cryptographic public-domain hash function:
http://burtleburtle.net/bob/hash/spooky.html
Aside from that, the only other bit I can contribute to this discussion, is the following formula telling roughly how many hashes one can do with a perfect N-bit hash function while staying under a given probability P of collision:
2^(N/2) * sqrt(2*P)
This an approximation, valid for not-too-large P (say P < 0.2), of the formula given there:
http://en.wikipedia.org/wiki/Birthday_problem#Cast_as_a_collision_problem
It can help decide whether a good 32bit hash function is enough or not, for a given use case. For example, if you are happy with a probability P=1e-4 of collision, a 32bit hash allows to safely do:
2^(32/2) * sqrt(2e-4) ~= 900 hashes
Comment 3•13 years ago
|
||
"keep working as now" means "UTF8 and UTF16 strings need to hash to the same value"? It's not clear to me why we have so many different functions to do the same thing, but can we please move this off of nsCRT while we're at it?
Do you have a set of realworld sample strings we can run tests against?
Reporter | ||
Comment 4•13 years ago
|
||
> "keep working as now" means "UTF8 and UTF16 strings need to hash to the same value"?
Yes, if they represent the same Unicode string. The atom table depends on this behavior. I believe it's the sole consumer of nsCRT::HashCodeAsUTF8, outside of tests.
> but can we please move this off of nsCRT while we're at it?
That would be just fine by me.
> Do you have a set of realworld sample strings we can run tests against?
I can attach the set of strings that came out of the procedure in comment 1. I basically started the browser, loaded gmail, web.mit.edu, some github pages, and a news site or two, then quit the browser to get that string set.
Reporter | ||
Comment 5•13 years ago
|
||
Data is UTF-8. Note that there is one non-ASCII string at the end there.
Assignee | ||
Comment 6•13 years ago
|
||
I'm a bit concerned about using a 128-bit hash in hot code when we need only 32 (or even 64) bits. But maybe it's not a big deal.
Let's run murmur and spooky over bz's data to see if one is obviously better in terms of collisions?
Assignee | ||
Comment 7•13 years ago
|
||
Reporter | ||
Comment 8•13 years ago
|
||
One other note: for atoms specifically, the typical string being hashed is short, not least because people minify their class names and the like. In particular, for the attached list I get numbers like so (first column is count, second column is length):
66 1
1797 2
1383 3
1279 4
1272 5
707 6
525 7
539 8
970 9
647 10
566 11
558 12
473 13
409 14
289 15
346 16
283 17
230 18
207 19
186 20
159 21
122 22
99 23
91 24
63 25
69 26
53 27
42 28
26 29
23 30
20 31
10 32
13 33
8 34
5 35
4 36
6 37
2 38
1 39
46 40
2 42
2 43
So for hashing performance we're more interested in low startup costs than in good behavior on long inputs. For hashing behavior we want functions that do a good job spreading a small number of bytes over a 32-bit int. For example, our current hash when given a 3-byte input produces a 32-bit number which has 0s in the top 16 bits. And if you restrict the set of possible byte values to the expected [a-zA-Z0-9-_] then I bet we end up using even fewer bits than that, which is why collision rates are so high.
Comment 9•13 years ago
|
||
How about: if a string is 4 bytes or less, just use the value of those bytes as the hash value? For longer strings, use one of the "complex" hash algorithms.
e.g.
"A" -> 0x41000000
"AB" -> 0x41420000 (ignore endianness, it doesn't matter)
"ABC" -> 0x41424300
"ABCD" -> 0x41424344
"ABCDE" -> spooky("ABCDE")
I'm pretty sure that the "PRUint32 HashCode(const char* str, PRUint32* resultingStrLen = nsnull); variant isn't important, and the rest pass in a length so we can switch on it without reading the data first...
Reporter | ||
Comment 10•13 years ago
|
||
That's worth testing, at least.
Assignee | ||
Comment 11•13 years ago
|
||
Heh, spookyhash has exactly one collision on bz's whole dataset.
Reporter | ||
Comment 12•13 years ago
|
||
That's a bit more like it! ;)
Assignee | ||
Comment 13•13 years ago
|
||
Actually, none of the three hash functions have any collisions on the test set. I was hashing the empty string twice.
Assignee | ||
Comment 14•13 years ago
|
||
Time to hash bz's atoms list on my mac:
spookyhash 234us
murmurhash3_x86_32 187us
cityhash 133us
The other versions of murmurhash (x86_128 and x64_128) are all slower than spookyhash.
Seeing as none of these has any collisions, I think we have a winner.
Comment 15•13 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #14)
> Time to hash bz's atoms list on my mac:
>
> spookyhash 234us
> murmurhash3_x86_32 187us
> cityhash 133us
What's the noise level in such ultra-short benchmarks? Are these results consistent across runs? I would run this benchmark 10k times and average (or, perhaps better: keep the best score)
Reporter | ||
Comment 16•13 years ago
|
||
One other question, just so I know how much to cry... What's the timing like for our existing hash function?
Assignee | ||
Comment 17•13 years ago
|
||
I'm running it 1000 times and taking the average. The noise is +/- 3ms or so.
I threw the code online in case anyone else wants to benchmark. You should just need to run |make|, although I make no guarantees.
I'll add the current hash function in the morning; I am, as they say here, le tired.
https://github.com/jlebar/hashtest
Comment 18•13 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #14)
> cityhash 133us
Keep in mind cityhash relies on the special x86 CRC instruction. If you don't have a processor that supports it (and many of our users do not, especially on mobile), it is nowhere near as fast.
Assignee | ||
Comment 19•13 years ago
|
||
These are the times for the non-CRC variant.
Assignee | ||
Comment 20•13 years ago
|
||
The current nsCRT hash is 64us, so twice as fast as cityhash.
Assignee | ||
Comment 21•13 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #3)
> can we please move this off of nsCRT while we're at it?
Where would you like it to live?
Comment 22•13 years ago
|
||
There are existing declarations in nsHashKeys.h that we could just use, or have a new header/cpp specifically for these hashing functions.
Assignee | ||
Comment 23•13 years ago
|
||
Waldo, how would you feel about checking this in to mfbt? I'd like this hash function to be accessible everywhere.
It's one third-party cpp file, plus a header file. Both have some Mozilla-specific modifications, but I don't want to convert the whole thing into Mozilla/mfbt style, because that will make it harder to take upstream changes.
Assignee | ||
Comment 24•13 years ago
|
||
I tried bsmedberg's suggestion of, if the string's length <= 4, use the chars cast as a uint32_t as the hashcode.
This gets rid of all but 11 of the collisions with the current hashcode. Unfortunately, it's slower (much slower for nscrt), at least in a naive implementation (I used a case statement for the length branch).
The following times are all with length <= 4 wrapping:
spookyhash 219us
cityhash 172us
murmurhash 219us
nscrt 205us
Here are the original (unwrapped) times again:
> spookyhash 234us
> murmurhash 187us
> cityhash 133us
> nscrt 64us
Assignee | ||
Comment 25•13 years ago
|
||
Assignee | ||
Comment 26•13 years ago
|
||
Assignee | ||
Comment 27•13 years ago
|
||
Comment on attachment 600450 [details] [diff] [review]
Part 1, WIPv1: Add cityhash to mfbt
Asking for feedback on the general idea of having third-party code in mfbt. I need to clean up the comments and so on in here before review.
Attachment #600450 -
Flags: feedback?(jwalden+bmo)
Assignee | ||
Comment 28•13 years ago
|
||
Attachment #600725 -
Flags: review?(jwalden+bmo)
Assignee | ||
Updated•13 years ago
|
Attachment #600449 -
Attachment is obsolete: true
Assignee | ||
Updated•13 years ago
|
Attachment #600450 -
Attachment is obsolete: true
Attachment #600450 -
Flags: feedback?(jwalden+bmo)
Reporter | ||
Comment 29•13 years ago
|
||
Justin, what's the actual plan for HashCodeAsUTF8 given the APIs in that patch?
Assignee | ||
Comment 30•13 years ago
|
||
I wasn't sure if you needed a hash function that speaks unicode.
If so, I think I can rewrite cityhash to take a data transform function. I'm not sure exactly what you need, but I presume an arbitrary input transform would be sufficient? We'll have to see how fast I can make it.
Assignee | ||
Comment 31•13 years ago
|
||
Comment on attachment 600725 [details] [diff] [review]
Add cityhash
If I just add a multiply to the current nsCRT hashcode function, it has 0 collisions on bz's dataset and runs just as fast as the current code.
That seems like much simpler than rewriting CityHash as streaming code.
Attachment #600725 -
Flags: review?(jwalden+bmo)
Reporter | ||
Comment 32•13 years ago
|
||
Assignee | ||
Comment 33•13 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #32)
> > I wasn't sure if you needed a hash function that speaks unicode.
>
> See comment 3 and comment 4...
I see; I didn't understand that exchange earlier. :)
Multiplying by golden ratio when adding a char seems to work just fine -- in retrospect, using just rol and xor seems pretty nuts.
I'm working on converting all callers to this better hash, but I'll post a smaller patch asap.
Note that the atom table code currently hash all strings in utf16 space. I.e. we hash nsStrings "as they are", and nsCStrings are hashed using HashCodeAsUTF16 which uses a utf8 iterator to extract individual unicode characters and add their utf16 equivalent to the hash.
http://mxr.mozilla.org/mozilla-central/source/xpcom/ds/nsCRT.cpp#278
Assignee | ||
Comment 35•13 years ago
|
||
I've been trying to reproduce bz's collisions list without much success.
> I basically started the browser, loaded gmail, web.mit.edu, some github pages, and a news site or
> two, then quit the browser to get that string set.
I've instrumented nsCRT's hashcode functions, plus the ones in nsTHashtable and pldhash. I'm printing the input and output of every hash function, and I see just a few (~10) hash collisions in total.
Reporter | ||
Comment 36•13 years ago
|
||
Hmm. I might have loaded the Facebook front page and twitter as well, possibly.....
In any case, the attached string list is definitely the set of strings that were in atoms when those atoms were destroyed in my build.
Assignee | ||
Comment 37•13 years ago
|
||
Oh, d'oh, I was missing a |sort| before one of my |uniq|s. Much better now.
Assignee | ||
Updated•13 years ago
|
Attachment #600725 -
Attachment description: Patch v1 → Add cityhash
Assignee | ||
Comment 38•13 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #37)
> Oh, d'oh, I was missing a |sort| before one of my |uniq|s. Much better now.
Or perhaps much worse? :)
In any case, I can verify that
> #define ADD_TO_HASHVAL(hashval, c) \
> hashval = (GOLDEN_RATIO * PR_ROTATE_LEFT32(hashval, 4)) ^ (c);
is much, much, much better than without the GOLDEN_RATIO call.
In these tables, e.g. "1097 3" means that we had 1097 three-way collisions.
Originally:
39662 1
1956 2
1097 3
891 4
266 5
15 6
1 7
With golden ratio multiply:
47195 1
1 2
(These are from two separate runs, so the strings aren't exactly the same. But I loaded the same sites, via session-restore. And you can see that the number of strings -- sum the first column -- is roughly the same.)
Assignee | ||
Comment 39•13 years ago
|
||
I intend to expand upon these functions in a followup bug, but this is the bare minimum required functionality to get something working.
Assignee | ||
Comment 40•13 years ago
|
||
Attachment #600963 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 41•13 years ago
|
||
I used the following to generate the collisions histogram:
dist/bin/firefox | grep XXX > hashes && cut -f '2-3' hashes | sort | uniq | cut -f 2 | sort | uniq -c | sort -n | sed 's/^ *//' | cut -f 1 -d ' ' | sort | uniq -c
Assignee | ||
Comment 42•13 years ago
|
||
Really get rid of all the debugging code.
Assignee | ||
Comment 43•13 years ago
|
||
Fix comment typo.
Assignee | ||
Updated•13 years ago
|
Attachment #600725 -
Attachment is obsolete: true
Assignee | ||
Updated•13 years ago
|
Attachment #600962 -
Attachment is obsolete: true
Assignee | ||
Updated•13 years ago
|
Attachment #600963 -
Attachment is obsolete: true
Attachment #600963 -
Flags: review?(bzbarsky)
Assignee | ||
Updated•13 years ago
|
Attachment #600965 -
Flags: review?(bzbarsky)
Assignee | ||
Updated•13 years ago
|
Attachment #600966 -
Flags: review?(jwalden+bmo)
Assignee | ||
Updated•13 years ago
|
Attachment #600964 -
Attachment description: Debugging printf's. → Part 3 - Debugging printf's.
Assignee | ||
Comment 44•13 years ago
|
||
> but can we please move this off of nsCRT while we're at it?
I'm going to do this in bug 730895, if you don't mind, so bz's work isn't blocked on a large-ish code change.
Reporter | ||
Comment 45•13 years ago
|
||
Comment on attachment 600966 [details] [diff] [review]
Part 1, v1.1: Add a better hash function to mfbt. r?waldo
Should mozilla::GoldenRatioU32 be static? Or does it not matter?
Also, worth documenting why the rotation is by 5 bits, assuming there's a reason.
Reporter | ||
Comment 46•13 years ago
|
||
Comment on attachment 600965 [details] [diff] [review]
Part 2, v1.1: Use a better hash function in nsCRT, nsTHashtable, and pldhash.
r=me
Attachment #600965 -
Flags: review?(bzbarsky) → review+
Assignee | ||
Comment 47•13 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #45)
> Comment on attachment 600966 [details] [diff] [review]
> Part 1, v1.1: Add a better hash function to mfbt. r?waldo
>
> Should mozilla::GoldenRatioU32 be static? Or does it not matter?
I don't know how |static const| would be different from |const| in a header, actually. I guess it's a matter of the const being exported out of the compilation unit?
> Also, worth documenting why the rotation is by 5 bits, assuming there's a
> reason.
No good reason. An odd shift makes much more sense to me than an even shift; I'll comment that.
Also, I think I should change
(GoldenRatio * rol(hash, 5)) ^ value
to
GoldenRatio * (rol(hash, 5) ^ value).
With the former, the hash of a single char is that char's numeric value (assuming hash is initially 0), whereas with the latter, we get an arbitrary value. This would matter if the bloom filter doesn't multiply hash values by something before using them, since the bloom filter looks at the high 16 and low 16 bits of the hashcode separately.
To get even more entropy you should do:
uint64_t res = GoldenRatio * (rol(hash, 5) ^ value);
return (uint32_t)res ^ ((uint32_t)(res >> 32);
Which *should* compile to something basically as fast on x86, but I'm not sure that's true in practice, and I'm definitely not sure about ARM.
Reporter | ||
Comment 49•13 years ago
|
||
The bloom filter is using the pldhash hashcode, which is already multiplied by golden ratio on entry.
The good thing is that GoldenRation^N != 1 for small N...
Reporter | ||
Comment 50•13 years ago
|
||
Which may be worth documenting too.
Assignee | ||
Comment 51•13 years ago
|
||
I am not an expert on hash functions, but I'm hesitant to do something like that because:
* It's not clearly fast on all platforms
* I've never seen a commercial hash function do that
* The low 32 bits of |res| are random, but the number of non-zero high-order bits is determined by the magnitude of |rol(hash, 5) ^ value|. So if rol(hash, 5) and value happen to be equal in their high-order bits, |rol(hash, 5) ^ value| will be small, which automatically means that many of the top 32 bits of |res| will be 0.
If we wanted more entropy, we could multiply by a second constant. Here's the internal loop of murmurhash3's 32-bit hash (c1, c2, are constants), for example:
k1 *= c1;
k1 = ROTL32(k1,15);
k1 *= c2;
h1 ^= k1;
h1 = ROTL32(h1,13);
h1 = h1*5+0xe6546b64;
Assignee | ||
Comment 52•13 years ago
|
||
> The good thing is that GoldenRation^N != 1 for small N...
To be clear, this is because GoldenRatio is odd. We care about the Carmichael function [1]; in our case, lambda(2^32) == 2^30, so 2^30 is the smallest N such that a^N == 1, for any odd |a|.
I'm not sure how much group theory we want to include in this file... :)
[1] http://en.wikipedia.org/wiki/Carmichael_function
Assignee | ||
Comment 53•13 years ago
|
||
> The good thing is that GoldenRation^N != 1 for small N...
Anyway this isn't particularly relevant because we have that rotate() function in place, which means we're never actually doing GoldenRatio^N.
Comment 54•13 years ago
|
||
(In reply to Jonas Sicking (:sicking) from comment #48)
> Which *should* compile to something basically as fast on x86, but I'm not
> sure that's true in practice, and I'm definitely not sure about ARM.
You have to use a long multiply on ARM, which, e.g., on a Cortex A8 is three cycles instead of two, but has the same latency (4 cycles). But after that it's just one extra instruction for the xor.
(In reply to Justin Lebar [:jlebar] from comment #51)
> |rol(hash, 5) ^ value| will be small, which automatically means that many of
> the top 32 bits of |res| will be 0.
This shouldn't be a serious issue, since you're xor'ing with a bunch of bits that don't have this problem. I've seen it done in, e.g., Marsaglia's classic "MWC" PRNG (which has many of the same requirements as a hash function):
https://groups.google.com/group/sci.math.num-analysis/msg/eb4ddde782b17051
But that also multiplies by two constants. With just a single multiplicand, I'd worry that some repeating pattern in the interior of the product would cancel most of the bits.
However, before we worry about adding more entropy, how does the current approach fare in things like Bob Jenkin's frog-test?
(In reply to Justin Lebar [:jlebar] from comment #52)
> I'm not sure how much group theory we want to include in this file... :)
We want to include all the group theory (someone was telling me just the other week they didn't understand when it ever gets used in real life... and didn't believe me when I said "all the time").
Reporter | ||
Comment 55•13 years ago
|
||
> We care about the Carmichael function [1]; in our case, lambda(2^32) == 2^30, so 2^30 is
> the smallest N such that a^N == 1, for any odd |a|.
2^30 is the smallest N such that a^N = 1 no matter what |a| is. For any particular value of a, the smallest n such that a^n = 1 will be a factor of 2^30, but may be much smaller. In the extreme cases, 1^1 = 1 and (2^32 - 1)^2 = 1, say.
What really matters here is what the order of GoldenRatio is in the multiplicative group Z mod 2^32. A brute-force search suggests this order is 2^29, which is good enough for our purposes.
> I'm not sure how much group theory we want to include in this file... :)
I think saying that 2^29 is the smallest N for which GoldenRatio^N == 1 mod 2^32 is good enough.
Reporter | ||
Comment 56•13 years ago
|
||
> which means we're never actually doing GoldenRatio^N.
Well, we're doing it for N=2 if you do it to produce your hashcode and then pldhash does it on the input hashcode.
Assignee | ||
Comment 57•13 years ago
|
||
Heh, okay, I think we all are having too much fun with this (myself included!). :)
I propose we table this hash bikes--er, analysis until I have bug 730895 in hand. This will push all our hashing (or as much as I can muster) through the new algorithm. We can then analyze using all forms of amphibian tests. As is, this new hash function appears to be a clear win.
Assignee | ||
Comment 58•13 years ago
|
||
Attachment #600966 -
Attachment is obsolete: true
Attachment #601024 -
Flags: review?(jwalden+bmo)
Attachment #600966 -
Flags: review?(jwalden+bmo)
Assignee | ||
Comment 59•13 years ago
|
||
See also http://isthe.com/chongo/tech/comp/fnv/, which is very similar to what we're using, but has theoretically well-chosen magic numbers.
Comment 60•13 years ago
|
||
(In reply to Boris Zbarsky from comment #55)
> What really matters here is what the order of GoldenRatio is in the
> multiplicative group Z mod 2^32.
For odd numbers, the order depends on the first bit transition after bit 1.
For N > 2, (k×2^N±1)² = (k²×2^2N±k×2^(N+1)+1) = k'×2^(N+1)+1 which is of the previous form for N'=N+1, so that squaring enough times will reach 1 mod 2^32. GoldenRatio is of this form where N = 3 so that 32-3=29 squarings suffice.
For N = 1 or 2, (k×2³±3)² = (k²×2^6±k×2^4+9) which is of the previous form for N'=3, so that 30 squarings are required in the worst case. (Put another way, GoldenRatio has two square roots mod 2^32.) I didn't have to consider odd powers because the lowest two set bits are always the same in any odd power.
Comment 61•13 years ago
|
||
(In reply to neil@parkwaycc.co.uk from comment #60)
> GoldenRatio has two square roots mod 2^32.
Of course it doesn't, it either has zero or four square roots.
Comment 62•13 years ago
|
||
Comment on attachment 601024 [details] [diff] [review]
Part 1, v1.2: Add a better hash function to mfbt. r?waldo
Review of attachment 601024 [details] [diff] [review]:
-----------------------------------------------------------------
::: mfbt/Attributes.h
@@ +320,5 @@
> # define MOZ_FINAL /* no support */
> #endif
>
> +/**
> + * MOZ_WARN_UNUSED_RESULT tells the compiler to emit a warning if a function's
Just curious, what motivated adding this? I can't think of cases where the result of a hash function would accidentally not be used.
@@ +322,5 @@
>
> +/**
> + * MOZ_WARN_UNUSED_RESULT tells the compiler to emit a warning if a function's
> + * return value is not used by the caller. This currently works with GCC and
> + * clang.
I seem to recall most documentation comments don't call out which compilers support various things, so I'd say don't do that. It's kind of don't-repeat-yourself as well, come to think of it.
@@ +337,5 @@
> +#if defined(__GNUC__)
> +#define MOZ_WARN_UNUSED_RESULT __attribute__ ((warn_unused_result))
> +#else
> +#define MOZ_WARN_UNUSED_RESULT
> +#endif
Indent the defines two spaces, like all the others in the file:
#if ...
# define ...
#else
# define ...
#endif
Please check for defined(__clang__) || defined(__GNUC__) to be explicit about clang support here, rather than relying on its pretending to be gcc.
::: mfbt/HashFunctions.h
@@ +1,1 @@
> +/* -*- Mode: C++; tab-width: 4; indent-tabs-mode: nil; c-basic-offset: 4 -*-
s/4/2/g
@@ +35,5 @@
> + * and other provisions required by the GPL or the LGPL. If you do not delete
> + * the provisions above, a recipient may use your version of this file under
> + * the terms of any one of the MPL, the GPL or the LGPL.
> + *
> + * ***** END LICENSE BLOCK ***** */
Whee, MPL2 is actually required for new files now, not that I knew it for all the mfbt headers added recently.
@@ +43,5 @@
> +#ifndef mozilla_HashFunctions_h_
> +#define mozilla_HashFunctions_h_
> +
> +#include "mozilla/Attributes.h"
> +#include "mozilla/Types.h"
If StandardInteger.h supplies everything you need -- I think it does, looks like you're just using uint32_t -- use that instead.
@@ +45,5 @@
> +
> +#include "mozilla/Attributes.h"
> +#include "mozilla/Types.h"
> +
> +#ifdef __cplusplus
Unless SpiderMonkey gets around to dropping C compatibility sooner than I think it will, we may come to regret this. I guess we can cross that bridge when we reach it.
@@ +56,5 @@
> +
> +inline uint32_t
> +RotateLeft32(uint32_t value, uint8_t bits)
> +{
> + return (value << bits) | (value >> (32 - bits));
Might as well MOZ_ASSERT(bits < 32); just to be safe.
@@ +93,5 @@
> + *
> + * evaluates to |value|.
> + *
> + * (Number-theoretic aside: Because any odd number |m| is relatively prime to
> + * our modulus (2^32), the list
Use ** for exponentiation to distinguish it from xor, particularly since these algorithms are mixing both concepts.
I keep thinking that if you're going to have a "number-theoretic aside", you can do better than this. :-P Like saying they form a group under modulo multiplication, or something -- except that's not actually what you're saying (although I think they come to the same thing if you squint a little). I dunno, I've thought about this longer than I should have. :-)
Attachment #601024 -
Flags: review?(jwalden+bmo) → review+
Assignee | ||
Comment 63•13 years ago
|
||
> Just curious, what motivated adding this? I can't think of cases where the result of a hash
> function would accidentally not be used.
I actually wrote this bug myself. It's
> PRUint32 hash = HashString(...)
> AddToHash(hash, some, other, stuff);
> return hash;
> Whee, MPL2 is actually required for new files now, not that I knew it for all the mfbt headers
> added recently.
You have no idea how happy I am to make this change. :)
Assignee | ||
Comment 64•13 years ago
|
||
> I keep thinking that if you're going to have a "number-theoretic aside", you can do better than
> this. :-P Like saying they form a group under modulo multiplication, or something -- except that's
> not actually what you're saying (although I think they come to the same thing if you squint a
> little).
I think we're saying is that for odd x, the group (x * Z/2^32) (that is, multiply each element in Z mod 2^32 by x) is isomorphic to Z/2^32.
Status: NEW → ASSIGNED
Assignee | ||
Comment 65•13 years ago
|
||
> Use ** for exponentiation to distinguish it from xor, particularly since these algorithms are mixing
> both concepts.
Gah, I didn't change this before pushing. Too many things going through my brain today. I've changed it in my patches to bug 729940.
Assignee | ||
Comment 66•13 years ago
|
||
Comment 67•13 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/69e8dd5e9201
https://hg.mozilla.org/mozilla-central/rev/73831fbed59f
https://hg.mozilla.org/mozilla-central/rev/5a9bd18c627a
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla13
Comment 68•13 years ago
|
||
Try run for 284fb758b848 is complete.
Detailed breakdown of the results available here:
https://tbpl.mozilla.org/?tree=Try&rev=284fb758b848
Results (out of 279 total builds):
success: 214
warnings: 64
failure: 1
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-284fb758b848
Comment 69•13 years ago
|
||
Try run for 284fb758b848 is complete.
Detailed breakdown of the results available here:
https://tbpl.mozilla.org/?tree=Try&rev=284fb758b848
Results (out of 283 total builds):
success: 218
warnings: 64
failure: 1
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jlebar@mozilla.com-284fb758b848
You need to log in
before you can comment on or make changes to this bug.
Description
•