Well... it depends on what you define as enough.
In my case below it was enough. Read how I assessed it.
It's clear that different inputs might map to the same hashcode value (2^32 different options), but what are the chances for it to happen?
I have one million users, each user owns 50 private items. An item is identified by a UUID.
I had these two conflicting goals:
(I) Represent each item as an integer instead of a UUID
(II) Avoid collisions. Any pair of items owned by the same user should resolve to a different hashcode.
What is enough: I could live with up to 10 users, out of a million, experiencing a collision. Most of these 10 users will never notice the collision. I assume the system will have other bugs with higher probably than that.
|We're all unique|
Assessing uniquenessOne way to asses is computing the statistical probability for such an event . But I preferred a "proof" that any programmer could appreciate even those without good statistics skills. Therefore I coded a simulation that simply ties it in practice:
Download from Gist
The program comes back saying that a collisions aren't really something to worry about:
Iteration 0: Had 0 collisions for 1000000 users Iteration 1: Had 0 collisions for 1000000 users Iteration 2: Had 0 collisions for 1000000 users Iteration 3: Had 0 collisions for 1000000 users Iteration 4: Had 0 collisions for 1000000 users