|
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Birthday paradox |  |
- To: [EMAIL PROTECTED]
- Subject: Birthday paradox
- From: Ian Miller <[EMAIL PROTECTED]>
- Date: Mon, 8 Nov 2004 11:21:02 +0000
- In-reply-to: <[EMAIL PROTECTED]>
- References: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
- Reply-to: [EMAIL PROTECTED]
- Sender: [EMAIL PROTECTED]
 |
| |
This has been discussed on the list a number of times, both with respect to
"birthday attacks" and various possibilities for false positives due to
value clashes.
It has been noted that the sample size N for a 50% chance [P = 0.5] of a
clash in a domain of size D is very approximately sqrt(D). A better
approximation can be derived as follows:-
If there are N samples, the number of pairs of samples is
(N^2 - N)/2
Each pair has a 1/D probability of being a clash so the expected number of
clashes is :-
(N^2 - N)/2D
Especially for large D, the number of clashes is approximately Poisson [1]
distributed so the probability of zero clashes is approximately:-
1 - P = e^(-(N^2 - N)/2D)
If this probability is to be P, it follows that:
ln(1/(1 - P)) = -(N^2 - N)/2D
or
N^2 - N - 2 * ln(1/(1-P)) * D = 0
This quadratic has the solution (its only positive solution)
N = sqrt(D*2*ln(1/(1-P)) + 0.25) + 0.5
I practice the model is insufficiently precise to put any faith in the 0.25
and 0.5 constants for any values of D where they are significant, so you
might as well simplify it to:-
N = sqrt(D*2*ln(1/(1-P)))
This is a very good approximation. Experimentally it seems that for the (P
= 0.5) case, the following slight modification is more precise:-
N = sqrt(D*2*ln(2)) + 0.269
If rounded up to the nearest integer the value returned is almost always
the smallest number that has a better than 50% chance of a clash.
Ian
--
[1] This diverges slightly from Poisson because the probability of the Nth
value clashing with the previous N-1 values, is not independent of whether
the N-1 values contain a clash. I suspect that this is the reason the
experimental results diverge very slightly from the Poisson distributed
model.
--
Singularis Ltd, 32 Stockwell St, Cambridge, CB1 3ND
Tel: +44 1223 525088 Mobile: +44 777 5536663
Fax: +44 870 0514333 (e-mail preferred to Fax)
 |
| |