Virus.Org  IT Security News and Information Portal. We offer the latest IT security news, updates, product reviews, books, and articles for all you IT security professionals out there. Enter and get the best IT security information on the Internet.

 

. Welcome to the Virus.Org Mailing List Archive  
.
.


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]


Birthday paradox
.

  • To: [EMAIL PROTECTED]
  • Subject: Birthday paradox
  • From: Ian Miller <[EMAIL PROTECTED]>
  • Date: Mon, 8 Nov 2004 11:21:02 +0000
  • In-reply-to: <[EMAIL PROTECTED]>
  • References: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
  • Reply-to: [EMAIL PROTECTED]
  • Sender: [EMAIL PROTECTED]
.
 
This has been discussed on the list a number of times, both with respect to
"birthday attacks" and various possibilities for false positives due to
value clashes.

It has been noted that the sample size N for a 50% chance [P = 0.5] of a
clash in a domain of size D is very approximately sqrt(D).  A better
approximation can be derived as follows:-

If there are N samples, the number of pairs of samples is
     (N^2 - N)/2

Each pair has a 1/D probability of being a clash so the expected number of
clashes is :-
     (N^2 - N)/2D

Especially for large D, the number of clashes is approximately Poisson [1]
distributed so the probability of zero clashes is approximately:-
     1 - P = e^(-(N^2 - N)/2D)

If this probability is to be P, it follows that:
     ln(1/(1 - P)) = -(N^2 - N)/2D
  or
     N^2 - N - 2 * ln(1/(1-P)) * D = 0

This quadratic has the solution (its only positive solution)
     N = sqrt(D*2*ln(1/(1-P)) + 0.25) + 0.5
I practice the model is insufficiently precise to put any faith in the 0.25
and 0.5 constants for any values of D where they are significant, so you
might as well simplify it to:-
     N = sqrt(D*2*ln(1/(1-P)))

This is a very good approximation.  Experimentally it seems that for the (P
= 0.5) case, the following slight modification is more precise:-
     N = sqrt(D*2*ln(2)) + 0.269

If rounded up to the nearest integer the value returned is almost always
the smallest number that has a better than 50% chance of a clash.


Ian
--
[1] This diverges slightly from Poisson because the probability of the Nth
value clashing with the previous N-1 values, is not independent of whether
the N-1 values contain a clash.  I suspect that this is the reason the
experimental results diverge very slightly from the Poisson distributed
model.


--
Singularis Ltd, 32 Stockwell St, Cambridge, CB1 3ND
Tel:  +44 1223 525088	            Mobile: +44 777 5536663
Fax:  +44 870 0514333	 (e-mail preferred to Fax)





 
.
.
 
Copyright (c) Virus.Org 1997-2006.
All Trademarks Acknowledged.
Please view our Terms and Conditions and our Privacy Policy.