Why normal approximation to binomial distribution uses np> 5 as a condition

by Durin   Last Updated August 14, 2019 06:20 AM

I was reading about normal approximation to binomial distribution and I dunno how it works for cases when you say for example p is equal to 0.3 where p is probability of success.

On most websites it is written that normal approximation to binomial distribution works well if average is greater than 5. I.e. np> 5 But I am unable to find where did this empirical formula came from?

If n is quite large and probability of success is equal to .5 then i agree that normal approximation to binomial distribution is going to be quite accurate. But what about other cases? How can one say np> 5 is the condition for doing normal approximation?



Answers 5


The condition $np > 5$ is not the condition, merely a rough estimate of what should be true in order for the normal distribution approximation to be "good enough".

From Wikipedia:

One rule is that both $x=np$ and $n(1 − p)$ must be greater than 5. However, the specific number varies from source to source, and depends on how good an approximation one wants.

There you can also find a list of other "rules".

naslundx
naslundx
April 20, 2014 08:35 AM

So I did some experiments. I think np>5 condition is not correct at all. It depends on Excess Kurtosis value for a given binomial distribution. If it is Mesokurtic then approximation will give accurate results.

Check following table enter image description here

for n=11 and p=0.5 kurtosis will be around 0.18. That is platykurtic and so I don't think approximation will give accurate results, even though n*p=5.5 > 5. The table shows results which manifests what I am trying to say.

Durin
Durin
April 21, 2014 14:06 PM

For $np$ and $nq$ to increase $n$ must increase. $n$ is the number of independent trials, so it should be clear that the more independent trials made, the more accurate your approximation is. The probability histogram approximates a normal curve pretty accurately when $np$ and $nq$ are greater(or equal to) $5$. However bigger is better! If $np$ and $nq$ were greater than $10$ the probability histogram would approximate the normal curve even more.

K. Fields
K. Fields
April 15, 2015 02:43 AM

Here's how I'm thinking of these conditions.

Note that if a random variable is truncated near its mean (i.e. the absolute value of the z score of the truncated value isn't too large) then the random variable's distribution will be skewed away from the truncated value and toward its mean.

That being said, observe that a binomial random variable X~B(n,p) is truncated at 0 and n. The condition np>10 pushes the distribution away from the truncation at 0, while n(1-p)>10 pushes the distribution away from the truncation at n. This will assure us that the distribution of X won't be undesirably skewed in any direction.

Think of np and n(1-p) as the expected number of success and failures in a series of n trials, respectively.

Hope this helps.

user429040
user429040
April 05, 2017 17:36 PM

The mean $\mu$ of a binomial = np. The standard deviation of a binomial = $\sqrt{np(1-p)}$

For a normal distribution, $\mu$ should be 3 standard deviations away from 0 and n.

Therefore:

$\mu$ - $3\sqrt{np(1-p)} > 0 \hspace{2cm}$ and $\hspace{2cm}\mu$ + $3\sqrt{np(1-p)}<n$

From that starting point, algebraically you can get to the inequalities:

$np>9(1-p)\hspace{2cm}$ and $\hspace{2cm}n(1-p)>9p$

To satisfy these inequalities, as n gets larger, p has a wider range. Or you could also say the closer p is to 0.5, the smaller n you can use.

Using n=10 (for example):

$0.474<p<0.526$

As n gets larger, p does not have to be so close to 0.5. For n = 100,

$0.0826<p<0.9174$

Remarkably, even with a p = 0.9, if n >100 then the mean will be 3 standard deviations away from 0 and n.

This relates to calculating np and n(1-p), as if both are greater than 5, usually these inequalities are satisfied. However something like n=15, p=0.65 does not work, so some textbooks say np>9.

This condition does not guarantee that the binomial will fit a normal dist. but just that the mean will not be skewed too far towards 0 or n.

Joseph
Joseph
August 14, 2019 06:09 AM

Related Questions


Updated July 08, 2017 19:20 PM

Updated April 21, 2019 18:20 PM

Updated June 24, 2019 16:20 PM

Updated May 24, 2017 21:20 PM

Updated June 22, 2017 09:20 AM