by Durin
Last Updated August 14, 2019 06:20 AM

I was reading about normal approximation to binomial distribution and I dunno how it works for cases when you say for example p is equal to 0.3 where p is probability of success.

On most websites it is written that normal approximation to binomial distribution works well if average is greater than 5. I.e. np> 5 But I am unable to find where did this empirical formula came from?

If n is quite large and probability of success is equal to .5 then i agree that normal approximation to binomial distribution is going to be quite accurate. But what about other cases? How can one say np> 5 is the condition for doing normal approximation?

The condition $np > 5$ is not **the** condition, merely a rough estimate of what should be true in order for the normal distribution approximation to be "good enough".

From Wikipedia:

One rule is that both $x=np$ and $n(1 − p)$ must be greater than 5. However, the specific number varies from source to source, and depends on how good an approximation one wants.

There you can also find a list of other "rules".

So I did some experiments. I think np>5 condition is not correct at all. It depends on Excess Kurtosis value for a given binomial distribution. If it is Mesokurtic then approximation will give accurate results.

Check following table

for n=11 and p=0.5 kurtosis will be around 0.18. That is platykurtic and so I don't think approximation will give accurate results, even though n*p=5.5 > 5. The table shows results which manifests what I am trying to say.

For $np$ and $nq$ to increase $n$ must increase. $n$ is the number of independent trials, so it should be clear that the more independent trials made, the more accurate your approximation is. The probability histogram approximates a normal curve pretty accurately when $np$ and $nq$ are greater(or equal to) $5$. However bigger is better! If $np$ and $nq$ were greater than $10$ the probability histogram would approximate the normal curve even more.

Here's how I'm thinking of these conditions.

Note that if a random variable is **truncated** near its mean (i.e. the absolute value of the z score of the truncated value isn't too large) then the random variable's distribution will be skewed away from the truncated value and toward its mean.

That being said, observe that a binomial random variable X~B(*n*,*p*) is truncated at 0 and *n.* The condition *np*>10 pushes the distribution away from the truncation at 0, while *n*(1-*p*)>10 pushes the distribution away from the truncation at *n*. This will assure us that the distribution of X won't be undesirably skewed in any direction.

Think of *np* and *n*(1-*p*) as the expected number of success and failures in a series of *n* trials, respectively.

Hope this helps.

The mean $\mu$ of a binomial = np. The standard deviation of a binomial = $\sqrt{np(1-p)}$

For a normal distribution, **$\mu$ should be 3 standard deviations away from 0 and n.**

Therefore:

$\mu$ - $3\sqrt{np(1-p)} > 0 \hspace{2cm}$ and $\hspace{2cm}\mu$ + $3\sqrt{np(1-p)}<n$

From that starting point, algebraically you can get to the inequalities:

$np>9(1-p)\hspace{2cm}$ and $\hspace{2cm}n(1-p)>9p$

To satisfy these inequalities, as n gets larger, p has a wider range. Or you could also say the closer p is to 0.5, the smaller n you can use.

Using n=10 (for example):

$0.474<p<0.526$

As n gets larger, p does not have to be so close to 0.5. For n = 100,

$0.0826<p<0.9174$

Remarkably, even with a p = 0.9, if n >100 then the mean will be 3 standard deviations away from 0 and n.

This relates to calculating np and n(1-p), as if both are greater than 5, usually these inequalities are satisfied. However something like n=15, p=0.65 does not work, so some textbooks say np>9.

This condition does not guarantee that the binomial will fit a normal dist. but just that the mean will not be skewed too far towards 0 or n.

Updated April 21, 2019 18:20 PM

Updated May 24, 2017 21:20 PM

- Serverfault Query
- Superuser Query
- Ubuntu Query
- Webapps Query
- Webmasters Query
- Programmers Query
- Dba Query
- Drupal Query
- Wordpress Query
- Magento Query
- Joomla Query
- Android Query
- Apple Query
- Game Query
- Gaming Query
- Blender Query
- Ux Query
- Cooking Query
- Photo Query
- Stats Query
- Math Query
- Diy Query
- Gis Query
- Tex Query
- Meta Query
- Electronics Query
- Stackoverflow Query
- Bitcoin Query
- Ethereum Query