Test fit of discrete data distribution to normal distribution

by Douglas Lovell   Last Updated August 14, 2019 00:19 AM

Here are some grades. (They aren't test scores. They aren't a sample.) I want to test whether a normal distribution derived from the mean and variance of the grades provides a good fit for the grades. What other distribution might fit is, for now, an orthogonal issue. The question is whether the normal distribution provides a valid model.

> grades
  [1] NA 75 70 70 80 75 85 85 75 85 75 80 85 80 75 70 70 85 90 90 NA 85 NA 75 80
 [26] 80 85 80 85 85 75 NA 75 70 80 65 80 75 70 80 85 65 85 75 75 85 75 60 85 80
 [51] 85 85 70 70 80 75 80 80 NA 80 75 85 80 80 55 75 60 90 80 75 80 70 85 75 80
 [76] 75 80 75 75 70 85 85 80 80 75 70 NA 75 70 75 70 65 85 70 70 80 80 85 85 70
[101] 75 70 75 75 65 70 80 75 65 65
> t <- table(grades)
> t
grades
55 60 65 70 75 80 85 90
 1  2  6 18 27 25 22  3

The grades come from a discrete set of possible values from five to 100 in increments of five. Because of that, a Chi-squared test might be appropriate. First, the normal distribution probabilities that any of the discrete values will occur:

> p <- dnorm(seq(55,90,5), mean(grades, na.rm=TRUE), sd(grades, na.rm=TRUE))
> p
[1] 0.0004433536 0.0031786310 0.0136901824 0.0354207364 0.0550535301
[6] 0.0514034173 0.0288322176 0.0097150119

Why are the values of p so small? Why do they not add-up to one?

> tf <- as.data.frame(t)
> tf
  grades Freq
1     55    1
2     60    2
3     65    6
4     70   18
5     75   27
6     80   25
7     85   22
8     90    3
> chisq.test(tf$Freq, p=p, rescale.p=TRUE)

    Chi-squared test for given probabilities

data:  tf$Freq
X-squared = 7.0452, df = 7, p-value = 0.4242

Warning message:
In chisq.test(tf$Freq, p = p, rescale.p = TRUE) :
  Chi-squared approximation may be incorrect

Why does the chisq.test function produce the warning?

I need numbers, because I have a lot of these data sets and want fit results for a large sample of them. Visual inspection isn't proof. The closest relevant question I've found is https://stats.stackexchange.com/a/140576/59460 ("Testing Whether a Binomial Distribution Fits Data")



Related Questions


Updated November 06, 2018 11:19 AM

Updated July 21, 2017 17:19 PM

Updated November 25, 2018 01:19 AM

Updated June 21, 2018 17:19 PM

Updated February 07, 2019 13:19 PM