Finding statistical differences between a sampled subset between two groups in R

by Connor Murray   Last Updated October 10, 2019 04:19 AM

library(data.table)

set.seed(1)
group <- rep(c("A", "B", "C", "D", "E"), 100)
col <- c(rep("non", 250), rep("syn", 250))
data <- sample(500)
df <- as.data.table(cbind(group,col,data))

aov <- aov(data ~ group*col, data=df)
summary(aov)

library(dplyr)

set.seed(1)
sam <- df %>% group_by(group) %>% sample_n(10)

plot(sam$data)
plot(df$data)

I want to test if the sampled subset of data is statistically different from the actual data set. Would this involve an ANOVA? I was hoping for suggestions to tackle this issue because I feel that it is relevant for a number of analyses.

I have tried playing with a linear discriminant analysis (lda) in R, but this has some limitations because there does not seem to be a statistical test associated with this in R that produces a p-value.

Similar to the ANOVA test, I would like to show if the sampled individuals (sam) are statistically different by (group) and (col) to the actual data (df). Would this involve adding an additional column called "sampled" where a "T" or "F" would be in each row indicating whether or not that row was sampled? Would this be better streamlined for an ANOVA test?

Tags : r random sample anova


Related Questions


Updated May 08, 2019 06:19 AM

Updated June 25, 2019 21:19 PM

Updated July 16, 2017 20:19 PM

Updated June 07, 2015 02:08 AM

Updated February 24, 2017 11:19 AM