library(data.table) set.seed(1) group <- rep(c("A", "B", "C", "D", "E"), 100) col <- c(rep("non", 250), rep("syn", 250)) data <- sample(500) df <- as.data.table(cbind(group,col,data)) aov <- aov(data ~ group*col, data=df) summary(aov) library(dplyr) set.seed(1) sam <- df %>% group_by(group) %>% sample_n(10) plot(sam$data) plot(df$data)
I want to test if the sampled subset of data is statistically different from the actual data set. Would this involve an ANOVA? I was hoping for suggestions to tackle this issue because I feel that it is relevant for a number of analyses.
I have tried playing with a linear discriminant analysis (lda) in R, but this has some limitations because there does not seem to be a statistical test associated with this in R that produces a p-value.
Similar to the ANOVA test, I would like to show if the sampled individuals (sam) are statistically different by (group) and (col) to the actual data (df). Would this involve adding an additional column called "sampled" where a "T" or "F" would be in each row indicating whether or not that row was sampled? Would this be better streamlined for an ANOVA test?