I have a data frame as such:
Feature ID Sub Value
A T1 B1 5.87
B T1 B2 3.99
C T1 B3 12.57
A T1 B2 9.22
B T1 B3 7.89
C T1 B1 4.76
A T2 B1 4.56
B T2 B2 9.26
C T2 B2 7.44
What I want to do is run one factor ANOVA in this dataset with the factor being "Sub". I want to loop through each feature and loop through each ID. Basically, I am calculating the variance of each feature within an ID, between "Sub".
I have generated the below code, but it doesn't seem to be working.
datalist = list()
for (i in unique(data1$Feature)) {
for (j in unique(data1$ID)) {
A1 <- summary(aov(data1$value ~ as.factor(data1$Sub), data = data1))
datalist[[j]] <- A1
}
}
big_data = do.call(rbind, datalist)
I end up getting big_data which is a matrix of 36 lists. I am unable to access the Anova output. It doesn't have to necessarily be a data frame. Even if it's a "write.csv()" within the loop that will generate the different outputs. Ultimately, I'll just be needing the "between" factor parameter of the Anova output to generate a plot so if this can also be incorporated in the code that'd be of great help.
I am still a beginner in R any help is very much appreciated.
Thank you!
CodePudding user response:
Several issues with current setup:
You do not actually use
iandjin youranovacall, so all nestedforloop iterations will return exact same results run on entire data frame. Quick Fix:subsetdata frame by i-th and j-th values.anova(value ~ Sub, data = subset(data1, Feature == i & ID == j))You save list elements only under
jvalues and not bothiandj, so iterations will reassign repeatedly and only saves last pass ofjitems. Quick fix: add named elements of i-th and j-th values.datalist[[paste0(i, "_", j)]] <- A1You are attempting to
rbindlist objects, not matrices or data frames, sincesummary.anovareturns a list of results. For your use case, callingstrshows your results contain a list of 1:str(summary(aov(data1$value ~ as.factor(data1$Sub), data = data1))) List of 1 $ :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: ..$ Df : num [1:2] ... ..$ Sum Sq : num [1:2] ... ..$ Mean Sq: num [1:2] ... ..$ F value: num [1:2] ... ..$ Pr(>F) : num [1:2] ... - attr(*, "class")= chr [1:2] "summary.aov" "listof"Quick fix: index the first item.
summary(anova(...))[[1]]
However, consider an apply family solution with by (object-oriented wrapper to tapply) and avoid the bookkeeping of initializing lists and assign iteratively in nested for loops. Specifically, by can split up data frame by one or more groups and run operations on the subsets to return a list equal to all possible unique values of groups. Also, consider using a defined method to encapsulate all processing on each subset.
# USER-DEFINED METHOD
run_anova <- function(sub_df) {
# RAW RESULTS
anova_raw <- summary(aov(value ~ Sub, data = sub_df))[[1]]
# CLEAN UP DATA WITH IDENTIFIERS
anova_df <- data.frame(
within(anova_raw, {Feature <- sub_df$Feature[1]; ID <- sub_df$ID[1]}),
row.names = NULL,
check.names = FALSE
)
return(anova_df)
}
datalist <- by(data1, data1[c("Feature", "ID")], run_anova)
big_data <- do.call(rbind, unname(datalist))
