Home > database >  Pairwise t test loop through dataframes contained in a list
Pairwise t test loop through dataframes contained in a list

Time:01-11

I have a very large dataframe which is built as follows: Originaldf

I want to perform a pairwise t test within item A, comparing the measured value within the condition groups. So I would like to see if for all observations pertaining to item A, there is a difference between the measured values of the control group, test group, and placebo group (Condition).

The first thing I did was to split the dataframe into a list using dplyr's filter function.

Listdf <- split(originaldf, Item)

This worked and I got a list containing 82 elements with one dataframe corresponding to each item in the original dataframe.

I now am trying to perform the pairwise.t.test function on each element of the list. I am relatively new to R and think that writing a loop for this process, though inefficient, would help me understand what is going on the background. I know there is also the option to use the lapply function. I tried this on the Listdf with the following code, which I know is most likely much too simple but was worth a try. lapply(Listdf, pairwise.t.test(Value, Condition))

However, I get the error Error in factor(g) : object 'Condition' not found. Not sure if there is a way to more specifically reference Condition so that it can be found. I've performed an individual pairwise.t.test on one of the items which worked with the following code.

pairwise.t.test(List$ItemA$Value, List$ItemA$Condition, p.adjust.method = "none")

However, I assume this would not work within the lapply function because I want it to perform the t.test for ItemA, ItemB, ItemC etc...

The loop I have tried so far is as follows:

for (i in Listdf) {
 pairwise.t.test(List$i$logAddedConstant, List$i$Condition, p.adjust = "none")
}

For this I get the error "Error in split.default(X, group) : first argument must be a vector" I believe this error corresponds to the original splitting of the original dataframe. However I don't quite understand why this error would show up this late in the code because the splitting of the dataframe worked without a problem.

I know I am probably missing something fundamental, but I am quite stumped and have tried multiple options to no avail. If anyone has another idea or suggestion I would be very grateful for the help. Please let me know if I should add some more information.

CodePudding user response:

I made a very short example of a data.frame which is likewise structured as your originaldf

df <- data.frame(Item = c("A", "B", "C", "A", "B", "C"), 
                 Value=runif(6), 
                 Condition=c("Control","Control","Control", "Test", "Test", "Test"))

Listdf <- split(df, df$Item)

Using a simple for-loop

p <-list()
for (i in 1:length(Listdf)) {
  p[[i]] <- pairwise.t.test(Listdf[[i]]$Value, Listdf[[i]]$Condition, p.adjust = "none")
}

Using lapply

p <- lapply(1:length(Listdf), function(x) {pairwise.t.test(Listdf[[x]]$Value, Listdf[[x]]$Condition, p.adjust = "none")})
  •  Tags:  
  • Related