Calculate the mean of specific rows-CodePudding

I have a data that contains 10 lines (samples) that are either Class1 or Class2 and columns.

I want to calculate the mean of rows Class1 and make the result in line 11 and calculate the mean of rows Class2 and make the result in a line 12.

The data:

Name   ClassType   Col1   Col2   Col3
A      Class1       10     50    12
B      Class2        7     20    12
C      Class1        8     12     8
D      Class1        9     14    17
E      Class2        3     15    14
F      Class2       10     15    16
G      Class2       12     22    15
H      Class1       10     28    10

The result I want:

       Name   ClassType    Col1   Col2   Col3     
        A      Class1       10     50    12        
        B      Class2        7     20    12
        C      Class1        8     12     8
        D      Class1        9     14    17
        E      Class2        3     15    14
        F      Class2       10     15    16
        G      Class2       12     22    15
        H      Class1       10     28    10
 Mean class1    NA          9.25   26    11.75
 Mean class2    NA          8      18    14.25

CodePudding user response：

Try this

library(dplyr)

calc_cols <- your_dataframe %>% group_by(ClassType) %>%
   summarise( Col1 = mean(Col1),
              Col2 = mean(Col2),
              Col3 = mean(Col3)) %>%
   mutate( ClassType = NA,
           Name = c("Mean class1", "Mean class2"))

your_new_dataframe <- rbind(your_dataframe,  calc_cols)

CodePudding user response：

You can summarize across columns. To build on @geometricfreedom's answer, here's how (I created a mini-version of your data set, using just the first four rows):

your_data <- data.frame(
  Name = c("A", "B", "C", "D"), 
  ClassType = c("Class1", "Class2", "Class1", "Class1"), 
  Col1 = c(10, 7, 8, 9), 
  Col2 = c(50, 20, 12, 14),
  Col3 = c(12, 12, 8, 17)
)

calc_cols <- your_data %>% 
  group_by(ClassType) %>% 
  summarize(
    across(Col1:Col3, mean)
  ) %>% 
  mutate(ClassType = NA, Name = c("Mean Class 1", "Mean Class 2"))

your_data <- rbind(your_data, calc_cols)

your_data

CodePudding user response：

Here is another tidyverse option, but doing everything in one pipe. We can first group by ClassType, then summarize over any columns that start with Col. Also, in the summarize statement, we can create the additional names that you want for the bottom rows (i.e., Mean Class 1 and Mean Class 2) and just return NA for ClassType. Then, we can use bind_rows to bind the original input dataframe to the new one (denoted by the .) from the previous step.

df %>%
  group_by(ClassType) %>%
  summarize(Name = paste0("Mean ", unique(ClassType)),
            across(starts_with("Col"), mean),
            ClassType = NA) %>%
  bind_rows(df, .)

Output

          Name ClassType  Col1 Col2  Col3
1            A    Class1 10.00   50 12.00
2            B    Class2  7.00   20 12.00
3            C    Class1  8.00   12  8.00
4            D    Class1  9.00   14 17.00
5            E    Class2  3.00   15 14.00
6            F    Class2 10.00   15 16.00
7            G    Class2 12.00   22 15.00
8            H    Class1 10.00   28 10.00
9  Mean Class1      <NA>  9.25   26 11.75
10 Mean Class2      <NA>  8.00   18 14.25

Data

df <- structure(
  list(
    Name = c("A", "B", "C", "D", "E", "F", "G", "H"),
    ClassType = c(
      "Class1",
      "Class2",
      "Class1",
      "Class1",
      "Class2",
      "Class2",
      "Class2",
      "Class1"
    ),
    Col1 = c(10L, 7L, 8L, 9L, 3L,
             10L, 12L, 10L),
    Col2 = c(50L, 20L, 12L, 14L, 15L, 15L, 22L, 28L),
    Col3 = c(12L, 12L, 8L, 17L, 14L, 16L, 15L, 10L)
  ),
  class = "data.frame",
  row.names = c(NA,-8L)
)