Home > Enterprise >  Summarize the number of non-numeric items using another column
Summarize the number of non-numeric items using another column

Time:01-10

I have a dataframe with several columns which are of type character and then a numeric column the aim is to summarize the character columns using the numeric column.

This is easier to explain using some example data and desired outcome. I have a feeling there's a nice solution in dplyr. I tried ifelse statements but couldn't get that to work.

Example data

df <- data.frame(time = c(1,1,1,1,2,2,3,4,5,6,6,6), Cat1 = c("item1","item1","item1","item1",NA,"item1",NA,NA,"item1",NA,NA,"item1"),cat2 = c(NA,NA,NA,"item2","item2","item2",NA,NA,NA,NA,"item2","item2"))

df

   time  Cat1  cat2
1     1 item1  <NA>
2     1 item1  <NA>
3     1 item1  <NA>
4     1 item1 item2
5     2  <NA> item2
6     2 item1 item2
7     3  <NA>  <NA>
8     4  <NA>  <NA>
9     5 item1  <NA>
10    6  <NA>  <NA>
11    6  <NA> item2
12    6 item1 item2

Desired output:

Category No.of 1s No of 2s No.of others
Cat1 4 1 2
Cat2 1 2 2

CodePudding user response:

Reshape to 'long' format on the 'Cat' columns with pivot_longer, then replace the 'time' column where values are not 1, or 2 to 'others', get the frequency count and reshape to 'wide' with pivot_wider

library(dplyr)
library(tidyr)
library(stringr)
df %>% 
  pivot_longer(cols = starts_with("Cat"), names_to = 'Category', 
      values_drop_na = TRUE) %>% 
  mutate(time = str_c('No_of_', replace(time, !time %in% c(1, 2), 'others')) )%>% 
  count(Category, time) %>% 
  pivot_wider(names_from = time, values_from = n)

-output

# A tibble: 2 × 4
  Category No_of_1 No_of_2 No_of_others
  <chr>      <int>   <int>        <int>
1 Cat1           4       1            2
2 cat2           1       2            2
  •  Tags:  
  • Related