I have a dataframe with several columns which are of type character and then a numeric column the aim is to summarize the character columns using the numeric column.
This is easier to explain using some example data and desired outcome. I have a feeling there's a nice solution in dplyr. I tried ifelse statements but couldn't get that to work.
Example data
df <- data.frame(time = c(1,1,1,1,2,2,3,4,5,6,6,6), Cat1 = c("item1","item1","item1","item1",NA,"item1",NA,NA,"item1",NA,NA,"item1"),cat2 = c(NA,NA,NA,"item2","item2","item2",NA,NA,NA,NA,"item2","item2"))
df
time Cat1 cat2
1 1 item1 <NA>
2 1 item1 <NA>
3 1 item1 <NA>
4 1 item1 item2
5 2 <NA> item2
6 2 item1 item2
7 3 <NA> <NA>
8 4 <NA> <NA>
9 5 item1 <NA>
10 6 <NA> <NA>
11 6 <NA> item2
12 6 item1 item2
Desired output:
| Category | No.of 1s | No of 2s | No.of others |
|---|---|---|---|
| Cat1 | 4 | 1 | 2 |
| Cat2 | 1 | 2 | 2 |
CodePudding user response:
Reshape to 'long' format on the 'Cat' columns with pivot_longer, then replace the 'time' column where values are not 1, or 2 to
'others', get the frequency count and reshape to 'wide' with pivot_wider
library(dplyr)
library(tidyr)
library(stringr)
df %>%
pivot_longer(cols = starts_with("Cat"), names_to = 'Category',
values_drop_na = TRUE) %>%
mutate(time = str_c('No_of_', replace(time, !time %in% c(1, 2), 'others')) )%>%
count(Category, time) %>%
pivot_wider(names_from = time, values_from = n)
-output
# A tibble: 2 × 4
Category No_of_1 No_of_2 No_of_others
<chr> <int> <int> <int>
1 Cat1 4 1 2
2 cat2 1 2 2
