I have a counting exercise with some rows NA.
Here is a sample dataset that has cat and id variables.
df <- data.frame(cat=c("A","A","A","B","B","C","C","C","C"),
id = c(11,12,13,NA,NA,21,21,23,24))
> df
cat id
1 A 11
2 A 12
3 A 13
4 B NA
5 B NA
6 C 21
7 C 21
8 C 23
9 C 24
I would like to count the number of observations grouped by cat.
I tried this but it also counted the NAs for the category B.
df.1 <- df %>%
group_by(cat) %>%
dplyr::summarise(freq=n())
> df.1
# A tibble: 3 x 2
cat freq
<fct> <int>
1 A 3
2 B 2
3 C 4
How can I not count NAs and simply assign NA in that cell as below?
> df.1
# A tibble: 3 x 2
cat freq
<fct> <int>
1 A 3
2 B NA
3 C 4
CodePudding user response:
Use sum on a logical vector (!is.na(id) or complete.cases(id)) and replace the 0 counts to NA with na_if
library(dplyr)
df %>%
group_by(cat) %>%
summarise(freq = na_if(sum(!is.na(id)), 0))
-output
# A tibble: 3 × 2
cat freq
<chr> <int>
1 A 3
2 B NA
3 C 4
CodePudding user response:
Here is an alternative approach using an ifelse statement:
library(dplyr)
df %>%
group_by(cat) %>%
mutate(freq = ifelse(!is.na(id), row_number(), NA)) %>%
summarise(freq=max(freq))
cat freq
<chr> <int>
1 A 3
2 B NA
3 C 4
