Count the number of valid rows in r-CodePudding

I have a counting exercise with some rows NA.

Here is a sample dataset that has cat and id variables.

df <- data.frame(cat=c("A","A","A","B","B","C","C","C","C"),
                 id = c(11,12,13,NA,NA,21,21,23,24))

> df
  cat id
1   A 11
2   A 12
3   A 13
4   B NA
5   B NA
6   C 21
7   C 21
8   C 23
9   C 24

I would like to count the number of observations grouped by cat.

I tried this but it also counted the NAs for the category B.

df.1 <- df %>% 
  group_by(cat) %>%
  dplyr::summarise(freq=n())

> df.1
# A tibble: 3 x 2
  cat    freq
  <fct> <int>
1 A         3
2 B         2
3 C         4

How can I not count NAs and simply assign NA in that cell as below?

   > df.1
    # A tibble: 3 x 2
      cat    freq
      <fct> <int>
    1 A         3
    2 B         NA
    3 C         4

CodePudding user response：

Use sum on a logical vector (!is.na(id) or complete.cases(id)) and replace the 0 counts to NA with na_if

library(dplyr)
df %>% 
 group_by(cat) %>% 
 summarise(freq = na_if(sum(!is.na(id)), 0))

-output

# A tibble: 3 × 2
  cat    freq
  <chr> <int>
1 A         3
2 B        NA
3 C         4

CodePudding user response：

Here is an alternative approach using an ifelse statement:

library(dplyr)

df %>% 
  group_by(cat) %>% 
  mutate(freq = ifelse(!is.na(id), row_number(), NA)) %>% 
  summarise(freq=max(freq))

  cat    freq
  <chr> <int>
1 A         3
2 B        NA
3 C         4