r count the number of nonmissing entries per variable per ID-CodePudding

Suppose this is my dataset

ID      Creatinine   Albumin   Calcium
1       12.2         10.1      NA
1       10.3          8.3      0.33
2       NA           11.12     0.56
2       5.6           9.6      0.14
2       NA            NA       NA
3       4.2          10.16     0.67
3       NA           12.84     0.45
4       NA           NA        0.72

I am interested in creating a summary of count indicating the number of times a given variable was measured per ID, like this

ID      Creatinine   Albumin   Calcium
1       2            2         1    
2       2            2         2  
3       1            2         2
4       NA           NA        1

Any suggestion regarding this is much appreciated.

CodePudding user response：

I think there's a more concise base R way, but here's one with dplyr:

library(dplyr)
df %>%
  group_by(ID) %>%
  summarize(across(everything(), ~sum(!is.na(.))))


# A tibble: 4 × 4
     ID Creatinine Albumin Calcium
  <int>      <int>   <int>   <int>
1     1          2       2       1
2     2          1       2       2
3     3          1       2       2
4     4          0       0       1

CodePudding user response：

Here is a base R solution:

t(sapply(split(df[-1], f = df$ID), function(x) colSums(!is.na(x))))

Output

  Creatinine Albumin Calcium
1          2       2       1
2          1       2       2
3          1       2       2
4          0       0       1

Data

df <-
  structure(
    list(
      ID = c(1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L),
      Creatinine = c(12.2, 10.3, NA, 5.6, NA, 4.2, NA, NA),
      Albumin = c(10.1, 8.3, 11.12,
                  9.6, NA, 10.16, 12.84, NA),
      Calcium = c(NA, 0.33, 0.56, 0.14,
                  NA, 0.67, 0.45, 0.72)
    ),
    class = "data.frame",
    row.names = c(NA,-8L)
  )

CodePudding user response：

data.table solution

library(data.table)
setDT(dt)

dt[ , lapply(.SD, function(x) sum(!is.na(x))), by = ID, .SDcols = !"ID"]

#    ID Creatinine Albumin Calcium
# 1:  1          2       2       1
# 2:  2          1       2       2
# 3:  3          1       2       2
# 4:  4          0       0       1

In case that you really want the zero counts to be displayed as NA

dt[ , lapply(.SD, function(x) { x <- sum(!is.na(x)); if (x == 0) x <- as.integer(NA); x }), by = ID, .SDcols = !"ID"]

#    ID Creatinine Albumin Calcium
# 1:  1          2       2       1
# 2:  2          1       2       2
# 3:  3          1       2       2
# 4:  4         NA      NA       1