Home > Mobile >  How to count string numbers row-wisely and count percentage (only not empty columns taking into acco
How to count string numbers row-wisely and count percentage (only not empty columns taking into acco

Time:01-18

For the following dataframe:

df <- structure(list(test = c("A", "B", "C"), `2019` = c("true", "", 
"false"), `2020` = c("false", "true", "true"), `2021` = c("true", 
"false", "true"), `2022` = c("", "false", "false")), class = "data.frame", row.names = c(NA, 
-3L))

Out:

  test  2019  2020  2021  2022
1    A  true false  true      
2    B        true false false
3    C false  true  true false

I need to count number of trues from column 2019 to 2022, then calculate percentage of number of true from not NA or not empty values:

The expected result could be like this:

  test  2019  2020  2021  2022 true_pct
1    A  true false  true           0.67 # 2/3
2    B        true false false     0.33 # 1/3
3    C false  true  true false     0.50 # 2/4

The code below returns wrong answer since it take all year columns since there are empty string in the data, but I need to filter mask such as !is.na(df[,2:5]) | df[,2:5] != '':

df$count <- rowSums(df[-1] == "true")
df$not_na <- rowSums(!is.na(df[,2:5]))
# df$not_na <- rowSums(!complete.cases(df[,2:5]))
df$true_pct <- df$count/df$not_na

Out:

  test  2019  2020  2021  2022 count not_na true_pct
1    A  true false  true           2      4     0.50
2    B        true false false     1      4     0.25
3    C false  true  true false     2      4     0.50

Please share better solutions, thanks.

Update:

tibble [23 x 16] (S3: tbl_df/tbl/data.frame)
 $ name    : chr [1:23] "A" "B" "C" "D" ...
 $ 2020-11 : chr [1:23] NA NA NA NA ...
 $ 2020-12 : chr [1:23] "true" "true" "true" "true" ...
 $ 2021-01 : chr [1:23] NA NA "true" "false" ...
 $ 2021-02 : chr [1:23] NA NA "true" "true" ...
 $ 2021-03 : chr [1:23] "true" "true" "true" "true" ...

CodePudding user response:

You can use -

#To select only required columns
tmp <- df[-1]
#Total true values divided by total values which are not `NA` or empty.
df$true_pct <- rowSums(tmp == "true")/rowSums(tmp != "" & !is.na(tmp))
df

#  test  2019  2020  2021  2022  true_pct
#1    A  true false  true       0.6666667
#2    B        true false false 0.3333333
#3    C false  true  true false 0.5000000
  •  Tags:  
  • Related