Imagine you have a vector x:
x <- c("C", "A", "B", "B", "A", "D", "B", "B", "A", "A", "A", "A", "A", "D", "C", "A", "C", "A", "A", "C", "A", "A", "D", "A", "D", "A", "D", "A", "A", "D", "D", "B", "B", "A", "A", "C", "A", "A", "B", "B", "B", "B", "B", "B", "B", "A", "C", "A", "C", "B")
You can make a table using:
table(x)
# x
# A B C D
# 22 14 7 7
What if you only want the table to include certain values (eg. 'A' and 'B'), or you want the table to include values that might not exist in x?
This is my attempt:
tab_specific_values <- function(vector, values) `names<-`(rowSums(outer(values, vector, `==`)), values)
For example:
tab_specific_values(vector = x, values = c('A', 'B'))
# A B
# 22 14
Or if we specify a value that does not exist in x
tab_specific_values(vector = x, values = c('A', 'B', 'E'))
# A B E
# 22 14 0
Is there an existing dedicated function that does this, or do you have a better approach? I suspect my function tab_specific_values might not be the best approach.
CodePudding user response:
Convert to factor with certain levels, then table:
#my values
v <- c("A", "B", "E")
table(factor(x, levels = v))
# A B E
# 22 14 0
CodePudding user response:
Benchmarking:
microbenchmark(
a = table(x, exclude = c('A', 'B')),
b = table(factor(x, levels = c('C', 'D'))),
c = tab_specific_values(vector = x, values = c('C', 'D')),
times = 1000
)
Unit: microseconds
expr min lq mean median uq max neval
a 116.401 131.6505 177.20030 145.201 236.8010 604.701 1000
b 49.302 60.0010 92.33422 66.501 109.4510 10974.101 1000
c 13.301 20.1005 29.09018 24.201 36.3015 134.901 1000
When x is 1,000,000 long:
Unit: milliseconds
expr min lq mean median uq max neval
a 119.3651 131.24110 142.63383 137.50385 144.07945 233.1265 100
b 43.9441 48.18640 58.24316 54.75485 59.12390 129.5087 100
c 48.9598 55.33825 67.03932 62.64145 65.93755 152.9490 100
