My data frame is like this
location population
Canada 38067913
China 1444216102
Mexico 130262220
And i would need to mutate() the population numbers into a new variable in abbreviated terms as such :
location population pop_text
Canada 38067913 38.06 milions
China 1444216102 1.44 billions
Mexico 130262220 130.26 millions
CodePudding user response:
My approach would be to throw together some nested ifelse() statements if there are only a handful of alternatives to deal with (as is the case with population size).
ifelse(x>=1e12, sprintf("%.2f trillion", x/1e12),
ifelse(x>=1e9, sprintf("%.2f billion", x/1e9),
ifelse(x>=1e6, sprintf("%.2f million", x/1e6),
format(x, big.mark=","))))
If df is your data.frame then yours would be
df$pop_text <-
ifelse(df$population>=1e12, sprintf("%.2f trillion", df$population/1e12),
ifelse(df$population>=1e9, sprintf("%.2f billion", df$population/1e9),
ifelse(df$population>=1e6, sprintf("%.2f million", df$population/1e6),
format(df$population, big.mark=","))))
An ifelse() evaluates the condition in the first argument (e.g. is x over 1 trillion?) and, if true, performs the operation in the second argument and moves to the next value of x, and if false, performs the operation in the third argument and moves to the next value of x. Placing ifelse() calls in the third argument of another means that second ifelse() gets evaluated if the first returned a false.
CodePudding user response:
With a little finagling you could hijack how format.object_size prints:
convert_numbers_into_abbreviated_terms_in_text <- function(x, digits = 1L) {
key <- setNames(
c('B', 'kB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'),
c('', 'thousands', 'millions', 'billions', 'trillions',
'quadrillions', 'quintillions', 'sextillions', 'septillions')
)
sapply(x, function(xx) {
xx <- format(
structure(xx, class = 'object_size'), units = 'auto', standard = 'SI',
digits = digits
)
xx <- strsplit(xx, ' ')[[1L]]
trimws(paste(xx[1L], names(key)[match(xx[2L], key)], collapse = ' '))
})
}
x <- read.table(header = TRUE, text = "location population
Canada 38067913
China 1444216102
Mexico 130262220")
convert_numbers_into_abbreviated_terms_in_text(x$population)
# [1] "38.1 millions" "1.4 billions" "130.3 millions"
convert_numbers_into_abbreviated_terms_in_text(x$population, digits = 2)
# [1] "38.07 millions" "1.44 billions" "130.26 millions"
si <- 1000^(0:8)
convert_numbers_into_abbreviated_terms_in_text(si)
# [1] "1" "1 thousands" "1 millions" "1 billions" "1 trillions" "1 quadrillions" "1 quintillions"
# [8] "1 sextillions" "1 septillions"
CodePudding user response:
We could use a switch based on a comparison with respective powers of 10.
f <- Vectorize(function(x, digits=2) {
u <- mapply(`^`, 10, 0:3*3)^-1 * x
o <- sum(u > 1) |>
(\(x) sapply(x, \(i) {
switch(i, '', 'thousands', 'millions', 'billions')
}))()
paste(round(u[u < 1000 & u > 1], digits), o)
})
transform(dat, popText=f(population))
# location population popText
# 1 Andorra 77443 77.44 thousands
# 2 Canada 38067913 38.07 millions
# 3 China 1444216102 1.44 billions
# 4 Mexico 130262220 130.26 millions
# 5 Earth 7577130400 7.58 billions
Data:
dat <- structure(list(location = c("Andorra", "Canada", "China", "Mexico",
"Earth"), population = c(77443, 38067913, 1444216102, 130262220,
7577130400)), class = "data.frame", row.names = c(NA, -5L))
