I was wondering if there was a way of converting decimal to terniary, since there are functions to binary/octal as intToBin(x).
So i need to convert a chacarcer string as
S0 <- c("Hello Stac")
To base3, so I thought to first convert It to ASCII with:
S01<- utf8ToInt(S0)
> S01
[1] 72 101 108 108 111 32 83 116 97 99
And then convert It to base3 and obtain something like this:
> S1
[1] 2200 10202 11000 11010 11022 1012 10002 11022 10121 10200
CodePudding user response:
For practice, I guess you can try to write your own converter function like below
f <- function(x, base = 3) {
q <- c()
while (x) {
q <- c(x %% base, q)
x <- x %/% base
}
# as.numeric(paste0(q, collapse = ""))
sum(q * 10^(rev(seq_along(q) - 1)))
}
or with recursion
f <- function(x, base = 3) {
ifelse(x < base, x, f(x %/% base) * 10 x %% base)
}
then you can run
> sapply(utf8ToInt(S0),f)
[1] 2200 10202 11000 11000 11010 1012 10002 11022 10121 10200
CodePudding user response:
You can use cwhmisc::int2B:
library(cwhmisc)
int2B(utf8ToInt(S0), 3)[[1]] |> as.numeric()
# [1] 2200 10202 11000 11000 11010 1012 10002 11022 10121 10200
CodePudding user response:
Nice programming exercise. I have vectorized @ThomasIsCoding's answer and generalized to support any number of strings and any base from 2 to 10.
The function below takes as arguments a character vector x, a base b, and a logical flag double. It returns a list res such that res[[i]] is an nchar(x[i])-length vector giving the base-b representation of x[i]. The list elements are double vectors or character vectors depending on double.
Some caveats:
- For efficiency, the function concatenates the strings in
xrather than looping over them. It throws an error if the concatenation would exceed2^31-1bytes, which is the maximum string size allowed by R. - The largest Unicode code point is
"\U10FFFF". The binary representation of this number exceeds2^53when interpreted as decimal, so it cannot be stored in a double vector without loss of precision:
As a defensive measure, the function warns ifx <- sub("^0 ", "", paste(rev(as.integer(intToBits(0x10FFFF))), collapse = "")) x ## [1] "100001111111111111111" sprintf("%.0f", as.double(x)) ## [1] "100001111111111114752"2^53is exceeded whenbase = 2anddouble = TRUE. - The
base = 2andbase = 10cases are already handled by base R'sutf8ToIntandintToBits.
utf8ToBase <- function(x, b = 10, double = TRUE) {
## Do some basic checks
stopifnot(is.character(x), !anyNA(x),
is.numeric(b), length(b) == 1L,
b %% 1 == 0, b >= 2, b <= 10)
if (length(x) == 0L) {
return(list())
}
## Require UTF-8 encoding
x <- enc2utf8(x)
## Operate on concatenation to avoid loop over strings
xx <- paste(x, collapse = "")
ixx <- utf8ToInt(xx)
## Use common field width determined from greatest integer
width <- as.integer(ceiling(log(max(ixx), base = b)))
res <- rep.int(strrep("0", width), length(ixx))
## Loop over digits
pos <- 1L
pow <- b^(width - 1L)
while (pos <= width) {
quo <- ixx %/% pow
substr(res, pos, pos) <- as.character(quo)
ixx <- ixx - pow * quo
pos <- pos 1L
pow <- pow %/% b
}
## Discard leading zeros
if (double) {
res <- as.double(res)
if (b == 2 && any(res > 2^53)) {
warning("binary result not guaranteed due to loss of precision")
}
} else {
res <- sub("^0 ", "", res)
}
## Return list
res <- split(res, rep.int(gl(length(x), 1L), nchar(x)))
names(res) <- names(x)
res
}
x <- c(foo = "Hello Stack Overflow!", bar = "Hello world!")
utf8ToBase(x, 2)
$foo
[1] 1001000 1100101 1101100 1101100 1101111 100000
[7] 1010011 1110100 1100001 1100011 1101011 100000
[13] 1001111 1110110 1100101 1110010 1100110 1101100
[19] 1101111 1110111 100001
$bar
[1] 1001000 1100101 1101100 1101100 1101111 100000
[7] 1110111 1101111 1110010 1101100 1100100 100001
utf8ToBase(x, 3)
$foo
[1] 2200 10202 11000 11000 11010 1012 10002 11022 10121 10200
[11] 10222 1012 2221 11101 10202 11020 10210 11000 11010 11102
[21] 1020
$bar
[1] 2200 10202 11000 11000 11010 1012 11102 11010 11020 11000
[11] 10201 1020
utf8ToBase(x, 10)
$foo
[1] 72 101 108 108 111 32 83 116 97 99 107 32 79 118 101
[16] 114 102 108 111 119 33
$bar
[1] 72 101 108 108 111 32 119 111 114 108 100 33
utf8ToBase("\U10FFFF", 2)
[[1]]
[1] 1.000011e 20
Warning message:
In str_to_num("\U{10ffff}", 2) :
binary result not guaranteed due to loss of precision
utf8ToBase("\U10FFFF", 2, double = FALSE)
[[1]]
[1] "100001111111111111111"
