I have the following vector:
vec<-c("70,00 mln €", "20,50 mln €", "400 mila", "400 mila", "400 mila", "100 mila", "50 mila")
In vec, mln means "milions" whereas mila means "thousand". I would like to convert this vector in a numeric vector like the following
70000000, 20500000, 400000, 400000, 400000, 100000, 50000
e.g. 70000000 stands for 70,00 mln, 20500000 stands for 20,50 mln and so on.
I tried with the following:
unlist(regmatches(vec, gregexpr("[[:digit:]] ", vec)))
to take the numeric part of the strings and then multiply by 1000 or 1000000, but I obtained:
[1] "70" "00" "20" "50" "400" "400" "400" "100" "50"
Here, "70" "00" should be just "70", "20" "50" should be instead 20.5 (numeric).
EDIT The one above is just an example. The true (longer) vector is the following
vec <- c("70,00 mln €", "20,50 mln €", "7,00 mln €", "1,90 mln €",
"1,50 mln €", "16,00 mln €", "15,00 mln €", "3,00 mln €",
"10 mln €", "6,70 mln €", "5,25 mln €", "4,80 mln €",
"3,68 mln €", "1,19 mln €", "1,00 mln €", "21 mln €",
"20 mln €", "3 mln €", "2 mln €", "1,95 mln €", "14.5 mln",
"14.5 mln", "12 mln", "7 mln", "2,32 mln", "21,30 mln", "21 mln",
"20 mln", "5 mln", "3,5 mln", "2 mln", "2 mln", "1,00 mln €",
"19,92 mln", "12,70 mln", "8,00 mln", "1 mln", "4,50 mln", "1,95 mln",
"4,50 mln", "1,95 mln", "1,00 mln €", "10,00 mln €", "2,00 mln €",
"2 mln", "4,50 mln", "8,00 mln €", "4,90 mln €", "1,00 mln €",
"400 mila", "400 mila", "400 mila", "100 mila", "50 mila", "600 mila €",
"500 mila €", "500 mila €", "200 mila €", "600 mila",
"520 mila", "200 mila", "100 mila", "500 mila €", "300 mila €",
"200 mila €", "150 mila €", "20 mila €", "700 mila €",
"500 mila", "500 mila", "600 mila €", "450 mila €", "33 mila €",
"500 mila €", "700 mila €", "250 mila €", "100 mila €"
)
CodePudding user response:
An easier option is to do the replacement with e6 and e3 for mln and mila after removing the space and other characters and then convert to numeric with as.numeric
library(stringr)
as.numeric(str_replace_all(str_remove_all(chartr(",", ".", vec),
"\\s €|\\s "), c(mln = "e6", "mila" = "e3")))
-output
[1] 70000000 20500000 400000 400000 400000 100000 50000
