I want to work with a very big text file (txt) that contains more than 3 million lines, each line with a different name, containing characters and integers.
My idea is to clean a little bit this file (so I can use it easier) and remove those names that have more than 2 numbers.
I would like to parse the names, counting the numbers and then if the name contains less than 3 numbers, writing it into a new file in R.
My big file would be something like this (separating names in new lines):
susan123 susan1 john john22345 alex55 alex1234
And then I would have this new file:
susan1 john alex55
Is this possible in R?
Thanks
CodePudding user response:
x = c("susan123", "susan1", "john", "john22345", "alex55", "alex1234")
library(stringr)
x[str_detect(x, pattern = "\\D \\d{0,2}$")]
# [1] "susan1" "john" "alex55"
CodePudding user response:
Base R:
We could use which with nchar and gsub and the useNames = TRUE argument of which:
x = c("susan123", "susan1", "john", "john22345", "alex55", "alex1234")
x[which(nchar(gsub("\\D", "", x)) < 3, useNames = TRUE)]
[1] "susan1" "john" "alex55"
