I have a file with name "test_result_20210930.xlsx". I would like to get "20210930" out to a new variable date. How should I do that? I think I can say pattern="[0-9] " What if I have more numbers in the file name, and I only want the part that will stand for the date? (8digt together?)
Any suggestion?
CodePudding user response:
Using gsub with \\D matches all non-digits and in the replacement, specify blank ("")
gsub("\\D ", "", str1)
[1] "20210930"
If the pattern also includes other digits, and want to return only the 8 digits
sub(".*_(\\d{8})_.*", "\\1", "test_result_20210930_01.xlsx")
[1] "20210930"
Or use str_extract
library(stringr)
str_extract("test_result_20210930_01.xlsx", "(?<=_)\\d{8}(?=_)")
[1] "20210930"
If we need to automatically convert to Date object
library(parsedate)
parse_date(str1)
[1] "2021-09-30 UTC"
-output
str1 <- "test_result_20210930.xlsx"
CodePudding user response:
You can also use str_extract from the stringr package to obtain the desired result.
library(stringr)
str_extract("test_result_20210930.xlsx", "[0-9]{8}")
# [1] "20210930"
