Is there a way to get the date out of these strings? I only want to isolate the year (e.g., 2019, 2020, 2021)
Ex: USP_03182019_H13
A tidyr-friendly answer would be ideal.
date <- c("USP_03182019_H13","DED_03212019_H1","EL_03202019_H8","EL_10082020_H6","DSP_05122021_H5")
# date
#1 USP_03182019_H13
#2 DED_03212019_H1
#3 EL_03202019_H8
#4 EL_10082020_H6
#5 DSP_05122021_H5
CodePudding user response:
library(lubridate)
year(mdy(parse_number(date)))
[1] 2019 2019 2019 2020 2021
or
sub('.*(\\d{4})_.*', '\\1', date)
[1] "2019" "2019" "2019" "2020" "2021"
stringr::str_extract(date, '\\d{4}(?=_)')
[1] "2019" "2019" "2019" "2020" "2021"
CodePudding user response:
I'm sure there's a regex-based way but this will do it ...
library(magrittr)
date %>% readr::parse_number %>% substr(., nchar(.)-3, nchar(.))
CodePudding user response:
Alternative to Bens solution:
library(stringi)
stri_sub(date, stri_locate_last_regex(date, "\\d{4}"))
Output:
[1] "2019" "2019" "2019" "2020" "2021"
CodePudding user response:
A gsub solution
gsub(".*_[[:digit:]]{4}|_.*","",date)
[1] "2019" "2019" "2019" "2020" "2021"
CodePudding user response:
Another way would be to split the character vector, select the second element and then extract the year.
substr(sapply(strsplit(date, split = '_'), "[[", 2), 5, 9)
#"2019" "2019" "2019" "2020" "2021"
CodePudding user response:
Using stringr and dplyr since you asked for a tidyr solution. Not as neat as a one liner, but hopefully simple for a non-regex expert (like me) to follow.
get_date = function(x) {
numbers = str_split(x, "_", simplify=T)[,2]
unlist(str_extract_all(numbers, ".{4}$"))
}
dat %>%
mutate(date = get_date(date))
date
1 2019
2 2019
3 2019
4 2020
5 2021
