Home > Net >  How to isolate year from mmddyyyy string with no available delimiter?
How to isolate year from mmddyyyy string with no available delimiter?

Time:01-06

Is there a way to get the date out of these strings? I only want to isolate the year (e.g., 2019, 2020, 2021)

Ex: USP_03182019_H13

A tidyr-friendly answer would be ideal.

date <- c("USP_03182019_H13","DED_03212019_H1","EL_03202019_H8","EL_10082020_H6","DSP_05122021_H5")

#              date
#1 USP_03182019_H13
#2  DED_03212019_H1
#3   EL_03202019_H8
#4   EL_10082020_H6
#5  DSP_05122021_H5

CodePudding user response:

library(lubridate)
year(mdy(parse_number(date)))
[1] 2019 2019 2019 2020 2021

or

sub('.*(\\d{4})_.*', '\\1', date)
[1] "2019" "2019" "2019" "2020" "2021"

stringr::str_extract(date, '\\d{4}(?=_)')
[1] "2019" "2019" "2019" "2020" "2021"

CodePudding user response:

I'm sure there's a regex-based way but this will do it ...

library(magrittr)
date %>% readr::parse_number %>% substr(., nchar(.)-3, nchar(.))

CodePudding user response:

Alternative to Bens solution:

library(stringi)
stri_sub(date, stri_locate_last_regex(date, "\\d{4}"))

Output:

[1] "2019" "2019" "2019" "2020" "2021"

CodePudding user response:

A gsub solution

gsub(".*_[[:digit:]]{4}|_.*","",date)
[1] "2019" "2019" "2019" "2020" "2021"

CodePudding user response:

Another way would be to split the character vector, select the second element and then extract the year.

substr(sapply(strsplit(date, split = '_'), "[[", 2), 5, 9)
#"2019" "2019" "2019" "2020" "2021"

CodePudding user response:

Using stringr and dplyr since you asked for a tidyr solution. Not as neat as a one liner, but hopefully simple for a non-regex expert (like me) to follow.

get_date = function(x) {
    numbers = str_split(x, "_", simplify=T)[,2]
    unlist(str_extract_all(numbers, ".{4}$"))
}

dat %>%
    mutate(date = get_date(date))
  date
1 2019
2 2019
3 2019
4 2020
5 2021
  •  Tags:  
  • Related