Home > Net >  Parse "january 2020" string to date format returns half of NA's in R
Parse "january 2020" string to date format returns half of NA's in R

Time:01-28

I have a data frame of 1968 observations and am trying to parse the date column, where I have a string format into date format. Something like this:

df$date <- c("january 2020","january 2020","january 2020","february 2020","february 2020","february 2020","march 2020","march 2020","march 2020","april 2020","april 2020","april 2020","May 2020","May 2020","May 2020","june 2020","june 2020","june 2020")

I am using lubridate package:

date <- my(df$date)

Which bring's a "857 failed to parse" warning and returns a vactor like this:

[1] NA NA NA NA NA NA 2020-03-01 2020-03-01 2020-03-01 NA NA NA NA NA NA 2020-06-01 2020-06-01 2020-06-01 2020-06-01

although I want the date in this format, ymd, I would like to have all observations parsed. I have also tried:

date <- as.Date(df$date)
date <- my(df$date, format = "%B %Y)

but these returns all observations as NA's. What is happening?

thank you

CodePudding user response:

as.Date(paste(1, df$date), '%d %B %Y')

CodePudding user response:

my from lubridate package should work like this:

library(dplyr)
library(lubridate)

df %>% 
  mutate(my_date = my(date))
1  2020-01-01
2  2020-01-01
3  2020-01-01
4  2020-02-01
5  2020-02-01
6  2020-02-01
7  2020-03-01
8  2020-03-01
9  2020-03-01
10 2020-04-01
11 2020-04-01
12 2020-04-01
13 2020-05-01
14 2020-05-01
15 2020-05-01
16 2020-06-01
17 2020-06-01
18 2020-06-01

OR: We could use parse_date_time from lubridate:

format(lubridate::parse_date_time(df$my_date, orders = c("m/Y")), "%m-%Y")
 [1] "01-2020" "01-2020" "01-2020" "02-2020" "02-2020" "02-2020" "03-2020"
 [8] "03-2020" "03-2020" "04-2020" "04-2020" "04-2020" "05-2020" "05-2020"
[15] "05-2020" "06-2020" "06-2020" "06-2020"

data:

df <- structure(list(my_date = c("january 2020", "january 2020", "january 2020", 
"february 2020", "february 2020", "february 2020", "march 2020", 
"march 2020", "march 2020", "april 2020", "april 2020", "april 2020", 
"May 2020", "May 2020", "May 2020", "june 2020", "june 2020", 
"june 2020")), class = "data.frame", row.names = c(NA, -18L))
  •  Tags:  
  • Related