Im new to R and am having an issue converting a character format to a date/time format. Trying to get the started_at and ended_at columns mutated from character to date/time, but no matter what I have tried, I get the error nas introduced by coercion or character string is not in a standard unambiguous format. The intent is to create a new column ride_length as the difference between the ended_at and started_at values in minutes.
I have my df named sep_2021.
str(sep_2021)
spec_tbl_df [804,352 × 14] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ ride_id : chr [1:804352] "9DC7B962304CBFD8" "F930E2C6872D6B32" "6EF72137900BB910" "78D1DE133B3DBF55" ...
$ rideable_type : chr [1:804352] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
$ started_at : chr [1:804352] "9/28/21 16:07" "9/28/21 14:24" "9/28/21 00:20" "9/28/21 14:51" ...
$ ended_at : chr [1:804352] "9/28/21 16:09" "9/28/21 14:40" "9/28/21 00:23" "9/28/21 15:00" ...
$ day_of_week : num [1:804352] 3 3 3 3 3 3 3 3 2 3 ...
$ start_station_name: chr [1:804352] NA NA NA NA ...
$ start_station_id : chr [1:804352] NA NA NA NA ...
$ end_station_name : chr [1:804352] NA NA NA NA ...
$ end_station_id : chr [1:804352] NA NA NA NA ...
$ start_lat : num [1:804352] 41.9 41.9 41.8 41.8 41.9 ...
$ start_lng : num [1:804352] -87.7 -87.6 -87.7 -87.7 -87.7 ...
$ end_lat : num [1:804352] 41.9 42 41.8 41.8 41.9 ...
$ end_lng : num [1:804352] -87.7 -87.7 -87.7 -87.7 -87.7 ...
$ member_casual : chr [1:804352] "casual" "casual" "casual" "casual" ...
- attr(*, "spec")=
.. cols(
.. ride_id = col_character(),
.. rideable_type = col_character(),
.. started_at = col_character(),
.. ended_at = col_character(),
.. day_of_week = col_double(),
.. start_station_name = col_character(),
.. start_station_id = col_character(),
.. end_station_name = col_character(),
.. end_station_id = col_character(),
.. start_lat = col_double(),
.. start_lng = col_double(),
.. end_lat = col_double(),
.. end_lng = col_double(),
.. member_casual = col_character()
.. )
- attr(*, "problems")=<externalptr>
I have attempted the following --
sep_2021 <- mutate(sep_2021, started_at = as.Date(started_at)Result: character string is not in a standard unambiguous format
sep_2021 <- mutate(sep_2021, started_at = as.Date.POSIXct(started_at, tz = "", tryFormats = c("%Y-%m-%d %H:%M:%OS","%Y/%m/%d %H:%M:%OS")))Result: character string is not in a standard unambiguous format
sep_2021 <- mutate(sep_2021, started_at = lubridate::as_datetime(started_at))Result: All formats failed to parse. No formats found
sep_2021 <- mutate(sep_2021, started_at = as.Date(started_at, "%m-%d-%y %H:%M:%OS"))Result: NAs introduced by coercion
Any and all suggestions or advice is greatly appreciated!
CodePudding user response:
We may use parse_date from parsedate
library(dplyr)
library(parsedate)
sep_2021 <- sep_2021 %>%
mutate(across(c(started_at, ended_at), parse_date))
The format used and the format in the columns are different i.e. it should be %m/%d/%y %H:%M
sep_2021 <- sep_2021 %>%
mutate(across(c(started_at, ended_at), as.POSIXct,
format = "%m/%d/%y %H:%M"))
CodePudding user response:
You may use mdy_hm to change the class from character to POSIXct. To calculate difference use difftime and pass units to it.
For example, to get difference in seconds you can do -
library(dplyr)
library(lubridate)
sep_2021 <- sep_2021 %>%
mutate(across(c(started_at, ended_at), mdy_hm),
diff = difftime(ended_at, started_at, units = 'secs'))
sep_2021
# started_at ended_at diff
#1 2021-09-28 16:07:00 2021-09-28 16:09:00 120 secs
#2 2021-09-28 14:24:00 2021-09-28 14:40:00 960 secs
#3 2021-09-28 00:20:00 2021-09-28 00:23:00 180 secs
#4 2021-09-28 14:51:00 2021-09-28 15:00:00 540 secs
data
It is easier to help if you provide data in a reproducible format
sep_2021 <- data.frame(started_at = c("9/28/21 16:07", "9/28/21 14:24" ,"9/28/21 00:20" ,"9/28/21 14:51"),
ended_at = c("9/28/21 16:09" ,"9/28/21 14:40", "9/28/21 00:23", "9/28/21 15:00"))
