I have temporal data on different bears ('ID'), at different positions ('Position'; land or ice). Here is a simplified version with two individuals (A and B):
ID <- rep.int(c("A", "B"), times = c(10, 10))
Dates <- c(seq(as.Date("2011-06-11"), as.Date("2011-06-20"), by = "days"),
seq(as.Date("2011-05-27"), as.Date("2011-06-05"), by="days"))
Position <- c("Land", "Ice", "Land", "Land", "Ice", "Ice", "Land", "Land", "Land", "Land",
"Land", "Land", "Land", "Ice", "Ice", "Land", "Land", "Land", "Ice", "Ice")
data <- data.frame(ID, Dates, Position)
ID Dates Position
1 A 2011-06-11 Land
2 A 2011-06-12 Ice
3 A 2011-06-13 Land
4 A 2011-06-14 Land
5 A 2011-06-15 Ice
6 A 2011-06-16 Ice
7 A 2011-06-17 Land
8 A 2011-06-18 Land
9 A 2011-06-19 Land
10 A 2011-06-20 Land
11 B 2011-05-27 Land
12 B 2011-05-28 Land
13 B 2011-05-29 Land
14 B 2011-05-30 Ice
15 B 2011-05-31 Ice
16 B 2011-06-01 Land
17 B 2011-06-02 Land
18 B 2011-06-03 Land
19 B 2011-06-04 Ice
20 B 2011-06-05 Ice
I want to create a variable Arrival, which indicates on-land arrival date for each bear. I defined on-land arrival as the date of the first row in runs of three consecutive Position on "Land". This row should be set to "Arrival", and the other rows to NA. This date must also occur after May 31st.
For this dataset, the arrival dates would look like this:
ID Dates Position Arrival
1 A 2011-06-11 Land NA
2 A 2011-06-12 Ice NA
3 A 2011-06-13 Land NA
4 A 2011-06-14 Land NA
5 A 2011-06-15 Ice NA
6 A 2011-06-16 Ice NA
7 A 2011-06-17 Land Arrival
8 A 2011-06-18 Land NA
9 A 2011-06-19 Land NA
10 A 2011-06-20 Land NA
11 B 2011-05-27 Land NA
12 B 2011-05-28 Land NA
13 B 2011-05-29 Land NA
14 B 2011-05-30 Ice NA
15 B 2011-05-31 Ice NA
16 B 2011-06-01 Land Arrival
17 B 2011-06-02 Land NA
18 B 2011-06-03 Land NA
19 B 2011-06-04 Ice NA
20 B 2011-06-05 Ice NA
Is there a way for me to do this in R, preferably using dplyr?
CodePudding user response:
We can use zoo::rollapply for this.
dplyr
library(dplyr)
data %>%
group_by(ID) %>%
mutate(
Arrival = Dates > "2011-05-31" &
lag(Position != "Land", default = FALSE) &
zoo::rollapply(Position == "Land", 3, align = "left", FUN = all, partial = TRUE)
) %>%
ungroup()
# # A tibble: 20 × 4
# ID Dates Position Arrival
# <chr> <date> <chr> <lgl>
# 1 A 2011-06-11 Land FALSE
# 2 A 2011-06-12 Ice FALSE
# 3 A 2011-06-13 Land FALSE
# 4 A 2011-06-14 Land FALSE
# 5 A 2011-06-15 Ice FALSE
# 6 A 2011-06-16 Ice FALSE
# 7 A 2011-06-17 Land TRUE
# 8 A 2011-06-18 Land FALSE
# 9 A 2011-06-19 Land FALSE
# 10 A 2011-06-20 Land FALSE
# 11 B 2011-05-27 Land FALSE
# 12 B 2011-05-28 Land FALSE
# 13 B 2011-05-29 Land FALSE
# 14 B 2011-05-30 Ice FALSE
# 15 B 2011-05-31 Ice FALSE
# 16 B 2011-06-01 Land TRUE
# 17 B 2011-06-02 Land FALSE
# 18 B 2011-06-03 Land FALSE
# 19 B 2011-06-04 Ice FALSE
# 20 B 2011-06-05 Ice FALSE
base R
data$prevnotland <- ave(
data$Position != "Land", data$ID,
FUN = function(z) c(FALSE, z[-length(z)]))
data$Arrival <- data$prevnotland & ave(
data$Dates > "2011-05-31" & data$Position == "Land", data$ID,
FUN = function(z) zoo::rollapply(z, 3, FUN=all, align="left", partial=TRUE))
data
# ID Dates Position prevnotland Arrival
# 1 A 2011-06-11 Land FALSE FALSE
# 2 A 2011-06-12 Ice FALSE FALSE
# 3 A 2011-06-13 Land TRUE FALSE
# 4 A 2011-06-14 Land FALSE FALSE
# 5 A 2011-06-15 Ice FALSE FALSE
# 6 A 2011-06-16 Ice TRUE FALSE
# 7 A 2011-06-17 Land TRUE TRUE
# 8 A 2011-06-18 Land FALSE FALSE
# 9 A 2011-06-19 Land FALSE FALSE
# 10 A 2011-06-20 Land FALSE FALSE
# 11 B 2011-05-27 Land FALSE FALSE
# 12 B 2011-05-28 Land FALSE FALSE
# 13 B 2011-05-29 Land FALSE FALSE
# 14 B 2011-05-30 Ice FALSE FALSE
# 15 B 2011-05-31 Ice TRUE FALSE
# 16 B 2011-06-01 Land TRUE TRUE
# 17 B 2011-06-02 Land FALSE FALSE
# 18 B 2011-06-03 Land FALSE FALSE
# 19 B 2011-06-04 Ice FALSE FALSE
# 20 B 2011-06-05 Ice TRUE FALSE
CodePudding user response:
library(dplyr)
left_join(data,
data %>%
arrange(ID, Dates) %>% # if not in OP order already
group_by(ID, loc_grp = cumsum(Position != lag(Position, 1, ""))) %>%
filter(Dates >= as.Date("2011-05-31"), Position == "Land",
n() >= 3, row_number() == 1) %>%
ungroup() %>%
transmute(ID, Dates, Position, Arrival = "Arrival"))
Result
Joining with `by = join_by(ID, Dates, Position)`
ID Dates Position Arrival
1 A 2011-06-11 Land <NA>
2 A 2011-06-12 Ice <NA>
3 A 2011-06-13 Land <NA>
4 A 2011-06-14 Land <NA>
5 A 2011-06-15 Ice <NA>
6 A 2011-06-16 Ice <NA>
7 A 2011-06-17 Land Arrival
8 A 2011-06-18 Land <NA>
9 A 2011-06-19 Land <NA>
10 A 2011-06-20 Land <NA>
11 B 2011-05-27 Land <NA>
12 B 2011-05-28 Land <NA>
13 B 2011-05-29 Land <NA>
14 B 2011-05-30 Ice <NA>
15 B 2011-05-31 Ice <NA>
16 B 2011-06-01 Land Arrival
17 B 2011-06-02 Land <NA>
18 B 2011-06-03 Land <NA>
19 B 2011-06-04 Ice <NA>
20 B 2011-06-05 Ice <NA>
CodePudding user response:
Not as succinct as other solutions, but step-by-step with some temporary variables.
library(tidyverse)
ddf <- data |>
arrange(ID, Dates) |>
group_by(ID) |>
mutate(n = lead(Position, n = 1)) |>
mutate(nn = lead(Position, n = 2)) |>
filter(Position == n & Position == nn & Dates > "2011-05-30") |>
slice_head(n = 1) |>
select(-(n:nn)) |>
mutate(Arrival = "Arrival")
ddf |> right_join(data) |> arrange(ID, Dates)
#> Joining, by = c("ID", "Dates", "Position")
#> # A tibble: 20 × 4
#> # Groups: ID [2]
#> ID Dates Position Arrival
#> <chr> <date> <chr> <chr>
#> 1 A 2011-06-11 Land <NA>
#> 2 A 2011-06-12 Ice <NA>
#> 3 A 2011-06-13 Land <NA>
#> 4 A 2011-06-14 Land <NA>
#> 5 A 2011-06-15 Ice <NA>
#> 6 A 2011-06-16 Ice <NA>
#> 7 A 2011-06-17 Land Arrival
#> 8 A 2011-06-18 Land <NA>
#> 9 A 2011-06-19 Land <NA>
#> 10 A 2011-06-20 Land <NA>
#> 11 B 2011-05-27 Land <NA>
#> 12 B 2011-05-28 Land <NA>
#> 13 B 2011-05-29 Land <NA>
#> 14 B 2011-05-30 Ice <NA>
#> 15 B 2011-05-31 Ice <NA>
#> 16 B 2011-06-01 Land Arrival
#> 17 B 2011-06-02 Land <NA>
#> 18 B 2011-06-03 Land <NA>
#> 19 B 2011-06-04 Ice <NA>
#> 20 B 2011-06-05 Ice <NA>
CodePudding user response:
I hope that your preferably using dplyr means that you are still open for other possibilities :) If so, here's a data.table alternative.
library(data.table)
setDT(data)
data[Dates > "2011-05-31",
Arrival := if(.N > 2 & Position[1] == "Land") c("Arrival", rep(NA, .N - 1)),
by = .(ID, rleid(Position))]
ID Dates Position Arrival
1: A 2011-06-11 Land <NA>
2: A 2011-06-12 Ice <NA>
3: A 2011-06-13 Land <NA>
4: A 2011-06-14 Land <NA>
5: A 2011-06-15 Ice <NA>
6: A 2011-06-16 Ice <NA>
7: A 2011-06-17 Land Arrival
8: A 2011-06-18 Land <NA>
9: A 2011-06-19 Land <NA>
10: A 2011-06-20 Land <NA>
11: B 2011-05-27 Land <NA>
12: B 2011-05-28 Land <NA>
13: B 2011-05-29 Land <NA>
14: B 2011-05-30 Ice <NA>
15: B 2011-05-31 Ice <NA>
16: B 2011-06-01 Land Arrival
17: B 2011-06-02 Land <NA>
18: B 2011-06-03 Land <NA>
19: B 2011-06-04 Ice <NA>
20: B 2011-06-05 Ice <NA>
Explanation:
Select relevant rows (Dates > "2011-05-31"). Create groups by 'ID' and consecutive runs of 'Position' (by = .(ID, rleid(Position))). Within each group, if number of rows are more than 2 (.N > 2) &values in the run of Positions are "Land" (Position[1] == "Land"), create the result where the first value is "Arrival" and the rest (.N-1) are NA. Add the new column by reference (:=).
CodePudding user response:
This dplyr approach uses a relative (non-hardcoded) year for the date condition. Needs library(data.table) for rleid. Can be replaced but is very handy.
library(dplyr)
data %>%
group_by(ID) %>%
mutate(grp = data.table::rleid(Position)) %>%
group_by(ID, grp) %>%
mutate(Arrival = if_else(n() >= 3 & Position == "Land" & row_number() == 1 &
Dates > paste0(format(Dates, "%Y"), "-05-31"),
"Arrival", NA_character_)) %>%
ungroup() %>%
select(-grp)
# A tibble: 20 × 4
ID Dates Position Arrival
<chr> <date> <chr> <chr>
1 A 2011-06-11 Land NA
2 A 2011-06-12 Ice NA
3 A 2011-06-13 Land NA
4 A 2011-06-14 Land NA
5 A 2011-06-15 Ice NA
6 A 2011-06-16 Ice NA
7 A 2011-06-17 Land Arrival
8 A 2011-06-18 Land NA
9 A 2011-06-19 Land NA
10 A 2011-06-20 Land NA
11 B 2011-05-27 Land NA
12 B 2011-05-28 Land NA
13 B 2011-05-29 Land NA
14 B 2011-05-30 Ice NA
15 B 2011-05-31 Ice NA
16 B 2011-06-01 Land Arrival
17 B 2011-06-02 Land NA
18 B 2011-06-03 Land NA
19 B 2011-06-04 Ice NA
20 B 2011-06-05 Ice NA
