Suppose I have a data frame as follows:
| date | price | company |
|---|---|---|
| 2000-10-01 | 18 | A |
| 2001-10-01 | 20 | A |
| 2001-10-01 | 20 | A |
| 2001-10-01 | 20 | A |
I want to create a new variable lagged_price as follows:
| date | price | company | lagged_price |
|---|---|---|---|
| 2000-10-01 | 18 | A | NA |
| 2001-10-01 | 20 | A | 18 |
| 2001-10-01 | 20 | A | 18 |
| 2001-10-01 | 20 | A | 18 |
The new variable, lagged_price, takes the lagged value of price for group company. That is, lagged_price captures the price for the company on a previous date. Using group_by is problematic since it captures the value in the preceding row of the group company. Instead, I want to capture the lagged price on the previous date for that company. I also do not want to perform distinct() on the original dataset. Although that does the job in this example, I still want to keep other rows.
my failed solution:
out <- data %>%
group_by(company) %>%
mutate(lagged_price = lag(price))
Any help is appreciated.
CodePudding user response:
Lagging before grouping gives
df %>%
mutate(lagged_price = lag(price)) %>%
group_by(date) %>%
mutate(lagged_price = lagged_price[1]) %>%
ungroup()
# A tibble: 4 × 4
date price company lagged_price
<chr> <int> <chr> <int>
1 2000-10-01 18 A NA
2 2001-10-01 20 A 18
3 2001-10-01 20 A 18
4 2001-10-01 20 A 18
