My database
| ID | Time | Score |
|:---- |:------:| -----:|
| 1 | 1 | 105 |
| 1 | 2 | 155 |
| 1 | 3 | 100 |
| 2 | 1 | 105 |
| 2 | 2 | 150 |
| 2 | 3 | Na |
How do I calculate the difference between last AVAIABLE score (Highest Time) and first score (Lowest Time)? My code so Far
DB = DB %>%
group_by(ID) %>%
dplyr::mutate (Diff_score = Score[max(Time, na.rm= T) - Score [min(Time, na.rm=T)]) %>%
ungroup()
CodePudding user response:
We can modify the code to get the index of 'Time' with which.max and which.min and use it to the subset the 'Score' to find the difference (inorder to remove the NA cases, create an index 'i1' based on the NA values)
library(dplyr)
DB %>%
na_if("Na") %>%
type.convert(as.is = TRUE) %>%
group_by(ID) %>%
mutate(i1 = complete.cases(Score),
Diff_score = Score[i1][which.max(Time[i1])] -
Score[i1][which.min(Time[i1])], i1 = NULL) %>%
ungroup
-output
# A tibble: 6 × 4
ID Time Score Diff_score
<int> <int> <int> <int>
1 1 1 105 -5
2 1 2 155 -5
3 1 3 100 -5
4 2 1 105 45
5 2 2 150 45
6 2 3 NA 45
Or use the range after converting 'Time' to NA where the 'Score' values are NA and make use of na.rm = TRUE in range for subsetting
DB %>%
na_if("Na") %>%
type.convert(as.is = TRUE) %>%
group_by(ID) %>%
mutate(Diff_score = diff(Score[range(NA^is.na(Score) *Time,
na.rm = TRUE)])) %>%
ungroup
CodePudding user response:
We could also use first and last (with ordering by Time) within the groups:
library(dplyr)
DB |>
group_by(ID) |>
mutate(diff = last(Score[!is.na(Score)], order_by = Time) - first(Score[!is.na(Score)], order_by = Time)) |>
ungroup()
Output:
# A tibble: 6 × 4
ID Time Score diff
<dbl> <dbl> <dbl> <dbl>
1 1 1 105 -5
2 1 2 155 -5
3 1 3 100 -5
4 2 1 105 45
5 2 2 150 45
6 2 3 NA 45
Data:
library(readr)
DB <- read_delim("ID | Time | Score
1 | 1 | 105
1 | 2 | 155
1 | 3 | 100
2 | 1 | 105
2 | 2 | 150
2 | 3 | NA ", delim = "|", trim_ws = TRUE)
