I have the following database. My intention is to subtract the scores of each name according to the category and Index. That is b - a of every name according the index.
My database
dat <- read.table(text="
Name Score Index Category
Pepe 5 Al a
Pepe 1 Bl b
Juan 8 Cl a
Juan 5 Al b
Mikel 1 Cl a
Mikel 6 Bl b
", header=TRUE)
I want obtain this:
Can you help me, please?
CodePudding user response:
Here are two ways, base R and package dplyr.
Base R
Use aggregate to compute the differences, then merge with the original data set to have the other columns in the result and subset by Category.
To keep the names order, the 2nd solution uses order to first get an index i.
df1 <- data.frame(
Name = c("Pepe", "Pepe", "Juan", "Juan", "Mikel", "Mikel"),
Score = c(5,1,8,5,1,6),
Index = rep(c("AI", "BI", "CI"), 2),
Category = rep(c("a", "b"), 3)
)
merge(
aggregate(Score ~ Name, df1, diff),
subset(df1, select = -Score),
by = "Name"
) |> subset(Category == "b")
#> Name Score Index Category
#> 2 Juan -3 AI b
#> 4 Mikel 5 CI b
#> 6 Pepe -4 BI b
i <- order(unique(df1$Name))
merge(
aggregate(Score ~ Name, df1, diff),
subset(df1, select = -Score),
by = "Name"
) |> subset(Category == "b") |>
(\(x) x[order(i), ])()
#> Name Score Index Category
#> 6 Pepe -4 BI b
#> 2 Juan -3 AI b
#> 4 Mikel 5 CI b
Created on 2022-03-10 by the reprex package (v2.0.1)
Package dplyr
The same algorithm as above but with dplyr functions.
suppressPackageStartupMessages(library(dplyr))
left_join(
df1 %>% group_by(Name) %>% summarise(Score = diff(Score)),
df1 %>% select(-Score),
by = "Name"
) %>%
filter(Category == "b")
#> # A tibble: 3 x 4
#> Name Score Index Category
#> <chr> <dbl> <chr> <chr>
#> 1 Juan -3 AI b
#> 2 Mikel 5 CI b
#> 3 Pepe -4 BI b
Created on 2022-03-10 by the reprex package (v2.0.1)

