I have the following database. My intention is to subtract the scores of each name according to the category and Index. That is b - a of every name according the index.

My database

dat <- read.table(text="
Name    Score   Index    Category
Pepe        5      Al           a
Pepe        1      Bl           b
Juan        8      Cl           a
Juan        5      Al           b
Mikel       1      Cl           a
Mikel       6      Bl           b
", header=TRUE)

I want obtain this:

Can you help me, please?

CodePudding user response：

Here are two ways, base R and package dplyr.

Base R

Use aggregate to compute the differences, then merge with the original data set to have the other columns in the result and subset by Category.
To keep the names order, the 2nd solution uses order to first get an index i.

df1 <- data.frame(
  Name = c("Pepe", "Pepe", "Juan", "Juan", "Mikel", "Mikel"),
  Score = c(5,1,8,5,1,6),
  Index = rep(c("AI", "BI", "CI"), 2),
  Category = rep(c("a", "b"), 3)
)

merge(
  aggregate(Score ~ Name, df1, diff),
  subset(df1, select = -Score),
  by = "Name"
) |> subset(Category == "b")
#>    Name Score Index Category
#> 2  Juan    -3    AI        b
#> 4 Mikel     5    CI        b
#> 6  Pepe    -4    BI        b


i <- order(unique(df1$Name))
merge(
  aggregate(Score ~ Name, df1, diff),
  subset(df1, select = -Score),
  by = "Name"
) |> subset(Category == "b") |>
  (\(x) x[order(i), ])()
#>    Name Score Index Category
#> 6  Pepe    -4    BI        b
#> 2  Juan    -3    AI        b
#> 4 Mikel     5    CI        b

^{Created on 2022-03-10 by the reprex package (v2.0.1)}

Package `dplyr`

The same algorithm as above but with dplyr functions.

suppressPackageStartupMessages(library(dplyr))

left_join(
  df1 %>% group_by(Name) %>% summarise(Score = diff(Score)),
  df1 %>% select(-Score),
  by = "Name"
) %>%
  filter(Category == "b")
#> # A tibble: 3 x 4
#>   Name  Score Index Category
#>   <chr> <dbl> <chr> <chr>   
#> 1 Juan     -3 AI    b       
#> 2 Mikel     5 CI    b       
#> 3 Pepe     -4 BI    b

^{Created on 2022-03-10 by the reprex package (v2.0.1)}

Base R

Package dplyr

Package `dplyr`