From this dataframe I want to subset rows which have the identical "model", "Country" and "year" values but different "factor"
model <- c("A","B","C","A","A","C","B","A")
country <- c("Italy","Germany","Norway","Italy","France","Germany","Norway","Italy")
year <- c(2016,2016,2016,2016,2015,2015,2015,2015)
value <- c(14,24,11,34,73,11,33,22)
factor <- c("NEW","OLD","OLD","OLD","OLD","OLD","NEW","NEW")
df <- data.frame(model,country,year,value, factor)
model country year value factor
1 A Italy 2016 14 NEW
2 B Germany 2016 24 OLD
3 C Norway 2016 11 OLD
4 A Italy 2016 34 OLD
5 A France 2015 73 OLD
6 C Germany 2015 11 OLD
7 B Norway 2015 33 NEW
8 A Italy 2015 22 NEW
Expected output:
1 A Italy 2016 22 NEW
4 A Italy 2016 34 OLD
CodePudding user response:
What you could do is create a column which tells the number of duplicated based on the groups (model, country and year) and then filter like this:
model <- c("A","B","C","A","A","C","B","A")
country <- c("Italy","Germany","Norway","Italy","France","Germany","Norway","Italy")
year <- c(2016,2016,2016,2016,2015,2015,2015,2015)
value <- c(14,24,11,34,73,11,33,22)
factor <- c("NEW","OLD","OLD","OLD","OLD","OLD","NEW","NEW")
df <- data.frame(model,country,year,value, factor)
library(dplyr)
df %>%
group_by(model, country, year) %>%
mutate(number_dups = n()) %>%
filter(number_dups > 1) %>%
select(-number_dups) %>%
ungroup()
#> # A tibble: 2 × 5
#> model country year value factor
#> <chr> <chr> <dbl> <dbl> <chr>
#> 1 A Italy 2016 14 NEW
#> 2 A Italy 2016 34 OLD
Created on 2022-08-12 by the reprex package (v2.0.1)
Please note: I assume that the expected output should be 14 instead of 22.
CodePudding user response:
You can group by all columns, except for factor, then keep only the rows where there is more than 1 observation that is the same. Then, to ensure we only have those with NEW and OLD, we can add factor to the group, and keep only the first row of each.
library(dplyr)
df %>%
group_by(across(c(model,country,year))) %>%
filter(n() > 1) %>%
group_by(factor, .add = T) %>%
slice(1)
Output
model country year value factor
<chr> <chr> <dbl> <dbl> <chr>
1 A Italy 2016 34 NEW
2 A Italy 2016 34 OLD
