Suppose I have the following table,
| Name | City | Value |
|---|---|---|
| Tom | NY | 1 |
| Tom | NY | 2 |
| Tom | NY | 4 |
| James | NY | 1 |
| James | NY | 2 |
| Tony | DC | 1 |
| Tony | DC | 2 |
I want to find the total value city-wise, however for each Name only their maximum value must be taken. So for this table I should get,
| City | Value |
|---|---|
| NY | 6 |
| DC | 2 |
I have tried using various methods using dyplr but none seem to work
CodePudding user response:
Here's another approach -
library(dplyr)
df %>%
arrange(desc(Value)) %>%
distinct(Name, City, .keep_all = TRUE) %>%
group_by(City) %>%
summarise(Value = sum(Value))
# City Value
# <chr> <int>
#1 DC 2
#2 NY 6
CodePudding user response:
Lets assume your dataset is called df1:
library(dplyr)
df1 %>% group_by(City, Name) %>% slice_max(Value, with_ties = FALSE) %>% ungroup() %>% select(City, Value) %>% group_by(City) %>% summarise(Value = sum(Value))
CodePudding user response:
Here is a data.table approach in one line:
library(data.table)
setDT(df)[df[, .I[which.max(Value)], by=c("Name", "City")]$V1][, sum(Value), by=City]
Output
City V1
1: NY 6
2: DC 2
base R
aggregate(Value ~ City, merge(aggregate(Value ~ Name City, data=df, max), df, all.x=T), sum)
City Value
1 DC 2
2 NY 6
Data
df <- structure(list(
Name = c("Tom", "Tom", "Tom", "James", "James",
"Tony", "Tony"),
City = c("NY", "NY", "NY", "NY", "NY", "DC",
"DC"),
Value = c(1L, 2L, 4L, 1L, 2L, 1L, 2L)
),
class = "data.frame",
row.names = c(NA, -7L))
