I have a data set that looks like the following:
| Member ID # | Location | Donation Amount | Donation Month |
|---|---|---|---|
| 1 | A | 5 Toys | Jan |
| 1 | B | 2 Toys | Jan |
| 2 | A | 6 Toys | Mar |
| 3 | B | 4 Toys | Feb |
| 3 | B | 9 Toys | June |
I want to create a column that will sum the number of locations that a member donated toys per period. An example output would be as follows:
| Member ID # | Location | Donation Amount | Donation Month | Count |
|---|---|---|---|---|
| 1 | A | 5 Toys | Jan | 2 |
| 1 | B | 2 Toys | Jan | 2 |
| 2 | A | 6 Toys | Mar | 1 |
| 3 | B | 4 Toys | Feb | 1 |
| 3 | B | 9 Toys | June | 1 |
This output lets me know that Member 1 was the only one to donate toys to two different centers in the same month.
I've tried the following code to accomplish this output:
output2 <- output1 %>%
group_by(`Member ID #`, `Location`, `Donation Amount`, `Donation Month`,
.groups='Keep') %>%
mutate(count = n())
But this just counts every single entry and not entries per the conditions I listed above. How can I create a conditional count to accomplish my goal?
CodePudding user response:
try to do this
df %>%
group_by(`Member ID #`) %>%
summarise(Count = n_distinct(Location))
CodePudding user response:
We may use split() to split our data by ID and Duration_month. Then determine the unique amount of locations per split.
out <- lapply(split(df, f = ~ df$ID df$Donation_month), \(x) {
cbind(x, no_of_locations = rep(length(unique(x$Location)), times = nrow(x)))
})
df_new <- do.call(rbind, out)
rownames(df_new) <- NULL
Output
> df_new
ID Location Donation_amount Donation_month no_of_locations
1 3 B 4 Toys Feb 1
2 1 A 5 Toys Jan 2
3 1 B 2 Toys Jan 2
4 3 B 9 Toys June 1
5 2 A 6 Toys Mar 1
Data
df <- data.frame(ID = c(1,1,2,3,3), Location = c('A','B','A','B','B'),
Donation_amount = paste0(c(5,2,6,4,9),' Toys'),
Donation_month = c('Jan','Jan','Mar','Feb','June'))
