I am using R..
I have a dataframe about reservations in a transportation company:
| Van | Route | Departure | Price | Customer ID |
|---|---|---|---|---|
| U21 | LA - SF | 8:00:00 | 30.00 | 467866578 |
| U21 | LA - SF | 8:00:00 | 30.00 | 234656433 |
| U21 | LA - SF | 8:00:00 | 30.00 | 654343554 |
| U21 | LA - SF | 8:00:00 | 30.00 | 466534444 |
| U21 | LA - SF | 8:00:00 | 30.00 | 354543433 |
| U22 | LA - SD | 6:00:00 | 20.00 | 345464533 |
| U22 | LA - SD | 6:00:00 | 20.00 | 345456777 |
| U22 | LA - SD | 6:00:00 | 20.00 | 344565411 |
| U22 | LA - SD | 6:00:00 | 20.00 | 119873566 |
I would like to make a new data frame that shows this:
| Van | Route | Departure | Price | Tickets Sold | Revenue |
|---|---|---|---|---|---|
| U21 | LA - SF | 8:00:00 | 30.00 | 5 | 150.00 |
| U22 | LA - SD | 6:00:00 | 20.00 | 4 | 80.00 |
Thanks in advance!!!!! Please help :)
CodePudding user response:
or using dplyr
library(dplyr)
df_new<-df %>%
group_by(Van, Route, Departure, Price) %>%
summarize(`Tickets Sold`=n(),
Revenue=sum(Price)) %>%
ungroup()
df_new
#> # A tibble: 2 × 6
#> Van Route Departure Price `Tickets Sold` Revenue
#> <chr> <chr> <chr> <int> <int> <int>
#> 1 U21 LA - SF 8:00:00 30 5 150
#> 2 U22 LA - SD 6:00:00 20 4 80
Created on 2022-02-02 by the reprex package (v2.0.1)
CodePudding user response:
# load package
library(data.table)
# set dataframe as datatable
setDT(df)
# calculate
df[, .(tickets_sold = .N
, revenue = sum(Price)
)
, .(Van, Route, Departure, Price)
]
CodePudding user response:
You can dplyr::group_by() the common variables and then use dplyr::summarize() to calculate the number of entries per group with n() and the total Revenue with sum().
library(tidyverse)
d <- structure(list(Van = c("U21", "U21", "U21", "U21", "U21", "U22", "U22", "U22", "U22"), Route = c("LA - SF", "LA - SF", "LA - SF", "LA - SF", "LA - SF", "LA - SD", "LA - SD", "LA - SD", "LA - SD"), Departure = c("8:00:00", "8:00:00", "8:00:00", "8:00:00", "8:00:00", "6:00:00", "6:00:00", "6:00:00", "6:00:00"), Price = c(30, 30, 30, 30, 30, 20, 20, 20, 20), Customer.ID = c(467866578L, 234656433L, 654343554L, 466534444L, 354543433L, 345464533L, 345456777L, 344565411L, 119873566L)), class = "data.frame", row.names = c(NA, -9L))
d %>%
group_by(across(Van:Departure)) %>%
summarize(Tickets_Sold = n(), Revenue = sum(Price), .groups = "drop")
#> # A tibble: 2 x 5
#> Van Route Departure Tickets_Sold Revenue
#> <chr> <chr> <chr> <int> <dbl>
#> 1 U21 LA - SF 8:00:00 5 150
#> 2 U22 LA - SD 6:00:00 4 80
Created on 2022-02-02 by the reprex package (v2.0.1)
