Is there a way to merge rows?-CodePudding

I am using R..

I have a dataframe about reservations in a transportation company:

Van	Route	Departure	Price	Customer ID
U21	LA - SF	8:00:00	30.00	467866578
U21	LA - SF	8:00:00	30.00	234656433
U21	LA - SF	8:00:00	30.00	654343554
U21	LA - SF	8:00:00	30.00	466534444
U21	LA - SF	8:00:00	30.00	354543433
U22	LA - SD	6:00:00	20.00	345464533
U22	LA - SD	6:00:00	20.00	345456777
U22	LA - SD	6:00:00	20.00	344565411
U22	LA - SD	6:00:00	20.00	119873566

I would like to make a new data frame that shows this:

Van	Route	Departure	Price	Tickets Sold	Revenue
U21	LA - SF	8:00:00	30.00	5	150.00
U22	LA - SD	6:00:00	20.00	4	80.00

Thanks in advance!!!!! Please help :)

CodePudding user response：

or using dplyr

library(dplyr)

df_new<-df %>%
  group_by(Van, Route, Departure, Price)  %>%
  summarize(`Tickets Sold`=n(),
            Revenue=sum(Price)) %>%
  ungroup()

df_new
#> # A tibble: 2 × 6
#>   Van   Route   Departure Price `Tickets Sold` Revenue
#>   <chr> <chr>   <chr>     <int>          <int>   <int>
#> 1 U21   LA - SF 8:00:00      30              5     150
#> 2 U22   LA - SD 6:00:00      20              4      80

^{Created on 2022-02-02 by the reprex package (v2.0.1)}

CodePudding user response：

# load package
library(data.table)

# set dataframe as datatable
setDT(df)

# calculate
df[, .(tickets_sold = .N
       , revenue = sum(Price)
       )
   , .(Van, Route, Departure, Price)
   ]

CodePudding user response：

You can dplyr::group_by() the common variables and then use dplyr::summarize() to calculate the number of entries per group with n() and the total Revenue with sum().

library(tidyverse)

d <- structure(list(Van = c("U21", "U21", "U21", "U21", "U21", "U22", "U22", "U22", "U22"), Route = c("LA - SF", "LA - SF", "LA - SF", "LA - SF", "LA - SF", "LA - SD", "LA - SD", "LA - SD", "LA - SD"), Departure = c("8:00:00", "8:00:00", "8:00:00", "8:00:00", "8:00:00", "6:00:00", "6:00:00", "6:00:00", "6:00:00"), Price = c(30, 30, 30, 30, 30, 20, 20, 20, 20), Customer.ID = c(467866578L, 234656433L, 654343554L, 466534444L, 354543433L, 345464533L, 345456777L, 344565411L, 119873566L)), class = "data.frame", row.names = c(NA, -9L))

d %>% 
  group_by(across(Van:Departure)) %>% 
  summarize(Tickets_Sold = n(), Revenue = sum(Price), .groups = "drop")
#> # A tibble: 2 x 5
#>   Van   Route   Departure Tickets_Sold Revenue
#>   <chr> <chr>   <chr>            <int>   <dbl>
#> 1 U21   LA - SF 8:00:00              5     150
#> 2 U22   LA - SD 6:00:00              4      80

^{Created on 2022-02-02 by the reprex package (v2.0.1)}