Home > Back-end >  Data frame and summarizing
Data frame and summarizing

Time:01-21

My dataset:

dt<-data.frame(GrossIncome=seq(0, 10000, by = 1000),
               Turnover= seq(0, 100000, by = 10000),
               Sellers= seq(0, 1, by = 0.1),
               Buyers=seq(0, 1, by = 0.1))

So I now I want to summarize this data and divide by 1000 GrossIncome and Turnover.

     OUTPUT<-data.frame( 
                   "GrossIncome"=round(sum(dt$GrossIncome)/1000,1),
                   "Turnover"=round(sum(dt$Turnover)/1000,1),
                   "GrossIncomeAndTurnover"=round(((sum(dt$Turnover) sum(dt$Turnover))/1000),1),
                   "Sellers"=round(sum(dt$Sellers),1),
                   "Buyers"=round(sum(dt$Buyers),1))


  Output                 
         GrossIncome Turnover GrossIncomeAndTurnover Sellers Buyers
1          55      550                   1100     5.5    5.5

So any suggestion for a more elegant solution then solution above ? I tried with the code below but this code only works for first two items (GrossIncome and Turnover) but not for rest of items.

  dt %>%
  dplyr::select(GrossIncome,Turnover)%>%
  dplyr:: summarise_all(sum,na.rm=TRUE)/1000

So can anybody help me how to solve this problem?

CodePudding user response:

We can use across() to apply different functions to different columns.

dt %>%
  summarize(
    across(c(GrossIncome, Turnover), ~ round(sum(.) / 1000, 1)),
    GrossIncomeAndTurnover = GrossIncome   Turnover,
    across(c(Sellers, Buyers), ~round(sum(.), 1))
  )
#   GrossIncome Turnover GrossIncomeAndTurnover Sellers Buyers
# 1          55      550                    605     5.5    5.5

Note that in both our codes, the GrossIncome and Turnover summaries are computed first and these newly created variables are used in the GrossIncomeAndTurnover calculation. My code accounts for this, simply adding them.

CodePudding user response:

Something like this?

round_fun <- \(DF) {
  out <- apply(DF, 2, sum)
  out <- ifelse(out > 1e3, out/1e3, out)
  out <- c(out, out['GrossIncome']   out['Turnover'])
  names(out)[5] <- 'GrossIncomeAndTurnover'
  return(out)
}
round_fun(dt)
# -------------------------------
> round_fun(dt)
           GrossIncome               Turnover                Sellers                 Buyers 
                  55.0                  550.0                    5.5                    5.5 
GrossIncomeAndTurnover 
                 605.0 

CodePudding user response:

Another way of doing it is first summarise all your data and then "format" it.

dt %>%
  summarise_all(sum, na.rm = TRUE) %>%
  mutate_at(c("GrossIncome", "Turnover"), ~(.) / 1000) %>%
  mutate(GrossIncomeAndTurnover = GrossIncome   Turnover) %>%
  mutate_all(round, digits = 1)
  
  •  Tags:  
  • Related