Home > Mobile >  Data Frame in R - Make operations on columns depending the value of factors in the first columns
Data Frame in R - Make operations on columns depending the value of factors in the first columns

Time:01-25

I am not sure how to type my question but here's what I am trying to do (highly simplified) I have a data frame with 4 columns that looks something like this: [table][1] [1]: https://i.stack.imgur.com/KLgBh.png

The first 2 columns are factors (segment / company). The 2 last columns are variables.

I want to divide each value on the last 2 columns by the market value of that specific segment. As you can see if you looked at the picture the problem I run into is that for some sectors I have 3 companies and the market, for other sectors I have 2 companies and the market and so on so the sizes are never the same...

I have solved this by creating tons of "auxiliar" data frames where each data frame contains only the specific sector but I am sure there is a simpler way to do it either using dplyr or using conditionals

something like if(df[Segment="Seg1"]){ df['Var1']<- df['Var1']/df[4,3] & df['Var2']<- df['Var2']/df[4,4] } else if (df[Segment="Seg2"]){ df['Var1']<- df['Var1']/df[7,3] & df['Var2']<- df['Var2']/df[7,4] } else if ....

but as you can imagine this is also not optimal code-wise and I am using the position of the market that I checked manually instead of using code to ask R to find it

Maybe something with mutating or left_join?

Hope my question is clear Does anyone have any idea?

CodePudding user response:

library(tidyverse)
segment <- c(rep_len("Seg1", 4), rep_len("Seg2", 4))
company <- c(rep_len(c("a", "b", "c", "market"), 8))
var1 <- c(100, 100, 200, 400, 150, 200, 200, 800)
var2 <- c(200, 222, 333, 4444, 555, 666, 777, 888)
df <- data_frame(segment, company, var1, var2)
#> Warning: `data_frame()` was deprecated in tibble 1.1.0.
#> Please use `tibble()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

df |> group_by(segment) |>
   mutate(new1 = var1/var1[company == "market"], new2 = var2/var2[company == "market"], )
#> # A tibble: 8 × 6
#> # Groups:   segment [2]
#>   segment company  var1  var2  new1   new2
#>   <chr>   <chr>   <dbl> <dbl> <dbl>  <dbl>
#> 1 Seg1    a         100   200 0.25  0.0450
#> 2 Seg1    b         100   222 0.25  0.0500
#> 3 Seg1    c         200   333 0.5   0.0749
#> 4 Seg1    market    400  4444 1     1     
#> 5 Seg2    a         150   555 0.188 0.625 
#> 6 Seg2    b         200   666 0.25  0.75  
#> 7 Seg2    c         200   777 0.25  0.875 
#> 8 Seg2    market    800   888 1     1

Created on 2022-01-25 by the reprex package (v2.0.1)

CodePudding user response:

So I solved it like this

1)created new "aux" df with only market values

market.df<-df%>%
 filter(Company=='Market")
  1. then matched with let_join (note that my market df is much smaller than my original df so I can't just divide my original df by market.df

    new.df<-left_join(df, unique(market.aux), by=='Segment', suffix=c("",".market"))

  2. then just separated new.df into 2 data frames and divided them

    aux.1<-select(new.df, 'Variable 1', 'Variable 2') aux.2<-select(new.df, 'Variable 1.market', 'Variable 2.market') results<- aux.1/aux.2

  3. then I just took back the first 2 columns of the original data frame to add Segment and Company again...

The complicated part of this was that not all segments had the same length so left_join and unique were very important for my solution to work

  •  Tags:  
  • Related