I am not sure how to type my question but here's what I am trying to do (highly simplified) I have a data frame with 4 columns that looks something like this: [table][1] [1]: https://i.stack.imgur.com/KLgBh.png
The first 2 columns are factors (segment / company). The 2 last columns are variables.
I want to divide each value on the last 2 columns by the market value of that specific segment. As you can see if you looked at the picture the problem I run into is that for some sectors I have 3 companies and the market, for other sectors I have 2 companies and the market and so on so the sizes are never the same...
I have solved this by creating tons of "auxiliar" data frames where each data frame contains only the specific sector but I am sure there is a simpler way to do it either using dplyr or using conditionals
something like if(df[Segment="Seg1"]){ df['Var1']<- df['Var1']/df[4,3] & df['Var2']<- df['Var2']/df[4,4] } else if (df[Segment="Seg2"]){ df['Var1']<- df['Var1']/df[7,3] & df['Var2']<- df['Var2']/df[7,4] } else if ....
but as you can imagine this is also not optimal code-wise and I am using the position of the market that I checked manually instead of using code to ask R to find it
Maybe something with mutating or left_join?
Hope my question is clear Does anyone have any idea?
CodePudding user response:
library(tidyverse)
segment <- c(rep_len("Seg1", 4), rep_len("Seg2", 4))
company <- c(rep_len(c("a", "b", "c", "market"), 8))
var1 <- c(100, 100, 200, 400, 150, 200, 200, 800)
var2 <- c(200, 222, 333, 4444, 555, 666, 777, 888)
df <- data_frame(segment, company, var1, var2)
#> Warning: `data_frame()` was deprecated in tibble 1.1.0.
#> Please use `tibble()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
df |> group_by(segment) |>
mutate(new1 = var1/var1[company == "market"], new2 = var2/var2[company == "market"], )
#> # A tibble: 8 × 6
#> # Groups: segment [2]
#> segment company var1 var2 new1 new2
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Seg1 a 100 200 0.25 0.0450
#> 2 Seg1 b 100 222 0.25 0.0500
#> 3 Seg1 c 200 333 0.5 0.0749
#> 4 Seg1 market 400 4444 1 1
#> 5 Seg2 a 150 555 0.188 0.625
#> 6 Seg2 b 200 666 0.25 0.75
#> 7 Seg2 c 200 777 0.25 0.875
#> 8 Seg2 market 800 888 1 1
Created on 2022-01-25 by the reprex package (v2.0.1)
CodePudding user response:
So I solved it like this
1)created new "aux" df with only market values
market.df<-df%>%
filter(Company=='Market")
then matched with let_join (note that my market df is much smaller than my original df so I can't just divide my original df by market.df
new.df<-left_join(df, unique(market.aux), by=='Segment', suffix=c("",".market"))
then just separated new.df into 2 data frames and divided them
aux.1<-select(new.df, 'Variable 1', 'Variable 2') aux.2<-select(new.df, 'Variable 1.market', 'Variable 2.market') results<- aux.1/aux.2
then I just took back the first 2 columns of the original data frame to add Segment and Company again...
The complicated part of this was that not all segments had the same length so left_join and unique were very important for my solution to work
