Selecting variable column names for further IRR calculation in R-CodePudding

I have a table of cash flows for various projects over time (years) and want to calculate the IRR for each project. I can't seem to select the appropriate columns, which vary, for each project. The table structure is as follows:

structure(list(`Portfolio Company` = c("Ventures II", "Pal III", 
"River Fund II", "Ventures III"), 
minc = c(2007, 2008, 2008, 2012), 
maxc = c(2021, 2021, 2021, 2020), 
num_pers = c(14, 13, 13, 8), 
`2007` = c(-660000, NA, NA, NA), 
`2008` = c(-525000, -954219, -1427182.55, NA), 
`2009` = c(-351991.03, -626798, -1694353.41, NA), 
`2010` = c(-299717.06, -243248, -1193954, NA), 
`2011` = c(-239257.08, 465738, -288309, NA),
`2012` = c(-9057.31000000001, -369011, 128509.63, -480000), 
`2013` = c(-237233.9, -131111, 53718, -411734.58),
`2014` = c(-106181.76, -271181, 887640, -600000),
`2015` = c(-84760.51, 441808, 906289, -900000), 
`2016` = c(2770719.21, -377799, 166110, -150000),
`2017` = c(157820.08, -12147, 1425198, -255000),
`2018` = c(204424.36,-1626110, 361270, -180000),
`2019` = c(563463.62, 119577, 531555, 3300402.62),
`2020` = c(96247.29, 7057926, 2247027, 36111.6), 
`2021` = c(614848.68, 1277996, 258289, NA)), 
class = c("grouped_df", "tbl_df", "tbl", "data.frame"),
row.names = c(NA, -4L), 
groups = structure(list(`Portfolio Company` =c("Ventures II","Ventures III","Pal III", "River Fund II"),
.rows = structure(list(1L, 4L, 2L, 3L),
ptype = integer(0), 
class = c("vctrs_list_of", "vctrs_vctr", "list"))), 
class = c("tbl_df", "tbl", "data.frame"), 
row.names = c(NA, -4L), .drop = TRUE))

Each project (Portfolio Company) has a different start and end date which is captured by the minc and maxc columns. I would like to use the text in minc and maxc to select from minc:maxc for each project to perform the IRR calculation. I get a variety of errors including: object maxc not found, incorrect arg ... Have tried about 20 combinations of !!sym, as.String (from NLP package) ... none works.

This is the code that created the table and the problematic select code:

sum_fund_CF <- funds %>% group_by(`TX_YR`, `Portfolio Company`) %>% 
  summarise(CF=sum(if_else(is.na(Proceeds),0,Proceeds)-if_else(is.na(Investment),0,Investment))) %>% ungroup() #organizes source data and calculates cash flows

sum_fund_CF <- sum_fund_CF %>%
  group_by(`Portfolio Company`) %>% mutate(minc=min(`TX_YR`),maxc=max(`TX_YR`),num_pers=maxc-minc) %>% 
  pivot_wider(names_from = TX_YR, values_from = `CF`) #creates the table and finds first year and last year of cash flow, and num of periods between them

sum_fund_CF %>% group_by(`Portfolio Company`)%>% select(!!sym(as.String(maxc))):!!sym(as.String(max))) #want to select appropriate columns for each record to do the IRR analysis ... IRR() ... need a string of cash flows and no NA.

I'm sure it's something simple, but this has me perplexed.  Thanks !

CodePudding user response：

You can modify your definition of IRR accordingly. I followed this article on how to calculate IRR using the jrvFinance package.

The filter function from the dplyr package is used after group_by, to select the years indicated by the minc and maxc columns.

library(tidyverse)
library(janitor)
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test
library(jrvFinance)

data <- structure(list(`Portfolio Company` = c("Ventures II", "Pal III", 
                                               "River Fund II", "Ventures III"), 
                       minc = c(2007, 2008, 2008, 2012), 
                       maxc = c(2021, 2021, 2021, 2020), 
                       num_pers = c(14, 13, 13, 8), 
                       `2007` = c(-660000, NA, NA, NA), 
                       `2008` = c(-525000, -954219, -1427182.55, NA), 
                       `2009` = c(-351991.03, -626798, -1694353.41, NA), 
                       `2010` = c(-299717.06, -243248, -1193954, NA), 
                       `2011` = c(-239257.08, 465738, -288309, NA),
                       `2012` = c(-9057.31000000001, -369011, 128509.63, -480000), 
                       `2013` = c(-237233.9, -131111, 53718, -411734.58),
                       `2014` = c(-106181.76, -271181, 887640, -600000),
                       `2015` = c(-84760.51, 441808, 906289, -900000), 
                       `2016` = c(2770719.21, -377799, 166110, -150000),
                       `2017` = c(157820.08, -12147, 1425198, -255000),
                       `2018` = c(204424.36,-1626110, 361270, -180000),
                       `2019` = c(563463.62, 119577, 531555, 3300402.62),
                       `2020` = c(96247.29, 7057926, 2247027, 36111.6), 
                       `2021` = c(614848.68, 1277996, 258289, NA)), 
                  class = c("grouped_df", "tbl_df", "tbl", "data.frame"),
                  row.names = c(NA, -4L), 
                  groups = structure(list(`Portfolio Company` =c("Ventures II","Ventures III","Pal III", "River Fund II"),
                                          .rows = structure(list(1L, 4L, 2L, 3L),
                                                            ptype = integer(0), 
                                                            class = c("vctrs_list_of", "vctrs_vctr", "list"))), 
                                     class = c("tbl_df", "tbl", "data.frame"), 
                                     row.names = c(NA, -4L), .drop = TRUE))


clean_data <- data %>%
  clean_names() %>%
  ungroup() %>%
  pivot_longer(cols = -1:-4,
               names_to = "year",
               values_to = "cashflow") %>%
  mutate(year = str_replace(year, "x", ""),
         year = as.numeric(year))


clean_data %>% 
  print(n = 20)
#> # A tibble: 60 x 6
#>    portfolio_company  minc  maxc num_pers  year cashflow
#>    <chr>             <dbl> <dbl>    <dbl> <dbl>    <dbl>
#>  1 Ventures II        2007  2021       14  2007 -660000 
#>  2 Ventures II        2007  2021       14  2008 -525000 
#>  3 Ventures II        2007  2021       14  2009 -351991.
#>  4 Ventures II        2007  2021       14  2010 -299717.
#>  5 Ventures II        2007  2021       14  2011 -239257.
#>  6 Ventures II        2007  2021       14  2012   -9057.
#>  7 Ventures II        2007  2021       14  2013 -237234.
#>  8 Ventures II        2007  2021       14  2014 -106182.
#>  9 Ventures II        2007  2021       14  2015  -84761.
#> 10 Ventures II        2007  2021       14  2016 2770719.
#> 11 Ventures II        2007  2021       14  2017  157820.
#> 12 Ventures II        2007  2021       14  2018  204424.
#> 13 Ventures II        2007  2021       14  2019  563464.
#> 14 Ventures II        2007  2021       14  2020   96247.
#> 15 Ventures II        2007  2021       14  2021  614849.
#> 16 Pal III            2008  2021       13  2007      NA 
#> 17 Pal III            2008  2021       13  2008 -954219 
#> 18 Pal III            2008  2021       13  2009 -626798 
#> 19 Pal III            2008  2021       13  2010 -243248 
#> 20 Pal III            2008  2021       13  2011  465738 
#> # ... with 40 more rows

clean_data %>% 
  group_by(portfolio_company) %>% 
  filter(between(year, min(minc), max(maxc))) %>% 
  summarise(irr = irr(cashflow, 
                      cf.freq = 1))
#> # A tibble: 4 x 2
#>   portfolio_company    irr
#>   <chr>              <dbl>
#> 1 Pal III           0.111 
#> 2 River Fund II     0.0510
#> 3 Ventures II       0.0729
#> 4 Ventures III      0.0251

^{Created on 2022-01-04 by the reprex package (v2.0.1)}

CodePudding user response：

Another way to do it using jvrFinance::irr().

library(jrvFinance)
library(tidyverse)

df %>% 
  rowwise() %>% 
  summarise(irr = irr(na.omit(c_across(matches('^\\d')))), .groups = 'drop')
#> # A tibble: 4 × 2
#>   `Portfolio Company`    irr
#>   <chr>                <dbl>
#> 1 Ventures II         0.0729
#> 2 Pal III             0.111 
#> 3 River Fund II       0.0510
#> 4 Ventures III        0.0251

^{Created on 2022-01-04 by the reprex package (v2.0.1)}