I made the following code for data It works fine, but the problem is that it takes too long as my dataset is huge. Could someone make the code more efficient and faster? Thanks a lot in advance!
for (f in 1: nlevels(try$IDISIN)) {
temp<-subset(try, IDISIN==levels(try$IDISIN)[f])
temp<-as.data.table(temp)
temp2<-temp %>%
arrange(TradingDate)
temp2<-as.data.table(temp2)
for (i in 1:nrow(temp2)) {
temp2$CSum[i]<-ifelse(i=="1", temp2$Dailysum[1],(temp2$CSum[i-1] temp2$Dailysum[i]))
if (temp2$CSum[i]<0) {
Selling<-bind_rows(Selling, temp2[i])
temp2$CSum[i]<-0
}
temp2$CSum[i]<-ifelse(temp2$FinalInd[i]==1,
temp2$CSum[i]/temp2$A.Factor[i],
temp2$CSum[i])
}
Rebind<-bind_rows(Rebind, temp2)
rm(list = "temp", "temp2")
}
and here is simplified dataset
try<- data.frame(ISIN=c("abc", "abc", "ghi", "def", "def", "def", "ghi", "ghi", "ghi"),
ID =c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
TradingDate=c("2022-07-01", "2022-07-02", "2022-07-03", "2022-07-01", "2022-07-02", "2022-07-03","2022-07-01", "2022-07-02", "2022-07-03"),
Dailysum=c("-4", "8", "1", "2", "-6","9", "4", "8", "9"),
A.Factor=c("0", "0", "0.1", "0", "0","0", "0", "0.5", "0"),
Ind=c("0", "0", "1", "0", "0","0", "0", "1", "0"))
library(data.table)
try<-as.data.table(try)
try[,IDISIN:=paste(ISIN, ID,sep = "-")]
Selling<-try[is.na(ISIN)]
Rebind<-try[is.na(ISIN)]
ISIN ID TradingDate Dailysum A.Factor Ind IDISIN
1: abc A 2022-07-01 -4 0 0 abc-A
2: abc A 2022-07-02 8 0 0 abc-A
3: ghi A 2022-07-03 1 0.1 1 ghi-A
4: def B 2022-07-01 2 0 0 def-B
5: def B 2022-07-02 -6 0 0 def-B
6: def B 2022-07-03 9 0 0 def-B
7: ghi C 2022-07-01 4 0 0 ghi-C
8: ghi C 2022-07-02 8 0.5 1 ghi-C
9: ghi C 2022-07-03 9 0 0 ghi-C
I need to do following things
cumulative sum by ISIN and ID
If cumulatvie sume is negative
save it into seperate dataframe ("Selling" in the above code)
set cumulative sum to 0
If Ind=1, divide cumulative sum with Factor
So it should look like this
dat <- data.frame(ISIN=c("abc", "abc", "ghi", "def", "def", "def", "ghi", "ghi", "ghi"),
ID =c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
TradingDate=c("2022-07-01", "2022-07-02", "2022-07-03", "2022-07-01", "2022-07-02", "2022-07-03","2022-07-01", "2022-07-02", "2022-07-03"),
Dailysum=c("-4", "8", "1", "2", "-6","9", "4", "8", "9"),
A.Factor=c("0", "0", "0.1", "0", "0","0", "0", "0.5", "0"),
Ind=c("0", "0", "1", "0", "0","0", "0", "1", "0"),
CSum= c("0", "8", "10", "2", "0","9", "4", "24", "33"))
ISIN ID Date Quantity Factor Ind CumulativeSum
1 abc A 2022-07-01 -4 0 0 0
2 abc A 2022-07-02 8 0 0 8
3 ghi A 2022-07-03 1 0.1 1 10
4 def B 2022-07-01 2 0 0 2
5 def B 2022-07-02 -6 0 0 0
6 def B 2022-07-03 9 0 0 9
7 ghi C 2022-07-01 4 0 0 4
8 ghi C 2022-07-02 8 0.5 1 24
9 ghi C 2022-07-03 9 0 0 33
CodePudding user response:
I can't run your first code block, so I don't know for certain what Selling should look like, but I think we can do it as part of the main processing and then you can filter it out later.
First, I think many of those columns should be numbers, so
try[, names(try) := lapply(.SD, type.convert, as.is = TRUE)]
str(try)
# Classes 'data.table' and 'data.frame': 9 obs. of 6 variables:
# $ ISIN : chr "abc" "abc" "ghi" "def" ...
# $ ID : chr "A" "A" "A" "B" ...
# $ TradingDate: chr "2022-07-01" "2022-07-02" "2022-07-03" "2022-07-01" ...
# $ Dailysum : int -4 8 1 2 -6 9 4 8 9
# $ A.Factor : num 0 0 0.1 0 0 0 0 0.5 0
# $ Ind : int 0 0 1 0 0 0 0 1 0
Second, I don't think we need IDISIN, since I think you're using it as an easy grouping variable, in which case data.table's use of by= takes care of it for us.
Third, I'm assuming that you are controlling the order of data (TradingDate) externally, perhaps setkey(try, ISIN, ID, TradingDate) or similar. I make no checks (or promises if this is not true.) (Over to you if you want try[, TradingDate := as.Date(TradingDate)], it seems logical to do so but changes nothing here.)
From here,
fun <- function(prev, this) {
z <- prev[1] this[1]
c(max(z, 0), max(-z, 0)) / (if (this[2] > 0) this[3] else 1)
}
try[, c("CumulativeSum", "Sell") :=
transpose(Reduce(fun, transpose(list(Dailysum, Ind, A.Factor)),
init = c(0, 0), accumulate = TRUE)[-1]),
by = .(ISIN, ID) ]
try
# ISIN ID TradingDate Dailysum A.Factor Ind CumulativeSum Sell
# <char> <char> <char> <int> <num> <int> <num> <num>
# 1: abc A 2022-07-01 -4 0.0 0 0 4
# 2: abc A 2022-07-02 8 0.0 0 8 0
# 3: ghi A 2022-07-03 1 0.1 1 10 0
# 4: def B 2022-07-01 2 0.0 0 2 0
# 5: def B 2022-07-02 -6 0.0 0 0 4
# 6: def B 2022-07-03 9 0.0 0 9 0
# 7: ghi C 2022-07-01 4 0.0 0 4 0
# 8: ghi C 2022-07-02 8 0.5 1 24 0
# 9: ghi C 2022-07-03 9 0.0 0 33 0
No for loop required.
Admittedly, the Sell logic might need double-check to make sure that successive negatives will react as you need it. That should be handled in the function. In there, the two arguments:
prevare the previous row's values forCumulativeSumandSell, as determined by the previous call tofun(within a group). The first time it is called for a group, it is preassigned the valuec(0, 0)(by theinit=argument).thisis a triplet of the current row'sc(Dailysum, Ind, A.Factor)(all numbers, not named), so we index it directly by position.
