Home > Enterprise >  Run next iteration on output from previous iteration in R
Run next iteration on output from previous iteration in R

Time:01-23

Lets say I have a data frame

mydata <- data.frame(x = 1:25,
                     y = 26:50)

and another data frame with a set of min and max values

df.remove <- data.frame(min = c(3,10,22,17),
                        max = c(6,13,24,20))

Im looking to create an output where the rows with values in column x of mydata, that fall between each row of min and max in df.remove are deleted. thus giving me an output data frame

  x  y
  1 26
  2 27
  7 32
  8 33
  9 34
 14 39
 15 40
 16 41
 21 46
 25 50

I figured I can use the between() function to delete the values that fall between a range, and since I would be looking at the min and max values from each row in df.remove I attempted to run a loop using the code

result <- data.frame()
for(i in 1:nrow(df.filter)) {
  result <- mydata[!between(mydata$x,df.filter$min[i],df.filter$max[i]),]
}

This, for obvious reasons returns the output with only the last set of min and max values removed. I figured to get the output I am looking for I would likely have to run the consecutive iteration on the output from the previous iteration instead of the original data frame mydata, however I couldn't find a way to do it.

CodePudding user response:

Since you're using dplyr function between, we can use dplyr filter function. For each row of mydata you want to apply between to each row of df.remove to see if value of column x is between. This can be accomplished with mapply (since there are two values to input to the function). This will create a matrix of T/F. Then go through each row and see if any values are returned as T. Do this with apply, across rows. Negative filter for any row that returns a T indicating a value between the target value:

library(dplyr)
mydata %>% 
  filter(
    !mapply(function(left, right) between(mydata$x, left, right), left = df.remove$min, right = df.remove$max) %>% 
      apply(., 1, any)
    )

Returns:

    x  y
1   1 26
2   2 27
3   7 32
4   8 33
5   9 34
6  14 39
7  15 40
8  16 41
9  21 46
10 25 50

CodePudding user response:

In your code, the result dataframe can only keep your last update, as you operated on the original mydata dataframe and assigned this single update to the result dataframe every time.

Instead, you are supposed to operate on the updated dataframe. You could try the following code.

result <- mydata
for(i in 1:nrow(df.remove)) {
     result <- result[!between(result$x,df.remove$min[i],df.remove$max[i]),]
}

After assigning the original mydata dataframe to the result dataframe, you are able to update it in an iterated way.

  •  Tags:  
  • Related