Home > Back-end >  Replace a row in a dataframe with two new rows derived from the former
Replace a row in a dataframe with two new rows derived from the former

Time:01-21

Consider the following dataframe

 > df=tibble(x1=c(0,100,250,500,1000),x2=lead(x1),y1=c(-20,-10,20,40,60),y2=lead(y1),p=(y2-y1)/(x2-x1))
 > df
 # A tibble: 5 x 5
      x1    x2    y1    y2     p
   <dbl> <dbl> <dbl> <dbl> <dbl>
 1     0   100   -20   -10  0.1 
 2   100   250   -10    20  0.2
 3   250   500    20    40  0.08 
 4   500  1000    40    60  0.04
 5  1000    NA    60    NA NA   

## alternatively
df <- structure(list(x1 = c(0, 100, 250, 500, 1000), x2 = c(100, 250, 500, 1000, NA), y1 = c(-20, -10, 20, 40, 60), y2 = c(-10, 20, 40, 60, NA), p = c(0.1, 0.2, 0.08, 0.04, NA)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))

where each record can be interpreted as the coordinates of a segment with slope p.

I need to modify df so that whenever y1<0 and y2>0 the row is replaced by two rows derived from the one I am replacing:

  # A tibble: 5 x 5
      x1    x2    y1    y2     p
   <dbl> <dbl> <dbl> <dbl> <dbl>
 1     0   100   -20   -10  0.1 
 2   100   150   -10     0  0.2
 3   150   250     0    20  0.2
 4   250   500    20    40  0.1 
 4   500  1000    40    60  0.04
 5  1000    NA    60    NA NA   

Namely: x1 in row 2 and x2 in row 3 of the final dataframe are equal to 150, which is given by x1-y1/p or some other equivalent formula applied to row 2 in the original dataframe.

The difficult part is that I cannot find an efficient way (possibly in a pipe with dplyr) to extract a row, duplicate it, manipulate the duplicated rows and substitute for the original row.

CodePudding user response:

Here is a suggestion how you could do it:

library(dplyr)

df %>% 
  mutate(id = row_number()) %>%                         # ad row id
  filter(y1 < 0 & y2 > 0) %>%                           # filter by condition
  slice(rep(1:n(), each = 2)) %>%                       # duplicate each row
  group_by(id) %>%                                      # group by id
  mutate(x2 = ifelse(row_number()==1, x2-x1, x2),       # some ifelse for the condtions
         y2 = ifelse(row_number()==1, 0,y2)) %>% 
  mutate(x1 = ifelse(row_number()==2, lag(x2), x1), 
         y1 = ifelse(row_number()==2, lag(y2), y1)) %>% 
  bind_rows(df %>% mutate(id = row_number())) %>%       # bind with original df by adding again id
  arrange(id) %>%                                       # arrang id
  slice(-3) %>%                                         # remove the third of each group
  ungroup()
     x1    x2    y1    y2     p    id
  <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1     0   100   -20   -10  0.1      1
2   100   150   -10     0  0.2      2
3   150   250     0    20  0.2      2
4   250   500    20    40  0.08     3
5   500  1000    40    60  0.04     4
6  1000    NA    60    NA NA        5

CodePudding user response:

Not knowing how you calculate the new y1 and y2, here's a start:

library(dplyr)
df %>%
  mutate(nextx1 = lead(x1)) %>%
  filter(y1 < 0, y2 > 0) %>%
  rowwise() %>%
  summarize(
    x1 = c(    x1 , x1 - y1/p ),
    x2 = c( x1[2] , nextx1[1] ),
    y1 = c(    y1 ,        NA ),       # you need to fix thesem, idk
    y2 = c(    y2 ,        NA ),       #
    p
  ) %>%
  bind_rows(filter(df, y1 >= 0 |  y2 <= 0)) %>%
  arrange(x1, x2)
# # A tibble: 6 x 5
#      x1    x2    y1    y2     p
#   <dbl> <dbl> <dbl> <dbl> <dbl>
# 1     0   100   -20   -10  0.1 
# 2   100   150   -10    20  0.2 
# 3   150   250    NA    NA  0.2 
# 4   250   500    20    40  0.08
# 5   500  1000    40    60  0.04
# 6  1000    NA    60    NA NA   
  •  Tags:  
  • Related