How can I store logs that are replaced compared with original data in R?-CodePudding

I already asked and got the solution about this topic.

However, additionally, I want to check which data are replaced in the new column. I just tried below,

df$check <- str_match_all(df, "\\d{11}") %% unlist

but, it cannot work. Ultimately, I want to get the below data set.

              original               edited       check
1        010-1234-5678        010-1234-5678            
2   John 010-8888-8888   John 010-8888-8888            
3 Phone: 010-1111-2222 Phone: 010-1111-2222            
4  Peter 018.1111.3333  Peter 018.1111.3333            
5 Year(2007,2019,2020) Year(2007,2019,2020)            
6    Alice 01077776666  Alice 010-9999-9999 01077776666

Here is my code.

x = c("010-1234-5678",
      "John 010-8888-8888",
      "Phone: 010-1111-2222",
      "Peter 018.1111.3333",
      "Year(2007,2019,2020)",
      "Alice 01077776666")

df = data.frame(
  original = x
)

df$edited <- gsub("\\d{11}", "010-9999-9999", df$original)

df$check <- c("","","","","","01077776666") # I want to know the way here.

Thank you.

CodePudding user response：

In an ifelse using `==` you could test if the columns match, then if not, use gsub to match the first digit and get it and the rest of the string out of "original".

transform(df, check=ifelse(!do.call(`==`, df[c("original", "edited")]), 
                           gsub('(\\D*)(\\d.*)', '\\2', original),
                           NA))
#               original               edited       check
# 1        010-1234-5678        010-1234-5678        <NA>
# 2   John 010-8888-8888   John 010-8888-8888        <NA>
# 3 Phone: 010-1111-2222 Phone: 010-1111-2222        <NA>
# 4  Peter 018.1111.3333  Peter 018.1111.3333        <NA>
# 5 Year(2007,2019,2020) Year(2007,2019,2020)        <NA>
# 6    Alice 01077776666  Alice 010-9999-9999 01077776666