I already asked and got the solution about this topic.
However, additionally, I want to check which data are replaced in the new column. I just tried below,
df$check <- str_match_all(df, "\\d{11}") %% unlist
but, it cannot work. Ultimately, I want to get the below data set.
original edited check
1 010-1234-5678 010-1234-5678
2 John 010-8888-8888 John 010-8888-8888
3 Phone: 010-1111-2222 Phone: 010-1111-2222
4 Peter 018.1111.3333 Peter 018.1111.3333
5 Year(2007,2019,2020) Year(2007,2019,2020)
6 Alice 01077776666 Alice 010-9999-9999 01077776666
Here is my code.
x = c("010-1234-5678",
"John 010-8888-8888",
"Phone: 010-1111-2222",
"Peter 018.1111.3333",
"Year(2007,2019,2020)",
"Alice 01077776666")
df = data.frame(
original = x
)
df$edited <- gsub("\\d{11}", "010-9999-9999", df$original)
df$check <- c("","","","","","01077776666") # I want to know the way here.
Thank you.
CodePudding user response:
In an ifelse using `==` you could test if the columns match, then if not, use gsub to match the first digit and get it and the rest of the string out of "original".
transform(df, check=ifelse(!do.call(`==`, df[c("original", "edited")]),
gsub('(\\D*)(\\d.*)', '\\2', original),
NA))
# original edited check
# 1 010-1234-5678 010-1234-5678 <NA>
# 2 John 010-8888-8888 John 010-8888-8888 <NA>
# 3 Phone: 010-1111-2222 Phone: 010-1111-2222 <NA>
# 4 Peter 018.1111.3333 Peter 018.1111.3333 <NA>
# 5 Year(2007,2019,2020) Year(2007,2019,2020) <NA>
# 6 Alice 01077776666 Alice 010-9999-9999 01077776666
