I'm trying to remove some characters from the strings, but for some reason my code doesn't seem to work. I have the following data
data <- as.data.frame (structure(list(col_name = c("applexz", "Jack", "Tablesxz", "aorange"))))
col_name
applexz
Jack
Tablexz
aorange
and I'm trying to tell R to remove the last two characters if the last two characters are "xz" (I want to repeat it with other strings and numbers of characters later, for example removing the first character if it is 'a', as in the 'aorange' here) But when I try different options nothing happens, or it prints "NO" in each column, not detecting the substring - where is the problem?
data$col2 <- ifelse (str_sub(data$col_name, -1) == "xz", str_sub(data$col_name,1, nchar(data$col_name))-2, data$col_name) #this is to remove the last two characters if the condition is met
data$col2 <- ifelse (str_sub(data$col_name, -1) == "xz", 'YES', 'NO')
data$col2 <- ifelse(grepl('^xz', data$col_name), 'YES', 'NO')
CodePudding user response:
A simple regular expression can do this much simpler:
sub("xz$", "", data$col_name)
# [1] "apple" "Jack" "Tables" "aorange"
But to your code:
str_sub(., -1)is returning the last letter only, you should check the inner values your code is using before assumingifelsewill know what to do with it:stringr::str_sub(data$col_name, -1) # [1] "z" "k" "z" "e" ifelse(stringr::str_sub(data$col_name, -2) == "xz", 'YES', 'NO') # [1] "YES" "NO" "YES" "NO"your regex for
greplis looking at the beginning (^) instead of the end ($) of the string.grepl("xz$", data$col_name) # [1] TRUE FALSE TRUE FALSE ifelse(grepl('xz$', data$col_name), 'YES', 'NO') # [1] "YES" "NO" "YES" "NO"
CodePudding user response:
Here is a way to delete a particular set of characters if the string starts OR ends with them:
mylist <- c("applexz", "xzJack", "Tablesxz", "aorange")
sub("^xz|xz$", "", mylist)
# [1] "apple" "Jack" "Tables" "aorange"
With one use of sub, you are deleting the characters if the strings starts ("^xz") OR ends ("xz$") with them. I believe this answers the question fully.
CodePudding user response:
str_sub() will recycle all arguments to be the same length as the longest argument. You need to use
str_sub(data$col_name, start=-2, end=-1)
to indicate the last two characters. I am not sure how str_sub() can substitute substrings for a whole data.frame, but for a single element in data.frame, you could write a for loop to substitute all tail "xz"
for(i in 1:length(data)){
if(str_sub(data[i], -2,-1) == "xz")
str_sub(data[2], -2,-1) <- ""
}
You can also try function like gsub() or sub()
sub("xz","",data$col_name)
This method has a potential drawback that it will substitute all "xz"s found in your data.frame
For example, if you have an extra entry "xzdef", the function will also turn "xzdef" to "def".
