Home > Mobile >  Ifelse statement : if the strings starts or ends with a particular set of characters, delete them
Ifelse statement : if the strings starts or ends with a particular set of characters, delete them

Time:01-26

I'm trying to remove some characters from the strings, but for some reason my code doesn't seem to work. I have the following data


data <- as.data.frame (structure(list(col_name = c("applexz", "Jack", "Tablesxz", "aorange"))))
    
col_name

applexz
Jack
Tablexz
aorange


and I'm trying to tell R to remove the last two characters if the last two characters are "xz" (I want to repeat it with other strings and numbers of characters later, for example removing the first character if it is 'a', as in the 'aorange' here) But when I try different options nothing happens, or it prints "NO" in each column, not detecting the substring - where is the problem?

data$col2 <- ifelse (str_sub(data$col_name,  -1) == "xz", str_sub(data$col_name,1, nchar(data$col_name))-2, data$col_name) #this is to remove the last two characters if the condition is met

data$col2 <- ifelse (str_sub(data$col_name,  -1) == "xz", 'YES', 'NO')

data$col2 <- ifelse(grepl('^xz', data$col_name), 'YES', 'NO')

CodePudding user response:

A simple regular expression can do this much simpler:

sub("xz$", "", data$col_name)
# [1] "apple"   "Jack"    "Tables"  "aorange"

But to your code:

  • str_sub(., -1) is returning the last letter only, you should check the inner values your code is using before assuming ifelse will know what to do with it:

    stringr::str_sub(data$col_name, -1)
    # [1] "z" "k" "z" "e"
    
    ifelse(stringr::str_sub(data$col_name, -2) == "xz", 'YES', 'NO')
    # [1] "YES" "NO"  "YES" "NO" 
    
  • your regex for grepl is looking at the beginning (^) instead of the end ($) of the string.

    grepl("xz$", data$col_name)
    # [1]  TRUE FALSE  TRUE FALSE
    ifelse(grepl('xz$', data$col_name), 'YES', 'NO')
    # [1] "YES" "NO"  "YES" "NO" 
    

CodePudding user response:

Here is a way to delete a particular set of characters if the string starts OR ends with them:

mylist <- c("applexz", "xzJack", "Tablesxz", "aorange")
sub("^xz|xz$", "", mylist)
# [1] "apple"   "Jack"    "Tables"  "aorange"

With one use of sub, you are deleting the characters if the strings starts ("^xz") OR ends ("xz$") with them. I believe this answers the question fully.

CodePudding user response:

str_sub() will recycle all arguments to be the same length as the longest argument. You need to use

str_sub(data$col_name, start=-2, end=-1)

to indicate the last two characters. I am not sure how str_sub() can substitute substrings for a whole data.frame, but for a single element in data.frame, you could write a for loop to substitute all tail "xz"

for(i in 1:length(data)){
  if(str_sub(data[i],  -2,-1) == "xz")
    str_sub(data[2],  -2,-1) <- ""
}

You can also try function like gsub() or sub()

sub("xz","",data$col_name)

This method has a potential drawback that it will substitute all "xz"s found in your data.frame

For example, if you have an extra entry "xzdef", the function will also turn "xzdef" to "def".

  •  Tags:  
  • Related