I have a dataframe as follows:
df <- data.frame(v1 = 1:5, v2 = c('A, A, A', 'A', 'S', 'A, S', 'P, P, A'))
in column v2, there are three letters (A, P, S), where they can appear in any combination, e.g. "A, A", "A, P", "P, P, S", "A", "A, A, S, A", etc.
What I want to do is to detect the rows that only ontain the letter "A", no matter how many times it is repeated. In my sample df, desired anseer is : TRUE, TRUE, FALSE, FALSE, FALSE.
thanks in advance.
CodePudding user response:
I would use the regex pattern ^A(?:,\s*A)*$:
df[grepl('^A(?:,\\s*A)*$', df$v2), ]
v1 v2
1 1 A, A, A
2 2 A
Data:
df <- data.frame(v1 = 1:5, v2 = c('A, A, A', 'A', 'S', 'A, S', 'P, P, A'))
CodePudding user response:
You can split the values into vectors in a list, and then check that all values in that vector are equal to A. You can do that with this line
sapply(strsplit(df$v2, ", "), function(x) all(x=="A"))
# [1] TRUE TRUE FALSE FALSE FALSE
CodePudding user response:
Using regex you can do -
grepl('^(A,?\\s?) $', df$v2)
[1] TRUE TRUE FALSE FALSE FALSE
