Consider the following data structure (df):
| ID | Text |
|---|---|
| 1 | Example |
| 2 | Example - 1 |
| 3 | Example - 2 |
| 4 | Example - 3 |
| 5 | Example - 4 |
| 6 | Example - 5 |
| 7 | Example - NA |
| 8 | Text |
| 9 | Text - 10 |
| 10 | Text - 20 |
| 11 | Text - 30 |
| 12 | Text - 40 |
| 13 | Text - 50 |
| 14 | Text - 60 |
| 15 | Text - 70 |
| 16 | Text - 80 |
| 17 | Text - 90 |
| 18 | Text - 100 |
In the column "Text", I want to find all rows that contain the following pattern: WhitespaceHyphenWhitespaceSingledigit
Or in other words, I want to extract the following rows:
| ID | Text |
|---|---|
| 2 | Example - 1 |
| 3 | Example - 2 |
| 4 | Example - 3 |
| 5 | Example - 4 |
| 6 | Example - 5 |
Currently I use the grepl()-function in combination with regular expressions. However none of my attempts like
- df[which(grepl("s{1}-\s{1}\d{1}$", df$Text)),]
- df[which(grepl("\b\s{1}-\s{1}\d{1}\b$", df$Text)),]
has worked out. Since I am a beginner in programming, I would be grateful for any advices. Thanks in advance.
CodePudding user response:
I would use the following regex pattern:
\s-\s\d(?!\d)
This matches a hyphen in between whitespaces, followed by a single digit which itself is followed by either a non digit character or end of the input.
Full R code:
df[grepl("\\s-\\s\\d(?!\\d)", df$Text, perl=TRUE), ]
