I have two data frames. One of them contains numbers of questions as text and I use the grep() function to match those numbers to the name of my other dataframe columns.
The problem is that a part of my code doesn't work because my function grep() is not doing the trick.
Basically my two dataframesare as follows
DF1:
| Question | Group |
|---|---|
| 11 | Redmeat |
| 100 | Chicken |
| 56 | Vegetables |
| 210 | Dairy |
DF 2 (values don't matter, only the column name):
| 1.Question | 2.Question | ... | 101.Question | ... | 250.Question |
|---|---|---|---|---|---|
| Yes | No | ... | ... | ... | ... |
| Yes | Yes | ... | ... | ... | ... |
| No | Yes | ... | ... | ... | ... |
| No | Yes | ... | ... | ... | ... |
I use the following code:
i <- n ## I change n according to the row of DF1 that I want
grep(DF1$Question[i], colnames(DF2), fixed = T)
If I do:
i <- 2 ## (Question number 100)
grep(DF1$Question[i], colnames(DF2), fixed = T)
My code returns 100, which is correct since it's the column that corresponds to "100.Question"
But if I do:
i <- 1 ## (Question number 1)
grep(DF1$Question[i], colnames(DF2), fixed = T)
My code returns 1, 11, 21 ... 101 ... 201
Same if i do:
i <- 3 ## (Question number 56)
grep(DF1$Question[i], colnames(DF2), fixed = T)
It returns 56, 156
I only want the exact same number. Even if i use the argument fixed = TRUE it doesn't work.
Is there a solution or an alternative?
CodePudding user response:
Two options: 1) Include the . in the grep pattern, grep(paste0("^", DF1$Question[i], "\\."), colnames(DF2)), or 2) paste the full ".Question" on and use exact matching without any grep at all: paste0(DF1$Question, ".Question"). This will likely be more efficient than regex. Since your code has these is all over the place, I assume you're using a loop. grep and paste are vectorized, so if you provide more context we may be able to help you avoid the loop entirely.
CodePudding user response:
What about specifying in the pattern that you want from the start ^ and you want it to be followed by .Q?
i=3
grep(paste0("^",DF1$Question[i],".Q"), colnames(DF2))
Output:
[1] 56
CodePudding user response:
You need to grep for unique values, therefore you should grep the start of the string ^, together with your number and the dot .. In this case, you cannot use the fixed = T argument, since you are using regex to match.
grep(paste0("^", DF1$Question[i], "\\."), colnames(DF2))
