I have a dataframe that looks like the below:
BaseRating contRating Participant
5,4,6,3,2,4 5 01
4 4 01
I would first like to run some code that looks to see whether there are any commas in the dataframe, and returns a column number of where that is. I have tried some of the solutions in the questions below, which don't seem to work when looking for a comma instead of a string/whole value? I'm probably missing something simple here but any help appreciated!
Selecting data frame rows based on partial string match in a column
Filter rows which contain a certain string
Check if value is in data frame
Having determined whether there are commas in my data, I then want to extract just the last number in the list separated by commas in that entry, and replace the entry with that value. For instance, I want the first row in the BaseRating column to become '4' because it is last in that list.
Is there a way to do this in R without manually changing the number?
CodePudding user response:
A possible solution:
library(tidyverse)
df <- data.frame(
BaseRating = c("5,4,6,3,2,4", "4"),
contRating = c(5L, 4L),
Participant = c(1L, 1L)
)
df %>%
mutate(BaseRating = sapply(BaseRating,
function(x) str_extract(x, "^\\d $|(?<=\\,)\\d $") %>% as.integer))
#> BaseRating contRating Participant
#> 1 4 5 1
#> 2 4 4 1
Or:
library(tidyverse)
df %>%
separate_rows(BaseRating, sep = ",", convert = TRUE) %>%
group_by(contRating, Participant) %>%
summarise(BaseRating = last(BaseRating), .groups = "drop") %>%
relocate(BaseRating, .before = 1)
#> # A tibble: 2 × 3
#> BaseRating contRating Participant
#> <int> <int> <int>
#> 1 4 4 1
#> 2 4 5 1
CodePudding user response:
If we want a quick option, we can use trimws from base R
df$BaseRating <- as.numeric(trimws(df$BaseRating, whitespace = ".*,"))
-output
> df
BaseRating contRating Participant
1 4 5 1
2 4 4 1
Or another option is stri_extract_last
library(stringi)
df$BaseRating <- as.numeric(stri_extract_last_regex(df$BaseRating, "\\d "))
data
df <- structure(list(BaseRating = c("5,4,6,3,2,4", "4"), contRating = 5:4,
Participant = c(1L, 1L)), class = "data.frame", row.names = c(NA,
-2L))
