Home > Back-end >  How can I search for a regular expression in each box of data frame column, and then return just the
How can I search for a regular expression in each box of data frame column, and then return just the

Time:01-25

I have the following data frame:

df <- data.frame(a=c("23034185- Breast Cancer","24586730- Glioblastoma"), b=c(25, 47))

I want to search column a to see if the box contains a number, and if so, return just the number instead of the whole field. So output should be "23034185", "24586730" instead of containing the whole field.

Please help, thank you.

CodePudding user response:

A possible solution:

library(tidyverse)

df <- data.frame(a=c("23034185- Breast Cancer","24586730- Glioblastoma"), b=c(25, 47))

df %>% 
  mutate(number = str_extract(a, "^\\d "))

#>                         a  b   number
#> 1 23034185- Breast Cancer 25 23034185
#> 2  24586730- Glioblastoma 47 24586730

CodePudding user response:

We may use gsub from base R. The pattern [^0-9] means 'look for characters in df$a that are not digits and replace them with nothing'.

df$c <- gsub('[^0-9]', '', df$a)

> df
                        a  b        c
1 23034185- Breast Cancer 25 23034185
2  24586730- Glioblastoma 47 24586730

The column df$c does in fact not contain numbers, but rather character strings.

> str(df$c)
 chr [1:2] "23034185" "24586730"

If you wish to perform any calculations with these numbers, you can directly convert them to numbers using as.numeric().

df$c <- as.numeric(gsub('[^0-9]', '', df$a))

> str(df$c)
 num [1:2] 23034185 24586730
  •  Tags:  
  • Related