I want to extract titles (Mr, Mrs, Miss) from within the Name column and import those extracted titles into a new column Title. Relevant data looks like this:
snippet <- data_frame(Name=c('Braund, Mr. Owen Harris','Cumings, Mrs. John Bradley','Heikkinen, Miss. Laina'),Column=c('blah','blah,'blah'))
I've reviewed this answer, but I must be missing something.
Here's the best code I could come up with: snippet <- mutate(snippet, Title = str_extract(snippet $Name, "(?<=,)[^,]*(?=.)"). This does add the Title column, but all values within that column are NA. Where's my error? Thanks.
CodePudding user response:
Maybe this helps - in the column 'Name', there is a space after the ,, so we use regex lookaround to match non-whitespace characters (\\S ) that succeeds after the , and space ((?<=, )) and precedes the . (. is metacharacter so we escape or else it matches any character)
library(dplyr)
library(stringr)
snippet <- snippet %>%
mutate(Title = str_extract(Name, "(?<=, )\\S (?=\\.)"))
-output
snippet
# A tibble: 3 × 3
Name Column Title
<chr> <chr> <chr>
1 Braund, Mr. Owen Harris blah Mr
2 Cumings, Mrs. John Bradley blah Mrs
3 Heikkinen, Miss. Laina blah Miss
data
snippet <- structure(list(Name = c("Braund, Mr. Owen Harris",
"Cumings, Mrs. John Bradley",
"Heikkinen, Miss. Laina"), Column = c("blah", "blah", "blah")),
class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -3L))
