How can I add hashtag in a data frame into new column?
This is my data frame:
dataframe <- data.frame(a = c('A', 'B', 'C', 'D', 'E'),
b = c("hello friends! #goodday",
"the flood getting worse #peoplefirst",
"i love adele new song, it is remarkable",
"john doe loves judo",
"the new variant of covid19 is worrying #staysafe"))
Final data frame should be like this:
a b c
A hello friends! #goodday #goodday
B the flood getting worse #peoplefirst #sos #peoplefirst #sos
C i love adele new song, it is remarkable NA
D john doe loves judo NA
E the new variant of covid19 is worrying #staysafe #staysafe
CodePudding user response:
Using the stringr package:
dataframe$c <- lapply(str_extract_all(dataframe$b, "#\\w "),
function(x) paste(x, collapse=" "))
dataframe
a b c
1 A hello friends! #goodday #goodday
2 B the flood getting worse #peoplefirst #sos #peoplefirst #sos
3 C i love adele new song, it is remarkable
4 D john doe loves judo
5 E the new variant of covid19 is worrying #staysafe #staysafe
CodePudding user response:
A more tidyverse-esque solution would be the following using mutate, map, str_extract_all and na_if.
library(tidyverse)
dataframe |>
# For every row extract all the letters following a hashtag
# and paste them into a single character string (for multiple matches)
mutate(c = map(.x = b,
.f = function(x) paste0(str_extract_all(x, "#[A-z] ",
simplify = T),
collapse = " ",
recycle0 = "NA"))) |>
# Change empty spaces to NA
na_if("")
# a b c
#1 A hello friends! #goodday #goodday
#2 B the flood getting worse #peoplefirst #sos #peoplefirst #sos
#3 C i love adele new song, it is remarkable NA
#4 D john doe loves judo NA
#5 E the new variant of covid19 is worrying #staysafe #staysafe
