Home > Enterprise >  R How to add hashtag in data frame into new column
R How to add hashtag in data frame into new column

Time:01-04

How can I add hashtag in a data frame into new column?

This is my data frame:

dataframe <- data.frame(a = c('A', 'B', 'C', 'D', 'E'),
                 b = c("hello friends! #goodday", 
                       "the flood getting worse #peoplefirst", 
                       "i love adele new song, it is remarkable", 
                       "john doe loves judo", 
                       "the new variant of covid19 is worrying #staysafe"))

Final data frame should be like this:

a   b                                                 c
A   hello friends! #goodday                           #goodday
B   the flood getting worse #peoplefirst #sos         #peoplefirst #sos              
C   i love adele new song, it is remarkable           NA
D   john doe loves judo                               NA
E   the new variant of covid19 is worrying #staysafe  #staysafe

CodePudding user response:

Using the stringr package:

dataframe$c <- lapply(str_extract_all(dataframe$b, "#\\w "),
                      function(x) paste(x, collapse=" "))
dataframe

  a                                                b                 c
1 A                          hello friends! #goodday          #goodday
2 B        the flood getting worse #peoplefirst #sos #peoplefirst #sos
3 C          i love adele new song, it is remarkable                  
4 D                              john doe loves judo                  
5 E the new variant of covid19 is worrying #staysafe         #staysafe

CodePudding user response:

A more tidyverse-esque solution would be the following using mutate, map, str_extract_all and na_if.

library(tidyverse)

dataframe |>
  # For every row extract all the letters following a hashtag
  # and paste them into a single character string (for multiple matches)
  mutate(c = map(.x = b, 
                 .f = function(x) paste0(str_extract_all(x, "#[A-z] ", 
                                                         simplify = T), 
                                         collapse = " ",
                                         recycle0 = "NA"))) |>
  # Change empty spaces to NA
  na_if("")

#  a                                                b                 c
#1 A                          hello friends! #goodday          #goodday
#2 B        the flood getting worse #peoplefirst #sos #peoplefirst #sos
#3 C          i love adele new song, it is remarkable                NA
#4 D                              john doe loves judo                NA
#5 E the new variant of covid19 is worrying #staysafe         #staysafe
  •  Tags:  
  • Related