Home > Net >  Removing retweets from data frame in R based on text column
Removing retweets from data frame in R based on text column

Time:02-05

I pulled tweets from twitter using the academictwitter package. I would now like to remove all retweets = tweets starting with "RT" in the first column "text" (e.g. third row). You can download a similar data frame from github including tweets from Trump: enter image description here

Thank you in advance for any suggestions

CodePudding user response:

You can use regular expressions to figure out which rows start with 'RT'. If your data is in a data frame called tweets, maybe something like this?

tweets[grepl("^(?!RT)", tweets$text, perl = TRUE),]

Or if you're using tidyverse:

tweets %>% 
  filter(grepl("^(?!RT)", text, perl = TRUE))
  •  Tags:  
  • Related