I have a dataframe with a single column like the below:
df <- data.frame(
Level = c(
'Midwest',
'Wisconsin: Good',
'Wisconsin: Neutral',
'Wisconsin: Bad',
NA,
'Minnesota: Good',
'Minnesota: Neutral',
'Minnesota: Bad',
NA,
'New England',
'New Hampshire: Good',
'New Hampshire: Neutral',
'New Hampshire: Bad',
NA,
'Vermont: Good',
'Vermont: Neutral',
'Vermont: Bad'
)
)
I'm interested in doing two things to this column:
- Grab the first instance of the
Goodlevel for each state and place it in a new row about theGoodlevel, and... - Remove the state names from each level so it just shows
Good,Neutral, andBadbelow each state name.
The result would look like this:
df_desired <- data.frame(
Level = c(
'Midwest',
'Wisconsin',
'Good',
'Neutral',
'Bad',
NA,
'Minnesota',
'Good',
'Neutral',
'Bad',
NA,
'New England',
'New Hampshire',
'Good',
'Neutral',
'Bad',
NA,
'Vermont',
'Good',
'Neutral',
'Bad'
)
)
What's the best way to go about achieving this using R, ideally using dplyr if possible?
CodePudding user response:
Here is one way in tidyverse - Extract the prefix part before the : in 'Level' with str_extract, then split the 'Level' column at the : with separate_rows, remove the duplicated elements and remove the prefix column 'new'
library(dplyr)
library(stringr)
library(tidyr)
out <- df %>%
mutate(new = str_extract(Level, "\\w (?=:)")) %>%
separate_rows(Level, sep = ":\\s*") %>%
filter(!duplicated(cur_data()) | is.na(Level)) %>%
select(-new)
-checking with OP's desired
> all.equal(out, df_desired, check.attributes = FALSE)
[1] TRUE
