I have a dataframe like below:
SampleId Col1 Col2 Col3 Col4
1 st1 k p
2 st2 k
3 st3 k p g
4 st4 k p g s
I want the empty rows in the columns to be filled based on the available values from preceding columns, so something like below:
SampleId Col1 Col2 Col3 Col4
1 st1 k p p p
2 st2 k k k k
3 st3 k p g g
4 st4 k p g s
What would be a dplyr way to do this?
CodePudding user response:
You could transform the data to long, replace "" with NA, and fill in missing values with the previous value with fill(). Finally, transform the data back to wide.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(starts_with('Col')) %>%
mutate(value = na_if(value, "")) %>%
fill(value) %>%
pivot_wider()
# # A tibble: 4 × 5
# SampleId Col1 Col2 Col3 Col4
# <chr> <chr> <chr> <chr> <chr>
# 1 st1 k p p p
# 2 st2 k k k k
# 3 st3 k p g g
# 4 st4 k p g s
Data
df <- structure(list(SampleId = c("st1", "st2", "st3", "st4"),
Col1 = c("k", "k", "k", "k"), Col2 = c("p", "", "p", "p"), Col3 = c("", "", "g", "g"),
Col4 = c("", "", "", "s")), class = "data.frame", row.names = c("1", "2", "3", "4"))
