I'm having a hard putting this into the form of a question. I have situation where the data in a column (column B) were recorded in such a way that all the values with respect to an indicator (column A) ended up in the bottom-most row within each value of the indicator. Or more simply, like this:
(my_df <- data.frame(
A = c(rep(1, 6), rep(2, 6)),
B = c(rep(NA, 5), "a,b,c,d,e,f", rep(NA, 5), "g,h,i,j,k,l")
))
#> A B
#> 1 1 <NA>
#> 2 1 <NA>
#> 3 1 <NA>
#> 4 1 <NA>
#> 5 1 <NA>
#> 6 1 a,b,c,d,e,f
#> 7 2 <NA>
#> 8 2 <NA>
#> 9 2 <NA>
#> 10 2 <NA>
#> 11 2 <NA>
#> 12 2 g,h,i,j,k,l
Created on 2022-01-28 by the reprex package (v2.0.1)
I am trying to find a simple way to distribute the cell contents upward so that they are in their correct rows, with respect to their respective codes:
(expected_df_1 <- data.frame(
A = c(rep(1, 6), rep(2, 6)),
B = c(letters[1:6], letters[7:12])
))
#> A B
#> 1 1 a
#> 2 1 b
#> 3 1 c
#> 4 1 d
#> 5 1 e
#> 6 1 f
#> 7 2 g
#> 8 2 h
#> 9 2 i
#> 10 2 j
#> 11 2 k
#> 12 2 l
Created on 2022-01-28 by the reprex package (v2.0.1)
This would also be fine:
(expected_df_2 <- data.frame(
A = c(rep(1, 6), rep(2, 6)),
B = c(rep(NA, 5), "a,b,c,d,e,f", rep(NA, 5), "g,h,i,j,k,l"),
C = c(letters[1:6], letters[7:12])
))
#> A B C
#> 1 1 <NA> a
#> 2 1 <NA> b
#> 3 1 <NA> c
#> 4 1 <NA> d
#> 5 1 <NA> e
#> 6 1 a,b,c,d,e,f f
#> 7 2 <NA> g
#> 8 2 <NA> h
#> 9 2 <NA> i
#> 10 2 <NA> j
#> 11 2 <NA> k
#> 12 2 g,h,i,j,k,l l
Created on 2022-01-28 by the reprex package (v2.0.1)
I can't for the life of me find a solution to this. Ideas? Preferably I'd like to stay within the tidyverse framework if possible, but I'll take any suggestions!
CodePudding user response:
A possible solution, removing all NA first and then separating into rows, by comma, the elements together:
library(tidyverse)
my_df <- data.frame(
A = c(rep(1, 6), rep(2, 6)),
B = c(rep(NA, 5), "a,b,c,d,e,f", rep(NA, 5), "g,h,i,j,k,l")
)
my_df %>%
drop_na(B) %>%
separate_rows(B, sep=",")
#> # A tibble: 12 × 2
#> A B
#> <dbl> <chr>
#> 1 1 a
#> 2 1 b
#> 3 1 c
#> 4 1 d
#> 5 1 e
#> 6 1 f
#> 7 2 g
#> 8 2 h
#> 9 2 i
#> 10 2 j
#> 11 2 k
#> 12 2 l
CodePudding user response:
Another alternative to try. After grouping by column A, use strsplit on the comma separated values in column B (removing NA).
library(tidyverse)
my_df %>%
group_by(A) %>%
mutate(B = unlist(strsplit(na.omit(B), ',')))
Output
A B
<dbl> <chr>
1 1 a
2 1 b
3 1 c
4 1 d
5 1 e
6 1 f
7 2 g
8 2 h
9 2 i
10 2 j
11 2 k
12 2 l
