Home > database >  How to distribute or "rotate" cell values of a dataframe column so that they fill blank ro
How to distribute or "rotate" cell values of a dataframe column so that they fill blank ro

Time:01-29

I'm having a hard putting this into the form of a question. I have situation where the data in a column (column B) were recorded in such a way that all the values with respect to an indicator (column A) ended up in the bottom-most row within each value of the indicator. Or more simply, like this:

(my_df <- data.frame(
    A = c(rep(1, 6), rep(2, 6)),
    B = c(rep(NA, 5), "a,b,c,d,e,f", rep(NA, 5), "g,h,i,j,k,l")
))
#>    A           B
#> 1  1        <NA>
#> 2  1        <NA>
#> 3  1        <NA>
#> 4  1        <NA>
#> 5  1        <NA>
#> 6  1 a,b,c,d,e,f
#> 7  2        <NA>
#> 8  2        <NA>
#> 9  2        <NA>
#> 10 2        <NA>
#> 11 2        <NA>
#> 12 2 g,h,i,j,k,l

Created on 2022-01-28 by the reprex package (v2.0.1)

I am trying to find a simple way to distribute the cell contents upward so that they are in their correct rows, with respect to their respective codes:

(expected_df_1 <- data.frame(
    A = c(rep(1, 6), rep(2, 6)),
    B = c(letters[1:6], letters[7:12])
))
#>    A B
#> 1  1 a
#> 2  1 b
#> 3  1 c
#> 4  1 d
#> 5  1 e
#> 6  1 f
#> 7  2 g
#> 8  2 h
#> 9  2 i
#> 10 2 j
#> 11 2 k
#> 12 2 l

Created on 2022-01-28 by the reprex package (v2.0.1)

This would also be fine:

(expected_df_2 <- data.frame(
    A = c(rep(1, 6), rep(2, 6)),
    B = c(rep(NA, 5), "a,b,c,d,e,f", rep(NA, 5), "g,h,i,j,k,l"),
    C = c(letters[1:6], letters[7:12])
))
#>    A           B C
#> 1  1        <NA> a
#> 2  1        <NA> b
#> 3  1        <NA> c
#> 4  1        <NA> d
#> 5  1        <NA> e
#> 6  1 a,b,c,d,e,f f
#> 7  2        <NA> g
#> 8  2        <NA> h
#> 9  2        <NA> i
#> 10 2        <NA> j
#> 11 2        <NA> k
#> 12 2 g,h,i,j,k,l l

Created on 2022-01-28 by the reprex package (v2.0.1)

I can't for the life of me find a solution to this. Ideas? Preferably I'd like to stay within the tidyverse framework if possible, but I'll take any suggestions!

CodePudding user response:

A possible solution, removing all NA first and then separating into rows, by comma, the elements together:

library(tidyverse)

my_df <- data.frame(
  A = c(rep(1, 6), rep(2, 6)),
  B = c(rep(NA, 5), "a,b,c,d,e,f", rep(NA, 5), "g,h,i,j,k,l")
)

my_df %>% 
  drop_na(B) %>% 
  separate_rows(B, sep=",") 

#> # A tibble: 12 × 2
#>        A B    
#>    <dbl> <chr>
#>  1     1 a    
#>  2     1 b    
#>  3     1 c    
#>  4     1 d    
#>  5     1 e    
#>  6     1 f    
#>  7     2 g    
#>  8     2 h    
#>  9     2 i    
#> 10     2 j    
#> 11     2 k    
#> 12     2 l

CodePudding user response:

Another alternative to try. After grouping by column A, use strsplit on the comma separated values in column B (removing NA).

library(tidyverse)

my_df %>%
  group_by(A) %>%
  mutate(B = unlist(strsplit(na.omit(B), ',')))

Output

       A B    
   <dbl> <chr>
 1     1 a    
 2     1 b    
 3     1 c    
 4     1 d    
 5     1 e    
 6     1 f    
 7     2 g    
 8     2 h    
 9     2 i    
10     2 j    
11     2 k    
12     2 l
  •  Tags:  
  • Related