Home > Mobile >  How to compute the NAs with the column mean and then multiply columns of different lengths in R?
How to compute the NAs with the column mean and then multiply columns of different lengths in R?

Time:01-18

My question might be not so clear so I am putting an example.

My final goal is to produce

final=(df1$a*df2$b) (df1$a*df3$c*df4$d) (df4$d*df5$e)

I have five data frames (one column each) with different lengths as follows:

df1

    a
1.  1
2.  2
3.  4
4.  2

df2

    b
1.  2
2.  6

df3

    c
1.  2
2.  4 
3.  3

df4

    d
1.  1
2.  2
3.  4
4.  3

df5

    e
1.  4
2.  6
3.  2

So I want a final database which includes them all as follows

finaldf

    a   b   c   d  e
1.  1   2   2   1  4
2.  2   6   4   2  6
3.  4   NA  3   4  2
4.  2   NA  NA  3  NA

I want all the NAs for each column to be replaced with the mean of that column, so the finaldf has equal length of all the columns:

finaldf

    a   b   c   d   e
1.  1   2   2   1   4
2.  2   6   4   2   6
3.  4   4   3   4   2
4.  2   4   3   3   4

and therefore I can produce a final result for final=(df1$a*df2$b) (df1$a*df3$c*df4$d) (df4$d*df5$e) as I need.

CodePudding user response:

The easiest by far is to use the qpcR, dplyr and tidyr packages.

library(dplyr)
library(qpcR)
library(tidyr)

df1 <- data.frame(a=c(1,2,4,2))
df2 <- data.frame(b=c(2,6))
df3 <- data.frame(c=c(2,4,3))
df4 <- data.frame(d=c(1,2,4,3))
df5 <- data.frame(e=c(4,6,2))

mydf <- qpcR:::cbind.na(df1, df2, df3, df4,df5) %>% 
  tidyr::replace_na(.,as.list(colMeans(.,na.rm=T)))

> mydf
  a b c d e
1 1 2 2 1 4
2 2 6 4 2 6
3 4 4 3 4 2
4 2 4 3 3 4

Depending on your rgl settings, you might need to run the following at the top of your script to make the qpcR package load (see https://stackoverflow.com/a/66127391/2554330 ):

options(rgl.useNULL = TRUE)
library(rgl)

CodePudding user response:

With purrr and dplyr, we can first put all dataframes in a list with mget(). Second, use set_names to replace the dataframe names with their respective column names. As a third step, unlist the dataframes to get vectors with pluck. Then add the NAs by making all vectors the same length. Finally, bind all vectors back into a dataframe with as.data.frame, then use mutate with ~replace_na and colmeans.

library(dplyr)
library(purrr)

mget(ls(pattern = 'df\\d')) %>%
        set_names(map_chr(., colnames)) %>%
        map(pluck, 1) %>%
        map(., `length<-`, max(lengths(.))) %>%
        as.data.frame %>%
        mutate(across(everything(), ~replace_na(.x, mean(.x, na.rm=TRUE))))
  •  Tags:  
  • Related