R datasummary: Show N and percent in one column-CodePudding

I am currently trying to display the count of factor levels (e.g., gender) and their relative frequency per group (e.g., treatment group) using datasummary. In addition, I would like to combine this with the display of quantitative variables (e.g., age) with their respective mean and standard deviation.

So far, I created a function to display mean and sd in one column and managed to calculate N and percentages. However, I am struggling with creating a function that displays N and percentage in one column as well as adding the empty column to the datasummary of the quantitative variable to combine both frames (based on Show count of unique values in datasummary and combine two different tables of descriptive statistics using data).

library(modelsummary)
library(magrittr)
library(dplyr)

set.seed(123)
iris$gender <- factor(sample(1:3, size = 150, replace = T), 
                      labels = c("Male", "Female", "Other"))
iris$job <-  factor(sample(1:5, size = 150, replace = T), 
                    labels = c("Student", "Worker", "CEO", "Other", "None"))
empty <- function(...) ""

MeanSD = function(x) {
  M = mean(x, na.rm = T)
  SD = sd(x, na.rm = T)
  MSD = paste(round(M, 2), " (",round(SD,2), ")", sep = "")
  return(MSD)
}
#This function does not work properly
  NP = function(x, y) {
    N = N(x)
    P = Percent(x, y, denom = "col")
    out = paste(N, " (",P, ")", sep = "")
    return(NP)
  }
iris_tab1 <- iris %>% dplyr::select(Species,
                                    Gender = gender,
                                    Job = job,
                                    Length = Sepal.Length)
tbl_1 <- datasummary((Heading("")*N   Heading("")*Percent(fn = function(x, y) 100 * length(x) / length(y), denom = "col"))*(Gender   Job)~Species,
                          data = iris_tab1,
                          fmt = 2,
                          output = 'data.frame'
)
tbl_1

#Cannot add the empty column
tbl_2 <- datasummary(Heading("")*(MeanSD)*Length~empty Species, 
                          data = iris_tab1,
                          output = 'data.frame'
)
tbl_2

CodePudding user response：

empty is a function. MeanSD is a function. All functions need to go on the same side of the datasummary formula:

library(modelsummary)
library(magrittr)
library(dplyr)

set.seed(123)
iris$gender <- factor(sample(1:3, size = 150, replace = T),
    labels = c("Male", "Female", "Other"))
iris$job <- factor(sample(1:5, size = 150, replace = T),
    labels = c("Student", "Worker", "CEO", "Other", "None"))

empty <- function(...) ""

MeanSD = function(x) {
    M = mean(x, na.rm = T)
    SD = sd(x, na.rm = T)
    MSD = paste(round(M, 2), " (", round(SD, 2), ")", sep = "")
    return(MSD)
}

iris_tab1 <- iris %>%
    dplyr::select(Species,
        Gender = gender,
        Job = job,
        Length = Sepal.Length)

tbl_2 <- datasummary(Heading("") * Length ~ empty   MeanSD * Species,
    data = iris_tab1,
    output = "data.frame")
tbl_2
#>     empty      setosa  versicolor   virginica
#> 1         5.01 (0.35) 5.94 (0.52) 6.59 (0.64)

Simple illustration of Percent function:

library(modelsummary)

dat <- mtcars
dat$cyl <- as.factor(dat$cyl)

fn <- function(x, y) {
    out <- sprintf(
        "%s (%.1f%%)",
        length(x),
        length(x) / length(y) * 100)
}
datasummary(
    cyl ~ Percent(fn = fn),
    data = dat)

cyl	Percent
4	11 (34.4%)
6	7 (21.9%)
8	14 (43.8%)