I am currently trying to display the count of factor levels (e.g., gender) and their relative frequency per group (e.g., treatment group) using datasummary. In addition, I would like to combine this with the display of quantitative variables (e.g., age) with their respective mean and standard deviation.
So far, I created a function to display mean and sd in one column and managed to calculate N and percentages. However, I am struggling with creating a function that displays N and percentage in one column as well as adding the empty column to the datasummary of the quantitative variable to combine both frames (based on Show count of unique values in datasummary and combine two different tables of descriptive statistics using data).
library(modelsummary)
library(magrittr)
library(dplyr)
set.seed(123)
iris$gender <- factor(sample(1:3, size = 150, replace = T),
labels = c("Male", "Female", "Other"))
iris$job <- factor(sample(1:5, size = 150, replace = T),
labels = c("Student", "Worker", "CEO", "Other", "None"))
empty <- function(...) ""
MeanSD = function(x) {
M = mean(x, na.rm = T)
SD = sd(x, na.rm = T)
MSD = paste(round(M, 2), " (",round(SD,2), ")", sep = "")
return(MSD)
}
#This function does not work properly
NP = function(x, y) {
N = N(x)
P = Percent(x, y, denom = "col")
out = paste(N, " (",P, ")", sep = "")
return(NP)
}
iris_tab1 <- iris %>% dplyr::select(Species,
Gender = gender,
Job = job,
Length = Sepal.Length)
tbl_1 <- datasummary((Heading("")*N Heading("")*Percent(fn = function(x, y) 100 * length(x) / length(y), denom = "col"))*(Gender Job)~Species,
data = iris_tab1,
fmt = 2,
output = 'data.frame'
)
tbl_1
#Cannot add the empty column
tbl_2 <- datasummary(Heading("")*(MeanSD)*Length~empty Species,
data = iris_tab1,
output = 'data.frame'
)
tbl_2
CodePudding user response:
empty is a function. MeanSD is a function. All functions need to go on the same side of the datasummary formula:
library(modelsummary)
library(magrittr)
library(dplyr)
set.seed(123)
iris$gender <- factor(sample(1:3, size = 150, replace = T),
labels = c("Male", "Female", "Other"))
iris$job <- factor(sample(1:5, size = 150, replace = T),
labels = c("Student", "Worker", "CEO", "Other", "None"))
empty <- function(...) ""
MeanSD = function(x) {
M = mean(x, na.rm = T)
SD = sd(x, na.rm = T)
MSD = paste(round(M, 2), " (", round(SD, 2), ")", sep = "")
return(MSD)
}
iris_tab1 <- iris %>%
dplyr::select(Species,
Gender = gender,
Job = job,
Length = Sepal.Length)
tbl_2 <- datasummary(Heading("") * Length ~ empty MeanSD * Species,
data = iris_tab1,
output = "data.frame")
tbl_2
#> empty setosa versicolor virginica
#> 1 5.01 (0.35) 5.94 (0.52) 6.59 (0.64)
Simple illustration of Percent function:
library(modelsummary)
dat <- mtcars
dat$cyl <- as.factor(dat$cyl)
fn <- function(x, y) {
out <- sprintf(
"%s (%.1f%%)",
length(x),
length(x) / length(y) * 100)
}
datasummary(
cyl ~ Percent(fn = fn),
data = dat)
| cyl | Percent |
|---|---|
| 4 | 11 (34.4%) |
| 6 | 7 (21.9%) |
| 8 | 14 (43.8%) |
