I have a data frame object that contains a subset of variables (model, mpg, year, etc.).
I created a data frame object from that only contains the first 200 observations called reducedset.
I am trying to make a summary statistics table that for only the model "cars" but I cannot figure it. I referenced vtable.pdf but am still struggling.
st(reducedset, group='model', group.test=TRUE)
CodePudding user response:
I do not have your data, so I tried to run your analysis over the Auto dataset from the package ISLR (see Introduction to Statistical Learning, James et al., 2013). I replaced the condition model == "cars" with year == 70, but the reasoning is the same.
library(ISLR)
dta = Auto # Replace this with your data!
reducedset = dta[1:200, ]
st(reducedset[reducedset$year == 70, ], group='name', group.test=TRUE) # Change the condition within square brackets!
CodePudding user response:
I believe you are looking for something like this. The following function termed my_stats() splits the subset of mtcars termed sub into groups of a grouping_factor (here: vs) and then computes the mean, sd, min, and max for each variable within sub.
# cars data
data(mtcars)
# random subset
sub <- mtcars[sample(seq_len(nrow(mtcars)), 20, replace = TRUE), ]
# function to compute the mean and sd for variables in 'df' according
# to 'grouping_factor'
my_stats <- \(df, grouping_factor){
sum_stats <- lapply(split(df, df[[grouping_factor]]), \(x) {
data.frame(sapply(x, \(i) cbind(
mean(i, na.rm = TRUE), sd(i, na.rm = TRUE),
min(i, na.rm = TRUE), max(i, na.rm = TRUE))))
})
sum_stats <- lapply(sum_stats, \(x) {
rownames(x) <- c('Mean', 'SD', 'Min', 'Max'); x
})
for(i in 1:length(sum_stats)) {
names(sum_stats)[i] <-
paste(grouping_factor, '=', levels(as.factor(df[[grouping_factor]]))[i])
}
return(sum_stats)
}
Output (for the first three columns in each group)
> lapply(my_stats(df = sub, grouping_factor = 'vs'), '[', 1:3)
$`vs = 0`
mpg cyl disp
Mean 16.650000 7.500000 296.3833
SD 3.234333 0.904534 99.4829
Min 10.400000 6.000000 145.0000
Max 21.000000 8.000000 460.0000
$`vs = 1`
mpg cyl disp
Mean 24.350000 4 113.21250
SD 3.000952 0 22.39256
Min 21.500000 4 79.00000
Max 30.400000 4 146.70000
If you would like to see all the output, simply run my_stats(df = sub, grouping_factor = 'vs').
Note: use function(x) instead of \(x) if you use a version of R <4.1.0
