This question is about prepping data for ggplot
I have a long dataframe of survey questions and option responses (the real one is 76,931 rows) of the form:
df <- data.frame(
Question = c( "Qone", "Qone", "Qone", "Qtwo", "Qtwo", "Qthree", "Qthree"),
Response = as.factor( c( "Opt1", "Opt2","Opt1","Opt1","Opt2","Opt1","Opt2"))
)
and I want to create a summary table of the form
| Question | Response | Count | Total_of_Q | Percent_C_of_Total |
|---|---|---|---|---|
| Qone | Opt1 | 2 | 3 | 66.6 |
| Qone | Opt2 | 1 | 3 | 33.3 |
| Qtwo | Opt1 | 1 | 2 | 50 |
| Qtwo | Opt2 | 1 | 2 | 50 |
| Qthree | Opt1 | 1 | 2 | 50 |
| Qthree | Opt2 | 1 | 2 | 50 |
using
library(tidyverse)
If I write
df %>%
group_by(Question) %>%
summarise(Total_of_Q = n()) -> totals
totals
Totals is
| Question | Total_of_Q |
|---|---|
| Qone | 3 |
| Qtwo | 2 |
| Qthree | 2 |
and if I write
df %>%
group_by(Question) %>%
count(Response) -> Count
Count is
| Question | Response | Count |
|---|---|---|
| Qone | Opt1 | 2 |
| Qone | Opt2 | 1 |
| Qtwo | Opt1 | 1 |
| Qtwo | Opt2 | 1 |
| Qthree | Opt1 | 1 |
| Qthree | Opt2 | 1 |
But at this point I have two data frames that are pretty hard to bind together (because of different row lengths) and I can't help thinking there must be better ways to generate the whole table all at once.
Any hints or suggestions gratefully received.
CodePudding user response:
library(dplyr)
df %>%
# shortcut for group_by(Question, Response) %>% summarize(Count = n())`
count(Question, Response, name = "Count") %>% # 1: get detailed counts
group_by(Question) %>%
mutate(total = sum(Count), # 2: get counts by queston
pct = Count / total) %>% # 1 divided by 2
ungroup()
