I have a dataframe similar to the following:
> library(lubridate)
> df <- data.frame(name = c("george", "sara", "sam", "bill"),
start_date = mdy(c("January 1, 2022", "January 2, 2022", "January 5, 2022", "January 6, 2022")),
end_date = mdy(c("January 3, 2022", "January 4, 2022", "January 6, 2022", "January 8, 2022")),
group = c(1,1,2,2))
> df <- df %>%
mutate(date_range = interval(start_date,
end_date))
> df
name start_date end_date group date_range
1 george 2022-01-01 2022-01-03 1 2022-01-01 UTC--2022-01-03 UTC
2 sara 2022-01-02 2022-01-04 1 2022-01-02 UTC--2022-01-04 UTC
3 sam 2022-01-05 2022-01-06 2 2022-01-05 UTC--2022-01-06 UTC
4 bill 2022-01-06 2022-01-08 2 2022-01-06 UTC--2022-01-08 UTC
I would like to create two plots using ggplot if possible:
The first plot I want to display the date range for each person. It's easier to just show you what I mean, see photo.

The second plot I want to average the range for each group and display a boxplot or similar to show the distribution of dates for each group. See photo.

Any thoughts? I'm new to this hence drawing out what I want, I hope that it's helpful and clear.
CodePudding user response:
Allan is completely right when it comes to the first plot using geom_segment, I just thought I'd add that there actually is a geom to exactly do this in the ggalt package.
It's called a dumbbell plot and looks like this:
Here is the code I used to create it:
library(ggalt)
df %>%
ggplot(
aes(
x = start_date,
xend = end_date,
y = name
)
)
geom_dumbbell(
colour = "#a3c4dc",
colour_xend = "#0e668b",
size = 4
)
You can then use all the normal functions to make it look prettier. More on geom_dumbbell can be found through the help documentation or on this blog post
CodePudding user response:
You can achieve the first plot with geom_segment
library(ggplot2)
ggplot(df, aes(x = start_date, y = name, colour = name))
geom_segment(aes(xend = end_date, yend = name), colour = "black")
geom_point(size = 3)
geom_point(aes(x = end_date), size = 3)
theme_bw()
theme(legend.position = "none")

The second requires a bit of data reshaping, as akrun points out:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(2:3, names_to = "type", values_to = "date") %>%
ggplot(aes(date, factor(group)))
geom_boxplot(aes(colour = factor(group)))
theme_bw()
theme(legend.position = "none")

Created on 2022-01-22 by the reprex package (v2.0.1)

