I am trying to create a correlation matrix between my X and Y variables and display this information in a nice figure. I am currently using ggpairs() from the GGally package, but if there's a better way to do this then I am happy to try something new. The figure should:
-Fit linear regression models (using lm) between X and Y variables
-Display scatterplots with a regression line
-Display the Coefficient of the determination (R2)
-Map the colour of points/lines/R2 values by group
I have been able to do most of this, but ggpairs only displays the correlation coefficient (r) and not the coefficient of determination (R2). I was able to use the suggestion from this post, but unfortunately the solution does not display R2 values by group.
So far:
library(GGally)
library(ggplot2)
cars <- mtcars
cars$group <- factor(c(rep("A", 16), rep("B", 16))) #adding grouping variable
#function to return R2 (coefficient of determination) and not just r (Coefficient of correlation) in the top portion of the figure
upper_fn <- function(data, mapping, ndp=2, ...){
# Extract the relevant columns as data
x <- eval_data_col(data, mapping$x)
y <- eval_data_col(data, mapping$y)
# Calculate the r^2 & format output
m <- summary(lm(y ~ x))
lbl <- paste("r^2: ", formatC(m$r.squared, digits=ndp, format="f"))
# Write out label which is centered at x&y position
ggplot(data=data, mapping=mapping)
annotate("text", x=mean(x, na.rm=TRUE), y=mean(y, na.rm=TRUE), label=lbl, parse=TRUE, ...)
theme(panel.grid = element_blank())
}
#lower function basically fits a linear model and displays points
lower_fn <- function(data, mapping, ...){
p <- ggplot(data = data, mapping = mapping)
geom_point(alpha = 0.7)
geom_smooth(method=lm, fill="blue", se = F, ...)
p
}
#The actual figure
ggpairs(cars,
columns = c(1:11),
mapping = ggplot2::aes(color = group),
upper = list(continuous = "cor", size = 15),
diag = list(continuous = "densityDiag", alpha=0.5),
lower = list(continuous = lower_fn))
CodePudding user response:
Based on Is it possible to split correlation box to show correlation values for two different treatments in pairplot?, below is a little code to get you started.
The idea is that you need to 1. split the data over the aesthetic variable (which is assumed to be colour), 2. run a regression over each data subset and extract the r^2, 3. quick calculation of where to place the r^2 labels, 4. plot. Some features are left to do.
upper_fn <- function(data, mapping, ndp=2, ...){
# Extract the relevant columns as data
x <- eval_data_col(data, mapping$x)
y <- eval_data_col(data, mapping$y)
col <- eval_data_col(data, mapping$colour)
# if no colour mapping run over full data
if(is.null(col)) {
## add something here
}
# if colour aesthetic, split data and run `lm` over each group
if(!is.null(col)) {
idx <- split(seq_len(nrow(data)), col)
r2 <- unlist(lapply(idx, function(i) summary(lm(y[i] ~ x[i]))$r.squared))
lvs <- if(is.character(col)) sort(unique(col)) else levels(col)
cuts <- seq(min(y, na.rm=TRUE), max(y, na.rm=TRUE), length=length(idx) 1L)
pos <- (head(cuts, -1) tail(cuts, -1))/2
p <- ggplot(data=data, mapping=mapping, ...)
geom_blank()
theme_void()
# you could map colours to each level
annotate("text", x=mean(x), y=pos, label=paste(lvs, ": ", formatC(r2, digits=ndp, format="f")))
}
return(p)
}
