Home > Net >  R - Scatter plots - points() - positioning of data
R - Scatter plots - points() - positioning of data

Time:01-12

Good evening all,

I'm having an issue with my scatter plot, I am placing all my points in a single area on the x axis but I want it to spread across the whole plot. First, here's a minimal example of what I'm working with:

data <- structure(list(names = structure(1:2, .Label = c("george", "harry"), class = "factor"), 
                       gender = structure(c(1L, 1L), .Label = "male", class = "factor"), 
                       data1 = c(0.373, 0.37), data2 = c(0.376, 0.377), data3 = c(0.371, 0.361), 
                       data4 = c(0.3786, 0.3576), data5 = c(0.3262, 0.2928)), row.names = c(NA, -2L), class = "data.frame")

#       names gender data1 data2 data3  data4  data5
#    1 george male  0.373  0.376 0.371 0.3786 0.3262
#    2  harry male  0.370  0.377 0.361 0.3576 0.2928

headers = c("data1","data2","data3","data4","data5")

names <- levels(unique(data$names))

bcol=as.factor(headers)

# colours
mapcol <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#0072B2", "#D55E00", "#CC79A7")

plot(
  data[(data$gender == "male"),3],
  col=mapcol[1],
  pch=pch[20],
  xlim=c(1,1800),ylim=c(0,1),
  xlab="Names",ylab="Data Points",
  axes=F,cex.lab=1.5
)

###### This is likely where the issue is:

for(i in seq_along(names) ) 
  for(j in 3:7)
    points(
      data[(data$names == names[i]),j],
      col=mapcol[j-3],pch=pch[bcol]
    )

Here's an (edited to remove any confidential data) image showing what this code (and some later stuff that is just for drawing axes, legend etc) does.

Image example

My problem is this that I want the data to spread across for each name on the x axis, but is currently all just being displayed on "Harry".

CodePudding user response:

The root of your problem is that you are giving plot and points only a single value to plot at a time. The defaults will interpret the value you give as a y value, and assume 1 for the x value. You can solve this by giving an x value corresponding to the name.

plot(
  data[(data$gender == "male"),3],
  col=mapcol[1],
  ## pch was not included in the question, so this arg caused an error
  #  pch=pch[20], 
  ## key change: set the x limits to be 1 to the number of names
  xlim=c(1,length(names)),
  ylim=c(0,1),
  xlab="Names",ylab="Data Points",
  axes=F,cex.lab=1.5
)

for(i in seq_along(names)) {
  for(j in 3:7) {
    points(
      i, ## use i as an x coordinate, making sure i starts at 1
      data[(data$names == names[i]),j],
      col=mapcol[j-3]
      #,pch=pch[bcol] 
      ## pch was not included in the question, so this argument caused an error
    )
  }
}

enter image description here

The points function is vectorized, so we can also simplify your code to eliminate one of the loops. This gives the same result (though it does assume that the data frame is sorted in the same order as the levels - if you define names <- unique(data$names), skipping the redundant levels(), then it will take the names in the order they appear in the data.

plot(
  data[(data$gender == "male"),3],
  col=mapcol[1],
  ## pch was not included in the question, so this arg caused an error
  #  pch=pch[20], 
  ## key change: set the x limits to be 1 to the number of names
  xlim=c(1,length(names)),
  ylim=c(0,1),
  xlab="Names",ylab="Data Points",
  axes=F,cex.lab=1.5
)

for(j in 3:7) {
  points(
    x = seq_along(names),
    y = data[, j],
    col=mapcol[j-3]
  )
}

CodePudding user response:

You can try using data.table and ggplot2. Step 1: Convert the dataframe to a long format using the melt function Step 2: Use ggplot to plot the points

R code

library(data.table)
library(ggplot2)

data=as.data.table(data)
data_long=melt(data,id.vars=c("names","gender"),measure.vars=c("data1","data2","data3","data4","data5"))

ggplot(data=data_long,aes(x=names,y=values,col='variable')) 
   geom_point()

enter image description here

enter image description here

  •  Tags:  
  • Related