How do I search if the numbers in one vector are within a range of two other vectors in R?-CodePudding

I have two vectors. One is start and one is stop for a range of Nucleotides in a protein. Ex. one range is 1374742-1375555.

domainStart = c(1374742,1374760,1374769,1375822,1376182,1376320,1376350)
domainStop = c(1375555, 1375726,1375516, 1378129, 1376638, 1376638, 1377382)

Next I have a long list of nucleotide mutation positions.

  db =  c(37788, 40303, 138445, 161587, 165946,172979,177605, 200118, 244427, 251156, 258459, 265170, 344062)

I want to know if any of the mutation positions (db) are in the ranges of the domain (1374742-1375555) and return TRUE /FALSE as a vector for each position. Thanks!

CodePudding user response：

You could use map2() from the purrr package:

domainStart = c(1374742,1374760,1374769,1375822,1376182,1376320,1376350)
domainStop = c(1375555, 1375726,1375516, 1378129, 1376638, 1376638, 1377382)

db =  c(37788, 40303, 138445, 161587, 165946,172979,177605, 200118, 244427, 251156, 258459, 265170, 344062)
purrr:::map2(domainStart, domainStop, ~which(db > .x & db < .y))
# [[1]]
# integer(0)
# 
# [[2]]
# integer(0)
# 
# [[3]]
# integer(0)
# 
# [[4]]
# integer(0)
# 
# [[5]]
# integer(0)
# 
# [[6]]
# integer(0)
# 
# [[7]]
# integer(0)

Each element of the list identifies the position of the match in db for each pair of start/stop values. Here it is with some that actually work:

db <- c(1374750, 1374761, 1374770)
purrr:::map2(domainStart, domainStop, ~which(db > .x & db < .y))
# [[1]]
# [1] 1 2 3
# 
# [[2]]
# [1] 2 3
# 
# [[3]]
# [1] 3
# 
# [[4]]
# integer(0)
# 
# [[5]]
# integer(0)
# 
# [[6]]
# integer(0)
# 
# [[7]]
# integer(0)

Update: Fixed to address comment

db <- c(1374750, 1374761, 1374770)
purrr:::map2(domainStart, domainStop, function(.x,.y){
  mx <- db[which(db > .x & db < .y)]
  if(length(mx) == 0){
    mx <- NA
  }
  data.frame(domainStart = .x, domainStop = .y, db = mx)
  })

# [[1]]
#   domainStart domainStop      db
# 1     1374742    1375555 1374750
# 2     1374742    1375555 1374761
# 3     1374742    1375555 1374770
# 
# [[2]]
#   domainStart domainStop      db
# 1     1374760    1375726 1374761
# 2     1374760    1375726 1374770
# 
# [[3]]
#   domainStart domainStop      db
# 1     1374769    1375516 1374770
# 
# [[4]]
#   domainStart domainStop db
# 1     1375822    1378129 NA
# 
# [[5]]
#   domainStart domainStop db
# 1     1376182    1376638 NA
# 
# [[6]]
#   domainStart domainStop db
# 1     1376320    1376638 NA
# 
# [[7]]
#   domainStart domainStop db
# 1     1376350    1377382 NA

CodePudding user response：

Perhaps we can try the code below

df <- data.frame(Start = domainStart, Stop = domainStop)
apply(
  outer(db, domainStart, `>=`) & outer(db, domainStart, `<=`),
  1,
  function(v) {
    df[which(v, arr.ind = TRUE), ]
  }
)