I have two vectors. One is start and one is stop for a range of Nucleotides in a protein. Ex. one range is 1374742-1375555.
domainStart = c(1374742,1374760,1374769,1375822,1376182,1376320,1376350)
domainStop = c(1375555, 1375726,1375516, 1378129, 1376638, 1376638, 1377382)
Next I have a long list of nucleotide mutation positions.
db = c(37788, 40303, 138445, 161587, 165946,172979,177605, 200118, 244427, 251156, 258459, 265170, 344062)
I want to know if any of the mutation positions (db) are in the ranges of the domain (1374742-1375555) and return TRUE /FALSE as a vector for each position. Thanks!
CodePudding user response:
You could use map2() from the purrr package:
domainStart = c(1374742,1374760,1374769,1375822,1376182,1376320,1376350)
domainStop = c(1375555, 1375726,1375516, 1378129, 1376638, 1376638, 1377382)
db = c(37788, 40303, 138445, 161587, 165946,172979,177605, 200118, 244427, 251156, 258459, 265170, 344062)
purrr:::map2(domainStart, domainStop, ~which(db > .x & db < .y))
# [[1]]
# integer(0)
#
# [[2]]
# integer(0)
#
# [[3]]
# integer(0)
#
# [[4]]
# integer(0)
#
# [[5]]
# integer(0)
#
# [[6]]
# integer(0)
#
# [[7]]
# integer(0)
Each element of the list identifies the position of the match in db for each pair of start/stop values. Here it is with some that actually work:
db <- c(1374750, 1374761, 1374770)
purrr:::map2(domainStart, domainStop, ~which(db > .x & db < .y))
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 2 3
#
# [[3]]
# [1] 3
#
# [[4]]
# integer(0)
#
# [[5]]
# integer(0)
#
# [[6]]
# integer(0)
#
# [[7]]
# integer(0)
Update: Fixed to address comment
db <- c(1374750, 1374761, 1374770)
purrr:::map2(domainStart, domainStop, function(.x,.y){
mx <- db[which(db > .x & db < .y)]
if(length(mx) == 0){
mx <- NA
}
data.frame(domainStart = .x, domainStop = .y, db = mx)
})
# [[1]]
# domainStart domainStop db
# 1 1374742 1375555 1374750
# 2 1374742 1375555 1374761
# 3 1374742 1375555 1374770
#
# [[2]]
# domainStart domainStop db
# 1 1374760 1375726 1374761
# 2 1374760 1375726 1374770
#
# [[3]]
# domainStart domainStop db
# 1 1374769 1375516 1374770
#
# [[4]]
# domainStart domainStop db
# 1 1375822 1378129 NA
#
# [[5]]
# domainStart domainStop db
# 1 1376182 1376638 NA
#
# [[6]]
# domainStart domainStop db
# 1 1376320 1376638 NA
#
# [[7]]
# domainStart domainStop db
# 1 1376350 1377382 NA
CodePudding user response:
Perhaps we can try the code below
df <- data.frame(Start = domainStart, Stop = domainStop)
apply(
outer(db, domainStart, `>=`) & outer(db, domainStart, `<=`),
1,
function(v) {
df[which(v, arr.ind = TRUE), ]
}
)
