I just have a quick question here.
I am trying to find if there are duplicate values in a vector in R. For example, given the vector below:
numbers <- c(10, 45, 32, 10, 56, 43, 32, 9)
I want to create a for-loop nested within a for-loop to find if any values occur more than once (i.e. 10 & 32).
I want to do it with nested for loops, since I want to practice doing this with R.
I have accomplished a working script in Python that can succesfully find duplicate values:
numbers = [10, 45, 32, 10, 56, 43, 32, 9]
def similars(ourlist, container):
for i in range(len(ourlist)):
k = i 1
for j in range(k, len(ourlist)):
if ourlist[i]==ourlist[j] and ourlist[i] not in container:
container.append(ourlist[i])
return container
container1=[]
similars(numbers, container1)
print(container1)
The above is the Python code, and it finds the duplicate values 10 & 32 when I print (and excuse the possible indentation errors, the indentations sort of messed up when transporting it to StackOverflow :-) ).
I have some R code below that attempts to do the same:
numbers <- c(10, 45, 32, 10, 56, 43, 32, 9)
similars <- function(ourlist, container){
for (i in 1:length(ourlist)){
k <- i 1
for (j in k:length(ourlist)){
if (ourlist[i] == ourlist[j] & !(ourlist[i] %in% container)){
container[i] <- ourlist[i]
}
}
}
return (container)
}
container1 <- c()
similars(numbers, container1)
print(container1)
However, I get the following error message when I attempt to run it:
Error in if (ourlist[i] == ourlist[j] & !(ourlist[i] %in% container)) { :
missing value where TRUE/FALSE needed
Calls: similars
Execution halted
I feel there is a simple answer to it, but grudginly, I do not seem to be able to intuit it. Do any of you know why it says "missing value where TRUE/FALSE needed" when the same error does not occur in Python, and perhaps also how to fix the error in R?
In advance, thank you.
Best regards
CodePudding user response:
Your inner loop is extending beyond the length of
ourlist. With this exampleiwill iterate from 1 to 8 (length(ourlist)); on the last iteration wheniis 1, then you callk <- i 1, making it9. You then iteratejfromktolength(ourlist)which evaluates to9:8(a decreasing sequence, length 2).The answer, knowing that you want to compare an element with the element(s) after it, is that your
imust iterate up to but not includinglength(ourlist). In that way, yourk <- i 1will never be longer than the length ofourlist.A literal fix for that:
similars <- function(ourlist, container){ for (i in 1:(length(ourlist)-1)) { k <- i 1 for (j in k:length(ourlist)){ if (ourlist[i] == ourlist[j] & !(ourlist[i] %in% container)){ if (is.na(ourlist[i])) browser() container[i] <- ourlist[i] } } } return (container) } similars(numbers, container1) # [1] 10 NA 32- Next issue: why the
NA? That's because you are assigning to the output at indexi, not necessarily "append one element to the output". Let's do the append:
similars <- function(ourlist, container){ for (i in 1:(length(ourlist)-1)) { k <- i 1 for (j in k:length(ourlist)){ if (ourlist[i] == ourlist[j] & !(ourlist[i] %in% container)){ if (is.na(ourlist[i])) browser() container <- c(container, ourlist[i]) # container[i] <- ourlist[i] } } } return (container) } similars(numbers, container1) # [1] 10 32- Next issue: why the
(Minor.) Inside an
ifclause, the conditional must always be length-1. Use&&instead of&.if (ourlist[i] == ourlist[j] && !(ourlist[i] %in% container)){Why? Primarily for short-circuiting.
&and|are vectorized, which means it accepts something likec(TRUE,FALSE) | c(FALSE, TRUE), and it always iterates all aspects of both sides.&&is single only, but it short-circuits such that if the first resolves perfectly then the second will not even attempt to evaluate. Examples:TRUE || stop("oops") # [1] TRUE FALSE && stop("oops") # [1] FALSE TRUE && stop("oops") # Error: oops(Minor.) Passing
containerseems unnecessary here. R passes by-reference, so it is not as if you are pre-allocating memory here. I suggest you remove it from the argument list, and pre-define it in the function.similars <- function(ourlist) { container <- c() for (i in 1:(length(ourlist)-1)) { k <- i 1 for (j in k:length(ourlist)){ if (ourlist[i] == ourlist[j] && !(ourlist[i] %in% container)){ if (is.na(ourlist[i])) browser() container <- c(container, ourlist[i]) # container[i] <- ourlist[i] } } } return (container) }(More minor.) Let's think along the computer-science-y (CS) lines of "allow 0 or more". In this sense, is it "reasonable" to pass an empty vector? If that is given as the argument, then one might expect an empty vector be returned as well. However ...
1:length(.)will not work here. Demo:vec <- 2:4 1:length(vec) # [1] 1 2 3 seq_along(vec) # [1] 1 2 3 seq_len(length(vec)) # [1] 1 2 3 vec <- c() 1:length(vec) # [1] 1 0 # this is broken seq_along(vec) # integer(0) seq_len(length(vec)) # integer(0)I suggest you use
seq_len(length(ourlist))(orlength(.)-1), making the final version in this answer:similars <- function(ourlist) { container <- c() for (i in seq_len(max(0, length(ourlist)-1))) { k <- i 1 for (j in (k-1) seq_len(max(0, length(ourlist)-(k-1)))) { if (ourlist[i] == ourlist[j] && !(ourlist[i] %in% container)){ if (is.na(ourlist[i])) browser() container <- c(container, ourlist[i]) # container[i] <- ourlist[i] } } } return (container) } similars(numbers)#, container1) # [1] 10 32 similars(c()) # NULL
CodePudding user response:
The loop can be a single loop instead of nested - loop over the sequence from the 2nd element to the last (length), then if the current element ourlist[i] is present %in% the sequence of previous elements and not (!) present in the storage container, concatenate (c) with the 'container' with the current element and update by assignment (<-)
similars <- function(ourlist, container){
for(i in 2:length(ourlist)) {
if(ourlist[i] %in% ourlist[seq(i-1)] & !(ourlist[i] %in% container)) {
container <- c(container, ourlist[i])
}
}
container
}
-testing
> container1 <- c()
> similars(numbers, container1)
[1] 10 32
Here, we don't want to use a nested loop because %in% is vectorized and thus save a lot of unnecessary iterations
It can be done in a more easier way with duplicated in R
> numbers[duplicated(numbers)]
[1] 10 32
Regarding why there is an error, it is already specified in the comments Regarding the issue in code your outer loop will be till the last element, then you are assigning k <- i 1, which will be outside the index
