Assign value in vector based on presence in another vector in R?-CodePudding

I have tried to look for a similar question and I´m sure other people encountered this problem but I still couldn´t find something that helped me. I have a dataset1 with 37.000 observations like this:

id     hours
130      12
165      56
250      13
11       15
17       42

and another dataset2 with 38. 000 observations like this:

id     hours
130      6
165      23
250      9
11       14
17       11

I want to do the following: if an id of dataset1 is in dataset2, the hours of dataset1 should override the hours of dataset2. For the id´s who are in dataset1 but not in dataset2, the value for dataset2$hours should be NA.

I tried the %in% operator, ifelse(), a loop, and some base R commands but I can´t figure it out. I always get the error that the vectors don´have the same length.

Thanks for any help!

CodePudding user response：

You can replace hours with NAs for id that don't match between df1 and df2. Since both your data sets had the same values for ids, I added one row in df1 with id = 123 and hours = 12.

df1$hours <- replace(df1$hours, is.na(match(df1$id,df2$id)), NA)

df1
   id hours
1 130    12
2 165    56
3 250    13
4  11    15
5  17    42
6 123    NA

data

df1 <- structure(list(id = c(130L, 165L, 250L, 11L, 17L, 123L), hours = c(12L, 
56L, 13L, 15L, 42L, NA)), row.names = c(NA, -6L), class = "data.frame")
   id hours
1 130    12
2 165    56
3 250    13
4  11    15
5  17    42
6 123    12

df2 <- structure(list(id = c(130L, 165L, 250L, 11L, 17L), hours = c(6L, 
23L, 9L, 14L, 11L)), class = "data.frame", row.names = c(NA, 
-5L))

CodePudding user response：

First match ID's of replacement data with ID's of original data while using na.omit() for the case when replacement ID's are not contained in original data. Replace with replacement data whose ID's are in original ID's.

I expanded both data sets to fabricate cases with no matches.

dat1
#    id hours
# 1 130    12
# 2 165    56
# 3 250    13
# 4  11    15
# 5  17    42
# 6  12   232
# 7  35   456

dat2
#    id hours
# 1  11    14
# 2  17    11
# 3 165    23
# 4 999    99
# 5 130     6
# 6 250     9

Replacement

dat1[na.omit(match(dat2$id, dat1$id)), ]$hours <- 
  dat2[dat2$id %in% dat1$id, ]$hours
dat1
#    id hours
# 1 130     6
# 2 165    23
# 3 250     9
# 4  11    14
# 5  17    11
# 6  12   232
# 7  35   456

Data:

dat1 <- structure(list(id = c(130L, 165L, 250L, 11L, 17L, 12L, 35L), 
    hours = c(12L, 56L, 13L, 15L, 42L, 232L, 456L)), class = "data.frame", row.names = c(NA, 
-7L))

dat2 <- structure(list(id = c(11L, 17L, 165L, 999L, 130L, 250L), hours = c(14L, 
11L, 23L, 99L, 6L, 9L)), class = "data.frame", row.names = c(NA, 
-6L))