I have df1:
IDcode Random
11 8
2 9
3 10
18 3
21 2
6 4
9 5
10 4
I have a second df named Goats:
IDcode Random
11 10
18 22
44 9
10 7
I have a third df named NotGoats
IDcode Random
2 11
3 10
6 18
9 10
43 11
I would like to make a new Goat column in df1 that checks to see if the IDcode exists in the Goats df or the NotGoats df. If it exists in Goats df, I would like to have a value of 1, if it exists in NotGoats, I would like to have a value of 0. If it does not exist in either, I would like to have an NA.
Output would look like:
IDcode Random Goat
11 8 1
2 9 0
3 10 0
18 3 1
21 2 NA
6 4 0
9 5 0
10 4 1
CodePudding user response:
Here is a tidyverse r option:
library(tidyverse)
df1 %>%
mutate(Goat = case_when(IDcode %in% Goats$IDcode == TRUE ~ 1,
IDcode %in% NotGoats$IDcode == TRUE ~ 0))
Output
IDcode Random Goat
1 11 8 1
2 2 9 0
3 3 10 0
4 18 3 1
5 21 2 NA
6 6 4 0
7 9 5 0
8 10 4 1
Data
df1 <-
structure(list(
IDcode = c(11L, 2L, 3L, 18L, 21L, 6L, 9L, 10L),
Random = c(8L, 9L, 10L, 3L, 2L, 4L, 5L, 4L)
),
class = "data.frame",
row.names = c(NA,-8L))
Goats <-
structure(list(
IDcode = c(11L, 18L, 44L, 10L),
Random = c(10L, 22L, 9L, 7L)
),
class = "data.frame",
row.names = c(NA,-4L))
NotGoats <-
structure(list(
IDcode = c(2L, 3L, 6L, 9L, 10L, 43L),
Random = c(11L, 10L, 18L, 10L, 3L, 11L)
),
class = "data.frame",
row.names = c(NA,-6L))
CodePudding user response:
You can check it like this:
import numpy as np
df1['Goat'] = np.nan
df1['Goat'] = np.where(df1['Goat'].isin(df_goats['IDcode']), 1, df1['Goat'])
df1['Goat'] = np.where(df1['Goat'].isin(df_no_goats['IDcode']), 0, df1['Goat'])
CodePudding user response:
In pandas, you can also create "Goat" dummy columns in Goats and NotGoats DataFrames, concatenate the two and then map it to df['IDcode']:
df['Goat'] = df['IDcode'].map(pd.concat([Goats.assign(Goat=1), NotGoats.assign(Goat=0)]).set_index('IDcode')['Goat'])
Output:
IDcode Random Goat
0 11 8 1.0
1 2 9 0.0
2 3 10 0.0
3 18 3 1.0
4 21 2 NaN
5 6 4 0.0
6 9 5 0.0
7 10 4 1.0
CodePudding user response:
Here's another alternative, in case you want to keep Random from difference sources.
Add a Goat column for your other data.frames (containing either 1 or 0), and combine with bind_rows. Then left_join.
library(tidyverse)
Goats$Goat <- 1
NotGoats$Goat <- 0
left_join(
df1,
bind_rows(Goats, NotGoats),
by = "IDcode"
)
Output
IDcode Random.x Random.y Goat
1 11 8 10 1
2 2 9 11 0
3 3 10 10 0
4 18 3 22 1
5 21 2 NA NA
6 6 4 18 0
7 9 5 10 0
8 10 4 7 1
