How to create a column from different dataframes?-CodePudding

I have df1:

IDcode   Random
11       8
2        9
3        10 
18       3
21       2
6        4
9        5
10       4

I have a second df named Goats:

IDcode   Random
11       10
18       22
44       9
10       7

I have a third df named NotGoats

IDcode   Random
2        11
3        10
6        18
9        10
43       11

I would like to make a new Goat column in df1 that checks to see if the IDcode exists in the Goats df or the NotGoats df. If it exists in Goats df, I would like to have a value of 1, if it exists in NotGoats, I would like to have a value of 0. If it does not exist in either, I would like to have an NA.

Output would look like:

IDcode   Random  Goat
11       8       1
2        9       0
3        10      0
18       3       1
21       2       NA
6        4       0
9        5       0
10       4       1

CodePudding user response：

Here is a tidyverse r option:

library(tidyverse)

df1 %>% 
  mutate(Goat = case_when(IDcode %in% Goats$IDcode == TRUE ~ 1,
         IDcode %in% NotGoats$IDcode == TRUE ~ 0))

Output

  IDcode Random Goat
1     11      8    1
2      2      9    0
3      3     10    0
4     18      3    1
5     21      2   NA
6      6      4    0
7      9      5    0
8     10      4    1

Data

df1 <-
  structure(list(
    IDcode = c(11L, 2L, 3L, 18L, 21L, 6L, 9L, 10L),
    Random = c(8L, 9L, 10L, 3L, 2L, 4L, 5L, 4L)
  ),
  class = "data.frame",
  row.names = c(NA,-8L))

Goats <-
  structure(list(
    IDcode = c(11L, 18L, 44L, 10L),
    Random = c(10L, 22L, 9L, 7L)
  ),
  class = "data.frame",
  row.names = c(NA,-4L))

NotGoats <-
  structure(list(
    IDcode = c(2L, 3L, 6L, 9L, 10L, 43L),
    Random = c(11L, 10L, 18L, 10L, 3L, 11L)
  ),
  class = "data.frame",
  row.names = c(NA,-6L))

CodePudding user response：

You can check it like this:

import numpy as np

df1['Goat'] = np.nan
df1['Goat'] = np.where(df1['Goat'].isin(df_goats['IDcode']), 1, df1['Goat'])
df1['Goat'] = np.where(df1['Goat'].isin(df_no_goats['IDcode']), 0, df1['Goat'])

CodePudding user response：

In pandas, you can also create "Goat" dummy columns in Goats and NotGoats DataFrames, concatenate the two and then map it to df['IDcode']:

df['Goat'] = df['IDcode'].map(pd.concat([Goats.assign(Goat=1), NotGoats.assign(Goat=0)]).set_index('IDcode')['Goat'])

Output:

   IDcode  Random  Goat
0      11       8   1.0
1       2       9   0.0
2       3      10   0.0
3      18       3   1.0
4      21       2   NaN
5       6       4   0.0
6       9       5   0.0
7      10       4   1.0

CodePudding user response：

Here's another alternative, in case you want to keep Random from difference sources.

Add a Goat column for your other data.frames (containing either 1 or 0), and combine with bind_rows. Then left_join.

library(tidyverse)

Goats$Goat <- 1
NotGoats$Goat <- 0

left_join(
  df1,
  bind_rows(Goats, NotGoats),
  by = "IDcode"
)

Output

  IDcode Random.x Random.y Goat
1     11        8       10    1
2      2        9       11    0
3      3       10       10    0
4     18        3       22    1
5     21        2       NA   NA
6      6        4       18    0
7      9        5       10    0
8     10        4        7    1