Grep columns by values in different dataframe in python-CodePudding

I have this df1:

CHR   SNP          Pos  Ref Min
1   rs3094315   113934  A   G
1   rs12124819  126070  A   G
1   rs28765502  135853  C   T
1   rs9419478   158202  C   T
1   rs4881551   159076  G   A

and this df2:

CHR     SNP     A1  A2  MAF     NCHROBS
1   rs3094315   G   A   0.1402  214
1   rs12124819  G   A   0.1887  212
1   rs28765502  C   T   0.3113  212
1   rs7419119   G   T   0.2243  214
1   rs950122    C   G   0.1944  216

The first three rows have the same SNP names, so what I want to do is something like merge d2 with df1 based on coincidence in "SNP" and if they match evaluate if "A1" in df2 and "Ref" in df1 are the same, if not just Change the position letters and then subtract 1-MAF value as this at the end:

CHR     SNP     A1  A2  MAF     NCHROBS
1   rs3094315   A   G   0.8598  214
1   rs12124819  A   G   0.8113  212
1   rs28765502  C   T   0.3113  212
1   rs7419119   G   T   0.2243  214
1   rs950122    C   G   0.1944  216

What I tried is

import pandas as pd
import numpy as pd
df3=pd.merge(df2,df1, on=['SNP'])
df3['subtract']=np.where(df3['A1']!=df3['Ref']###print the subtract result 1-MAF= in 'subtract col')

But I don't want to lose those values that don't match in merge and print the subtract result of 1-MAF's value.

CodePudding user response：

You can first right merge on "SNP", then use np.where to evaluate the condition. Then fill NaN values with the corresponding values. Finally drop the columns with missing values and rearrange to fit the desired outcome:

merged_df = df1.merge(df2, on='SNP', how='right', suffixes=('_',''))
merged_df['MAF'] = np.where(merged_df['Ref'].eq(merged_df['A2']), 1-merged_df['MAF'], merged_df['MAF'])
merged_df['Ref'] = merged_df['Ref'].fillna(merged_df['A1'])
merged_df['Min'] = merged_df['Min'].fillna(merged_df['A2'])
merged_df = merged_df.dropna(axis=1).drop(columns=['A2','A1']).rename(columns={'Ref':'A1', 'Min':'A2'})[['CHR','SNP','A1','A2','MAF','NCHROBS']]

Output:

   CHR         SNP A1 A2     MAF  NCHROBS
0    1   rs3094315  A  G  0.8598      214
1    1  rs12124819  A  G  0.8113      212
2    1  rs28765502  C  T  0.3113      212
3    1   rs7419119  G  T  0.2243      214
4    1    rs950122  C  G  0.1944      216