How to subtract columns of one dataframe to that of another in python and store the result and both-CodePudding

I’ve 2 data frames which have one common column ‘X’ having all the unique values. I want to subtract each column of data frame 1 to that of data frame 2 which have similar names. dataframe 1

X	bar A	bar B
A1	48.0	1235
A2	53.0	4567

dataframe 2

X	par A	par B
A2	74.0	8342
A1	63.0	6531

Desired result:

X	bar A	par A	Difference	Status
A1	48.0	63.0	15.0	Ok
A2	53.0	74.0	21.0	Ok

To be noted that Difference here refers to barA - parA I.e subtraction. and this kind of data frames are required for each column There are 20 columns in my data frames. The status column gets not ok when value is less Than 0.

I’m thinking of writing a for loop to perform this but unable to start. If anyone can help in this. Ps I’m working on python.

CodePudding user response：

Your expected output and logic is a bit ambiguous, I'll assume here that you want to subtract the column of df2 that has the same trailing letter as the column in df1. Then compute the status if all differences are positive:

df3 = df1.merge(df2, on='X').set_index('X')

(df3.groupby(df3.columns.map(lambda x: x.split()[1]), axis=1)
    .apply(lambda d: d.diff(axis=1).iloc[:,-1])
    .add_prefix('diff_')
    .assign(Status=lambda d: np.where(d.ge(0).all(1), 'OK', 'Not OK'))
)

output:

     diff_A  diff_B Status
X                         
A1     15.0  5296.0     OK
A2     21.0  3775.0     OK

CodePudding user response：

Yes you can merge on column "X" to create a single dataframe with all values, then iterate through the columns and append calculated dataframes to a list.

Something like:

df = pd.merge(left=df1, right=df2, on='X', how='inner')
output_frames = []
letters = [i[-1] for i in df.columns if not i == 'X'] #Get letters in your columns 
letters = list(set(letters)) #Get list of letters with single element
for letter in letters:
    newdf = df.loc[:, ['X']   [i for i in df.columns if i.endswith(letter)]]
    newdf['Difference'] = newdf[f'par {letter}'] - newdf[f'bar {letter}']
    newdf['Status'] = newdf.apply(lambda x: 'OK' if x.Difference >= 0 else 'NOT OK', axis=1)
    output_frames.append(newdf)

Then view output:

for df in newdf:
    print(df)

Or better still, merge the outputs:

df = pd.concat(output_frames, axis=1)