I’ve 2 data frames which have one common column ‘X’ having all the unique values. I want to subtract each column of data frame 1 to that of data frame 2 which have similar names. dataframe 1
| X | bar A | bar B |
|---|---|---|
| A1 | 48.0 | 1235 |
| A2 | 53.0 | 4567 |
dataframe 2
| X | par A | par B |
|---|---|---|
| A2 | 74.0 | 8342 |
| A1 | 63.0 | 6531 |
Desired result:
| X | bar A | par A | Difference | Status |
|---|---|---|---|---|
| A1 | 48.0 | 63.0 | 15.0 | Ok |
| A2 | 53.0 | 74.0 | 21.0 | Ok |
To be noted that Difference here refers to barA - parA I.e subtraction. and this kind of data frames are required for each column There are 20 columns in my data frames. The status column gets not ok when value is less Than 0.
I’m thinking of writing a for loop to perform this but unable to start. If anyone can help in this. Ps I’m working on python.
CodePudding user response:
Your expected output and logic is a bit ambiguous, I'll assume here that you want to subtract the column of df2 that has the same trailing letter as the column in df1. Then compute the status if all differences are positive:
df3 = df1.merge(df2, on='X').set_index('X')
(df3.groupby(df3.columns.map(lambda x: x.split()[1]), axis=1)
.apply(lambda d: d.diff(axis=1).iloc[:,-1])
.add_prefix('diff_')
.assign(Status=lambda d: np.where(d.ge(0).all(1), 'OK', 'Not OK'))
)
output:
diff_A diff_B Status
X
A1 15.0 5296.0 OK
A2 21.0 3775.0 OK
CodePudding user response:
Yes you can merge on column "X" to create a single dataframe with all values, then iterate through the columns and append calculated dataframes to a list.
Something like:
df = pd.merge(left=df1, right=df2, on='X', how='inner')
output_frames = []
letters = [i[-1] for i in df.columns if not i == 'X'] #Get letters in your columns
letters = list(set(letters)) #Get list of letters with single element
for letter in letters:
newdf = df.loc[:, ['X'] [i for i in df.columns if i.endswith(letter)]]
newdf['Difference'] = newdf[f'par {letter}'] - newdf[f'bar {letter}']
newdf['Status'] = newdf.apply(lambda x: 'OK' if x.Difference >= 0 else 'NOT OK', axis=1)
output_frames.append(newdf)
Then view output:
for df in newdf:
print(df)
Or better still, merge the outputs:
df = pd.concat(output_frames, axis=1)
