df1 = pd.DataFrame(
{
"empid" : [1,2,3,4,5,6],
"empname" : ['a', 'b','c','d','e','f'],
"empcity" : ['aa','bb','cc','dd','ee','ff']
})
df1
df2 = pd.DataFrame(
{
"empid" : [1,2,3,4,5,6],
"empname" : ['a', 'b','m','d','n','f'],
"empcity" : ['aa','bb','cc','ddd','ee','fff']
})
df2
df_all = pd.concat([df1.set_index('empid'),df2.set_index('empid')],axis='columns',keys=['first','second'])
df_all
df_final = df_all.swaplevel(axis = 'columns')[df1.columns[1:]]
df_final
Based on df_final data frame, need to create following output. here comparison column need to created dynamically for every identical column as i'm trying to compare two data frame(both data frame structure and column name are same) where number of columns are more than 300
CodePudding user response:
Use DataFrame.stack for possible compare all levels columns first with second, create new column in DataFrame.assign and reshape back by DataFrame.unstack with DataFrame.swaplevel and DataFrame.reindex for original order:
#original ordering
orig = df1.columns[1:].tolist()
print (orig)
['empname', 'empcity']
df_final = (df_all.stack()
.assign(comparions=lambda x: x['first'].eq(x['second']))
.unstack()
.swaplevel(axis = 'columns')
.reindex(orig, axis=1, level=0))
print (df_final)
empname empcity
first second comparions first second comparions
empid
1 a a True aa aa True
2 b b True bb bb True
3 c m False cc cc True
4 d d True dd ddd False
5 e n False ee ee True
6 f f True ff fff False
CodePudding user response:
(i) Use get_level_values to get the label values for level 0
(ii) Iterate over the outcome of (i) and for each level=0, do element-wise comparison using eq between first and second
(iii) use sort_index to sort columns in desired order
for level_0 in df_final.columns.get_level_values(0).unique():
df_final[(level_0, 'comparison')] = df_final[(level_0, 'first')].eq(df_final[(level_0,'second')])
df_final = df_final.sort_index(level=0, sort_remaining=False, axis=1)
Output:
empcity empname
first second comparison first second comparison
empid
1 aa aa True a a True
2 bb bb True b b True
3 cc cc True c m False
4 dd ddd False d d True
5 ee ee True e n False
6 ff fff False f f True
CodePudding user response:
Directly comparing 2 dataframes with ==
You can do this with a simple == between two dataframes that you need to compare. Let's start with the original 2 dataframes df1 and df2 -
first = df1.set_index('empid')
second = df2.set_index('empid')
comparisons = first==second #<---
output = pd.concat([first, second, comparisons], axis=1,keys=['first','second', 'comparisons'])
#Swapping level and reindexing, borrowed from Jezrael's excellent answer
output = output.swaplevel(axis=1).reindex(first.columns, axis=1, level=0)
print(output)
empname empcity
first second comparisons first second comparisons
empid
1 a a True aa aa True
2 b b True bb bb True
3 c m False cc cc True
4 d d True dd ddd False
5 e n False ee ee True
6 f f True ff fff False
Alternate approach with pandas groupby
In addition to the excellent answer by jezrael, I am adding an alternate way of doing this using pandas groupby.
- Tranpose to get columns as row indexes
- Groupby on first level which contains empcity and empname
- Apply comparison between the 2 rows
- Transpose back to columns
- Add multi index columns by product of original columns and "comparisons"
- Combine the two dataframes (original one and one with comparisons)
- Use swaplevel and reindex to get the order of columns that you need
#create comparisons
comparisons = (df_all.T
.groupby(level=-1)
.apply(lambda x: x.iloc[0]==x.iloc[1])
.T)
#add multi index columns
comparisons.columns = pd.MultiIndex.from_product([['comparison'],comparisons.columns])
#concatenate with original data
df_final = pd.concat([df_all, comparisons], axis='columns')
#Swapping level and reindexing, borrowed from Jezrael's excellent answer
df_final = (df_final.swaplevel(axis = 'columns')
.reindex(df1.set_index('empid')
.columns, axis=1, level=0))
print(df_final)
empname empcity
first second comparison first second comparison
empid
1 a a True aa aa True
2 b b True bb bb True
3 c m False cc cc True
4 d d True dd ddd False
5 e n False ee ee True
6 f f True ff fff False

