Table I
| c1 | c2 | c3 |
|---|---|---|
| Ant | Bee | Apple |
| A | B | C |
| 4 | 5 | 6 |
Table II
| A1 | A2 | c3 |
|---|---|---|
| G | H | Apple |
| H | s | C |
| 8 | asd | 4 |
I would like to know if we can compare these 2 excel workbook (comparison based on the column c3) and extract only the similar values along with the entire row value of that matched value using Pandas
Expected Output:
| c1 | c2 | c3 | A1 | A2 |
|---|---|---|---|---|
| Ant | Bee | Apple | G | H |
| A | B | C | H | s |
CodePudding user response:
Here's my solution. Basically, you should:
merge dfs with outer join on "c3"
drop all rows that have any NaN values
import pandas as pd my_df = pd.DataFrame({'c1': ['Ant', 'A', 4], 'c2': ['Bee', 'B', 5], 'c3': ['Apple', 'C', 6] }) my_df2 = pd.DataFrame({'A1': ['G', 'H', 8], 'A2': ['H', 's', 'asd'], 'c3': ['Apple', 'C', 4] }) final = my_df.merge(my_df2, on='c3', how='outer').dropna()
Result:
c1 | c2 | c3 | A1 | A2
-------------------------------
Ant | Bee | Apple | G | H
A | B | C | H | s
CodePudding user response:
print (pd.merge(df1, df2, on='c3'))
