So basically I have 2 DataFrames like this:
Table_1
| Apple | Banana | Orange | Date |
|---|---|---|---|
| 1 | 2 | 4 | 2020 |
| 3 | 5 | 2 | 2021 |
| 7 | 8 | 9 | 2022 |
Table_2
| fruit | year |
|---|---|
| Apple | 2020 |
| Apple | 2021 |
| Apple | 2022 |
| Banana | 2020 |
| Banana | 2021 |
| Banana | 2022 |
| Orange | 2020 |
| Orange | 2021 |
| Orange | 2022 |
So I want to lookup the values for the fruits for Table_2 from the Table_1 based on the fruit name and the respective year.
The final outcome should look like this:
| fruit | year | number |
|---|---|---|
| Apple | 2020 | 1 |
| Apple | 2021 | 3 |
| Apple | 2022 | 7 |
| Banana | 2020 | 2 |
| Banana | 2021 | 5 |
| Banana | 2022 | 8 |
| Orange | 2020 | 4 |
| Orange | 2021 | 2 |
| Orange | 2022 | 9 |
In the Excel for an example one can do something like this:
=INDEX(Table1[[Apple]:[Orange]],MATCH([@year],Table1[Date],0),MATCH([@fruit],Table1[[#Headers],[Apple]:[Orange]],0))
But what is the way to do it in Python?
CodePudding user response:
Assuming pandas, you can melt and merge:
out = (df2
.merge(df1.rename(columns={'Date': 'year'})
.melt('year', var_name='fruit', value_name='number'),
how='left'
)
)
output:
fruit year number
0 Apple 2020 1
1 Apple 2021 3
2 Apple 2022 7
3 Banana 2020 2
4 Banana 2021 5
5 Banana 2022 8
6 Orange 2020 4
7 Orange 2021 2
8 Orange 2022 9
