I don't know how to ask this question, but i'll try do explain my case.
I have a dataset with the data as following:
| Product | Value | Value type | year |
|---|---|---|---|
| A | 21,5 | Price | 21 |
| A | 5 | Volume | 21 |
| B | 55,3 | Price | 21 |
| B | 10 | Volume | 21 |
| C | 70,0 | Price | 21 |
| D | 37,5 | Price | 21 |
| D | 7,7 | Volume | 21 |
And I want to reach something like that:
| Product | Price | Volume | Year |
|---|---|---|---|
| A | 21,5 | 5 | 21 |
| B | 55,3 | 10 | 21 |
| c | 70,0 | - | 21 |
| D | 37,0 | 7,7 | 21 |
I mind that the unstack function can solve the problem, but i don't know how, cause i'm not getting all the columns back.
I found a complex solution but it's not working.
container = []
for label, _df in df.groupby(['Year','Product']):
_df.set_index('Value type', inplace = True)
container.append(pd.DataFrame({
"Product": [label[1]],
"Price":[_df.loc['Price', 'Value']],
"Volume": [_df.loc['Volume', 'Value']],
"Year":[label[0]]}))
df_new = pd.concat(container)
This solution doesn't work, because the missing line for Volume for product C.
How can I reach the expected dataframe? Is there any fast way to calculate this?
CodePudding user response:
Use pivot:
out = df.pivot(index=['Product', 'year'], columns='Value type', values=['Value']) \
.droplevel(0, axis=1).reset_index().rename_axis(None, axis=1) \
[['Product', 'Price', 'Volume', 'year']]
>>> out
Product Price Volume year
0 A 21.5 5.0 21
1 B 55.3 10.0 21
2 C 70.0 NaN 21
3 D 37.5 7.7 21
