I have a Panda dataframe that contains two columns, as well as a default index. The first columns is the intended 'Column Name' and the second column the required value for that column.
name returnattribute
0 Customer Name Customer One Name
1 Customer Code CGLOSPA
2 Customer Name Customer Two Name
3 Customer Code COTHABA
4 Customer Name Customer Three Name
5 Customer Code CGLOADS
6 Customer Name Customer Four Name
7 Customer Code CAPRCANBRA
8 Customer Name Customer Five Name
9 Customer Code COTHAMO
I would like to povit this so that instead of 10 rows, I have 5 rows with two columns ('Customer Name' and 'Customer Code'). The hoped for result is as below:
Customer Code Customer Name
0 CGLOSPA Customer One Name
1 COTHABA Customer Two Name
2 CGLOADS Customer Three Name
3 CAPRCANBRA Customer Four Name
4 COTHAMO Customer Five Name
I have tried to use the pandas pivot function:
df.pivot(columns='name', values='returnattribute')
But this results in ten rows still with alternate blanks:
Customer Code Customer Name
0 NaN Customer One Name
1 CGLOSPA NaN
2 NaN Customer Two Name
3 COTHABA NaN
4 NaN Customer Three Name
5 CGLOADS NaN
6 NaN Customer Four Name
7 CAPRCANBRA NaN
8 NaN Customer Five Name
9 COTHAMO NaN
How to I pivot the dataframe to get just 5 rows of two columns?
CodePudding user response:
In df.pivot when index parameter is not passed df.index is used as default. Hence, the output.
index: str or object or a list of str, optional
- Column to use to make new frame’s index. If
None, uses existing index.
To get the desired output. You'd have to create a new index column like below.
df.assign(idx=df.index // 2).pivot(
index="idx", columns="name", values="returnattribute"
)
# name Customer Code Customer Name
# idx
# 0 CGLOSPA Customer One Name
# 1 COTHABA Customer Two Name
# 2 CGLOADS Customer Three Name
# 3 CAPRCANBRA Customer Four Name
# 4 COTHAMO Customer Five Name
Since every two rows represent one data point. You can reshape the data and build the required dataframe.
reshaped = df['returnattribute'].to_numpy().reshape(-1, 2)
# array([['Customer One Name', 'CGLOSPA'],
# ['Customer Two Name', 'COTHABA'],
# ['Customer Three Name', 'CGLOADS'],
# ['Customer Four Name', 'CAPRCANBRA'],
# ['Customer Five Name', 'COTHAMO']], dtype=object)
col_names = pd.unique(df.name)
# array(['Customer Name', 'Customer Code'], dtype=object)
out = pd.DataFrame(reshaped, columns=col_names)
# Customer Name Customer Code
# 0 Customer One Name CGLOSPA
# 1 Customer Two Name COTHABA
# 2 Customer Three Name CGLOADS
# 3 Customer Four Name CAPRCANBRA
# 4 Customer Five Name COTHAMO
# we can reorder the columns using reindex.
CodePudding user response:
You can also pass directly the new index to pivot_table, use aggfunc='first' as you have non numeric data:
df.pivot_table(index=df.index//2, columns='name',
values='returnattribute', aggfunc='first')
output:
name Customer Code Customer Name
0 CGLOSPA Customer One Name
1 COTHABA Customer Two Name
2 CGLOADS Customer Three Name
3 CAPRCANBRA Customer Four Name
4 COTHAMO Customer Five Name
