Python Pandas Pivot Of Two columns (ColumnName and Value)-CodePudding

I have a Panda dataframe that contains two columns, as well as a default index. The first columns is the intended 'Column Name' and the second column the required value for that column.

    name            returnattribute
0   Customer Name   Customer One Name
1   Customer Code   CGLOSPA
2   Customer Name   Customer Two Name
3   Customer Code   COTHABA
4   Customer Name   Customer Three Name
5   Customer Code   CGLOADS
6   Customer Name   Customer Four Name
7   Customer Code   CAPRCANBRA
8   Customer Name   Customer Five Name
9   Customer Code   COTHAMO

I would like to povit this so that instead of 10 rows, I have 5 rows with two columns ('Customer Name' and 'Customer Code'). The hoped for result is as below:

    Customer Code   Customer Name
0   CGLOSPA         Customer One Name
1   COTHABA         Customer Two Name
2   CGLOADS         Customer Three Name
3   CAPRCANBRA      Customer Four Name
4   COTHAMO         Customer Five Name

I have tried to use the pandas pivot function:

df.pivot(columns='name', values='returnattribute')

But this results in ten rows still with alternate blanks:

    Customer Code   Customer Name
0   NaN             Customer One Name
1   CGLOSPA         NaN
2   NaN             Customer Two Name
3   COTHABA         NaN
4   NaN             Customer Three Name
5   CGLOADS         NaN
6   NaN             Customer Four Name
7   CAPRCANBRA      NaN
8   NaN             Customer Five Name
9   COTHAMO         NaN

How to I pivot the dataframe to get just 5 rows of two columns?

CodePudding user response：

In df.pivot when index parameter is not passed df.index is used as default. Hence, the output.

From Docs DataFrame.pivot:

index: str or object or a list of str, optional

Column to use to make new frame’s index. If None, uses existing index.

To get the desired output. You'd have to create a new index column like below.

df.assign(idx=df.index // 2).pivot(
    index="idx", columns="name", values="returnattribute"
)

# name Customer Code        Customer Name
# idx                                    
# 0          CGLOSPA    Customer One Name
# 1          COTHABA    Customer Two Name
# 2          CGLOADS  Customer Three Name
# 3       CAPRCANBRA   Customer Four Name
# 4          COTHAMO   Customer Five Name

Since every two rows represent one data point. You can reshape the data and build the required dataframe.

reshaped = df['returnattribute'].to_numpy().reshape(-1, 2)
# array([['Customer One Name', 'CGLOSPA'],
#        ['Customer Two Name', 'COTHABA'],
#        ['Customer Three Name', 'CGLOADS'],
#        ['Customer Four Name', 'CAPRCANBRA'],
#        ['Customer Five Name', 'COTHAMO']], dtype=object)

col_names = pd.unique(df.name)
# array(['Customer Name', 'Customer Code'], dtype=object)

out = pd.DataFrame(reshaped, columns=col_names)

#          Customer Name Customer Code
# 0    Customer One Name       CGLOSPA
# 1    Customer Two Name       COTHABA
# 2  Customer Three Name       CGLOADS
# 3   Customer Four Name    CAPRCANBRA
# 4   Customer Five Name       COTHAMO

# we can reorder the columns using reindex.

CodePudding user response：

You can also pass directly the new index to pivot_table, use aggfunc='first' as you have non numeric data:

df.pivot_table(index=df.index//2, columns='name',
               values='returnattribute', aggfunc='first')

output:

name Customer Code        Customer Name
0          CGLOSPA    Customer One Name
1          COTHABA    Customer Two Name
2          CGLOADS  Customer Three Name
3       CAPRCANBRA   Customer Four Name
4          COTHAMO   Customer Five Name