Home > database >  How to pivot a DataFrame creating new columns, considering the max item repeated
How to pivot a DataFrame creating new columns, considering the max item repeated

Time:02-02

I have the next pd.DataFrame:

Index ID   Name   Date     Days     
1     1    Josh   5-1-20   10       
2     1    Josh   9-1-20   10       
3     1    Josh   19-1-20  6        
4     2    Mike   1-1-20   10       
5     3    George 1-4-20   10       
6     4    Rose   1-2-20   10       
7     4    Rose   11-5-20  5        
8     5    Mark   1-9-20   10       
9     6    Joe    1-4-21   10       
10    7    Jill   1-1-21   10       

I'm needing to make a DataFrame where the ID is not repeated, for that, I want to creat new columns (Date y Days), considering the case with most repeatitions (3 in this case).

The desired output is the next DataFrame:

Index ID  Name   Date 1   Date 2  Date 3   Days1  Days2   Days3  
1     1   Josh   5-1-20   9-1-20  19-1-20  10     10      6      
2     2   Mike   1-1-20                    10                    
3     3   George 1-4-20                    10                    
4     4   Rose   1-2-20   11-5-20          10     5              
5     5   Mark   1-9-20                    10                    
6     6   Joe    1-4-21                    10                    
7     7   Jill   1-1-21                    10                    

CodePudding user response:

Try:

df_out = df.set_index(['ID','Name',df.groupby('ID').cumcount() 1]).unstack()
df_out.columns = [f'{i} {j}' for i, j in df_out.columns]
df_out.fillna('').reset_index()

Output:

   ID    Name  Index 1 Index 2 Index 3  Date 1   Date 2   Date 3  Days 1 Days 2 Days 3
0   1    Josh      1.0     2.0     3.0  5-1-20   9-1-20  19-1-20    10.0   10.0    6.0
1   2    Mike      4.0                  1-1-20                      10.0              
2   3  George      5.0                  1-4-20                      10.0              
3   4    Rose      6.0     7.0          1-2-20  11-5-20             10.0    5.0       
4   5    Mark      8.0                  1-9-20                      10.0              
5   6     Joe      9.0                  1-4-21                      10.0              
6   7    Jill     10.0                  1-1-21                      10.0              

CodePudding user response:

Here is a solution using pivot with a helper column:

df2 = (df
       .assign(col=df.groupby('ID').cumcount().add(1).astype(str))
       .pivot(index=['ID','Name'], columns='col', values=['Date', 'Days'])
       .fillna('')
       )
df2.columns = df2.columns.map('_'.join)

df2.reset_index()

Output:

   ID    Name  Date_1   Date_2   Date_3  Days_1 Days_2 Days_3
0   1    Josh  5-1-20   9-1-20  19-1-20      10     10      6
1   2    Mike  1-1-20                        10              
2   3  George  1-4-20                        10              
3   4    Rose  1-2-20  11-5-20               10      5       
4   5    Mark  1-9-20                        10              
5   6     Joe  1-4-21                        10              
6   7    Jill  1-1-21                        10              
  •  Tags:  
  • Related