Home > Net >  Pivoting data and keeping only specific rows as per a condition
Pivoting data and keeping only specific rows as per a condition

Time:01-17

I have a pandas dataframe with multiple columns, which looks like the following:

Index ID Year Code #Purchase Mode
0 100 2018 ABC 1 1
1 100 2019 DEF 2 2
2 100 2019 GHI 3 3
3 102 2018 JKL 4 2
4 103 2019 MNO 5 1
5 103 2020 PQR 6 2
6 102 2019 PQR 3 3
7 104 2019 LMN 3 1
7 104 2021 LMN 1 3

I want to group rows w.r.t to ID and then pivot the results and would only like to keep IDs that have an entry against Mode_1 then Mode_2 and then Mode_3. The result should look like this:

Index ID Year_1 Code_1 #Purchase_1 Mode_1 Year_2 Code_2 #Purchase_2 Mode_2 Year_3 Code_3 #Purchase_3 Mode_3
0 100 2018 ABC 1 1 2019 DEF 2 2 2019 GHI 3 3
1 103 2019 MNO 5 1 2020 PQR 6 2 NaN NaN NaN NaN

In the example results, we can see that ID 102 is dropped because it doesn't have a Mode = 1, and ID 104 is dropped because the Mode value skips 2. So the possible combinations of Mode against an ID needed are 1, 1-->2, or 1-->2-->3.

It would be really appreciated if someone could help me with this example problem. TIA

CodePudding user response:

Try to the melt the dataframe with required column then do the pivot the results

CodePudding user response:

Use:

#filter if difference is 1 per groups
m1 = df.groupby('ID')['Mode'].transform(lambda x: x.diff().iloc[1:].eq(1).all())
#filter if first value per group is 1
m2 = df.groupby('ID')['Mode'].transform('first').eq(1)
  
#pivoting all columns by ID per groups g created by copy Mode column
df = df[m1 & m2].assign(g = lambda x: x['Mode']).pivot(index='ID', columns='g').sort_index(axis=1, level=1)
#flatten MultiIndex
df.columns = df.columns.map(lambda x: f'{x[0]}_{x[1]}')
df = df.reset_index()
print (df)
    ID  #Purchase_1 Code_1  Mode_1  Year_1  #Purchase_2 Code_2  Mode_2  \
0  100          1.0    ABC     1.0  2018.0          2.0    DEF     2.0   
1  103          5.0    MNO     1.0  2019.0          6.0    PQR     2.0   

   Year_2  #Purchase_3 Code_3  Mode_3  Year_3  
0  2019.0          3.0    GHI     3.0  2019.0  
1  2020.0          NaN    NaN     NaN     NaN  
  •  Tags:  
  • Related