Home > Software design >  How to create a new variable based on the fourth observation of a different variable
How to create a new variable based on the fourth observation of a different variable

Time:01-24

From this table

   ID, Date,  Value
    A    Jan01     5      
    A    Feb01     10     
    A    Mar03     9      
    A    Apr02     7      
    A    Jan01     2      
    B    Feb01     3      
    B    Mar01     6   
    B    Mar01     9    
    B    Mar02     5      

Desired table:

  ID, Date,  Value, New_Variable
   A    Jan01     5      7
   A    Feb01     10     7
   A    Mar03     9      7
   A    Apr02     7      7
   A    Jan01     2      5
   B    Feb01     3      5
   B    Mar01     6      5
   B    Mar01     9      5
   B    Mar02     5      5

I know I can do

df.groupby('ID')['Value'].transform('first')

if I want to take the first value, what about the other rows? like the fourth or the fifth?

CodePudding user response:

We can group the datframe by ID then transform the Value column with nth to select the nth value from each group.

df['new_col'] = df.groupby('ID')['Value'].transform('nth', n=3)

print(df)

  ID   Date  Value  new_col
0  A  Jan01      5        7
1  A  Feb01     10        7
2  A  Mar03      9        7
3  A  Apr02      7        7
4  A  Jan01      2        7
5  B  Feb01      3        5
6  B  Mar01      6        5
7  B  Mar01      9        5
8  B  Mar02      5        5

Note: The n value is zero based, so in order to select the 4th row you have to specify n=3

CodePudding user response:

one idea is add a ranking column to show which place a row is in among its group. For example:

df['rank'] = df.groupby('ID').cumcount()

In this case you know the 4th place for each ID.

fourth_place = df[df['rank']==3]

so that you can create a mapping

mapping = fourth_place.set_index('ID')['Value']

which can be used in creating the new column

df['New_Variable'] = df['ID'].map(mapping)
  •  Tags:  
  • Related