How to create a new variable based on the fourth observation of a different variable-CodePudding

From this table

   ID, Date,  Value
    A    Jan01     5      
    A    Feb01     10     
    A    Mar03     9      
    A    Apr02     7      
    A    Jan01     2      
    B    Feb01     3      
    B    Mar01     6   
    B    Mar01     9    
    B    Mar02     5

Desired table:

  ID, Date,  Value, New_Variable
   A    Jan01     5      7
   A    Feb01     10     7
   A    Mar03     9      7
   A    Apr02     7      7
   A    Jan01     2      5
   B    Feb01     3      5
   B    Mar01     6      5
   B    Mar01     9      5
   B    Mar02     5      5

I know I can do

df.groupby('ID')['Value'].transform('first')

if I want to take the first value, what about the other rows? like the fourth or the fifth?

CodePudding user response：

We can group the datframe by ID then transform the Value column with nth to select the nth value from each group.

df['new_col'] = df.groupby('ID')['Value'].transform('nth', n=3)

print(df)

  ID   Date  Value  new_col
0  A  Jan01      5        7
1  A  Feb01     10        7
2  A  Mar03      9        7
3  A  Apr02      7        7
4  A  Jan01      2        7
5  B  Feb01      3        5
6  B  Mar01      6        5
7  B  Mar01      9        5
8  B  Mar02      5        5

Note: The n value is zero based, so in order to select the 4th row you have to specify n=3

CodePudding user response：

one idea is add a ranking column to show which place a row is in among its group. For example:

df['rank'] = df.groupby('ID').cumcount()

In this case you know the 4th place for each ID.

fourth_place = df[df['rank']==3]

so that you can create a mapping

mapping = fourth_place.set_index('ID')['Value']

which can be used in creating the new column

df['New_Variable'] = df['ID'].map(mapping)