From this table
ID, Date, Value
A Jan01 5
A Feb01 10
A Mar03 9
A Apr02 7
A Jan01 2
B Feb01 3
B Mar01 6
B Mar01 9
B Mar02 5
Desired table:
ID, Date, Value, New_Variable
A Jan01 5 7
A Feb01 10 7
A Mar03 9 7
A Apr02 7 7
A Jan01 2 5
B Feb01 3 5
B Mar01 6 5
B Mar01 9 5
B Mar02 5 5
I know I can do
df.groupby('ID')['Value'].transform('first')
if I want to take the first value, what about the other rows? like the fourth or the fifth?
CodePudding user response:
We can group the datframe by ID then transform the Value column with nth to select the nth value from each group.
df['new_col'] = df.groupby('ID')['Value'].transform('nth', n=3)
print(df)
ID Date Value new_col
0 A Jan01 5 7
1 A Feb01 10 7
2 A Mar03 9 7
3 A Apr02 7 7
4 A Jan01 2 7
5 B Feb01 3 5
6 B Mar01 6 5
7 B Mar01 9 5
8 B Mar02 5 5
Note: The n value is zero based, so in order to select the 4th row you have to specify n=3
CodePudding user response:
one idea is add a ranking column to show which place a row is in among its group. For example:
df['rank'] = df.groupby('ID').cumcount()
In this case you know the 4th place for each ID.
fourth_place = df[df['rank']==3]
so that you can create a mapping
mapping = fourth_place.set_index('ID')['Value']
which can be used in creating the new column
df['New_Variable'] = df['ID'].map(mapping)
