Goal is to get average(integer) of marks column based on name value. If id and name column appears with exact same value more than once, then the marks with corresponding name will be considered once. For e.g. average of x = (33 14 3)/3 = 16
Sample dataframe:
id name marks
0 1 x 33
1 1 x 33
2 2 y 9
3 3 x 14
4 4 y 55
5 4 y 55
6 5 x 3
7 6 z 31
Expected output:
id name marks avg
0 1 x 33 16
1 1 x 33 16
2 2 y 9 32
3 3 x 14 16
4 4 y 55 32
5 4 y 55 32
6 5 x 3 16
7 6 z 31 31
I tried:
df["avg"] = df.groupby("name")["marks"].mean()
CodePudding user response:
Try this:
df = df.set_index('name').assign(avg=df.drop_duplicates(['name', 'marks']).groupby('name')['marks'].mean()).reset_index()
Output:
>>> df
name id marks avg
0 x 1 33 16.666667
1 x 1 33 16.666667
2 y 2 9 32.000000
3 x 3 14 16.666667
4 y 4 55 32.000000
5 y 4 55 32.000000
6 x 5 3 16.666667
7 z 6 31 31.000000
If you need it rounded, chain .astype(int) to .mean():
df = df.set_index('name').assign(avg=df.drop_duplicates(['name', 'marks']).groupby('name')['marks'].mean().astype(int)).reset_index()
Output:
>>> df
name id marks avg
0 x 1 33 16
1 x 1 33 16
2 y 2 9 32
3 x 3 14 16
4 y 4 55 32
5 y 4 55 32
6 x 5 3 16
7 z 6 31 31
CodePudding user response:
Compute mean for each name after drop duplicates (id, name) and map result value on name column:
df['avg'] = df['name'].map(df.drop_duplicates(['id', 'name']).groupby('name')['marks'].mean())
print(df)
# Output:
id name marks avg
0 1 x 33 16.666667
1 1 x 33 16.666667
2 2 y 9 32.000000
3 3 x 14 16.666667
4 4 y 55 32.000000
5 4 y 55 32.000000
6 5 x 3 16.666667
7 6 z 31 31.000000
CodePudding user response:
One option, which uses the same drop_duplicates idea, without using a groupby, is to pivot the deduplicated data:
df.assign(avg = df.name.map(df.drop_duplicates().pivot('name', 'id', 'marks').mean(1)))
id name marks avg
0 1 x 33 16.666667
1 1 x 33 16.666667
2 2 y 9 32.000000
3 3 x 14 16.666667
4 4 y 55 32.000000
5 4 y 55 32.000000
6 5 x 3 16.666667
7 6 z 31 31.000000
