I have this dataframe:
A B
0 [0, 1, 2] 1
1 foo 1
2 [3, 4] 1
I would like to use explode function for column "A" and then to keep right and fair proportion for each exploded row in case with column "B" . So the result should look like this:
A B
0 0 0.33
0 1 0.33
0 2 0.33
1 foo 1
2 3 0.5
2 4 0.5
Would this be possible with the explode function? I would manage to come to this result with for row in data.itertuples(): but the for loop is so slow in case with large dataframe. So do you have idea how to solve this with explode or with some other fast way?
I would be very grateful with any help.
CodePudding user response:
Create a boolean mask which indicates rows where A holds a list:
mask = df['A'].apply(lambda x: isinstance(x, list))
Prepopulate a new column of denominators with the integer 1 (to leave division by these values unchanged):
df['denom'] = 1
For each row where A is a list, overwrite the value of B with the length of the list:
df.loc[mask, 'denom'] = df['A'].str.len()
Explode column A, compute the new value of B, and drop denom:
res = df.explode('A').reset_index(drop=True)
res['B'] = res['B'] / res['denom']
res = res.drop(columns='denom')
Result:
print(res)
A B
0 0 0.333333
1 1 0.333333
2 2 0.333333
3 foo 1.000000
4 3 0.500000
5 4 0.500000
CodePudding user response:
You can use explode, then groupby apply:
(df.explode('A')
.assign(B=lambda d: d.groupby(level=0)['B'].apply(lambda s:s/len(s)))
)
output:
A B
0 0 0.333333
0 1 0.333333
0 2 0.333333
1 foo 1.000000
2 4 0.500000
2 5 0.500000
input:
df = pd.DataFrame({'A': [[0,1,2], 'foo', [4,5]],
'B': [1,1,1]})
CodePudding user response:
You can explode "A"; then groupby the index and transform count method (to count the number of each index) and divide the elements in 'B' by their corresponding index count.
out = df.explode('A')
out['B'] /= out['B'].groupby(level=0).transform('count')
Output:
A B
0 0 0.333333
0 1 0.333333
0 2 0.333333
1 foo 1.000000
2 3 0.500000
2 4 0.500000
