Hi I have a DataFrame which have values like
| ID| Value| comments |
| 1 | a | |
| 2 | b | |
| 3 | a;b;c| |
| 4 | b;c | |
| 5 | d;a;c| |
I need to transfer to a and b from Value to Comments for all the rows they are in. so that only value other that a and b will remain in data.
the new df would look like this
| ID| Value| comments |
| 1 | | a |
| 2 | | b |
| 3 | c | a;b |
| 4 | c | b |
| 5 | d;c | a |
Can you give me a direction where should i look for the answer to this
CodePudding user response:
Explode your Value column then label it to the right column:
out = df.assign(Value=df['Value'].str.split(';')).explode('Value')
out['col'] = np.where(out['Value'].isin(['a', 'b']), 'comments', 'Value')
print(out)
# Intermediate output
ID Value comments col
0 1 a NaN comments
1 2 b NaN comments
2 3 a NaN comments
2 3 b NaN comments
2 3 c NaN Value
3 4 b NaN comments
3 4 c NaN Value
4 5 d NaN Value
4 5 a NaN comments
4 5 c NaN Value
Now pivot your dataframe:
out = out.pivot_table(index='ID', columns='col', values='Value', aggfunc=';'.join) \
.fillna('').reset_index().rename_axis(columns=None)
print(out)
# Final output
ID Value comments
0 1 a
1 2 b
2 3 c a;b
3 4 c b
4 5 d;c a
CodePudding user response:
(i) Use str.split to split on ';' and explode the "Value" column
(ii) Use boolean indexing to filter rows where 'a' or 'b' exist, take them out and groupby index and join them with ';' as separators
exploded_series = df['Value'].str.split(';').explode()
mask = exploded_series.isin(['a','b'])
df['comments'] = exploded_series[mask].groupby(level=0).apply(';'.join)
df['Value'] = exploded_series[~mask].groupby(level=0).apply(';'.join)
df = df.fillna('')
Output:
ID Value comments
0 1 a
1 2 b
2 3 c a;b
3 4 c b
4 5 d;c a
