Movement of a specific string from one column to another-CodePudding

Hi I have a DataFrame which have values like

| ID| Value|  comments  |
| 1 | a    |            |
| 2 | b    |            |
| 3 | a;b;c|            |
| 4 | b;c  |            |
| 5 | d;a;c|            |

I need to transfer to a and b from Value to Comments for all the rows they are in. so that only value other that a and b will remain in data.

the new df would look like this

| ID| Value|  comments  |
| 1 |      |    a       |
| 2 |      |    b       |
| 3 |  c   |   a;b      |
| 4 |  c   |    b       |
| 5 | d;c  |    a       |

Can you give me a direction where should i look for the answer to this

CodePudding user response：

Explode your Value column then label it to the right column:

out = df.assign(Value=df['Value'].str.split(';')).explode('Value')
out['col'] = np.where(out['Value'].isin(['a', 'b']), 'comments', 'Value')
print(out)

# Intermediate output
   ID Value  comments       col
0   1     a       NaN  comments
1   2     b       NaN  comments
2   3     a       NaN  comments
2   3     b       NaN  comments
2   3     c       NaN     Value
3   4     b       NaN  comments
3   4     c       NaN     Value
4   5     d       NaN     Value
4   5     a       NaN  comments
4   5     c       NaN     Value

Now pivot your dataframe:

out = out.pivot_table(index='ID', columns='col', values='Value', aggfunc=';'.join) \
         .fillna('').reset_index().rename_axis(columns=None)
print(out)

# Final output
   ID Value comments
0   1              a
1   2              b
2   3     c      a;b
3   4     c        b
4   5   d;c        a

CodePudding user response：

(i) Use str.split to split on ';' and explode the "Value" column

(ii) Use boolean indexing to filter rows where 'a' or 'b' exist, take them out and groupby index and join them with ';' as separators

exploded_series = df['Value'].str.split(';').explode()
mask = exploded_series.isin(['a','b'])
df['comments'] = exploded_series[mask].groupby(level=0).apply(';'.join)
df['Value'] = exploded_series[~mask].groupby(level=0).apply(';'.join)
df = df.fillna('')

Output:

   ID Value comments
0   1              a
1   2              b
2   3     c      a;b
3   4     c        b
4   5   d;c        a