I have a dataframe in IOB format as below:-
| Name | Label |
|---|---|
| Alan | B-PERSON |
| Smith | I-PERSON |
| is | O |
| Alice's | B-PERSON |
| uncle | O |
| from | O |
| New | B-LOCATION |
| York | I-LOCATION |
| city | I-LOCATION |
I would like to convert into a new dataframe as below:-
| Name | Label |
|---|---|
| Alan Smith | PERSON |
| Alice's | PERSON |
| New York city | LOCATION |
Any help is much appreciated!
CodePudding user response:
You can create groups by compare values O, remove IO- values in Label column and with helper groups created by cumulative sum aggregate join:
m = df['Label'].eq('O')
df = (df[~m].assign(Label=lambda x: x['Label'].str.replace('^[IB]-', ''))
.groupby([m.cumsum(), 'Label'])['Name']
.agg(' '.join)
.droplevel(0)
.reset_index()
.reindex(df.columns, axis=1))
print (df)
Name Label
0 Alan Smith PERSON
1 Alice's PERSON
2 New York city LOCATION
