Please consider the following example:
I have a DataFrame
| Index | Speaker | Word |
|---|---|---|
| 0 | spk_0 | can |
| 1 | spk_0 | you |
| 2 | spk_0 | see |
| 3 | spk_0 | my |
| 4 | spk_0 | screen |
| 5 | spk_0 | now |
| 6 | spk_0 | ? |
| 7 | spk_1 | yes |
| 0 | spk_1 | , |
| 8 | spk_1 | now |
| 9 | spk_1 | I |
| 10 | spk_1 | can |
| 11 | spk_1 | see |
| 12 | spk_1 | your |
| 13 | spk_1 | screen |
| 14 | spk_1 | . |
| 15 | spk_0 | Let |
| 16 | spk_0 | me |
| 17 | spk_0 | start |
| 18 | spk_0 | then |
| 19 | spk_2 | yes |
| 20 | spk_2 | sure |
I want to combine the Word column such that it should look like the following:
| Index | Speaker | Sentence |
|---|---|---|
| 0 | spk_0 | can you see my screen now ? |
| 1 | spk_1 | yes , now I can see your screen . |
| 2 | spk_0 | let me start then . |
| 3 | spk_2 | Yes sure . |
Can someone please help me find a solution to this problem? I already had tried group by but didn't work.
CodePudding user response:
You can group by consecutive values of Speaker column created by comapred shifted value with cumulative sum and aggregate join:
g = df['Speaker'].ne(df['Speaker'].shift()).cumsum()
df = df.groupby(['Speaker', g],sort=False)['Word'].agg(' '.join).droplevel(-1).reset_index()
print (df)
Speaker Word
0 spk_0 can you see my screen now ?
1 spk_1 yes , now I can see your screen .
2 spk_0 Let me start then
3 spk_2 yes sure
