how to group words as a sentence based on speaker # in pandas DataFrame-CodePudding

Please consider the following example:

I have a DataFrame

Index	Speaker	Word
0	spk_0	can
1	spk_0	you
2	spk_0	see
3	spk_0	my
4	spk_0	screen
5	spk_0	now
6	spk_0	?
7	spk_1	yes
0	spk_1	,
8	spk_1	now
9	spk_1	I
10	spk_1	can
11	spk_1	see
12	spk_1	your
13	spk_1	screen
14	spk_1	.
15	spk_0	Let
16	spk_0	me
17	spk_0	start
18	spk_0	then
19	spk_2	yes
20	spk_2	sure

I want to combine the Word column such that it should look like the following:

Index	Speaker	Sentence
0	spk_0	can you see my screen now ?
1	spk_1	yes , now I can see your screen .
2	spk_0	let me start then .
3	spk_2	Yes sure .

Can someone please help me find a solution to this problem? I already had tried group by but didn't work.

CodePudding user response：

You can group by consecutive values of Speaker column created by comapred shifted value with cumulative sum and aggregate join:

g = df['Speaker'].ne(df['Speaker'].shift()).cumsum()
df = df.groupby(['Speaker', g],sort=False)['Word'].agg(' '.join).droplevel(-1).reset_index()
print (df)
  Speaker                               Word
0   spk_0        can you see my screen now ?
1   spk_1  yes , now I can see your screen .
2   spk_0                  Let me start then
3   spk_2                           yes sure