Reform Pandas Data frame based on column name-CodePudding

I have a dataframe

                         text       label                 title  version
0                 Alice is in     Seattle                    SA        1
1     Alice is in wonderland.    Portlang                    SA        2
2  Mallory has done the task.     Gotland             sometitle        4
3  Mallory has done the task.  california             sometitle        4
4                 Mallory has  california             sometitle        2
5               Bob is final.    Portland  some different title        3
6            Mallory has done    Portland             sometitle        3

The final result I want is to find the hightest version text for given title and corresponding label, however the label should be divided as columns. Here is the final result:

                         text  Seattle  Portlang  Gotland  california  Portland                 title
0     Alice is in wonderland.        0         1        0           0         0                    SA
1  Mallory has done the task.        0         0        1           1         0             sometitle
2               Bob is final.        0         0        0           0         1  some different title

Thanks in advance,

CodePudding user response：

Use pivot_table. First rename text values with the title of the highest version for each title the pivot your dataframe:

out = (
  df.assign(dummy=1)
    .mask(df.groupby('title')['version'].rank(method='dense', ascending=False) > 1)
    .pivot_table('dummy', ['title', 'text'], 'label', fill_value=0)
    .reset_index()
    .rename_axis(columns=None)
)

Output:

>>> out

                  title                        text  Gotland  Portland  Portlang  california
0                    SA     Alice is in wonderland.        0         0         1           0
1  some different title               Bob is final.        0         1         0           0
2             sometitle  Mallory has done the task.        1         0         0           1