Home > Net >  Reform pandas dataframe
Reform pandas dataframe

Time:02-03

I have a dataframe:

df1 = pandas.DataFrame( { 
    "text" : ["Alice is in ", "Alice is in wonderland.", "Mallory has done the task.", "Mallory has", "Bob is final." , "Mallory has done"] , 
    "label" : ["Seattle", "Portlang", "Gotland", "california", "california", "Portland"] ,
    "title":["SA","SA","sometitle","sometitle","some different title","sometitle"],
    "version":[1,2,4,1,2,3]})

df1
              text                  label           title            version
    0   Alice is in                Seattle         SA                   1 
    1   Alice is in wonderland.    Portlang        SA                   2
    2   Mallory has done the task. Portland       sometitle             4
    3   Mallory has                california     sometitle             1
    4   Bob is final.              california     some different title  2
    5   Mallory has done            Portland       sometitle            3

I want to keep the title and text corresponding to the latest version number, also want to keep the labels in a list. So the final result I expect is following:

    title            version    label                          text
0   SA                   2  [Seattle, Portlang]              Alice is in wonderland.
1   some different title 2  [california]                     Bob is final.
2   sometitle            4  [Gotland, california, Portland]  Mallory has done the task.

I could perform so far this, bit confuse how to handle 'text' here:

df2 = df1.groupby(['title']).agg({'version':'max', 'label':list})[['version','label']].reset_index()
df2

title                    version    label
0   SA                      2   [Seattle, Portlang]
1   some different title    2   [california]
2   sometitle               4   [Gotland, california, Portland]

any help is appreciated.

Thank you,

CodePudding user response:

Use df.merge with Groupby.agg:

In [508]: x = df1.groupby(['title']).agg({'version':'max', 'label':list})

In [516]: df1[['title', 'version', 'text']].merge(x, on=['title', 'version'])
Out[516]: 
                  title  version                        text                            label
0                    SA        2     Alice is in wonderland.              [Seattle, Portlang]
1             sometitle        4  Mallory has done the task.  [Gotland, california, Portland]
2  some different title        2               Bob is final.                     [california]
  •  Tags:  
  • Related