Creat a list with repeated terms by adding a multiplier index in Pandas Dataframe-CodePudding

Given a dataframe like this:

row1 = ['AAA', 'BBB', 'BBB', 'CCC', 'AAA', 'AAA']
row2 = ['CCC', 'CCC', 'BBB', 'AAA', 'AAA', 'AAA']
col = {'List': [row1, row2]}
df = pd.DataFrame(col)

which leads to:

	List
0	['AAA', 'BBB', 'BBB', 'CCC', 'AAA', 'AAA']
1	['CCC', 'CCC', 'BBB', 'AAA', 'AAA', 'AAA']

I would like to generate the following dataframe:

	List
0	['AAA', 'BBB x 2', 'CCC', 'AAA x 2']
1	['CCC x 2', 'BBB', 'AAA x 3']

where the final column List contains a multiplier index that indicates how many times the term appears consecutively in the list.

Could you suggest a pandas instruction that solves this task?

CodePudding user response：

In your case you may need to check with explode ,then we create the subgroup with cumsum ad shift

s = df.explode('List')
s = s.groupby([s.index,s['List'].shift().ne(s['List']).cumsum()])['List'].agg(['first','count'])
out = s['first']  'x'   s['count'].astype(str)
out = out.mask(s['count']==1,s['first']).groupby(level=0).agg(list)
out
Out[202]: 
0    [AAA, BBBx2, CCC, AAAx2]
1         [CCCx2, BBB, AAAx3]
dtype: object

CodePudding user response：

itertools groupby can do what you need. The custom function joins the count to the value if there is more than one record in that group.

from itertools import groupby
import pandas as pd

row1 = ['AAA', 'BBB', 'BBB', 'CCC', 'AAA', 'AAA']
row2 = ['CCC', 'CCC', 'BBB', 'AAA', 'AAA', 'AAA']
col = {'Lists': [row1, row2]}
df = pd.DataFrame(col)

def count_items(row):
    output = []
    for k, d in groupby(row):
        x = list(d)
        if len(x)>1:
            output.append(' x '.join([k, str(len(x))]))
        else:
            output.append(k)
            
    return output

df['Lists'] = df.Lists.apply(lambda x: count_items(x))

print(df)

Output

                          Lists
0  [AAA, BBB x 2, CCC, AAA x 2]
1       [CCC x 2, BBB, AAA x 3]