I have a dataframe with offer data that I want to transform into portion data. The column quantity tells how many portions each offer has. If an offer has three portions out of which two were sold, that row should appear three times in the new dataframe, so that the sold column has value 1 on two rows and value 0 on one row.
The actual dataframe contains more columns that should stay intact in the transformed dataframe. Date column works as an example.
My input:
Offer name Quantity Sold Date
0 A 3 2 2022-05
1 B 2 1 2022-01
2 C 1 1 2022-04
3 D 1 1 2022-04
4 E 1 1 2022-05
The desired output:
Offer name Quantity Sold Date
0 A 1 1 2022-05
1 A 1 1 2022-05
2 A 1 0 2022-05
3 B 1 1 2022-01
4 B 1 0 2022-01
and so on...
Thank you!!
CodePudding user response:
Solution
s = df.reindex(df.index.repeat(df['Quantity']))
s['Quantity'] = 1
s['Sold'] = s.groupby(level=0).cumcount().lt(s['Sold']).astype(int)
Explained
Repeat the index of dataframe exactly Quantity times
>>> s = df.reindex(df.index.repeat(df['Quantity']))
>>> s
Offer name Quantity Sold Date
0 A 3 2 2022-05
0 A 3 2 2022-05
0 A 3 2 2022-05
1 B 2 1 2022-01
1 B 2 1 2022-01
2 C 1 1 2022-04
3 D 1 1 2022-04
4 E 1 1 2022-05
Group the above dataframe on index and use cumcount to create a sequential counter,
>>> s.groupby(level=0).cumcount()
0 0
0 1
0 2
1 0
1 1
2 0
3 0
4 0
dtype: int64
Flag the rows where the sequential counter is less than Sold quantity then change the dtype to int and assign the result to Sold column
>>> s['Sold'] = s.groupby(level=0).cumcount().lt(s['Sold']).astype(int)
>>> s
Offer name Quantity Sold Date
0 A 1 1 2022-05
0 A 1 1 2022-05
0 A 1 0 2022-05
1 B 1 1 2022-01
1 B 1 0 2022-01
2 C 1 1 2022-04
3 D 1 1 2022-04
4 E 1 1 2022-05
CodePudding user response:
You can try apply on rows
def repeat(row):
offer = [row['Offer name']] * row['Quantity']
sold = [1] * row['Sold'] [0] * (row['Quantity']-row['Sold'])
return offer, sold, 1
df[['Offer name', 'Sold', 'Quantity']] = df.apply(repeat, axis=1, result_type='expand')
df = df.explode(['Offer name', 'Sold'])
print(df)
Offer name Quantity Sold Date
0 A 1 1 2022-05
0 A 1 1 2022-05
0 A 1 0 2022-05
1 B 1 1 2022-01
1 B 1 0 2022-01
2 C 1 1 2022-04
3 D 1 1 2022-04
4 E 1 1 2022-05
