Create a repetitive value a certain number of times if the value does not meet a certain count in Py-CodePudding

I wish to create a repetition of a certain value if and only if the value does not already repeat by a certain number.

Data

id  Date    type    energy
aa  Q1 22   hi      5
aa  Q2 22   hi      5
aa  Q3 22           0
aa  Q4 22           0
aa  Q1 23   hi      5
aa  Q1 23   hi      5
aa  Q1 23   hi      5
aa  Q1 23   hi      5
aa  Q1 23   hi      5
bb  Q1 23   hi      2

Desired

id  Date    type    energy
aa  Q1 22   hi      5
aa  Q2 22   hi      5
aa  Q3 22           0
aa  Q4 22           0
aa  Q1 22           0
aa  Q1 22           0
aa  Q1 22           0
aa  Q1 22           0
aa  Q2 22           0
aa  Q2 22           0
aa  Q2 22           0
aa  Q2 22           0
aa  Q3 22           0
aa  Q3 22           0
aa  Q3 22           0
aa  Q3 22           0
aa  Q4 22           0
aa  Q4 22           0
aa  Q4 22           0
aa  Q4 22           0
aa  Q1 23   hi      4
aa  Q1 23   hi      3
aa  Q1 23   hi      4
aa  Q1 23   hi      3
aa  Q1 23   hi      3
bb  Q1 23   hi      2
bb  Q1 23           0
bb  Q1 23           0
bb  Q1 23           0
bb  Q1 23           0

Logic

If 'Date'  value does not repeat at least 5 times, 
then repeat 5 times with a value of '0' in the energy column. 
If Date value repeats at least 5 times, then do nothing. (i.e. 'aa' Q1 23)
 - we see that this already repeats 5 times, so no change occurs.

Doing

result = df['Date'].repeat(repeats = 5)
df['energy'] = 0

Any suggestion is appreciated.

CodePudding user response：

convert df into an array and use np.repeat to replicate rows. To conditionally repeat, calculate the repeat frequency using groupby(), transform. Convert the output array back into df using the pd.DataFrame() constructor

   s=df.groupby(['Date','id'])['type'].transform('count')

s=np.where(s.ge(5),0,abs(pd.Series(s)-5))

new=pd.DataFrame(np.repeat(df.values,s, axis=0), columns=df.columns)

new=(df.append(new.assign(type=np.nan, energy=0)).fillna('').sort_values(by=['id','type','Date'], ascending=[True,False, False]))

  id   Date type  energy
1   aa  Q2 22   hi       5
4   aa  Q1 23   hi       5
5   aa  Q1 23   hi       5
6   aa  Q1 23   hi       5
7   aa  Q1 23   hi       5
8   aa  Q1 23   hi       5
0   aa  Q1 22   hi       5
3   aa  Q4 22            0
12  aa  Q4 22            0
13  aa  Q4 22            0
14  aa  Q4 22            0
15  aa  Q4 22            0
2   aa  Q3 22            0
8   aa  Q3 22            0
9   aa  Q3 22            0
10  aa  Q3 22            0
11  aa  Q3 22            0
4   aa  Q2 22            0
5   aa  Q2 22            0
6   aa  Q2 22            0
7   aa  Q2 22            0
0   aa  Q1 22            0
1   aa  Q1 22            0
2   aa  Q1 22            0
3   aa  Q1 22            0
9   bb  Q1 23   hi       2
16  bb  Q1 23            0
17  bb  Q1 23            0
18  bb  Q1 23            0
19  bb  Q1 23            0

CodePudding user response：

First build a dataframe of missing rows then merge with your original dataframe:

cat = pd.CategoricalDtype(df['Date'].unique(), ordered=True)

# Step-1
df1 = df[['id', 'Date']].value_counts(sort=False).clip(1, 4).rsub(5) \
                        .to_frame('repeat').assign(type='', energy=0).reset_index()
df1 = df1.reindex(df1.index.repeat(df1['repeat'])).drop(columns='repeat')

# Step-2
out = pd.concat([df, df1]).astype({'Date': cat}) \
        .sort_values(['id', 'Date']).reset_index(drop=True)

Output:

>>> out
    id   Date type  energy
0   aa  Q1 22   hi       5
1   aa  Q1 22            0
2   aa  Q1 22            0
3   aa  Q1 22            0
4   aa  Q1 22            0
5   aa  Q2 22   hi       5
6   aa  Q2 22            0
7   aa  Q2 22            0
8   aa  Q2 22            0
9   aa  Q2 22            0
10  aa  Q3 22            0
11  aa  Q3 22            0
12  aa  Q3 22            0
13  aa  Q3 22            0
14  aa  Q3 22            0
15  aa  Q4 22            0
16  aa  Q4 22            0
17  aa  Q4 22            0
18  aa  Q4 22            0
19  aa  Q4 22            0
20  aa  Q1 23   hi       5
21  aa  Q1 23   hi       5
22  aa  Q1 23   hi       5
23  aa  Q1 23   hi       5
24  aa  Q1 23   hi       5
25  aa  Q1 23            0
26  bb  Q1 23   hi       2
27  bb  Q1 23            0
28  bb  Q1 23            0
29  bb  Q1 23            0
30  bb  Q1 23            0