Home > OS >  Fill in missing column values based on pattern in Python
Fill in missing column values based on pattern in Python

Time:01-20

I have a dataframe, df, where I would like to fill in missing values in specific columns based on quarters.

Data

type    date    stat  test
aa      Q1 2022 20    1
aa      Q2 2022 10    2
aa      Q3 2022 30    1
bb      Q1 2022 30    1
bb      Q2 2022 10    1

Desired

type    date    stat  test
aa      Q1 2022 20    1
aa      Q2 2022 10    2
aa      Q3 2022 30    1
aa      Q4 2022 0
bb      Q1 2022 30    1
bb      Q2 2022 10    1
bb      Q3 2022 0
bb      Q4 2022 0

Doing

Logic:

The pattern is Q1 2022, Q2 2022, Q3 2022 and Q4 2022. 
If there is a 'break' in this pattern, the missing data should fill in accordingly with a stat 
value of 0.

I believe I can create a dictionary and then combine the impute function

data = {  "Q1 2022":0 ,
          "Q2 2022":0 ,
          "Q3 2022":0 ,
          "Q4 2022":0 ,
       }



df["type"].fillna("", inplace = True) 
df["date"].fillna("", inplace = True)   #input dictionary mapping
df["stat"].fillna("0", inplace = True)

Any suggestion is appreciated.

CodePudding user response:

You can pivot first then reindex back

l =['Q1 2022','Q2 2022','Q3 2022','Q4 2022']
out = df.pivot(*df).reindex(columns = l,fill_value=0).stack().reset_index(name = 'stat')

CodePudding user response:

Create a new dataframe with all combinations of type and dates then merge it with your original dataframe. Finally, fill values according your rules:

from itertools import product

dates = ['Q1 2022', 'Q2 2022', 'Q3 2022', 'Q4 2022']
df1 = pd.DataFrame(product(df['type'].unique(), dates), columns=['type', 'date'])
df1 = df1.merge(df, how='left').fillna({'stat': 0, 'test': ''})

Output:

>>> df1
  type     date  stat test
0   aa  Q1 2022  20.0  1.0
1   aa  Q2 2022  10.0  2.0
2   aa  Q3 2022  30.0  1.0
3   aa  Q4 2022   0.0     
4   bb  Q1 2022  30.0  1.0
5   bb  Q2 2022  10.0  1.0
6   bb  Q3 2022   0.0     
7   bb  Q4 2022   0.0     
  •  Tags:  
  • Related