I have a dataframe, df, where I would like to fill in missing values in specific columns based on quarters.
Data
type date stat test
aa Q1 2022 20 1
aa Q2 2022 10 2
aa Q3 2022 30 1
bb Q1 2022 30 1
bb Q2 2022 10 1
Desired
type date stat test
aa Q1 2022 20 1
aa Q2 2022 10 2
aa Q3 2022 30 1
aa Q4 2022 0
bb Q1 2022 30 1
bb Q2 2022 10 1
bb Q3 2022 0
bb Q4 2022 0
Doing
Logic:
The pattern is Q1 2022, Q2 2022, Q3 2022 and Q4 2022.
If there is a 'break' in this pattern, the missing data should fill in accordingly with a stat
value of 0.
I believe I can create a dictionary and then combine the impute function
data = { "Q1 2022":0 ,
"Q2 2022":0 ,
"Q3 2022":0 ,
"Q4 2022":0 ,
}
df["type"].fillna("", inplace = True)
df["date"].fillna("", inplace = True) #input dictionary mapping
df["stat"].fillna("0", inplace = True)
Any suggestion is appreciated.
CodePudding user response:
You can pivot first then reindex back
l =['Q1 2022','Q2 2022','Q3 2022','Q4 2022']
out = df.pivot(*df).reindex(columns = l,fill_value=0).stack().reset_index(name = 'stat')
CodePudding user response:
Create a new dataframe with all combinations of type and dates then merge it with your original dataframe. Finally, fill values according your rules:
from itertools import product
dates = ['Q1 2022', 'Q2 2022', 'Q3 2022', 'Q4 2022']
df1 = pd.DataFrame(product(df['type'].unique(), dates), columns=['type', 'date'])
df1 = df1.merge(df, how='left').fillna({'stat': 0, 'test': ''})
Output:
>>> df1
type date stat test
0 aa Q1 2022 20.0 1.0
1 aa Q2 2022 10.0 2.0
2 aa Q3 2022 30.0 1.0
3 aa Q4 2022 0.0
4 bb Q1 2022 30.0 1.0
5 bb Q2 2022 10.0 1.0
6 bb Q3 2022 0.0
7 bb Q4 2022 0.0
