I have a dataframe of string values with missing values in it. It needs to be populated/filled by the below conditions.
- From the
NaNvalue index , Check the last 3 rows and next 3 rows and replace theNaNwith the most frequent/repeated value out of 6 rows. - If there is 2 strings with an equal amount of frequency occurred from the last 3 rows and next 3 rows , replace the
NaNwith the value that has lowest index out of theses 6 rows.
My DataFrame:
reading
0 talk
1 kill
2 NaN
3 vertical
4 type
5 kill
6 NaN
7 vertical
8 vertical
9 type
10 durable
11 NaN
12 durable
13 vertical
Expected output:
reading
0 talk
1 kill
2 kill
3 vertical
4 type
5 kill
6 vertical
7 vertical
8 vertical
9 type
10 durable
11 vertical
12 durable
13 vertical
Here is the minimum reproducible code:
import pandas as pd
import numpy as np
df = pd.DataFrame({'reading':['talk','kill',np.NAN,'vertical','type','kill',np.NAN,'vertical','vertical','type','durable',np.NAN,'durable','vertical']})
def filldf(df):
# Do the logic here
return df
I am not sure how to approach this problem. Any help will be appreciated !!
CodePudding user response:
If you don't have too many NaN values, you can iterate over the index of NaN "reading" values and simply look for the mode of the surrounding 6 values of it (use iloc to get the first occurrence of multiple modes) and assign the values back to the corresponding "NaN" values
msk = df['reading'].isna()
df.loc[msk, 'reading'] = [df.loc[min(0, i-3):i 3, 'reading'].mode().iloc[0] for i in df.index[msk]]
Output:
reading
0 talk
1 kill
2 kill
3 vertical
4 type
5 kill
6 vertical
7 vertical
8 vertical
9 type
10 durable
11 vertical
12 durable
13 vertical
