Home > Blockchain >  fill missing value based on one column to another
fill missing value based on one column to another

Time:01-19

I have two columns like this:

enter image description here

what i want to do is suppose for 'age' columns value between 30-39,i want to fill the missing value of age_band = 30. Like that suppose for 'age' columns value between 80-89,i want to fill the missing value of age_band = 80. How can i do this in pandas dataframe?

I tried like this but the loop is running like forever

for ages in data['age']:
if 0<=ages<=9:
    data['age_band']= data['age_band'].fillna(0)
elif 10<=ages<=19:
    data['age_band']= data['age_band'].fillna(10)
elif 20<=ages<=29:
    data['age_band']= data['age_band'].fillna(20)
elif 30<=ages<=39:
    data['age_band']= data['age_band'].fillna(30)
elif 40<=ages<=49:
    data['age_band']= data['age_band'].fillna(40)
elif 50<=ages<=59:
    data['age_band']= data['age_band'].fillna(50)
elif 60<=ages<=69:
    data['age_band']= data['age_band'].fillna(60)
elif 70<=ages<=79:
    data['age_band']= data['age_band'].fillna(70)
elif 80<=ages<=89:
    data['age_band']= data['age_band'].fillna(80)
elif 90<=ages<=99:
    data['age_band']= data['age_band'].fillna(90)
elif 100<=ages<=109:
    data['age_band']= data['age_band'].fillna(100)

please help me

CodePudding user response:

Try this shortcut:

data['age_band'] = data['age_band'].fillna(data['age'] // 10 * 10).astype(int)
print(data)

# Output
   age  age_band
0   93        90
1   46        40
2   50        50
3   56        50
4   89        80
5   19        10
6   25        20
7   17        10
8   54        50
9   42        40

Setup:

import pandas as pd
import numpy as np

np.random.seed(2022)
data = pd.DataFrame({'age': np.random.randint(1, 111, 10), 'age_band': np.nan})
print(data)

# Output
   age  age_band
0   93       NaN
1   46       NaN
2   50       NaN
3   56       NaN
4   89       NaN
5   19       NaN
6   25       NaN
7   17       NaN
8   54       NaN
9   42       NaN

CodePudding user response:

The above answers only work when age bins are equal you may try pd.cut which work in all scenario.

You can use labels to pd.cut() as well. The following example contains the age in the range from 0-9. We're adding a new column called 'age alband' to categorize the age

bins represent the intervals: 0-9 is one interval, 10-19 is one interval, and so on The corresponding labels are "0-9"etc

bins = [0, 9,19,29,39,49,59,69,79,89,99,109]
labels = ["0-9","10-19","20-29","30-39","40-49","50-59","60-69","70-79","80-89","90-99","100-109",">109"]
data['age_band']= pd.cut(data['age'], bins=bins, labels=labels)
  •  Tags:  
  • Related