Home > Enterprise >  Creating a new column with value dependent on on other columns values
Creating a new column with value dependent on on other columns values

Time:01-27

Assuming I have a dataframe looking like below:

import pandas as pd
import numpy as np
d = {'Column 1': [10, 12,13,43,np.nan], 
    'Column2':[np.nan,7,np.nan,49,8]}
df = pd.DataFrame(d)

enter image description here

I would like to create a third column with a condition to take values from Column 2 unless they are NaNs. So looking like below:

enter image description here

I have found multiple topics/solutions where the condition was dependent on values in one column but could not find one where it had to provide data from more than one column.

CodePudding user response:

You could use mask:

df['Column3'] = df['Column2'].mask(df['Column2'].isna(), df['Column 1'])

A more generic version (uses any number of columns) would be to take the last valid value per row:

df['Column3'] = df.ffill(1).iloc[:,-1]

output:

   Column 1  Column2  Column3
0      10.0      NaN     10.0
1      12.0      7.0      7.0
2      13.0      NaN     13.0
3      43.0     49.0     49.0
4       NaN      8.0      8.0

CodePudding user response:

You could trying using np.where:

import pandas as pd
import numpy as np
d = {'Column 1': [10, 12,13,43,np.nan], 
    'Column2':[np.nan,7,np.nan,49,8]}
df = pd.DataFrame(d)

df['Column3'] = np.where(~pd.isna(df['Column2']), df['Column2'], df['Column 1'])
print(df)
   Column 1  Column2  Column3
0      10.0      NaN     10.0
1      12.0      7.0      7.0
2      13.0      NaN     13.0
3      43.0     49.0     49.0
4       NaN      8.0      8.0

CodePudding user response:

You only need:

df['Column3'] = df['Column2'].fillna(df['Column1'])

Or:

df['Column3'] = df['Column2'].combine_first(df['Column1'])
  •  Tags:  
  • Related