Home > Software engineering >  Fill empty columns with values from another column of another row based on an identifier
Fill empty columns with values from another column of another row based on an identifier

Time:01-14

I am trying to fill a dataframe, containing repeated elements, based on an identifier. My Dataframe is as follows:

   Code Value
0  SJHV   
1  SJIO    96B
2  SJHV    33C
3  CPO3    22A
4  CPO3    22A
5  SJHV    33C       #< -- Numbers stored as strings
6   TOY   
7   TOY             #< -- These aren't NaN, they are empty strings

I would like to remove the empty 'Value' rows only if a non-empty 'Value' row exists. To be clear, I would want my output to look like:

   Code Value
0  SJHV    33C
1  SJIO    96B
2  CPO3    22A      
3   TOY         

My attempt was as follows:

df['Value'].replace('', np.nan, inplace=True)

df2 = df.dropna(subset=['Value']).drop_duplicates('Code')

As expected, this code also drops the 'TOY' Code. Any suggestions?

CodePudding user response:

The empty strings should go to the bottom if you sort them, then you can just drop duplicates.

import pandas as pd
df = pd.DataFrame({'Code':['SJHV','SJIO','SJHV','CPO3','CPO3','SJHV','TOY','TOY'],'Value':['','96B','33C','22A','22A','33C','','']})
df = (
    df.sort_values(by=['Value'], ascending=False)
      .drop_duplicates(subset=['Code'], keep='first')
      .sort_index()
)
    

Output

   Code Value
1  SJIO   96B
2  SJHV   33C
3  CPO3   22A
6   TOY      
  •  Tags:  
  • Related