Home > Mobile >  Python - lambda on pandas dataframe with nan rows
Python - lambda on pandas dataframe with nan rows

Time:01-27

I want to apply an alteration to a column of my dataframe where the cells are not empty. This is the dataframe that I am using:

df = pd.DataFrame ([{'name':None, 'client':None, 'fruit':'orange'},
                    {'name':'halley','client':'abana', 'fruit':'pear'},
                    {'name':'josh','client':'a', 'fruit':'apple'},
                    {'name':'kim','client':'b', 'fruit':'apple'}])

output:

   name    client fruit
0  nan     nan    orange
1  halley  abana  pear
2  josh    a      apple
3  kim     b      apple

I want to rename clients with string shorter than 5 characters to be 'client_x' instead and this is what I did:

df['client'] =df['client'].apply(lambda x: x if len(x)>5 else "client_" x)

but I have witnessed the following two possible errors:

TypeError: object of type 'float' has no len()
TypeError: object of type 'NoneType' has no len()

I don't understand how nan can be assumed as a float, but I would really like a smart way to get through this.

Any help would be greatly appreciated!!

CodePudding user response:

Use Series.str.len for working with missing values NaNs with numpy.where:

df['client'] = np.where(df['client'].str.len()>=5, df['client'], "client_" df['client'])

CodePudding user response:

You can use str.len to get the string length and feed it to mask to replace the short names with their prepended variant. NaNs will be excluded by str.len:

df['long_name'] = df['client'].mask(df['client'].str.len().lt(5),
                                    'client_' df['client'])

output:

     name client   fruit long_name
0    None   None  orange      None
1  halley  abana    pear     abana
2    josh      a   apple  client_a
3     kim      b   apple  client_b
  •  Tags:  
  • Related