Python - lambda on pandas dataframe with nan rows-CodePudding

I want to apply an alteration to a column of my dataframe where the cells are not empty. This is the dataframe that I am using:

df = pd.DataFrame ([{'name':None, 'client':None, 'fruit':'orange'},
                    {'name':'halley','client':'abana', 'fruit':'pear'},
                    {'name':'josh','client':'a', 'fruit':'apple'},
                    {'name':'kim','client':'b', 'fruit':'apple'}])

output:

   name    client fruit
0  nan     nan    orange
1  halley  abana  pear
2  josh    a      apple
3  kim     b      apple

I want to rename clients with string shorter than 5 characters to be 'client_x' instead and this is what I did:

df['client'] =df['client'].apply(lambda x: x if len(x)>5 else "client_" x)

but I have witnessed the following two possible errors:

TypeError: object of type 'float' has no len()
TypeError: object of type 'NoneType' has no len()

I don't understand how nan can be assumed as a float, but I would really like a smart way to get through this.

Any help would be greatly appreciated!!

CodePudding user response：

Use Series.str.len for working with missing values NaNs with numpy.where:

df['client'] = np.where(df['client'].str.len()>=5, df['client'], "client_" df['client'])

CodePudding user response：

You can use str.len to get the string length and feed it to mask to replace the short names with their prepended variant. NaNs will be excluded by str.len:

df['long_name'] = df['client'].mask(df['client'].str.len().lt(5),
                                    'client_' df['client'])

output:

     name client   fruit long_name
0    None   None  orange      None
1  halley  abana    pear     abana
2    josh      a   apple  client_a
3     kim      b   apple  client_b