Change values in a dataframe column with mixed types based on a condition-CodePudding

One column of my dataset has both strings and floats. In that column, for each string I am trying to replace it with only the first 5 characters of the string.

def isfloat(num):
    try:
        float(num)
        return True
    except ValueError:
        return False

df = pd.DataFrame([[1, "Alligator"], [1, 3], [4, "Markets"]], columns=['A', 'B'])

The following two methods don't seem to change the actual dataframe.

df['B'].apply(lambda x: float(x) if isfloat(x) else x[0:5])

for index, row in df.iterrows():
    if not isfloat(row.B):
        row.B = row.B[0:5]

This next method results in the warning "cannot convert the series to <class 'float'>", I think because the isfloat method cannot be called in this way.

df['B'] = np.where(not isfloat(df['B']), df['B'][0:5], df['B'])

I tried using .loc as well but it did not seem suitable because of the condition I need to base the change on. How would one go about this, or what am I missing?

CodePudding user response：

I believe you need:

df['B']=df['B'].apply(lambda x: float(x) if isfloat(x) else x[0:5])

Since DataFrames are not edited in place.

Output:

   A      B
0  1  Allig
1  1    3.0
2  4  Marke

CodePudding user response：

Hi first of all dataframes are not edited in place. you simply need to store edited value of df.B column again in df.B column.

df.B=df.B.apply(lambda x: float(x) if isfloat(x) else x[0:5])

Also You can use the below Code too:

import pandas as pd
df = pd.DataFrame([[1, "Alligator"], [1, 3], [4, "Markets"]], columns=['A', 'B'])
newlist=[]   
for v in df.B:
    if type(v)==str:
        newlist.append(v[:5])
    else:
        newlist.append(v)
df['B']=newlist