I have a dataframe where every column has numeric values like 5,12; 3,14; 12,01... in object dtype.
I want to iterate through the table to convert the dtype to float.
Therefore, I made a list of all column names to replace the ',' with '.' of every value and then convert it into the right type.
My code looks like this:
for x in columnList:
x.replace(',' , '.')
x.astype(float)
Data:
Timestamp Ins_W/m2 GenPowerW1 GenPowerW2 GenPowerW3
2020-01-01 5,12 3,14 12,1
2020-01-02 6,84 16,4 12,1
.
.
.
Unfortunately, I always get an AttributeError. I hope someone can give me a hint on how to fix it.
CodePudding user response:
You need to iterate over each of the columns, converting each column to strings (with Series.str) to allow replacement and then converting those values to floats. To convert empty cells to NaN we first replace them with the string 'NaN':
df = pd.DataFrame({
'Timestamp': ['2020-01-01', '2020-01-02'],
'Ins_W/m2': ['5,12', '6,84'],
'GenPowerW1': ['3,14', ''],
'GenPowerW2': ['12,1', '16,4'],
'GenPowerW3': ['', '12,1']
})
df
# Timestamp Ins_W/m2 GenPowerW1 GenPowerW2 GenPowerW3
# 0 2020-01-01 5,12 3,14 12,1
# 1 2020-01-02 6,84 16,4 12,1
columnList = ['Ins_W/m2', 'GenPowerW1', 'GenPowerW2', 'GenPowerW3']
for col in columnList :
df[col] = df[col].str.replace(',', '.').replace('', 'NaN').astype(float)
df
# Timestamp Ins_W/m2 GenPowerW1 GenPowerW2 GenPowerW3
# 0 2020-01-01 5.12 3.14 12.1 NaN
# 1 2020-01-02 6.84 NaN 16.4 12.1
df['GenPowerW1']
# 0 3.14
# 1 NaN
# Name: GenPowerW1, dtype: float64
CodePudding user response:
Perhaps you could try something like this:
df = pd.DataFrame([["1,12", "3,14", ""], ["12,1", "234,1", "21,1"]], columns=["INS", "GEN_POWER1", "GEN_POWER2"])
df[["INS", "GEN_POWER1", "GEN_POWER2"]] = df[["INS", "GEN_POWER1", "GEN_POWER2"]].apply(lambda x: x.str.replace(",", "."))
df[["INS", "GEN_POWER1", "GEN_POWER2"]] = df[["INS", "GEN_POWER1", "GEN_POWER2"]].apply(pd.to_numeric, errors='coerce')
This applies the lambda function to each column that you need, and then converts each of those columns to a float using pd.to_numeric. The output looks like this:
INS GEN_POWER1 GEN_POWER2
0 1.12 3.14 NaN
1 12.10 234.10 21.1
