I want to change the dataframe cell(?) value as shown below
| before | col1 | after | col1 | |
|---|---|---|---|---|
| 0 | 10.0 | - | 0 | 10.0 |
| 1 | 20 (15) | - | 1 | 20.0 |
| 2 | ND | - | 2 | None |
| 3 | 30.0 | - | 3 | 30.0 |
| 4 | 40.0 | - | 4 | 40.0 |
df=pd.DataFrame([10.0, '20 (15)', 'ND', 30.0, 40.0], columns=['col1'])
for data in df['col1']:
if type(data) is str:
temp=data.split(' ')[0]
if data == 'ND':
data = None
else:
data = float(temp)
this code don't update the dataframe value.
help please
CodePudding user response:
Use pandas alternative Series.str.splitSeries.str.rsplit first, if integers get missing values so replace values by Series.fillna and convert to numeric by to_numeric with errors='coerce' for missing values if non numbers:
df['col1'] = pd.to_numeric(df['col1'].str.split().str[0]
.fillna(df['col1']), errors='coerce')
print (df)
col1
0 10.0
1 20.0
2 NaN
3 30.0
4 40.0
If need extract first integer or floats use Series.str.extract:
df=pd.DataFrame(['*10.0', '20 (15)', 'ND', 30.0, 40.0], columns=['col1'])
df['col1'] = pd.to_numeric(df['col1'].str.extract('(\d \.\d |\d )', expand=False)
.fillna(df['col1']), errors='coerce')
print (df)
col1
0 10.0
1 20.0
2 NaN
3 30.0
4 40.0
CodePudding user response:
You shouldn't modify your data in a loop. In you case, while you modify the variable data, this one is no longer linked to the DataFrame's data. In addition, while there are methods to do this, looping over rows is inefficient.
You can use vectorial code instead:
df['col1'] = pd.to_numeric(df['col1'].astype(str).str.extract('([.\d] )',
expand=False), errors='coerce')
or if you want to ensure valid floats as independent words:
df['col1'] = pd.to_numeric(df['col1'].astype(str).str.extract('\b(\d (?:\.\d )?\b)',
expand=False), errors='coerce')
