I have a pandas dataframe with different formats for one column like this
| Name | Values |
|---|---|
| First | 5-9 |
| Second | 7 |
| Third | - |
| Fourth | 12-16 |
I need to iterate over all Values column, and if the format is like the first row 5-9 or like fourth row 12-16 replace it with the mean between the 2 numbers in string.
For first row replace 5-9 to 7, or for fourth row replace 12-16 to 14.
And if the format is like third row - replace it to 0
I have tried
if df["Value"].str.len() > 1:
df["Value"] = df["Value"].str.split('-')
df["Value"] = (df["Value"][0] df["Value"][1]) / 2
elif df["Value"].str.len() == 1:
df["Value"] = df["Value"].str.replace('-', 0)
Expected output
| Name | Values |
|---|---|
| First | 7 |
| Second | 7 |
| Third | 0 |
| Fourth | 14 |
CodePudding user response:
Let us split and expand the column then cast values to float and calculate mean along column axis:
s = df['Values'].str.split('-', expand=True)
df['Values'] = s[s != ''].astype(float).mean(1).fillna(0)
Name Values
0 First 7.0
1 Second 7.0
2 Third 0.0
3 Fourth 14.0
CodePudding user response:
You can use str.replace with customized replacement function
mint = lambda s: int(s or 0)
repl = lambda m: str(sum(map(mint, map(m.group, [1,2])))/2)
df['Values'] = df['Values'].str.replace('(\d*)-(\d*)', repl, regex=True)
print(df)
Name Values
0 First 7.0
1 Second 7
2 Third 0.0
3 Fourth 14.0
