I have a column with numeric values in a string format with thousand separator commas and the minus marks stated at the end of the string for minus values.
df1 = pd.DataFrame({'Name':['John', 'Tom', 'Simon', 'Jose','Jose'],
'Amount':['1,000.00','1,000.00-', '100.00', '100.00-','1,000,000.00-'],
'ExpectedAmount':[1000.00,-1000.00, 100.00, -100.00,-1000000.00],
})
I want to convert the values in the "Amount" column to a float while preserving whether the value is a plus or minus.
I can remove the commas using "replace"
df1['Amount1'] = df1['Amount'].str.replace(',','')
But I'm unable to change the position of the minus mark from the end of the string to the beginning of the string so that I can change the format from string to float. Can someone help me to do this?
CodePudding user response:
You can swap minus sign with
df1.Amount.apply(lambda x: '-' x[:-1] if x[-1] == '-' else x)
CodePudding user response:
Here is a vectorial solution:
df1['ExpectedAmount'] = (df1['Amount'].str.replace('[^\d.]', '', regex=True).astype(float)
*df1['Amount'].str.endswith('-').mul(-2).add(1))
Remove all non digit/dot and convert to float
Multiply by -1 if the initial string ends with '-'
output:
Name Amount ExpectedAmount
0 John 1,000.00 1000.0
1 Tom 1,000.00- -1000.0
2 Simon 100.00 100.0
3 Jose 100.00- -100.0
4 Jose 1,000,000.00- -1000000.0
CodePudding user response:
Use Series.replace for swap - from back to start of string, remove , and casting to numeric:
df1['Amount'] = (df1['Amount'].replace({r'(.*)(-)$': r'\2\1', ',':''}, regex=True)
.astype(float))
print (df1)
Name Amount ExpectedAmount
0 John 1000.0 1000.0
1 Tom -1000.0 -1000.0
2 Simon 100.0 100.0
3 Jose -100.0 -100.0
4 Jose -1000000.0 -1000000.0
