I am scrapping a web and extracting some values, from which I need only the numeric half. For example, if the string says "-14.32 kcal/mole",I want to get the float -14.32
To do this I am applying the following code:
import re
number_string = '-9.2 kcal/mole'
number = re.search(r"[- ]?\d*\.\d |\d ", number_string).group()
print(number)
Output: -9.2
Whenever the number_string is a float it works fine. But when the number is a negative integer, I get the postive value of that number.
For example,
import re
number_string = '-4 kcal/mole'
number = re.search(r"[- ]?\d*\.\d |\d ", number_string).group()
print(number)
Output: 4 (instead of -4)
CodePudding user response:
| is the lowest priority operator. You are looking for a non-zero float
[- ]?\d*\.\d
or an unsigned integer
\d
You need to parenthesize the expression for matching the absolute value to make the sign apply to either:
[- ]?(?:\d*\.\d |\d )
or make the fractional part optional.
[- ]?\d*(?:.\d )?
In both cases, I've used non-capture groups to avoid changing the semantics of the following call to the groups method.
CodePudding user response:
I would use something like this:
[ -]?(?:\d*\.)?\d
[ -]?- optional positive or negative sign(?:\d*\.)?- optional leading digits followed by decimal\d- required digits
https://regex101.com/r/WKPQ4h/1
Since you are scraping web content this regex will simply find all numbers.
You will probably wish to target specific units of measurement:
[ -]?(?:\d*\.)?\d (?= (?:kcal/mole|butterflies))
https://regex101.com/r/FM5ZXJ/1
CodePudding user response:
Your regular expression is set up to search for [- ]?\d*\.\d or \d , that is why it is happening. You can change you regular expression to something like [- ]?\d*\.\d |[- ]?\d and that should get your expected result.
