Home > database >  Regex returns negative integers as positive
Regex returns negative integers as positive

Time:01-12

I am scrapping a web and extracting some values, from which I need only the numeric half. For example, if the string says "-14.32 kcal/mole",I want to get the float -14.32

To do this I am applying the following code:

import re

number_string = '-9.2 kcal/mole'


number = re.search(r"[- ]?\d*\.\d |\d ", number_string).group()

print(number)

Output: -9.2

Whenever the number_string is a float it works fine. But when the number is a negative integer, I get the postive value of that number.

For example,

import re

number_string = '-4 kcal/mole'


number = re.search(r"[- ]?\d*\.\d |\d ", number_string).group()

print(number)

Output: 4 (instead of -4)

CodePudding user response:

| is the lowest priority operator. You are looking for a non-zero float

[- ]?\d*\.\d 

or an unsigned integer

\d 

You need to parenthesize the expression for matching the absolute value to make the sign apply to either:

[- ]?(?:\d*\.\d |\d )

or make the fractional part optional.

[- ]?\d*(?:.\d )?

In both cases, I've used non-capture groups to avoid changing the semantics of the following call to the groups method.

CodePudding user response:

I would use something like this:

[ -]?(?:\d*\.)?\d 
  • [ -]? - optional positive or negative sign
  • (?:\d*\.)? - optional leading digits followed by decimal
  • \d - required digits

https://regex101.com/r/WKPQ4h/1


Since you are scraping web content this regex will simply find all numbers.

You will probably wish to target specific units of measurement:

[ -]?(?:\d*\.)?\d (?= (?:kcal/mole|butterflies))

https://regex101.com/r/FM5ZXJ/1

CodePudding user response:

Your regular expression is set up to search for [- ]?\d*\.\d or \d , that is why it is happening. You can change you regular expression to something like [- ]?\d*\.\d |[- ]?\d and that should get your expected result.

  •  Tags:  
  • Related