Home > OS >  How to convert numerical portion of a string using regular expressions if the quantity is expressed
How to convert numerical portion of a string using regular expressions if the quantity is expressed

Time:01-05

I have a string containing a range of values in the form "1-10uM Ach" where 1 and 10 are not necessarily integers; they can be any floating number, "uM" stands for microMeter and then some irrelevant substring called "Ach". Since each micro is $10^{-6}$, I would like to use regular expression to covert this string to another string of the form "$10^{-6}-10^{-5}$". However, I am not sure what matching pattern I should use to isolate for numerical parts of the string to include any possible floats before and after "-" symbol.

Note:

  1. The line always start with a range
  2. There is always a unit after the range which can be any of the following values, mM, uM, nM, and pM which I know how to convert to M.
  3. There could be a white space between last number and the unit.

CodePudding user response:

You can capture the 2 (float) numbers in 2 capture groups, and to match one of mM, uM, nM, pM you can use a character class.

^(\d*\.?\d )-(\d*\.?\d )[munp]M \S ^
  • ^ Start of string
  • (\d*\.?\d ) Capture group 1, match a float like number being optional digits, optional dot and 1 digits
  • - Match a hyphen
  • (\d*\.?\d ) Capture group 2, match another float like number
  • [munp]M Match one of mM, uM, nM, pM
  • \S Match a space and 1 or more non whitespace chars
  • $ End of string

Regex demo

import re

pattern = r"(\d*\.?\d )-(\d*\.?\d )[munp]M \S $"
s = "1-10uM Ach"
m = re.match(pattern, s)
if m:
    print(m.group(1), m.group(2))

Output

1 10

Or for a partial match with word boundaries:

\b(\d*\.?\d )-(\d*\.?\d )[munp]M\b

Regex demo

If you are just interested in the numbers of the range of the pattern you can just use the first part:

^(\d*\.?\d )-(\d*\.?\d )

CodePudding user response:

The best approach is to match explicitly on two ints or floats at the start of the line, separated by a -:

data = '2.345-4.321 uM'
vals = re.match(r'^([0-9] \.?[0-9]*|\.[0-9] )-([0-9] \.?[0-9]*|\.[0-9] )', data)
print(vals[1], vals[2])
# 2.345 4.321

This regex will also match on numbers with a leading . but no 0 - e.g. .456 - if you don't need this then you can remove |\.[0-9] from each part.

  •  Tags:  
  • Related