I have a string containing a range of values in the form "1-10uM Ach" where 1 and 10 are not necessarily integers; they can be any floating number, "uM" stands for microMeter and then some irrelevant substring called "Ach". Since each micro is $10^{-6}$, I would like to use regular expression to covert this string to another string of the form "$10^{-6}-10^{-5}$". However, I am not sure what matching pattern I should use to isolate for numerical parts of the string to include any possible floats before and after "-" symbol.
Note:
- The line always start with a range
- There is always a unit after the range which can be any of the following values, mM, uM, nM, and pM which I know how to convert to M.
- There could be a white space between last number and the unit.
CodePudding user response:
You can capture the 2 (float) numbers in 2 capture groups, and to match one of mM, uM, nM, pM you can use a character class.
^(\d*\.?\d )-(\d*\.?\d )[munp]M \S ^
^Start of string(\d*\.?\d )Capture group 1, match a float like number being optional digits, optional dot and 1 digits-Match a hyphen(\d*\.?\d )Capture group 2, match another float like number[munp]MMatch one of mM, uM, nM, pM\SMatch a space and 1 or more non whitespace chars$End of string
import re
pattern = r"(\d*\.?\d )-(\d*\.?\d )[munp]M \S $"
s = "1-10uM Ach"
m = re.match(pattern, s)
if m:
print(m.group(1), m.group(2))
Output
1 10
Or for a partial match with word boundaries:
\b(\d*\.?\d )-(\d*\.?\d )[munp]M\b
If you are just interested in the numbers of the range of the pattern you can just use the first part:
^(\d*\.?\d )-(\d*\.?\d )
CodePudding user response:
The best approach is to match explicitly on two ints or floats at the start of the line, separated by a -:
data = '2.345-4.321 uM'
vals = re.match(r'^([0-9] \.?[0-9]*|\.[0-9] )-([0-9] \.?[0-9]*|\.[0-9] )', data)
print(vals[1], vals[2])
# 2.345 4.321
This regex will also match on numbers with a leading . but no 0 - e.g. .456 - if you don't need this then you can remove |\.[0-9] from each part.
