As the title says, I want to extract the text between the last two ocurrences of a character in a string.
I have:
'9500 anti-Xa IU/ml - 0,6 ml 5700 IU -'
'120 mg/ml – 0.165 ml -'
'300-300-300 IR/ml or IC/ml - 10 ml -'
'Fluocortolone-21-pivalate 1 mg/g, Lidocaine hydrochloride 20 mg/g - 15 g -'
I want to have:
'0,6 ml 5700 IU'
'0.165 ml'
'10 ml'
'15 g'
I tried using -\s*.*- but it matches everything between first and last -. What's the correct regex to use?
CodePudding user response:
With search:
import re
[re.search(r'[-–]\s*([^-–] ?)\s*[-–][^-–]*$', x).group(1) for x in l]
Or split:
[re.split(r'\s [-–]\s*', x, 2)[-2] for x in l]
output: ['0,6 ml 5700 IU', '0.165 ml', '10 ml', '15 g']
used input:
l = ['9500 anti-Xa IU/ml - 0,6 ml 5700 IU -',
'120 mg/ml – 0.165 ml -',
'300-300-300 IR/ml or IC/ml - 10 ml -',
'Fluocortolone-21-pivalate 1 mg/g, Lidocaine hydrochloride 20 mg/g - 15 g -'
]
CodePudding user response:
You can use
[^-–—\s][^-–—]*?(?=\s*[-–—][^-–—]*$)
See the regex demo. Details:
[^-–—\s]- a char other than whitespace,-,–and—[^-–—]*?- zero or more chars other than-,–and—as few as possible(?=\s*[-–—][^-–—]*$)- a positive lookahead that requires zero or more whitespaces, then a-,–or—char and then zero or more chars other than-,–and—till end of string immediately to the right of the current location.
CodePudding user response:
With your shown samples Only. Please try following regex with Python code, written and tested in Python3. Here is the Online demo for used regex.
import re
var="""9500 anti-Xa IU/ml - 0,6 ml 5700 IU -
120 mg/ml - 0.165 ml -
300-300-300 IR/ml or IC/ml - 10 ml -
Fluocortolone-21-pivalate 1 mg/g, Lidocaine hydrochloride 20 mg/g - 15 g -"""
[x.strip(' ') for x in re.findall(r'(?<=\s-|\s–)(.*?)(?=-)',var,re.M)]
Output will be as follows:
['0,6 ml 5700 IU', '0.165 ml', '10 ml', '15 g']
Explanation: Simple explanation would be, using Python3's re module's findall function. Where I am using regex r'(?<=\s-|\s–)(.*?)(?=-)' to get the required output. Then removing all leading and trailing spaces with strip function from it to get expected output.
CodePudding user response:
Try to also match the blank space before the last dash -:
\s\-\s(.*)\s\-
By the way, maybe regex101 website could help you next time you have a new regex issue.
EDIT
I just see that you have two types of dash symbols! Short - and long –. Try this regex instead:
\s[-–]\s(.*)\s[-–]
