I have two findall statements that work well separately. But I'd like to combine them into one statement. How do I allow of continuous find not stopped by any /n?
Beautiful soup is not an option for bigger picture.
Code #!/usr/bin/python import re import os
f = open(os.path.join("data.txt"), "r")
text = f.read()
print (text)
fValue = re.findall(r"line-height: 1.45;\"\>(.*)</h3><p class=3D", text, re.MULTILINE) #Value1
print ("fAdd: " , fValue)
fPrice = re.findall(r"(\$.*)</p>", text, re.MULTILINE) #price
print ("fPrice: " , fPrice)
fCombine = re.findall(r"(\$.*)</p>.*\n.*line-height: 1.45;\"\>(.*)</h3><p class=3D", text, re.MULTILINE) #price
print ("fCombine: " , fCombine)
Data
-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; f=
ont-weight: 500; font-size: 16px; line-height: 1.38;">$144,900</p><h3 class=
=3D"highlight-title" style=3D"margin: 0; margin-bottom: 6px; font-family: '=
Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight=
: 500; font-size: 13px; line-height: 1.45;">Value1</h3><p class=3D"hi=
ghlight-description" style=3D"margin: 0; font-family: 'Montserrat', sans-se=
rif; text-decoration: none; color: #323232; font-weight: 500; font-size: 13=
Results:
Add: ['Value1']
fPrice: ['$144,900']
fCombine: []
Desired:
Add: ['Value1']
fPrice: ['$144,900']
fCombine: ['Value1','$144,900']
CodePudding user response:
Since your regex patterns are working as you want. An easy option would be to use the boolean OR operator to combine them.
The pattern would become:
r'line-height: 1.45;\"\>(.*)</h3><p class=3D|(\$.*)</p>'
using findall on this will return two match objects with two groups in them, but not all the groups will have values in them:
pattern = r"line-height: 1.45;\"\>(.*)</h3><p class=3D|(\$.*)</p>"
matches = re.findall(pattern, TEXT, re.MULTILINE)
print(matches)
# [('', '$144,900'), ('Value1', '')] the 1st tuple is the first match,
which has only the price, the second tuple is the second match which doesnt have a value but has a price.
You can use finditer too, if you use capture groups the answer becomes a lot clearer but the result will be similar.
pattern = r"line-height: 1.45;\"\>(?P<value>.*)</h3><p class=3D|(?P<price>\$.*)</p>"
matches = re.finditer(pattern, TEXT, re.MULTILINE)
for match in matches:
print(match.groupdict())
# {'value': None, 'price': '$144,900'}
# {'value': 'Value1', 'price': None}
regex test: https://regex101.com/r/yEISii/1
