Home > Software engineering >  python - findall and multiline combine results
python - findall and multiline combine results

Time:02-10

I have two findall statements that work well separately. But I'd like to combine them into one statement. How do I allow of continuous find not stopped by any /n?

Beautiful soup is not an option for bigger picture.

Code #!/usr/bin/python import re import os

f = open(os.path.join("data.txt"), "r")
text = f.read()

print (text)

fValue = re.findall(r"line-height: 1.45;\"\>(.*)</h3><p class=3D", text, re.MULTILINE) #Value1
print ("fAdd: " , fValue)
fPrice = re.findall(r"(\$.*)</p>", text, re.MULTILINE) #price
print ("fPrice: " , fPrice)

fCombine = re.findall(r"(\$.*)</p>.*\n.*line-height: 1.45;\"\>(.*)</h3><p class=3D", text, re.MULTILINE) #price
print ("fCombine: " , fCombine)

Data

-family: 'Montserrat', sans-serif; text-decoration: none; color: #323232; f=
ont-weight: 500; font-size: 16px; line-height: 1.38;">$144,900</p><h3 class=
=3D"highlight-title" style=3D"margin: 0; margin-bottom: 6px; font-family: '=
Montserrat', sans-serif; text-decoration: none; color: #323232; font-weight=
: 500; font-size: 13px; line-height: 1.45;">Value1</h3><p class=3D"hi=
ghlight-description" style=3D"margin: 0; font-family: 'Montserrat', sans-se=
rif; text-decoration: none; color: #323232; font-weight: 500; font-size: 13=

Results:

Add:  ['Value1']
fPrice:  ['$144,900']
fCombine:  []

Desired:

Add:  ['Value1']
fPrice:  ['$144,900']
fCombine:  ['Value1','$144,900']

CodePudding user response:

Since your regex patterns are working as you want. An easy option would be to use the boolean OR operator to combine them.

The pattern would become: r'line-height: 1.45;\"\>(.*)</h3><p class=3D|(\$.*)</p>'

using findall on this will return two match objects with two groups in them, but not all the groups will have values in them:

pattern = r"line-height: 1.45;\"\>(.*)</h3><p class=3D|(\$.*)</p>"
matches = re.findall(pattern, TEXT, re.MULTILINE)
print(matches)
# [('', '$144,900'), ('Value1', '')] the 1st tuple is the first match,
 which has only the price, the second tuple is the second match which doesnt have a value but has a price.

You can use finditer too, if you use capture groups the answer becomes a lot clearer but the result will be similar.

pattern = r"line-height: 1.45;\"\>(?P<value>.*)</h3><p class=3D|(?P<price>\$.*)</p>"
matches = re.finditer(pattern, TEXT, re.MULTILINE)
for match in matches:
    print(match.groupdict())

# {'value': None, 'price': '$144,900'}
# {'value': 'Value1', 'price': None}

regex test: https://regex101.com/r/yEISii/1

  •  Tags:  
  • Related