I was trying to extract some specific characters from two lists of expression
list1:
I would like a result of \d \n\d \n\d on each string in the list but I can't seem to get a match. list is as follows:
['Famalicao\n5.10\nDraw\n1.30\nArouca\n9.50', 'Club America\n1.01\nDraw\n8.75\nClub Necaxa\n100.00', 'AD Pasto\n1.85\nDraw\n3.25\nJaguares de Cordoba\n4.25', 'Red Bull Bragantino\n1.60\nDraw\n3.65\nGuarani FC SP\n5.10']
list2:
In the second list I wanted to extract the first three digits in every string in the list. list is as follows:
['9.25\n4.05\n1.45\n2.35\n4.35\n2.35\n2.85\n2.60\n2.90', '1.32\n4.60\n18.0\n3.15\n2.30\n3.10\n3.75\n1.95\n3.65', '2.45\n2.65\n3.80\n2.00\n4.65\n2.70\n2.45\n2.65\n3.80', '1.75\n3.75\n4.65\n2.55\n7.00\n1.80\n3.55\n3.15\n2.10']
CodePudding user response:
Using \d \n\d \n\d matches 1 digits only followed by a newline in that order. To match numbers with an optional decimal part, you can use \d (?:\.\d )?
For the first list, there are digits at the start of the string, where there are also lines that do not contain digits at all.
If you want to match all those numbers, regardless of the format, you can match the number from the start of the string
^\d (?:\.\d )?
Example
import re
lst1 = ['Famalicao\n5.10\nDraw\n1.30\nArouca\n9.50', 'Club America\n1.01\nDraw\n8.75\nClub Necaxa\n100.00', 'AD Pasto\n1.85\nDraw\n3.25\nJaguares de Cordoba\n4.25', 'Red Bull Bragantino\n1.60\nDraw\n3.65\nGuarani FC SP\n5.10']
pattern1 = r"^\d (?:\.\d )?"
for s in lst1:
print(re.findall(pattern1, s, re.M))
Output
['5.10', '1.30', '9.50']
['1.01', '8.75', '100.00']
['1.85', '3.25', '4.25']
['1.60', '3.65', '5.10']
The second list has digits followed by newlines and digits. To get the first 3 numbers you can use 3 capture groups:
^(\d (?:\.\d )?)\n(\d (?:\.\d )?)\n(\d (?:\.\d )?)
Example
lst2 = ['9.25\n4.05\n1.45\n2.35\n4.35\n2.35\n2.85\n2.60\n2.90', '1.32\n4.60\n18.0\n3.15\n2.30\n3.10\n3.75\n1.95\n3.65', '2.45\n2.65\n3.80\n2.00\n4.65\n2.70\n2.45\n2.65\n3.80', '1.75\n3.75\n4.65\n2.55\n7.00\n1.80\n3.55\n3.15\n2.10']
pattern2 = r"^(\d (?:\.\d )?)\n(\d (?:\.\d )?)\n(\d (?:\.\d )?)"
for s in lst2:
print(re.findall(pattern2, s))
Output
[('9.25', '4.05', '1.45')]
[('1.32', '4.60', '18.0')]
[('2.45', '2.65', '3.80')]
[('1.75', '3.75', '4.65')]
