Home > Software design >  using regex to extract characters of a list of strings
using regex to extract characters of a list of strings

Time:02-03

I was trying to extract some specific characters from two lists of expression

list1:

I would like a result of \d \n\d \n\d on each string in the list but I can't seem to get a match. list is as follows:

['Famalicao\n5.10\nDraw\n1.30\nArouca\n9.50', 'Club America\n1.01\nDraw\n8.75\nClub Necaxa\n100.00', 'AD Pasto\n1.85\nDraw\n3.25\nJaguares de Cordoba\n4.25', 'Red Bull Bragantino\n1.60\nDraw\n3.65\nGuarani FC SP\n5.10']

list2:
In the second list I wanted to extract the first three digits in every string in the list. list is as follows:

['9.25\n4.05\n1.45\n2.35\n4.35\n2.35\n2.85\n2.60\n2.90', '1.32\n4.60\n18.0\n3.15\n2.30\n3.10\n3.75\n1.95\n3.65', '2.45\n2.65\n3.80\n2.00\n4.65\n2.70\n2.45\n2.65\n3.80', '1.75\n3.75\n4.65\n2.55\n7.00\n1.80\n3.55\n3.15\n2.10']

CodePudding user response:

Using \d \n\d \n\d matches 1 digits only followed by a newline in that order. To match numbers with an optional decimal part, you can use \d (?:\.\d )?

For the first list, there are digits at the start of the string, where there are also lines that do not contain digits at all.

If you want to match all those numbers, regardless of the format, you can match the number from the start of the string

^\d (?:\.\d )?

Regex demo

Example

import re

lst1 = ['Famalicao\n5.10\nDraw\n1.30\nArouca\n9.50', 'Club America\n1.01\nDraw\n8.75\nClub Necaxa\n100.00', 'AD Pasto\n1.85\nDraw\n3.25\nJaguares de Cordoba\n4.25', 'Red Bull Bragantino\n1.60\nDraw\n3.65\nGuarani FC SP\n5.10']
pattern1 = r"^\d (?:\.\d )?"
for s in lst1:
    print(re.findall(pattern1, s, re.M))

Output

['5.10', '1.30', '9.50']
['1.01', '8.75', '100.00']
['1.85', '3.25', '4.25']
['1.60', '3.65', '5.10']

The second list has digits followed by newlines and digits. To get the first 3 numbers you can use 3 capture groups:

^(\d (?:\.\d )?)\n(\d (?:\.\d )?)\n(\d (?:\.\d )?)

Regex demo

Example

lst2 = ['9.25\n4.05\n1.45\n2.35\n4.35\n2.35\n2.85\n2.60\n2.90', '1.32\n4.60\n18.0\n3.15\n2.30\n3.10\n3.75\n1.95\n3.65', '2.45\n2.65\n3.80\n2.00\n4.65\n2.70\n2.45\n2.65\n3.80', '1.75\n3.75\n4.65\n2.55\n7.00\n1.80\n3.55\n3.15\n2.10']

pattern2 = r"^(\d (?:\.\d )?)\n(\d (?:\.\d )?)\n(\d (?:\.\d )?)"
for s in lst2:
    print(re.findall(pattern2, s))

Output

[('9.25', '4.05', '1.45')]
[('1.32', '4.60', '18.0')]
[('2.45', '2.65', '3.80')]
[('1.75', '3.75', '4.65')]
  •  Tags:  
  • Related