extract substrings using regex python-CodePudding

I have a string as:

string= "**Started:** 2021-07-04 11:51:31 PM BST | **Finished:** 2021-07-04 11:51:46
PM BST | **Duration:** 1 Minute  
---  
Company| Participant|  Email | Joined| Duration| Messages  
---|---|---|---|---|---  
global merchant Bank (GR) ((PM) by TR) (Disclaimer)| Bokng Kim|
[email protected]| 2021-07-04 11:51:31 PM BST| 1 Minute | 0  
Brokers LP (GR) ((PM) by TR) (KW)| Ren Kim| [email protected]|
2021-07-04 11:51:31 PM BST| 1 Minute | 2  
---"

I want to extract the name and email ID from it i.e.,

names=['Bokng Kim','Ren Kim']
email=['[email protected]','[email protected]']

CodePudding user response：

Here is a regex re.findall option. First, we split the input text on column header, leaving behind the text containing the actual content. Then, we do a regex find all targeting the second and third pipe separated columns.

string = """**Started:** 2021-07-04 11:51:31 PM BST | **Finished:** 2021-07-04 11:51:46
PM BST | **Duration:** 1 Minute  
---  
Company| Participant|  Email | Joined| Duration| Messages  
---|---|---|---|---|---  
global merchant Bank (GR) ((PM) by TR) (Disclaimer)| Bokng Kim|
[email protected]| 2021-07-04 11:51:31 PM BST| 1 Minute | 0  
Brokers LP (GR) ((PM) by TR) (KW)| Ren Kim| [email protected]|
2021-07-04 11:51:31 PM BST| 1 Minute | 2  
---"""

inp = string.split('---|---|---|---|---|---')[1]
matches = re.findall(r'.*?\|\s*(.*?)\s*\|\s*(.*?)\s*\|', inp)
names = [x[0] for x in matches]
email = [x[1] for x in matches]
print(names)  # ['Bokng Kim', 'Ren Kim']
print(email)  # ['[email protected]', '[email protected]']