I have a string as:
string= "**Started:** 2021-07-04 11:51:31 PM BST | **Finished:** 2021-07-04 11:51:46
PM BST | **Duration:** 1 Minute
---
Company| Participant| Email | Joined| Duration| Messages
---|---|---|---|---|---
global merchant Bank (GR) ((PM) by TR) (Disclaimer)| Bokng Kim|
[email protected]| 2021-07-04 11:51:31 PM BST| 1 Minute | 0
Brokers LP (GR) ((PM) by TR) (KW)| Ren Kim| [email protected]|
2021-07-04 11:51:31 PM BST| 1 Minute | 2
---"
I want to extract the name and email ID from it i.e.,
names=['Bokng Kim','Ren Kim']
email=['[email protected]','[email protected]']
CodePudding user response:
Here is a regex re.findall option. First, we split the input text on column header, leaving behind the text containing the actual content. Then, we do a regex find all targeting the second and third pipe separated columns.
string = """**Started:** 2021-07-04 11:51:31 PM BST | **Finished:** 2021-07-04 11:51:46
PM BST | **Duration:** 1 Minute
---
Company| Participant| Email | Joined| Duration| Messages
---|---|---|---|---|---
global merchant Bank (GR) ((PM) by TR) (Disclaimer)| Bokng Kim|
[email protected]| 2021-07-04 11:51:31 PM BST| 1 Minute | 0
Brokers LP (GR) ((PM) by TR) (KW)| Ren Kim| [email protected]|
2021-07-04 11:51:31 PM BST| 1 Minute | 2
---"""
inp = string.split('---|---|---|---|---|---')[1]
matches = re.findall(r'.*?\|\s*(.*?)\s*\|\s*(.*?)\s*\|', inp)
names = [x[0] for x in matches]
email = [x[1] for x in matches]
print(names) # ['Bokng Kim', 'Ren Kim']
print(email) # ['[email protected]', '[email protected]']
