I have a concatenated text that I want to split using Regex. Luckily there is a pattern. The pattern is structured this way: (seconds) some text (seconds) some other text (seconds) some other text
(1-4) Agent: THANK YOU FOR CALLING XCOMPANY MY NAME IS DEVIN HOW CAN I HELP YOU (5-22) Customer: HI KEVIN I TRANSFERRED OVER TO YOU ... (24-29) Agent: OK AY (662-662) Customer: THANK YOU TOO (663-664) Agent: THANKS BYE NOW (664-664) Customer: BYE
I want to split each block so output should be like this.
(1-4) Agent: THANK YOU FOR CALLING XCOMPANY MY NAME IS DEVIN HOW CAN I HELP YOU
(5-22) Customer: HI KEVIN I TRANSFERRED OVER TO YOU ... ABOUT THAT BILL
(24-29) Agent: OK AY
So far I was able to create this \(\d*-\d*\)\s*\w*:\s*, but this catches (1-4) Agent: I can't figure out the rest, I tried many things.
Here is Regex101 link, showing where I am stuck.
CodePudding user response:
you could try Match groups
(\(\d*-\d*\)\s*\w :[\s?\w] [^(]){1}
I'm not much of a regex pro but I do try. Let me know if it helped :D
CodePudding user response:
With
\(\d*-\d*\)\s*\w*:[^(]*
you can catch everything after the colon that is not an open parenthesis.
CodePudding user response:
In the pattern that you have tried, the digits between parenthesis are optional due to the *, and the \w*:\s* does not match beyond optional word characters : and optional whitespace chars.
You can use:
\(\d -\d \).*?(?=\(\d -\d \)|$)
Explanation
\(\d -\d \)match(, 1 digits-1 digits and).*?Match any character, as few as possible(?=Positive lookahead\(\d -\d \)The digit pattern between parenthesis|Or$End of string (For the last occurrence)
)Close lookahead
Example code
import re
pattern = r"\(\d -\d \).*?(?=\(\d -\d \)|$)"
s = "(1-4) Agent: THANK YOU FOR CALLING XCOMPANY MY NAME IS DEVIN HOW CAN I HELP YOU (5-22) Customer: HI KEVIN I TRANSFERRED OVER TO YOU ... (24-29) Agent: OK AY (662-662) Customer: THANK YOU TOO (663-664) Agent: THANKS BYE NOW (664-664) Customer: BYE"
print(re.findall(pattern, s))
Output
[
'(1-4) Agent: THANK YOU FOR CALLING XCOMPANY MY NAME IS DEVIN HOW CAN I HELP YOU ',
'(5-22) Customer: HI KEVIN I TRANSFERRED OVER TO YOU ... ',
'(24-29) Agent: OK AY ',
'(662-662) Customer: THANK YOU TOO ',
'(663-664) Agent: THANKS BYE NOW ', '(664-664) Customer: BYE'
]
