I have a file that contains segments that form a word in the following format < segment1 segment2 segment3 segment4 >, what I want to have is an output with all the segments beside each other to form one word (So basically I want to remove the space between the segments and the < > sign surronding the segments). So for example:
Input:
< play ing > < game s . >
Output:
playing games.
I tried first detecting the pattern using \<\ (.*?)\ \> but I cannot seem to know how to remove the spaces
CodePudding user response:
Use this Python code:
import re
line = '< play ing > < game s . >'
line = re.sub(r'<\ \s*(.*?)\s*\ >', lambda z: z.group(1).replace(" ", ""), line)
print(line)
Results: playing games.
The lambda removes spaces additionally.
REGEX EXPLANATION
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
\ ' '
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\ ' '
--------------------------------------------------------------------------------
> '>'
CodePudding user response:
I assume that spaces can be converted to empty strings except when they are preceded by '>' and are followed by '<'. That is, the space in the string '> <' is not to be replaced by an empty string.
You can replace each match of the following regular expression with an empty string:
<\ |\ >|(?<!>) | (?!<)
Regex demo<¯\(ツ)/¯>Python code
This expression can be broken down as follows.
<\ # Match '< '
| # or
\ > # Match '< '
| # or
(?<!>) # Negative lookbehind asserts current location is not preceded by '>'
[ ] # Match a space
| # or
[ ] # Match a space
(?!<) # Negative lookahead asserts current location is not followed by '<'
I've placed each space in a character class above so it is visible.
