Home > Software design >  Regex to detect pattern and remove spaces from that pattern in Python
Regex to detect pattern and remove spaces from that pattern in Python

Time:01-05

I have a file that contains segments that form a word in the following format < segment1 segment2 segment3 segment4 >, what I want to have is an output with all the segments beside each other to form one word (So basically I want to remove the space between the segments and the < > sign surronding the segments). So for example:

Input:

< play ing > < game s . >

Output:

playing games. 

I tried first detecting the pattern using \<\ (.*?)\ \> but I cannot seem to know how to remove the spaces

CodePudding user response:

Use this Python code:

import re
line = '< play ing > < game s . >'
line = re.sub(r'<\ \s*(.*?)\s*\ >', lambda z: z.group(1).replace(" ", ""), line)
print(line)

Results: playing games.

The lambda removes spaces additionally.

REGEX EXPLANATION

--------------------------------------------------------------------------------
  <                        '<'
--------------------------------------------------------------------------------
  \                        ' '
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \                        ' '
--------------------------------------------------------------------------------
  >                        '>'

CodePudding user response:

I assume that spaces can be converted to empty strings except when they are preceded by '>' and are followed by '<'. That is, the space in the string '> <' is not to be replaced by an empty string.

You can replace each match of the following regular expression with an empty string:

<\ |\ >|(?<!>) | (?!<)

Regex demo<¯\(ツ)>Python code

This expression can be broken down as follows.

<\      # Match '< '
|       # or
\ >     # Match '< '
|       # or
(?<!>)  # Negative lookbehind asserts current location is not preceded by '>'
[ ]     # Match a space
|       # or
[ ]     # Match a space
(?!<)   # Negative lookahead asserts current location is not followed by '<'

I've placed each space in a character class above so it is visible.

  •  Tags:  
  • Related