Home > Net >  group in regex not capturing the groups as intended
group in regex not capturing the groups as intended

Time:02-06

I have the following string

"""<note date="8/31/12">
    <to>Tove</to>
    <from>Jani</from>
    <heading type="Reminder"/>
    <body>Don't forget me this weekend!</body>"""

and I want to capture the open tags with groups so that I get the following output

Intended Output

[('note', ' date="8/31/12"'), ('to', ''), ('heading', ' type="Reminder"/'),  ('body', '')]

However I tried using this pattern enter image description here

I tried using the following pattern

and the output I get is

[('note', ' date="8/31/12">'), ('t', 'o>Tove</to>'), ('fro', 'm>Jani</from>'), ('heading', ' type="Reminder"/>'), ('bod', "y>Don't forget me this weekend!</body>")]

How should I go about capturing these two correctly so that I get the intended output

CodePudding user response:

Your second group is optional so you need to append a '?': <([a-z] )(\s?\w =?"?.*)?>.

As a general tip, look for the things you don't want to see instead of what you want to see. For example instead of <([a-z] ) you could capture everything until you find delimiters: <([^\s>\/] ). Instead of searching for every thing you might get, search for the symbols that will definitely end your string:[^>] .

So to give you a better solution: <(\w )([^>] )?>. I didn't implement the first suggestion because your tags don't appear to include special symbols.

PS: You should post your regex as text next time.

  •  Tags:  
  • Related