Home > Blockchain >  How to fix my regex expression (extract value)?
How to fix my regex expression (extract value)?

Time:01-19

Given the following:

<input type="hidden" id="CSRFToken" name="CSRFToken" value="DFPYWYdMcvDWSmYSHh k5ptFLu068s4/AA=="/>

I want to extract: DFPYWYdMcvDWSmYSHh k5ptFLu068s4/AA=="/

My regex expression is:

 (?i)name *= *([\'\"])\w*csrf\w*\1 *value *= *(\w*) *>

The problem is \w stop at first non char or number, how can I change that? I want to read everything until \1 is seen with extra condition that \ doesn't appear before it.

ie given:

<input type="hidden" id="CSRFToken" name="CSRFToken" value="DFPYWYdMcvDWSmYSHh k5ptFLu068\"s4/AA=="/>

I want to get:

DFPYWYdMcvDWSmYSHh k5ptFLu068\"s4/AA==

Plus given:

I want to get:

ABC

Please Note, in case value is wrapped between " then I want them removed.

CodePudding user response:

.*CSRFToken.*value=["|'](.*)["|']\/> but value must be enclosed by quotes (single or double). It does not make sense that there is a unescaped quote in the string..

.*CSRFToken.*value=(.*)\/> for no quotes around but inside string.

But if that is really working for you to have unescaped quotes, for regex it's ok: .*CSRFToken.*value=(?'quote'["'])?(.*)(?(quote)["'])\/>

This is:

  1. capturing a quote after equals to or not and stores to quote capture group
  2. capturing everything in between to capture group for your further usage
  3. if quote captured anything it will capture "/> or '/> in the end, if not same without quote (maybe last character is a quote in payload string..)

shown here: https://regex101.com/r/qYVBNV/1 but don't know python re syntax for that construct..

CodePudding user response:

I would do something like this in python:

import re

regex = r"name=([\'\"])\w*\1 value=\1(.*)\1.*>"

test_str = ("<input type=\"hidden\" id=\"CSRFToken\" name=\"CSRFToken\" value=\"DFPYWYdMcvDWSmYSHh k5ptFLu068s4/AA==\"/>\n"
    "<input type=\"hidden\" id=\"CSRFToken\" name=\"CSRFToken\" value=\"DFPYWYdMcvDWSmYSHh k5ptFLu068\\\"s4/AA==\"/>\n")

matches = re.finditer(regex, test_str, re.MULTILINE | re.IGNORECASE)

for match in matches:
    print(match.group(2))

\w won't be enough to match what you want to extract so I'm going with .*

I get whatever there is in value:

DFPYWYdMcvDWSmYSHh k5ptFLu068s4/AA==
DFPYWYdMcvDWSmYSHh k5ptFLu068\"s4/AA==
  •  Tags:  
  • Related