Given the following:
<input type="hidden" id="CSRFToken" name="CSRFToken" value="DFPYWYdMcvDWSmYSHh k5ptFLu068s4/AA=="/>
I want to extract: DFPYWYdMcvDWSmYSHh k5ptFLu068s4/AA=="/
My regex expression is:
(?i)name *= *([\'\"])\w*csrf\w*\1 *value *= *(\w*) *>
The problem is \w stop at first non char or number, how can I change that? I want to read everything until \1 is seen with extra condition that \ doesn't appear before it.
ie given:
<input type="hidden" id="CSRFToken" name="CSRFToken" value="DFPYWYdMcvDWSmYSHh k5ptFLu068\"s4/AA=="/>
I want to get:
DFPYWYdMcvDWSmYSHh k5ptFLu068\"s4/AA==
Plus given:
I want to get:
ABC
Please Note, in case value is wrapped between " then I want them removed.
CodePudding user response:
.*CSRFToken.*value=["|'](.*)["|']\/>
but value must be enclosed by quotes (single or double). It does not make sense that there is a unescaped quote in the string..
.*CSRFToken.*value=(.*)\/> for no quotes around but inside string.
But if that is really working for you to have unescaped quotes, for regex it's ok:
.*CSRFToken.*value=(?'quote'["'])?(.*)(?(quote)["'])\/>
This is:
- capturing a quote after equals to or not and stores to
quotecapture group - capturing everything in between to capture group for your further usage
- if
quotecaptured anything it will capture"/>or'/>in the end, if not same without quote (maybe last character is a quote in payload string..)
shown here: https://regex101.com/r/qYVBNV/1 but don't know python re syntax for that construct..
CodePudding user response:
I would do something like this in python:
import re
regex = r"name=([\'\"])\w*\1 value=\1(.*)\1.*>"
test_str = ("<input type=\"hidden\" id=\"CSRFToken\" name=\"CSRFToken\" value=\"DFPYWYdMcvDWSmYSHh k5ptFLu068s4/AA==\"/>\n"
"<input type=\"hidden\" id=\"CSRFToken\" name=\"CSRFToken\" value=\"DFPYWYdMcvDWSmYSHh k5ptFLu068\\\"s4/AA==\"/>\n")
matches = re.finditer(regex, test_str, re.MULTILINE | re.IGNORECASE)
for match in matches:
print(match.group(2))
\w won't be enough to match what you want to extract so I'm going with .*
I get whatever there is in value:
DFPYWYdMcvDWSmYSHh k5ptFLu068s4/AA==
DFPYWYdMcvDWSmYSHh k5ptFLu068\"s4/AA==
