I am trying to extract integers and variable values defined in JavaScript in an HTML file using Python 3 re.findall method.
However, I am having a little difficulty matching digits enclosed in " with \d*, and matching an alphanumeric string enclosed in " too.
Case 1:
s = """
<script>
var i = 1636592595;
var j = i Number("6876" "52907");
</script>
"""
pattern = r'var j = i Number(\"(\d*)\" \"(\d*)\");'
m = re.findall(pattern, s)
print(m) # Output: []
The desired output should contain 6876 and 52907, but an empty list [] was obtained.
Case 2:
s = """
xhr.send(JSON.stringify({
"bm-foo": "AAQAAAAE/////4ytkgqq/oWI",
"pow": j
}));
"""
pattern = r'"bm-foo": \"(\w*)\",'
m = re.findall(pattern, s)
print(m) # Output: []
The desired output should contain AAQAAAAE/////4ytkgqq/oWI, but an empty list [] was obtained.
Can I have some help explaining why my regex patterns are not matching it?
CodePudding user response:
In the first regexp you need to escape , (, and ).
In the second regexp, use [^"]* instead of \w*, since \w doesn't match punctuation like /.
import re
s = """
<script>
var i = 1636592595;
var j = i Number("6876" "52907");
</script>
"""
pattern = r'var j = i \ Number\("(\d*)" \ \"(\d*)\"\);'
m = re.findall(pattern, s)
print(m)
s = """
xhr.send(JSON.stringify({
"bm-foo": "AAQAAAAE/////4ytkgqq/oWI",
"pow": j
}));
"""
pattern = r'"bm-foo": "([^"]*)",'
m = re.findall(pattern, s)
print(m)
