I have the following data (a subset of possible log4j responders if someone is interested)
ap://167.172.44.255:1389/LegitimateJavaCla
ap://167.172.44.255:1389/La
ap://167.99.32.139:1389/Basic/ReverseShell/167.99.32.139/99
ldap://x.x.x.x.61k2ev3252274o2ek77941q85t0r9444o.interact.sh/ok6ll9m
ldap://c6ps4rekeidcvgqlsmsgcg37qdoyyknz4.interact.sh/a
ldap://c6ps4rekeidcvgqlsmsgcg37x9ayymcak.interact.sh/a
ldap://c6ps4ipurnhssm2608l0cg37chyyykyhk.interact.sh/a
ldap://c6ps4ipurnhssm2608l0cg37pdyyykbug.interact.sh/a
91fd9fef8958.bingsearchlib.com:39356/
550f7e1deaed.bingsearchlib.com:39356/a
2174d47e8d04.bingsearchlib.com:39356/a
da6d408517b9.bingsearchlib.com:39356/a
5463610592ef.bingsearchlib.com:39356/a
I would like to keep the FQDN only (the host and domain) or the IP - so I tried (\S*)?(:\/\/)?(?<interesting>.*)(:)?\/ (see https://regex101.com/r/dusRR5/1)
The idea was:
(\S*)?→ match or not some letters (ldap, ...)(:\/\/)?→ match or not://(?<interesting>.*)→ match anything and call itinteresting(:)?→ ... but stop at:if there is one\/→ ... otherwise stop at/
The expected result is
167.172.44.255
167.99.32.139
x.x.x.x.61k2ev3252274o2ek77941q85t0r9444o.interact.sh
c6ps4rekeidcvgqlsmsgcg37qdoyyknz4.interact.sh
c6ps4rekeidcvgqlsmsgcg37x9ayymcak.interact.sh
(...)
But it does not work and my very limited knowledge of regex does not help.
CodePudding user response:
Modified a bit:
^((?:\S*:\/\/)?\S*?)[:\/]
The capturing group contains what you are interested in. The key is to use the lazy approach (*?) along with the start line anchor (^).
CodePudding user response:
You can use
^(?:[a-zA-Z0-9] :\/\/)?(?<interesting>[^:\/] )
See the regex demo. Details:
^- start of string(?:[a-zA-Z0-9] :\/\/)?- an optional occurrence of any one or more letters/digits and then://(?<interesting>[^:\/] )- Group "interesting": any one or more chars other than:and/.
Remember that you do not have to escape / if you define your regex with a string literal (as in Python, or C#, or using constructor notations in JavaScript/Ruby/etc.).
