How can I extract the "wp-*" portion of the URL in regex?
facebook.com/wp-content/uploads/xyz
facebook.com/wp-uploads/uploads/xyz
CodePudding user response:
This depend a bit on the language you use and how flexible the regex should be.
A very generic regex could look like this .*/(wp-[^/] ).*
In JavaScript a code could look like this
const url = 'facebook.com/wp-content/uploads/xyz';
const folder = url.replace(/.*\/(wp-[^\/] ).*/, '$1');
CodePudding user response:
This can extract "content" and "uploads" from your example:
/(?<=/wp-)[a-zA-Z0-9] /g
The first part is called "positive lookbehind" (?<=/wp-) it only starts extracting from the point that follows the /wp- character sequence. The second part [a-zA-Z0-9] sets what kind of characters we expect to read. I've added lowercase letters, uppercase letters, and numbers.
If these keywords can contain any other characters e.g: "-", "_", you can add those to the rule like this:
/(?<=/wp-)[a-zA-Z0-9-_] /g
The Global Switch at the end means that you want to check the whole text that can contain multiple matches to the rule.
Edit:
If you want to read wp-content and wp-uploads, you can move wp- out of the lookbehind part, like this:
/(?<=/)wp-[a-zA-Z0-9-_] /g
It will read something that follows a / and starts with wp-. Do not add / as a special character to the second part, because that will ruin it.
