I'm trying to extract the part of an URL ignoring the http(s)://www. part of it.
These URLs come from a form that the user fills and multiple formats and errors are expected, here's a sample:
http://www.akashicbooks.com
https://deliciouselsalvador.com
http://altaonline.com
http://https://www.amtb-la.org/
http://https://www.amovacations.com/
http://dornsife.usc.edu/jep
I've tried in Google Sheets and Airtable using the REGEXEXTRACT formula:
=REGEXEXTRACT({URL},"[^/] $")
But unfortunately, I can't make it work for all the cases:
Any ideas on how to make it work?
CodePudding user response:
You can use
^(?:https?://(?:www\.)?)*(.*)
See the regex demo. Details:
^- start of string(?:https?://(?:www\.)?)*- zero or more occurrences ofhttps?://-http://orhttps://(?:www\.)?- an optional sequence ofwww.
(.*)- Group 1: the rest of the string.
With REGEXEXTRACT, the output value is the text captured with Group 1.
